Download GenBank Searches

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Genome evolution wikipedia , lookup

Genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Point mutation wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene desert wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Biol 400 F2016
Names: _____________________________________
Section: ______
GENBANK WORKSHEET
You will work in pairs. Each pair of two students will turn in ONE GenBank worksheet.
Please write both of your names on the top of this sheet.
Genbank is an amazing resource with sequence information from thousands of different
organisms. This worksheet will help familiarize you with the information stored on GenBank
and help you learn to access it freely.
1. Select "GenBank" from the “links” page on the Bio 400 homepage
2. Read "What is GenBank"
3. Click on an annotated sample "record" link to take you to the sample of a GenBank record
for a Saccharomyces cerevisiae DNA sequence.
Read over this example and answer the following questions:
(Note: you can click on any of the words written in blue for a description of that word or
abbreviation)
a. What information does the “locus” give you? __________________________________
b. How many genes are present in this sequence? ___________________
c. Which of these genes is only partially included in this sequence? _____________
d. What TWO pieces of information in this record allowed you to determine that it was
a partial sequence?
1)
2)
e. What is the 2nd amino acid of the REV7 gene product ? ____________
f. Write the 6 base pair nucleotide sequence surrounding the start site of translation for
AXL2 ? Hint: What does CDS stand for?
____ ___ ___ ___ ___ ___
-3
-2
-1 +1 +2 +3
SEARCHING GENBANK
Biol 400 F2016
Names: _____________________________________
Section: ______
Now that you are familiar with a GenBank record, you will perform a GenBank search of your
own. You are interested in finding the nucleotide sequence of the gene that codes for an
Arabidopsis thaliana chromatin remodeling protein called SWI3B or switch subunit 3 .
1. Go back to the GenBank home page.
2. Select “Search Gen Bank" from the menu on the right side of the page under GenBank
Resources. This will take you to a new window that will allow you to search for nucleotide
sequences present in genbank. (If you want to search for other sequences (e.g. protein, SNPs
etc.) you could do this by selecting a different database in the drop down menu in the box that
reads “nucleotide” at the very top of the page
3. Using the GenBank Search function you should be able to identify several different
Arabidopsis thaliana nucleotide sequences that contain all or part of the gene "AtSWI3B". Note
that there are also sequences from entire chromosomes, but it is hard to find your gene
sequence in this large set of nucleotides so don’t choose these!
4. Select a record that contains the complete coding sequence for AtSWI3B
- Accession number for this record (e.g. NM 354907): __________________
5. Click on the link to access that GenBank record and answer these questions:
a. How do you know this entry contains a complete coding sequence for Arabidopsis
Switch subunit 3 (At SWI3B)?
b. What is the common name for Arabidopsis thaliana? __________________
c. How long is the AtSWI3B sequence?_____________________________
d. What is the base pair number of the first base pair of the start codon? ________
e. Write the sequence surrounding the start codon from -3 to +3:
(where +1 is the “A” of the start codon)
6. To learn more about this gene, click on the “gene” link. A box should open up. In the box
select: Gene ID 817927 and answer the following questions.
a. What chromosome is AtSWI3B located on? ____
b. What gene is located closest to the 5’ end of AtSWI3B on the chromosome?_________
c. Scroll down to “Interactions”. What are some proteins (not gene locus numbers) that
AtSWI3B has been found to interact with:
___________________________
d. Under bibliography (on the right), follow the Pubmed link to find five abstracts for
publications about AtSWI3B.
What evidence is there that this chromatin remodeler is involved in gene silencing?
What evidence is there that this chromatin remodeler is involved in gene activation?
What evidence directly indicates that this chromatin remodeler is involved in
stress adaptation (hint: look at the abstract by Saez et al)?
BLASTing (BLAST = Basic Local Alignment Search Tool)
Imagine that you have cloned and sequenced a portion of an Arabidopsis gene.
gtgaacccgt caacccttga acctcggctg gcaagtctaa tcaaaggcag gcagttaaat
The questions you want to ask are:
1. Does this sequence match an existing sequenced gene?
2. Is this gene unique to Arabidopsis or does it have orthologs in other species?
To answer these questions you need to search online genome databases:
1. Select "BLAST" from the “links” page on the Bio 400 web page or google “BLAST search”
http://blast.ncbi.nlm.nih.gov/Blast.cgi
2. Under Basic BLAST Select “Nucleotide BLAST”.
3. Paste your sequence into the “Enter Query Sequence” Box. (The sequence can be copied and
pasted from the word document “BioinformaticsSequence” in the Assignments folder on the
Bio 400 web page)
4. Select a database from the dropdown menu to search. For the broadest search, use the
nucleotide collection (nr/nt).
5. In the Program Selection Box, click the ? button next to “choose a BLAST algorithm” and
read the information.
a. Which type of search will best allow you to answer your first question (at the top of this
page)? ______________________
why?
b. Which type of search would be best to answer your 2nd question? _____________
why?
6. You decide to focus on your 2nd question. Choose your algorithm and then click the blue
"BLAST" button.
7. Your request is now being processed. The sequence you entered is now being compared to
every one of the millions of sequences in the database you selected. Isn’t that amazing???.
8. Your results will be presented in graphic format. Scroll down to see the pair-wise
alignments below the graph (or click on a bar inside the graph and you will be taken to that
sequence alignment). See if you can figure out what the parts of the graph represent.
a. What does the length of each line indicate? _____________________________
b. What does a red color indicate compare to a pink color?
c. Can you find any potential orthologs of this sequence in organisms other than
Arabidopsis? If so, what species?
________________________
(Note: if you didn’t identify any non-Arabiodpsis sequences you can go back to the BLAST
search page and broaden your search.
9. From the NCBI website: "E Value (Expectation Value) describes the likelihood that a
sequence with a similar score will occur in the database by chance. The smaller the E Value,
the more significant the alignment. For example, an alignment with a very low E value of e-117
means that a sequence with a similar score is very unlikely to have matched your sequence
simply by chance.
Do alignments of your sequence with those in other species have
higher or lower E-values than alignments with A. thaliana sequences?______________
What does this suggest?
10. Click on the blue GenBank link for the entry that represents the Arabidopsis gene with an
exact match over these 60 base pairs. This will take you to the GenBank Record for this gene.
11. Arabidopsis has 5 chromosomes. Each gene in the Arabidopsis genome has a unique identifier
(locus tag) in the format AtNgNNNNN. The At refers to “Arabidopsis thaliana”, the "N" in Ng
refers to the chromosome number, and the final 5 digits refer to the location on the
chromosome. The genes are numbered sequentially along the chromosome. Look through the
GenBank record to identify the Arabidopsis locus tag for your sequence and write it below.
Arabidposis locus tag: ___________________
What chromosome is this gene located on? ____
12. What is the gene’s three letter acronym (usually followed by a number if the gene belongs
to a family of genes):__________________
13. As you did earlier, to learn more about this gene, click on the “gene” link on the left side of
the page. In the box that opens up you can then click on the gene ID number (844241)
a. What is the function of this gene product? ____________________________
b. What is one thing you learned about HAC1 from one of the abstracts listed?
Congratulations! You have found the Arabidopsis gene that you will be
studying in this class!
Turn in your worksheet to a TA. when you are finished.