* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download GenBank Searches
Gene therapy of the human retina wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Point mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Sequence alignment wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Biol 400 F2016 Names: _____________________________________ Section: ______ GENBANK WORKSHEET You will work in pairs. Each pair of two students will turn in ONE GenBank worksheet. Please write both of your names on the top of this sheet. Genbank is an amazing resource with sequence information from thousands of different organisms. This worksheet will help familiarize you with the information stored on GenBank and help you learn to access it freely. 1. Select "GenBank" from the “links” page on the Bio 400 homepage 2. Read "What is GenBank" 3. Click on an annotated sample "record" link to take you to the sample of a GenBank record for a Saccharomyces cerevisiae DNA sequence. Read over this example and answer the following questions: (Note: you can click on any of the words written in blue for a description of that word or abbreviation) a. What information does the “locus” give you? __________________________________ b. How many genes are present in this sequence? ___________________ c. Which of these genes is only partially included in this sequence? _____________ d. What TWO pieces of information in this record allowed you to determine that it was a partial sequence? 1) 2) e. What is the 2nd amino acid of the REV7 gene product ? ____________ f. Write the 6 base pair nucleotide sequence surrounding the start site of translation for AXL2 ? Hint: What does CDS stand for? ____ ___ ___ ___ ___ ___ -3 -2 -1 +1 +2 +3 SEARCHING GENBANK Biol 400 F2016 Names: _____________________________________ Section: ______ Now that you are familiar with a GenBank record, you will perform a GenBank search of your own. You are interested in finding the nucleotide sequence of the gene that codes for an Arabidopsis thaliana chromatin remodeling protein called SWI3B or switch subunit 3 . 1. Go back to the GenBank home page. 2. Select “Search Gen Bank" from the menu on the right side of the page under GenBank Resources. This will take you to a new window that will allow you to search for nucleotide sequences present in genbank. (If you want to search for other sequences (e.g. protein, SNPs etc.) you could do this by selecting a different database in the drop down menu in the box that reads “nucleotide” at the very top of the page 3. Using the GenBank Search function you should be able to identify several different Arabidopsis thaliana nucleotide sequences that contain all or part of the gene "AtSWI3B". Note that there are also sequences from entire chromosomes, but it is hard to find your gene sequence in this large set of nucleotides so don’t choose these! 4. Select a record that contains the complete coding sequence for AtSWI3B - Accession number for this record (e.g. NM 354907): __________________ 5. Click on the link to access that GenBank record and answer these questions: a. How do you know this entry contains a complete coding sequence for Arabidopsis Switch subunit 3 (At SWI3B)? b. What is the common name for Arabidopsis thaliana? __________________ c. How long is the AtSWI3B sequence?_____________________________ d. What is the base pair number of the first base pair of the start codon? ________ e. Write the sequence surrounding the start codon from -3 to +3: (where +1 is the “A” of the start codon) 6. To learn more about this gene, click on the “gene” link. A box should open up. In the box select: Gene ID 817927 and answer the following questions. a. What chromosome is AtSWI3B located on? ____ b. What gene is located closest to the 5’ end of AtSWI3B on the chromosome?_________ c. Scroll down to “Interactions”. What are some proteins (not gene locus numbers) that AtSWI3B has been found to interact with: ___________________________ d. Under bibliography (on the right), follow the Pubmed link to find five abstracts for publications about AtSWI3B. What evidence is there that this chromatin remodeler is involved in gene silencing? What evidence is there that this chromatin remodeler is involved in gene activation? What evidence directly indicates that this chromatin remodeler is involved in stress adaptation (hint: look at the abstract by Saez et al)? BLASTing (BLAST = Basic Local Alignment Search Tool) Imagine that you have cloned and sequenced a portion of an Arabidopsis gene. gtgaacccgt caacccttga acctcggctg gcaagtctaa tcaaaggcag gcagttaaat The questions you want to ask are: 1. Does this sequence match an existing sequenced gene? 2. Is this gene unique to Arabidopsis or does it have orthologs in other species? To answer these questions you need to search online genome databases: 1. Select "BLAST" from the “links” page on the Bio 400 web page or google “BLAST search” http://blast.ncbi.nlm.nih.gov/Blast.cgi 2. Under Basic BLAST Select “Nucleotide BLAST”. 3. Paste your sequence into the “Enter Query Sequence” Box. (The sequence can be copied and pasted from the word document “BioinformaticsSequence” in the Assignments folder on the Bio 400 web page) 4. Select a database from the dropdown menu to search. For the broadest search, use the nucleotide collection (nr/nt). 5. In the Program Selection Box, click the ? button next to “choose a BLAST algorithm” and read the information. a. Which type of search will best allow you to answer your first question (at the top of this page)? ______________________ why? b. Which type of search would be best to answer your 2nd question? _____________ why? 6. You decide to focus on your 2nd question. Choose your algorithm and then click the blue "BLAST" button. 7. Your request is now being processed. The sequence you entered is now being compared to every one of the millions of sequences in the database you selected. Isn’t that amazing???. 8. Your results will be presented in graphic format. Scroll down to see the pair-wise alignments below the graph (or click on a bar inside the graph and you will be taken to that sequence alignment). See if you can figure out what the parts of the graph represent. a. What does the length of each line indicate? _____________________________ b. What does a red color indicate compare to a pink color? c. Can you find any potential orthologs of this sequence in organisms other than Arabidopsis? If so, what species? ________________________ (Note: if you didn’t identify any non-Arabiodpsis sequences you can go back to the BLAST search page and broaden your search. 9. From the NCBI website: "E Value (Expectation Value) describes the likelihood that a sequence with a similar score will occur in the database by chance. The smaller the E Value, the more significant the alignment. For example, an alignment with a very low E value of e-117 means that a sequence with a similar score is very unlikely to have matched your sequence simply by chance. Do alignments of your sequence with those in other species have higher or lower E-values than alignments with A. thaliana sequences?______________ What does this suggest? 10. Click on the blue GenBank link for the entry that represents the Arabidopsis gene with an exact match over these 60 base pairs. This will take you to the GenBank Record for this gene. 11. Arabidopsis has 5 chromosomes. Each gene in the Arabidopsis genome has a unique identifier (locus tag) in the format AtNgNNNNN. The At refers to “Arabidopsis thaliana”, the "N" in Ng refers to the chromosome number, and the final 5 digits refer to the location on the chromosome. The genes are numbered sequentially along the chromosome. Look through the GenBank record to identify the Arabidopsis locus tag for your sequence and write it below. Arabidposis locus tag: ___________________ What chromosome is this gene located on? ____ 12. What is the gene’s three letter acronym (usually followed by a number if the gene belongs to a family of genes):__________________ 13. As you did earlier, to learn more about this gene, click on the “gene” link on the left side of the page. In the box that opens up you can then click on the gene ID number (844241) a. What is the function of this gene product? ____________________________ b. What is one thing you learned about HAC1 from one of the abstracts listed? Congratulations! You have found the Arabidopsis gene that you will be studying in this class! Turn in your worksheet to a TA. when you are finished.