* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download COMP.350/580.202 LAB: GENOME ANNOTATION 2/3/16 Reference
Gene desert wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Molecular cloning wikipedia , lookup
Gene expression wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Point mutation wikipedia , lookup
Genome evolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene regulatory network wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Molecular evolution wikipedia , lookup
COMP.350/580.202 LAB: GENOME ANNOTATION 2/3/16 Reference on Annotation (www.cs.uml.edu/~kim/580/review_Annotation.pdf). LAB Experiments DUE 2/4 (Th) 5:00 PM Write answers to the bold-faced questions through Experiment 3. If you can complete any other questions after question 12 in Experiment 3, include them in the answers. Email the answers to [email protected]. Experiment 1: Find Repeats in DNA Concept: Genomes consist to a larger or lesser extend of various types of repetitive DNA. I. Create a Project 1. Go to http://www.dnasubway.org and sign up as a guest. 2. In DNA Subway, click the red square to annotate a genomic sequence. 3. Under ‘Select Organism type,’ select ‘Plant’ and ‘Dictyledon.’ 4. Select ‘Select a sample sequence,’ and pick Arabidopsis thaliana (mouse-ear cress) Synthetic Contig. 5. Provide a title (required), a project description (optional) and click Continue. II. Identify and Mask Repeats 1. Click RepeatMasker. 2. Once the bullet has finished blinking, click RepeatMasker again to view a listing of repetitive DNA sequences RepeatMasker has identified and masked. 3. How many and which types of repetitive DNA did RepeatMasker identify? What are their lengths? Can you identify any association between types and length ranges? 4. Close the table to return to DNA Subway. 5. Click Local Browser to view the results in a graphical interface. 6. Maximize the browser window. 7. Change Show 10 kb to Show 25 kbp. 8. How many and which types of repetitive DNA does the browser display? 9. Which of the two views, table or graphics, would you find easier to work with. 10. Close the Local Browser screen to return to DNA Subway. Experiment 2: Predict Genes in DNA Genes can be identified by their characteristics – where do gene predictors “see” genes? 1. Click Augustus. 2. Once Augustus has finished click FGenesH. Then, click SNAP. Finally, click tRNA Scan. (The Augustus, FGenesH and SNAP algorithms predict proteincoding genes; tRNA Scan identifies tRNA genes.) 3. Which program runs significantly longer than any other? 4. Again, view the results in the table view and the Local Browser. How many genes did the gene predictors predict? What kind of structures can you identify in the browser? What do the different structure elements symbolize? 5. Do the different programs predict the same genes ? Can you identify differences among the predictions? Which do you think got it right? Experiment 3: Insert a Start Codon into a Gene Genes have a beginning and an end. 1. Click Apollo. 2. Click Tiers and select Expand Tiers to view the entire evidence available. (Apollo initially collapses each evidence types onto a single line each, regardless of how many pieces of evidence are available for each position.) 3. Describe how gene features are displayed by Apollo; does Apollo use the same or different graphical elements than the browser? 4. Compare and contrast the predicted gene models for the four locations. Zoom, pan and scroll to nucleotide position 600-1,600 until you can comfortably view details for a gene on the forward strand in this location. 5. Compare the predictions with each other – what similarities and differences can you identify? 6. Discrepancies between the gene predictions and biological evidence consist in: inaccurate transcriptional start and termination sites and therefore inaccurate 5’- and 3’-untranslated regions (caused by difficulties predicting first and last exons due to transcriptional start and termination sites not following easily discernable patterns). 7. Double-click the FGenesH prediction and move it onto the workspace. 8. What is the meaning of the green and red lines that appeared at the ends of this prediction upon moving it onto the workspace? Zoom into the beginning of the gene until you can discern the nucleotide triplet in the position to the left. What does the green highlight indicate? 9. Zoom and pan to the end of the gene to examine the meaning of the red highlight. What nucleotide triplet do you find? What is its meaning? 10. Do these findings synch with what you know about molecular biology? Explain how a G on DNA ends up being a G on mRNA instead of a C. 11. Zoom out to view the region from position 600 to position 1600. 12. Double-click and move the Augustus prediction onto the workspace. What structures can you identify in this model? Zoom into the model until you can discern the individual letters of the sequences. What does the filled box indicate? What about the open part of the box? 13. The August-predicted model does not seem to entail a start codon. In order to fix this, move your cursor to the top of the Apollo screen where you should be able to identify three rows of green and three rows of red ticks. What do you think these represent? (Hint: zoom into the locations for a few of these ticks and check the sequence that is associated with each of them.) 14. Drag the first green tick that is located within the boundaries of the Augustus predicted gene model onto the model in your workspace and let go. Describe the result of this action. 15. The FGenesH prediction and the Augustus prediction for this gene are not mutually exclusive; explain why this is so. What parts of genes do you think FGenesH is programmed to predict? How about Augustus? Experiment 4: Examine Spliced Genes What sequence patterns signify splice sites? 1. Zoom, pan and scroll to nucleotide position 2,000-5,600 until you can comfortably view details for a gene on the forward strand in this location. 2. Compare the predictions with each other – what similarities and differences can you identify? 3. What would you need in order to decide which of the predictions is correct? 4. Double-click each of the three predictions and move them onto the workspace. 5. Determine the pattern that signifies the borders between exons and introns (splice sites): a. zoom into the first exon-intron border for the Augustus-derived model (position 2387/2388) until you can read the nucleotide sequence; b. record the last three nucleotides of the exon and the first three nucleotides of the successive intron; c. pan to the next exon-intron border (position 3017/3018) and repeat; d. repeat again for the last exon-intron border at position 4353/4354; e. pan to the first intron-exon border (position 2694/2695) and record the last three nucleotides for the intron and the first three nucleotides for the successive exon; f. repeat for the intron-exon borders in positions 3723/3724 and 4445/4446. g. Determine the nucleotide sequence pattern for exon-intron and the pattern for intron-exon borders; h. refine your findings by conducting the same analysis for the exon-intron border of the Augustus-derived gene model in position 6,400-9,200. 6. Zoom out to view the region from position 2,000 to position 5,600 again. 7. What differentiates the Augustus-predicted model from the FGenesHpredicted model? Which of the two does SNAP emulate? 8. Move on to the other two locations that contain predicted genes and determine the differences between the models predicted by the three different algorithms Augusts, FGenesH and SNAP. 9. If you find different predictions leading to conflicting models, explain what would be required to be able to decide which gene prediction got it right. 10. To conclude your work click menu tab File and select Upload to DNA Subway. 11. Close the Apollo to return to DNA Subway. Experiment 5: Identify Biological Evidence Protein-coding genes are transcribed into RNA, which is processed into mRNA, which is translated into proteins – were can one find material evidence for the genes in this contig? 1. Click the BLAST buttons to search databases of known genes and transcripts such as cDNAs or ESTs (BLASTN) and proteins (BLASTX) for sequences that match the genomic DNA sequence. (Too brush up on how mRNA is isolated and transformed into expressed sequence tags (ESTs) and complementary DNA (cDNA) click the Background button at the bottom of the DNA Subway screen.) 2. View BLASTN and BLASTX matches in the table view and the Local Browser. 3. For how many predicted genes did BLAST generate biological evidence?