Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MBoMS Genomics of Model Microbes Lab 5: Recap of Taxplot and Alignments Comparing Microbial Genomes • Last time you learned how to find specific genes in a target microbial genome – Often, you will be less interested in a specific gene or protein and more interested in making more sweeping comparisons between genomes • The exercised in this lab will teach you how to employ several NCBI-based genome comparison tools Exercise 1 • Go to the Microbial Genome Resource Page • Find the tool box on the right edge • Click on TaxPlot –Use help to learn a bit about TaxPlot Taxplot • What is taxplot? – A three-way genome comparison tool based on pre-computed protein BLASTP E-values – It displays a point for each protein in the reference genome based on the best alignment with proteins in each of the two genomes being compared • What is new about taxplot – It employs the BLAST Score Ratio (BSR) approach, which classifies all putative peptides within three genomes using a measure of similarity based on the ratio of BLAST scores – BSR analysis is a departure from traditional genome scale analyses as it overcomes the limitations of BLAST E-values in comparative studies by normalizing the BLAST raw scores. • What does taxplot provide? – The output of the BSR analysis enables global visualization of the degree of proteome similarity between all three genomes – Additional output enables the genomic synteny (conserved gene order) between each genome pair to be assessed – The synteny analysis is overlain with BSR data as a color dimension, enabling visualization of the degree of similarity of the peptides being compared QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Exercise 2 • Now, use TaxPlot to compare multiple genomes from each of your two species –In TaxPlot, choose 3 genomes for one of your species • you can scroll through the options at the top of the page to find all available genomes for each species • Let TaxPlot calculate the relationships between the 3 genomes –Repeat for the second species Sample TaxPlot: E. coli 101 vs E. coli K12 vs E. coli 0157:H7 523 hits K12 and 0157 Share a large Number of proteins That are equally similar to 101 1002 equal hits QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. K12 and 101 share the greatest similarity 2120 hits 4238 query proteins produced 3645 hits Sample TaxPlot Results • 4238 proteins in 101 were compared in a 101xK12x0157 three way comparison – 1002 of the comparisons were equivalent for K12 and 0157 – 2120 of the comparisons had “better” scores for K12 – 523 of the comparisons had “better” scores for 0157 • You can click on any of the circles to get details of the specific gene(s) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Exercise 2, cont. • Go to the Tax Plot results for each of your two species – Compare and contrast the results for the two species – Are the genomes from one species more or less similar than the genomes for the second species? – Do the plots show high levels of synteny (the genes are in the same order or same place in the genomes)? Ec1 x Ec2 x Ec3 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Ec1 x Ec2 x Se1 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Ec1 x Ec2 x Bc1 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Ec1 x Ec2 x Ba1 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Table 1. Taxplot percent better scores ec1 ec2 ec3 se1 se2 se3 ec1 100 75 85 62 62 61 ec2 38 100 36 63 63 63 ec3 83 73 100 76 75 76 se1 38 38 38 100 30 11 se2 71 71 71 80 100 94 se3 50 60 61 71 77 100 hits 3823 3981 4005 3011 3273 3828 3410 2944 3637 3173 3556 3029 3281 3305 3630 3741 3572 3624 equal 786 1906 834 1523 231 202 303 330 321 361 322 337 845 1802 903 1921 873 1901 above 652 1089 2581 876 763 215 981 1141 1059 1260 1031 1177 1205 1231 1337 919 1308 883 Table 2. Taxplot raw data taxa1 ec1 ec2 ec3 se1 se2 se3 ec1 ec1 ec2 ec2 ec3 ec3 se1 se1 se2 se2 se3 se3 taxa2 ec2 ec1 ec1 se2 se1 se1 se1 se3 se1 se3 se1 se3 ec1 ec3 ec1 ec3 ec1 ec3 taxa3 ec3 ec3 ec2 se3 se3 se2 se2 se2 se2 ec2 ec2 ec2 query 4238 4629 4783 4510 5604 5386 4238 4238 4629 4629 4783 4783 4510 4510 5604 5604 5386 5386 below 2385 986 590 612 2279 3411 2126 1472 2257 1552 2203 1515 1231 726 1390 901 1391 840 Table 3. Taxplot pairwise percent best scores ec within %best se within ec1 ec1 ec2 ec2 ec3 ec3 ec2 ec3 ec1 ec3 ec1 ec2 652 2385 1089 986 2581 590 21 79 52 48 81 19 se1 se1 se2 se2 se3 se3 se2 se3 se1 se3 se1 se2 786 + 652 3823 %best 876 612 763 2279 215 3411 59 41 25 75 6 94 X 100 What other ways could we visualize these data? 100 % Proteome similarity Within Ec Ec Between Ec x Se Within Se Between Se x Ec Se 100 % Proteome similarity DUE NEXT LAB Taxplot results • What did taxplot tell you about the comparison of genomes within your two species? • What did taxplot tell you about the comparison of genomes between your two species? • How can we use these results to refine our study? Put in lab notebook FROM LAST TIME Alignments • You should have with you alignments produced by CLUSTALW for each of your proteins – Was CLUSTALW straightforward to use? • If not, why? – Did you have problems entering your data? • If yes, how did you solve them? – Did adjusting the gap and extension penalties help to improve your alignments? • If yes, which proteins? – Do you feel your alignments are the best they can be? • If yes, why? • If not, what should you do? FROM LAST TIME Alignments • Let’s look at your 12 alignment files – Print them out if you have not already done so • How well does each protein align within one species? • How does the alignment change when compared between two species? • Can we compare any of these proteins between all of our species (~30 species)? FROM LAST TIME Alignments • Please make sure that Peg or Michelle looks at your alignments and helps you to make them as robust as is possible – In some cases, we may choose to exclude a protein from analysis – In some cases, we may urge you to delete a taxa – In some cases, we may urge you to try more gap and/or extension penalty values • The goal for today is to finalize our alignments FROM LAST TIME Alignments • Your initial alignments are due today in class, please print them out and hand them in (they probably won’t be perfect, that is okay, we will work on them in class today) • Your finalized alignments are due on Thursday in class, please print them out and hand them in on Thursday