Download MBoMS Genomics of Model Microbes Lab 3: Tools for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polyploid wikipedia , lookup

DNA barcoding wikipedia , lookup

Microevolution wikipedia , lookup

Minimal genome wikipedia , lookup

Koinophilia wikipedia , lookup

Genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
MBoMS
Genomics of Model Microbes
Lab 5: Recap of Taxplot and Alignments
Comparing Microbial
Genomes
• Last time you learned how to find specific
genes in a target microbial genome
– Often, you will be less interested in a specific
gene or protein and more interested in making
more sweeping comparisons between genomes
• The exercised in this lab will teach you how
to employ several NCBI-based genome
comparison tools
Exercise 1
• Go to the Microbial Genome
Resource Page
• Find the tool box on the right
edge
• Click on TaxPlot
–Use help to learn a bit about
TaxPlot
Taxplot
• What is taxplot?
– A three-way genome comparison tool based on pre-computed protein BLASTP
E-values
– It displays a point for each protein in the reference genome based on the best
alignment with proteins in each of the two genomes being compared
• What is new about taxplot
– It employs the BLAST Score Ratio (BSR) approach, which classifies all putative
peptides within three genomes using a measure of similarity based on the ratio of
BLAST scores
– BSR analysis is a departure from traditional genome scale analyses as it
overcomes the limitations of BLAST E-values in comparative studies by
normalizing the BLAST raw scores.
• What does taxplot provide?
– The output of the BSR analysis enables global visualization of the degree of
proteome similarity between all three genomes
– Additional output enables the genomic synteny (conserved gene order) between
each genome pair to be assessed
– The synteny analysis is overlain with BSR data as a color dimension, enabling
visualization of the degree of similarity of the peptides being compared
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Exercise 2
• Now, use TaxPlot to compare
multiple genomes from each of
your two species
–In TaxPlot, choose 3 genomes for one
of your species
• you can scroll through the options at
the top of the page to find all available
genomes for each species
• Let TaxPlot calculate the relationships
between the 3 genomes
–Repeat for the second species
Sample TaxPlot:
E. coli 101 vs E. coli K12 vs E. coli 0157:H7
523 hits
K12 and
0157
Share a
large
Number of
proteins
That are
equally
similar to
101
1002 equal hits
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
K12 and 101
share the
greatest
similarity
2120 hits
4238 query proteins produced 3645 hits
Sample
TaxPlot Results
• 4238 proteins in 101 were
compared in a 101xK12x0157 three
way comparison
– 1002 of the comparisons were
equivalent for K12 and 0157
– 2120 of the comparisons had “better”
scores for K12
– 523 of the comparisons had “better”
scores for 0157
• You can click on any of the circles
to get details of the specific gene(s)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Exercise 2, cont.
• Go to the Tax Plot results for each of your
two species
– Compare and contrast the results for the two
species
– Are the genomes from one species more or
less similar than the genomes for the second
species?
– Do the plots show high levels of synteny (the
genes are in the same order or same place in
the genomes)?
Ec1 x Ec2 x Ec3
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Se1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Bc1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Ba1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Table 1. Taxplot percent better scores
ec1
ec2
ec3
se1
se2
se3
ec1
100
75
85
62
62
61
ec2
38
100
36
63
63
63
ec3
83
73
100
76
75
76
se1
38
38
38
100
30
11
se2
71
71
71
80
100
94
se3
50
60
61
71
77
100
hits
3823
3981
4005
3011
3273
3828
3410
2944
3637
3173
3556
3029
3281
3305
3630
3741
3572
3624
equal
786
1906
834
1523
231
202
303
330
321
361
322
337
845
1802
903
1921
873
1901
above
652
1089
2581
876
763
215
981
1141
1059
1260
1031
1177
1205
1231
1337
919
1308
883
Table 2. Taxplot raw data
taxa1
ec1
ec2
ec3
se1
se2
se3
ec1
ec1
ec2
ec2
ec3
ec3
se1
se1
se2
se2
se3
se3
taxa2
ec2
ec1
ec1
se2
se1
se1
se1
se3
se1
se3
se1
se3
ec1
ec3
ec1
ec3
ec1
ec3
taxa3
ec3
ec3
ec2
se3
se3
se2
se2
se2
se2
ec2
ec2
ec2
query
4238
4629
4783
4510
5604
5386
4238
4238
4629
4629
4783
4783
4510
4510
5604
5604
5386
5386
below
2385
986
590
612
2279
3411
2126
1472
2257
1552
2203
1515
1231
726
1390
901
1391
840
Table 3. Taxplot pairwise percent best scores
ec within
%best
se within
ec1
ec1
ec2
ec2
ec3
ec3
ec2
ec3
ec1
ec3
ec1
ec2
652
2385
1089
986
2581
590
21
79
52
48
81
19
se1
se1
se2
se2
se3
se3
se2
se3
se1
se3
se1
se2
786 + 652
3823
%best
876
612
763
2279
215
3411
59
41
25
75
6
94
X
100
What other ways could we
visualize these data?
100 % Proteome similarity
Within Ec
Ec
Between Ec x Se
Within Se
Between Se x Ec
Se
100 % Proteome similarity
DUE NEXT LAB
Taxplot results
• What did taxplot tell you about the comparison
of genomes within your two species?
• What did taxplot tell you about the comparison
of genomes between your two species?
• How can we use these results to refine our
study?
Put in lab notebook
FROM LAST TIME
Alignments
• You should have with you alignments produced by
CLUSTALW for each of your proteins
– Was CLUSTALW straightforward to use?
• If not, why?
– Did you have problems entering your data?
• If yes, how did you solve them?
– Did adjusting the gap and extension penalties help to
improve your alignments?
• If yes, which proteins?
– Do you feel your alignments are the best they can be?
• If yes, why?
• If not, what should you do?
FROM LAST TIME
Alignments
• Let’s look at your 12 alignment files
– Print them out if you have not already done so
• How well does each protein align within one
species?
• How does the alignment change when
compared between two species?
• Can we compare any of these proteins
between all of our species (~30 species)?
FROM LAST TIME
Alignments
• Please make sure that Peg or Michelle looks
at your alignments and helps you to make
them as robust as is possible
– In some cases, we may choose to exclude a
protein from analysis
– In some cases, we may urge you to delete a taxa
– In some cases, we may urge you to try more gap
and/or extension penalty values
• The goal for today is to finalize our
alignments
FROM LAST TIME
Alignments
• Your initial alignments are due today in
class, please print them out and hand
them in (they probably won’t be perfect,
that is okay, we will work on them in
class today)
• Your finalized alignments are due on
Thursday in class, please print them out
and hand them in on Thursday