Download Phylogeny

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
Phylogeny
Modified by Dietlind Gerloff
for use in BME110/BIOL181
Fall 2009
Winter
2009
Winter
2008
© Wiley Publishing. 2007. All Rights
Reserved.
Learning Objectives
!Understand the most basic concepts of phylogeny
!Be able to compute simple phylogenetic trees
!Understand/remember the difference between
orthologs and paralogs
!Understand what bootstrapping means in phylogeny
Outline
! Finding out what to do with phylogeny
! Gathering sequences to make a tree
! Preparing your multiple-sequence alignment
! Computing a tree
! Bootstrapping your tree to check its reliability
! Displaying your tree
Why Build
a Phylogenetic Tree ?
! Phylogenetic trees reconstruct the evolutionary history
of your sequences
! They tell you who is closer to whom in the big tree of life
! Phylogenetic trees are based on sequence similarity
rather than morphologic characters
3 Ways to Use Your Gene Tree
! Finding the closest relative of your organism
• Usually done with a tree based on the ribosomal RNA
! Discovering the function of a gene
• Finding the orthologues of your gene
! Finding the origin of your gene
• Finding whether your gene comes from another species
Orthology and Paralogy
! Orthologous genes
• Separated by speciation
• Often have the same function (but this is not
! Paralogous genes
a requirement!)
• Separated by duplications
• Can have different functions
! In the graph:
• A is paralogous with B
• A1 is orthologous with A2
• A1 is also paralogous with B2!
Working on the Right Data
! Garbage in " garbage out
! The quality of your tree depends on the quality of the data
! Your first task is to assemble a very accurate MSA
DNA or Proteins
! Most phylogenetic methods work on Proteins and DNA sequences
! If possible, always compute a multiple-sequence alignment on the
protein sequences
• Translate the sequences if the DNA is coding
• Align the sequences
http://www.bork.embl.de/
coot.embl.de/
• Thread the DNA sequences back onto the protein MSA with coot.embl.de/pal2nal
pal2nal/
pal2nal
! If your DNA sequences are coding and have more than 70% identity . . .
• Compute the tree on the DNA multiple-sequence alignment
! If your DNA sequences are coding and have less than 70% identity . . .
• Compute the tree on the protein multiple-sequence alignment
Which Sequences ?
! Orthologous sequences
• If you want to produce a species tree
• Show how the considered species have diverged
! Paralogous sequences
• If you want to produce a gene tree, include them
• Show the evolution of a protein family
• May help speculate about the function of a protein of interest
Establishing Orthology
! Establishing orthology is very complicated
! It is common practice to establish orthology using the best
reciprocal BLAST
•
•
•
•
A is a gene of Genome X
B is a gene of Genome Y
BLAST (Gene A against Genome X) = B
BLAST (Gene B against Genome Y) = A
Note: the COG functional annotation that we encountered early in the course
approximately follows this same idea, only it considers more than two species.
! A is B’s best friend and B is A’s best friend…
! Phylogeny purists dislike this method
Creating the Perfect Dataset
(Note: “recombinant” in the sense of hybrid,
or chimeric, sequences - cloned expression
is OK if the sequence is not altered)
Building the Right MSA
! Your MSA should have as few gaps as possible.
• Typical alignment editing step: consolidating gaps
! Some variability but not too much!
! Some conservation but not too much!
Building the Right Tree
! There are two types of tree-reconstruction methods
• Distance-based methods
• Statistical methods
! Statistical methods are the most accurate
• Maximum likelihood of success
Note: Parsimony methods are intuitive (favor to
• Parsimony
use a minimum number of substitutions) but are
Note: Parsimony methods are intuitive (= favor using as few as possible mutations) but
beclosely
accurate
forsequences.
very closely
are only deemedonly
to be deemed
accurate fortovery
related
related genes/proteins.
! Statistical methods take
more time
• Limited to small datasets
Distance-based Methods for
Tree Reconstruction
! Distance-based methods are the most popular
• Neighbor Joining (NJ)
• UPGMA
(Note: most MSA guide trees are UPGMA trees NEVER use those for actual phylogeny, and/or publications)
! Distance-based methods involve 2 steps:
• Measure the distances between pairs of sequences in the MSA
• Transform the distance matrix into a tree
(Note: some distance-based methods do not use MSAs but pair-wise alignments; this is not
optimal but one can obtain decent trees that way if the proper stats/computation is applied!)
! The two most popular packages for making trees are
• Clustalw: very simple, not very sophisticated (best avoided for phylogeny, s.above)
easily obtained/learned
• Phylip: very powerful, less convivial
• TreeTop (Genebee server): is a web server that produces high quality trees
Which Format ?
! Trees are displayed in graphic formats
! Always keep a version of your tree in
newick format
• Also called new-hampshire, or nh
• Note the parentheses in this format
! Display your nh tree
with "Phylodendron"
(for example)
• Use iubio.bio.indiana.edu/treeapp
Reading Your Tree
! There’s a lot of vocabulary in a tree
! Nodes correspond to common ancestors
! The root is the oldest ancestor
• Often artificial
• Only meaningful with a good outgroup
! Trees can be un-rooted
! Branch lengths are only meaningful when
the tree is scaled *
*(Spotting the difference
• Cladograms are often scaled
• Phenograms are usualy unscaled is easy - unscaled trees
look very/too regular!)
A good tree figure contains: an indication of distance (scale), and branching
confidence (e.g. bootstrap values) - sadly we only rarely see these…
Bootstrapping
! Use bootstrapping to verify the solidity of each node
! ClustalW and Phylip do bootstrap operations automatically
! Bootstrapping involves these steps:
•
•
•
•
•
Select a subset of your MSA
Redo the tree
Repeat this operation N times (100 or 1000 times if you can)
Compute a consensus tree of the N trees
Measure how many of the N trees agree with the consensus tree on
each node
! Each node gets a bootstrap figure between 0 and N
! High bootstrap " good node
A Bootstrapped Tree
! This tree was produced with 2
bootstrap cycles
! It shows some nodes as more
robust than others
! In practice, always use more
than 100 cycles
(e.g. TreeTop offers 100)