Download Orthology, paralogy and GO annotation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Public health genomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Essential gene wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Protein moonlighting wikipedia , lookup

Population genetics wikipedia , lookup

Metagenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Pathogenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Genome (book) wikipedia , lookup

Minimal genome wikipedia , lookup

Gene wikipedia , lookup

Koinophilia wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Orthology, paralogy and
GO annotation
Paul D. Thomas
SRI International
Outline
• Why does orthology matter to us?
• A little background on evolution,
orthology and paralogy
• Practical considerations for RefGenome
Why does “orthology” matter
to us?
• Goal
– identify genes in reference genomes that have the
same or similar functions, so that comprehensive
curation can be done simultaneously
• Why?
– Different model organisms have different strengths
for exploring different facets of gene function, and
these can often inform each other
– Most genes did not first evolve within a given
extant species: they were INHERITED from a
common ancestor. Genes in different organisms
have similar functions because they were
inherited, and haven’t changed much since the
common ancestor.
How do we identify genes with
similar functions?
• Evolutionary analysis
• Where do orthologs fit in, and what do we
mean by orthologs?
How do we identify genes with
similar functions?
• Evolutionary analysis
• Where do orthologs fit in, and what do we
mean by orthologs?
– Simple answer: “The same gene in different
organisms” (separated only by speciation)
• Orthology = similar function
How do we identify genes with
similar functions?
• Evolutionary analysis
• Where do orthologs fit in, and what do we
mean by orthologs?
– Simple answer: “The same gene in different
organisms” (separated only by speciation)
• Orthology = similar function
– Unfortunately, the world is not that simple
• Orthologous genes can have the different functions
• Paralogous genes (duplications) can have (to some
extent at least) similar functions
How do we identify genes with
similar functions?
• Evolutionary analysis
• Where do orthologs fit in, and what do we
mean by orthologs?
– Simple answer: “The same gene in different
organisms” (separated only by speciation)
• Orthology = similar function
– Unfortunately, the world is not that simple
• Orthologous genes can have the different functions
• Paralogous genes (duplications) can have (to some
extent at least) similar functions
– Fortunately, a slightly more complicated view can
get us much closer to addressing the question of
gene function
Representing evolution of
related genes
• Start with Darwin’s basic model:
– Copying
• An ancestral “species” “splits” into two separate species
– Divergence
• Each copy (species) changes independently over
generations
– NATURAL SELECTION: adaptation to different
environment
Darwin’s species tree
• Number of generations/time along one axis
• Amount of divergence along other axis
• Characters in common are due to inheritance
– Also tells us something about common ancestor
Representing evolution of
related genes
• “Gene families”
• Add detail from population genetics/molecular
evolution to apply to genes
– Copying
• An ancestral species “splits” into two separate species
– SPECIATION
• A gene is duplicated in one population and subsequently
inherited
– DUPLICATION
– Divergence
• Each copy (gene sequence) changes independently over
generations
– NATURAL SELECTION: sequence substitutions to adapt to new
function/role
– NEUTRAL DRIFT: accumulation of “neutral” substitutions
How does this relate to gene
function?
• Copying
– An ancestral species “splits” into two separate species
• SPECIATION: likely to continue performing ancestral function
– BUT not always
– A gene is duplicated in one population and subsequently
inherited
• DUPLICATION: “redundant gene” free from previous
constraints can adapt to a new function
– BUT still inherits some aspects of ancestral function
• Divergence
– Each “new” (gene sequence) changes independently over
generations
• NATURAL SELECTION: sequence substitutions adapt to
new/modified function/role
• NEUTRAL DRIFT: sequence changes from accumulation of
“neutral” substitutions. This is the MAJOR source of sequence
differences!
A gene tree
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
D.m.
A.g.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
•
Only one “informative” axis: rate of sequence evolution
– For neutral changes this can often act as a “molecular clock”
– Non-neutral changes will speed up the rate of evolution
So what?
Practical considerations
OrthoMCL “ortholog cluster”
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
D.m.
A.g.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
•
•
•
•
An “ortholog cluster” is made by one or more “slices” through the protein family
tree
Some combination of evolutionary rates and history of duplications
Might miss genes that could be efficiently annotated at the same time
From a strict evolutionary standpoint, orthologs are separated ONLY by
speciation events; TIGRFAMs has coined the term “equivalog” for functionally
conserved groups
“ISS”
• Inference from sequence similarity
• A class of database search algorithm (e.g.
BLAST) has become a metaphor
– Implies “genes have similar functions because
they have similar sequences”
– Function is best determined using pairwise
comparison
“ISS”
• More properly, ISS of function is inheritance!
– “related genes have a common function because their
common ancestor had that function, which was inherited by
its descendants”
– ISS is not just a statement about one gene. It is also making
assertions about
• The common ancestor
• Inheritance of a “character” by
– Both “pairwise similar” descendants
– Other descendants
Homology inference in a tree
inheritance and divergence of function
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
“methylene tetrahydrofolate
reductase activity” (m.f.)
“methionine metabolic
process” (b.p.)
D.m.
A.g.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
Homology inference in a tree
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
“methylene tetrahydrofolate
reductase activity” (m.f.)
“methionine metabolic
process” (b.p.)
NOT “methionine metabolic
process” (b.p.)
D.m.
A.g.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
NOT “methylene tetrahydrofolate
reductase activity” (m.f.)?
NOT “methionine metabolic
process” (b.p.)?
“regulation of homocysteine
metabolic process” (b.p.)
Homology inference in a tree
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
D.m.
A.g.
COMBINES:
1. Evolutionary information
(tree)
2. Experimental knowledge
(GO annotations from
literature)
3. Organism-specific
biological knowledge
(curators)
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
This is just an easy, selfconsistent way of doing ISS!
E.c.
A.t. MTHFR1
A.t. MTHFR2
D.d.
S.p.
S.c. MET13
S.p.
S.c. MET12
C.e.
D.m.
A.g.
D.r.
G.g.
H.s. MTHFR
R.n.
M.m.
We have a picture of ALL the relationships
rather than N flat lists that need to be reconciled
Tree annotation tool for
RefGenome
• Pre-computed, searchable “library” of gene trees
– Include “outgroup” organisms to help infer evolutionary histories
– Gene members in any tree can be modified by curator feedback
• Tool for viewing tree and selecting “homology group”
to be annotated
• Tool for viewing tree labeled with in-depth GO
annotations from all MODs, and inferring ancestral
functions and homology annotations
• Homology annotations will be supported by a tree
node as evidence, trees will be available to scientific
community
• HMMs will be constructed to allow other genome
projects to infer GO terms, distributed by InterPro