Download lecture 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

NEDD9 wikipedia , lookup

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Metagenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Human genome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

Public health genomics wikipedia , lookup

Genomics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Genomic library wikipedia , lookup

Ridge (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome (book) wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genome Annotation and
Analysis
9/24/08
Phylogenetic profile analysis
¾ Minimal spanning tree of a undirected graph: a spanning tree
that has minimum total weight on the edges
gi
d(gi,gj)
gj
gi
d(gi,gj)
gj
Minimum spanning tree
Phylogenetic profile analysis
¾ The Prim’s algorithm for finding the minimum spanning tree in a
graph.
1. Mark any vertex v, and find
the edge e(v,w) among v’s
edges with smallest weight,
and label w;
2. Find the edge with the
smallest weight among all the
edges between any labeled
vertex u and any unlabeled
vertex x, label x;
weight
3. Repeat step 2 until all
vertex are
Labeled.
1
4
2
3
5
6
7
1 2 3 4 5 6
order
Phylogenetic profile analysis
¾ When recruiting vertices into the minimum spanning tree, the
Prim’s algorithm always exhausts the vertices in a cluster
before jumping out to another cluster;
d(i,j)
Y axis
¾ The plot of the order of recruiting a vertex v v.s the distance
between v and the vertex that recruits it into the minimum
spanning tree has a good visualization effect.
X axis
d (i, j ) = ( xi − x j ) + ( yi − y j )
2
2
Recruiting index
Phylogenetic profile analysis
¾ A minimal spanning tree based clustering algorithm
genes unique
to WH8102
House keeping
(universal) genes
existing in all
genomes
genes related
to
“phosphorus”
Phylogenetic profile analysis
¾ COG specific phylogenetic profiles can be used to predict
functional association of among COGs, and the living styles of
organisms.
Living style
clusters
Genome1
Functional
clusters
Genome2
………
Genome3
COG1
1
0
………
1
COG2
0
1
………
0
…
…
…
………
…
COGm
0
0
………
1
Phylogenetic profile analysis
¾ COG specific phylogenetic profiles, when inverted, can be used
to predict non-orthlologous gene displacements:
Genome1 Genome2
COG1
COG2
…
COGm
Non-orthlogous
gene
displacement COG’1
COG’2
…
COG’m
……… Genome3
1
0
…
0
0
1
…
0
………
………
………
………
1
0
…
1
0
1
…
1
1
0
…
1
………
………
………
………
0
1
…
0
Phylogenetic profile analysis
¾ Examples of non-orthologous gene displacement found by
detecting complementary phylogenetic profiles:
Phosphoglycerate mutase
Cofactor dependent
Cofactor independent
Fructose-1,6 biphosphate aldolases
Metal dependent(E&B)
Metal independent(U&B)
Metal independent(A)
Thymidylate synthetases
Prediction of protein-protein interaction through
detecting domain fusion events
¾ If two genes in a genome are known to be fused into one gene
coding a multi-domain protein in another genome, then the
respective proteins encoded by these two genes are likely to
physically interact with each other, and thus are functional
related. The fused multi-domain protein is called the Rosetta
stone.
homology
search
Target
genome
gi
1
Score = − (log E1 + log E2 )
2
nr
Database
gj
E2
E1
159
4
153
298
Prediction of protein-protein interaction through
detecting domain fusion events
¾ Predicted protein-protein interaction network in Synechococus
sp. WH8102.
Prediction of protein-protein interaction through
detecting domain fusion events
¾ Predicted protein-protein interaction network in Synechococus
sp. WH8102.
3.5
power law relationship
3
−γ
C
=
n
power law relationship
Counts (logC)
2.5
2
1.5
1
0.5
0
-0.5
0
0.2
0.4
0.6
0.8
1
1.2
Number of interactions (logn)
1.4
Prediction of protein-protein interaction through
detecting domain fusion events
¾ Potential pitfalls for the Rosetta methods:
the transitive rule can applied but promiscuous domain should
be excluded;
¾ It is better to be combined with other genome context methods.
¾ An example:
Peptide
methionine
sulfoxide
¾ Gene order on the
chromosome are generally
not conserved, however,
operon structures are more
or less conserved;
Pseudomonas aeruginosa
¾ Thus if the neighborhood of
a gene pair with the same
orientation is conserved in
not closely related genome,
then these two genes are
likely to be located in the
same operon, and are
functionally related.
Chlamydia trachomatis
Chlamydia pneumoniae
Gene clusters and genomic
neighborhoods
Escherichia coli
Gene clusters and genomic neighborhoods
¾ Operons are relatively conserved in prokaryotes because
operon structure facilitates HGT--- selfish operon hypothesis;
¾ Uber-operon: a set of operons in a genome that are functionally
related because the orthologs of genes in different operons are
located in the same operons in some other genomes:
Genome 1
Genome 1
Genome 1
¾ Genes in an uber-operon tend to be involved in the same
biological process.
Gene clusters and genomic neighborhoods
¾ Gene neighborhood based analyses seem not suitable for
eukaryotes because of the apparent lack of clustering of
functionally linked genes;
¾ Online tools for gene neighbor analysis:
• STRING database (http://string.embl.de/): Include all three
types of genomic context analysis, with a nice graphics view.
• KEGG SSDB gene cluster analysis tool
(http://www.genome.jp/kegg/ssdb/ )
Gene clusters and genomic neighborhoods
¾ Example 1:
Archaeal
shikimate
kinase
Gene clusters and genomic neighborhoods
¾ Example 2: Prediction of a novel DNA repair system in
thermophiles