* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture 9
Gene therapy of the human retina wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Oncogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Metagenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Public health genomics wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Genomic library wikipedia , lookup
Ridge (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Designer baby wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Genome Annotation and Analysis 9/24/08 Phylogenetic profile analysis ¾ Minimal spanning tree of a undirected graph: a spanning tree that has minimum total weight on the edges gi d(gi,gj) gj gi d(gi,gj) gj Minimum spanning tree Phylogenetic profile analysis ¾ The Prim’s algorithm for finding the minimum spanning tree in a graph. 1. Mark any vertex v, and find the edge e(v,w) among v’s edges with smallest weight, and label w; 2. Find the edge with the smallest weight among all the edges between any labeled vertex u and any unlabeled vertex x, label x; weight 3. Repeat step 2 until all vertex are Labeled. 1 4 2 3 5 6 7 1 2 3 4 5 6 order Phylogenetic profile analysis ¾ When recruiting vertices into the minimum spanning tree, the Prim’s algorithm always exhausts the vertices in a cluster before jumping out to another cluster; d(i,j) Y axis ¾ The plot of the order of recruiting a vertex v v.s the distance between v and the vertex that recruits it into the minimum spanning tree has a good visualization effect. X axis d (i, j ) = ( xi − x j ) + ( yi − y j ) 2 2 Recruiting index Phylogenetic profile analysis ¾ A minimal spanning tree based clustering algorithm genes unique to WH8102 House keeping (universal) genes existing in all genomes genes related to “phosphorus” Phylogenetic profile analysis ¾ COG specific phylogenetic profiles can be used to predict functional association of among COGs, and the living styles of organisms. Living style clusters Genome1 Functional clusters Genome2 ……… Genome3 COG1 1 0 ……… 1 COG2 0 1 ……… 0 … … … ……… … COGm 0 0 ……… 1 Phylogenetic profile analysis ¾ COG specific phylogenetic profiles, when inverted, can be used to predict non-orthlologous gene displacements: Genome1 Genome2 COG1 COG2 … COGm Non-orthlogous gene displacement COG’1 COG’2 … COG’m ……… Genome3 1 0 … 0 0 1 … 0 ……… ……… ……… ……… 1 0 … 1 0 1 … 1 1 0 … 1 ……… ……… ……… ……… 0 1 … 0 Phylogenetic profile analysis ¾ Examples of non-orthologous gene displacement found by detecting complementary phylogenetic profiles: Phosphoglycerate mutase Cofactor dependent Cofactor independent Fructose-1,6 biphosphate aldolases Metal dependent(E&B) Metal independent(U&B) Metal independent(A) Thymidylate synthetases Prediction of protein-protein interaction through detecting domain fusion events ¾ If two genes in a genome are known to be fused into one gene coding a multi-domain protein in another genome, then the respective proteins encoded by these two genes are likely to physically interact with each other, and thus are functional related. The fused multi-domain protein is called the Rosetta stone. homology search Target genome gi 1 Score = − (log E1 + log E2 ) 2 nr Database gj E2 E1 159 4 153 298 Prediction of protein-protein interaction through detecting domain fusion events ¾ Predicted protein-protein interaction network in Synechococus sp. WH8102. Prediction of protein-protein interaction through detecting domain fusion events ¾ Predicted protein-protein interaction network in Synechococus sp. WH8102. 3.5 power law relationship 3 −γ C = n power law relationship Counts (logC) 2.5 2 1.5 1 0.5 0 -0.5 0 0.2 0.4 0.6 0.8 1 1.2 Number of interactions (logn) 1.4 Prediction of protein-protein interaction through detecting domain fusion events ¾ Potential pitfalls for the Rosetta methods: the transitive rule can applied but promiscuous domain should be excluded; ¾ It is better to be combined with other genome context methods. ¾ An example: Peptide methionine sulfoxide ¾ Gene order on the chromosome are generally not conserved, however, operon structures are more or less conserved; Pseudomonas aeruginosa ¾ Thus if the neighborhood of a gene pair with the same orientation is conserved in not closely related genome, then these two genes are likely to be located in the same operon, and are functionally related. Chlamydia trachomatis Chlamydia pneumoniae Gene clusters and genomic neighborhoods Escherichia coli Gene clusters and genomic neighborhoods ¾ Operons are relatively conserved in prokaryotes because operon structure facilitates HGT--- selfish operon hypothesis; ¾ Uber-operon: a set of operons in a genome that are functionally related because the orthologs of genes in different operons are located in the same operons in some other genomes: Genome 1 Genome 1 Genome 1 ¾ Genes in an uber-operon tend to be involved in the same biological process. Gene clusters and genomic neighborhoods ¾ Gene neighborhood based analyses seem not suitable for eukaryotes because of the apparent lack of clustering of functionally linked genes; ¾ Online tools for gene neighbor analysis: • STRING database (http://string.embl.de/): Include all three types of genomic context analysis, with a nice graphics view. • KEGG SSDB gene cluster analysis tool (http://www.genome.jp/kegg/ssdb/ ) Gene clusters and genomic neighborhoods ¾ Example 1: Archaeal shikimate kinase Gene clusters and genomic neighborhoods ¾ Example 2: Prediction of a novel DNA repair system in thermophiles