* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Functional genomics: assigning functions to genome sequences
Metalloprotein wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Biochemistry wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Point mutation wikipedia , lookup
Biochemical cascade wikipedia , lookup
Gene nomenclature wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Paracrine signalling wikipedia , lookup
Signal transduction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Expression vector wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein structure prediction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Interactome wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein Targeting by Functional Linkage of Non-Homologous Proteins with examples from M. tuberculosis Genome-wide functional linkage map Structural Genomics of Complexes: 4000 3000 TB Gene A Identifying subunits of complexes by analyzing co-evolution of nonhomologous proteins, from genome-wide functional linkage maps 2000 1000 0 0 1000 2000 3000 4000 Limitations of Relying Entirely on Homology-Based Targeting • Many (most ?) proteins function in complexes made up of non-homologous proteins • Some (many ?) proteins are crystallizable only with their functional partners Limitations of Relying Entirely on Homology-Based Targeting • Many (most ?) proteins function in complexes made up of non-homologous proteins • Some (many ?) proteins are crystallizable only with their functional partners Suggests that targeting of non-homologus, functionally linked proteins may offer a useful shortcut to learning protein structures and functions Structural Genomics of Protein Complexes Identifying Subunits of Protein Complexes by Analyzing the Co-evolution of Non-homologous Proteins 4 Methods to Infer Non-Homologous Protein Pairs that have Co-evolved and hence are Functionally Linked •Rosetta Stone A A B Protein fusion •Phylogenetic Profile Protein co-occurrrence •Gene neighbor Constant separation •Operon Small separation A′ B′ Operon Method Rosetta Stone Method 4000 4000 TB Gene A 3000 TB Gene A 2000 3000 1000 2000 1000 TB Gene B 0 0 1000 2000 0 3000 4000 0 1000 2000 3000 4000 TB Gene B Conserved Gene Neighbor Method Phylogenetic Profiles Method 4000 4000 3000 TB Gene A TB gene A 3000 2000 2000 1000 1000 0 0 0 1000 2000 TB gene B 3000 4000 0 1000 2000 3000 4000 TB Gene B Figure 7. M. Strong, T. Graeber et al. Functional Linkages Between Genes of M. tuberculosis Classical graphical representation of protein functional linkages Whole Genome Functional Linkage Map (RS, PP, GN, OP methods for TB) 4000 TB Gene A 3000 2000 1000 0 0 1000 2000 3000 TB Gene B Requiring 2 or more functional linkages: 1,865 genes make 9,766 linkages Research of Michael Strong and Morgan Beeby 4000 Hierarchical Clustering of the Combined Genome-Wide Linkage Map for M. Tb. Reveals Complexes and Pathways Genome-wide functional linkage map based on 4 methods: Clustered linkage map showing complexes and pathways: 5000 Cluster similar linkage patterns 4000 TB Gene A 3000 2000 1000 0 0 1000 2000 TB Gene B 3000 4000 5000 Each cluster is a complex or pathway Cell Envelope, Cell Division Energy Metabolism TCA Broad Regulatory, Serine Threonine Protein Kinase Cell Envelope, Murein Sacculus and Peptidoglycan Transport/Binding Proteins Transport/Binding Proteins Cations Chaperones Cell Envelope Energy Metabolism, ATP Proton Motive force Biosynthesis of cofactors Cytochrome P450 Two component systems Energy Metabolism, Anaerobic Respiration Sugar Metabolism Purine, Pyrimidine nucleotide biosynthesis Aromatic Amino Acid Biosynthesis Novel Group Biosynthesis of Cofactors, Prosthetic groups Synthesis and Modif. Of Macromolecules, rpl,rpm, rps Amino Acid Biosynthesis (Branched) Degradation of Fatty Acids Emergy Metab. Respiration Aerobic Energy Metabolism, oxidoreductase Fig 4. M. Strong, T. Graeber et al. Energy Metabolism, oxidoreductase Polyketide and non-ribosomal peptide synthesis Lipid Biosynthesis Amino acid Biosynthesis Virulence Deg. of Fatty Acids Detoxification Quantitative Assessment of Inferred Protein Complexes Calculating Probabilities of Co-evolution n N n k mk Phylogenetic Profile P(k | n, m, N ) N Rosetta Stone N= number of fully sequenced genomes m n= number of homologs of protein A m = number of homologs of protein B k = number of genomes shared in common Gene Neighbor n = intergenic separation ln X k k 0 k! Pm ( X ) 1 Pm ( X ) X X= fractional separation of genes Operon m 1 P(n) 1 e n Combining Inferences of CoEvolution from 4 Methods We use a Bayesian approach to combine the probabilities from the four methods to arrive at a single probability that two proteins co-evolve: Opost 4 P( f i | pos) P( pos) i 1 P( f i | neg ) P(neg ) where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation Benchmarking this Approach Against Known Complexes Ecocyc: Karp et al. NAR, 30, 56 (2002) ROC plot 0.4 Fraction of True Positives 0.35 For high confidence links, we find 1/3 of true interactions with only one 1/1000 of the false positive ones 0.3 0.25 0.2 0.15 0.1 Random 0.05 0 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 Fraction of False Positives True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes. Example Complex: NADH Dehydrogenase I 11 of 13 subunits detected Example Complex: NADH Dehydrogenase I 11 of 13 subunits detected 3 false positives Functional Linkages Among Cytochrome Oxidase Genes CtaD CtaE Functional linkages relate all 3 components of cytochrome oxidase complex and also CtaB, the cytochrome oxidase assembly factor These genes are at four different chromosomal locations Membrane proteins linked to soluble proteins CtaC CtaB From Inferred Protein Complexes to their Structures The Problem of PE and PPE Proteins in M. tb PE, PE-PGRS, and PPE Proteins in M. tuberculosis 38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins Together compromise about 5 % of the genome No function is known, but some appear to be membrane bound No structure is known: always insoluble when expressed Goal: use functional linkages to predict a complex between a PE and a PPE protein: express complex, and determine its structure Research of Shuishu Wang and Michael Strong Construction of a co-expression vector to test for protein-protein interactions (Mike Strong) T7 promoter lac oper. RBS gene A Nde1 RBS Kpn1 gene B Thrombin site NcoI His tag HindIII pET 29b(+) transcription polycistronic mRNA translation protein A If proteins do not interact protein A protein B (with His tag) protein B (with His tag) If proteins interact (protein-protein interaction) protein A protein B (with His tag) When co-expressed, the PE and PPE proteins, inferred to interact, do form a soluble complex, Mr = 35,200 Sedimentation equilibrium experiments: Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8 Concentration OD280 0.7, 0.45, 0.15 Expected Mr: Rv 2431c (PE) 10,687 (10563.12 from Mass Spec) Rv2430c+His tag (PPE) 24,072 (23895.00 from Mass Spec) Possibly suggests a 1:1 complex between these two proteins Crystallization trials of the Complex Between PE Protein Rv2430c and PPE Protein Rv2431c Summary Many functional lnkages are revealed from genomic data (high coverage) Summary Many functional lnkages are revealed from genomic data (high coverage) Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways) Summary Many functional lnkages are revealed from genomic data (high coverage) Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways) Known subunits of E. coli complexes can be identified with high accuracy from functional linkages Summary Many functional lnkages are revealed from genomic data (high coverage) Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways) Known subunits of E. coli complexes can be identified with high accuracy from functional linkages A protein complex suitable for structural studies has been revealed from functional linkages Summary Many functional lnkages are revealed from genomic data (high coverage) Clustered genome-wide functional maps can reveal and organize information on complexes (and pathways) Known subunits of E. coli complexes can be identified with high accuracy from functional linkages A protein complex suitable for structural studies has been revealed from functional linkages The procedures for identifying and producing protein complexes can be adapted for high thruput Protein Interactions in M. tb. Analysis of M.tb. Genome Michael Strong, Debnath Pal, Sulmin Kim Whole Genome Interaction Maps Michael Strong, Tom Graeber, Huiying Li, Matteo Pellegrini Methods of Inferring Interactions Edward Marcotte, Matteo Pellegrini, Todd Yeates, Michael Thompson PI of Tb Structural Genomics Consortium Tom Terwilliger