Download Functional genomics: assigning functions to genome sequences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Thylakoid wikipedia , lookup

Metalloprotein wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Biochemistry wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Metabolism wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Point mutation wikipedia , lookup

SR protein wikipedia , lookup

Biochemical cascade wikipedia , lookup

Gene nomenclature wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Paracrine signalling wikipedia , lookup

Signal transduction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene expression wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein wikipedia , lookup

Expression vector wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein structure prediction wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Protein Targeting by Functional Linkage of
Non-Homologous Proteins
with examples from M. tuberculosis
Genome-wide functional linkage map
Structural Genomics
of Complexes:
4000
3000
TB Gene A
Identifying subunits
of complexes by analyzing
co-evolution of nonhomologous proteins, from
genome-wide functional
linkage maps
2000
1000
0
0
1000
2000
3000
4000
Limitations of Relying Entirely on
Homology-Based Targeting
• Many (most ?) proteins function in
complexes made up of non-homologous
proteins
• Some (many ?) proteins are crystallizable
only with their functional partners
Limitations of Relying Entirely on
Homology-Based Targeting
• Many (most ?) proteins function in
complexes made up of non-homologous
proteins
• Some (many ?) proteins are crystallizable
only with their functional partners
Suggests that targeting of non-homologus, functionally
linked proteins may offer a useful shortcut to learning
protein structures and functions
Structural Genomics of Protein Complexes
Identifying Subunits of Protein
Complexes by Analyzing the
Co-evolution of
Non-homologous Proteins
4 Methods to Infer Non-Homologous
Protein Pairs that have Co-evolved and
hence are Functionally Linked
•Rosetta Stone
A
A
B
Protein fusion
•Phylogenetic Profile
Protein co-occurrrence
•Gene neighbor
Constant separation
•Operon
Small separation
A′
B′
Operon Method
Rosetta Stone Method
4000
4000
TB Gene A
3000
TB Gene A
2000
3000
1000
2000
1000
TB Gene B
0
0
1000
2000
0
3000
4000
0
1000
2000
3000
4000
TB Gene B
Conserved Gene Neighbor Method
Phylogenetic Profiles Method
4000
4000
3000
TB Gene A
TB gene A
3000
2000
2000
1000
1000
0
0
0
1000
2000
TB gene B
3000
4000
0
1000
2000
3000
4000
TB Gene B
Figure 7.
M. Strong, T. Graeber et al.
Functional Linkages Between Genes of M. tuberculosis
Classical graphical representation of
protein functional linkages
Whole Genome Functional Linkage Map
(RS, PP, GN, OP methods for TB)
4000
TB Gene A
3000
2000
1000
0
0
1000
2000
3000
TB Gene B
Requiring 2 or more functional linkages:
1,865 genes make 9,766 linkages
Research of Michael Strong and Morgan Beeby
4000
Hierarchical Clustering of the Combined Genome-Wide
Linkage Map for M. Tb. Reveals Complexes and
Pathways
Genome-wide functional linkage
map based on 4 methods:
Clustered linkage map
showing complexes and pathways:
5000
Cluster
similar
linkage
patterns
4000
TB Gene A
3000
2000
1000
0
0
1000
2000
TB Gene B
3000
4000
5000
Each cluster is a
complex or pathway
Cell Envelope, Cell Division
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins
Transport/Binding Proteins Cations
Chaperones
Cell Envelope
Energy Metabolism, ATP Proton Motive force
Biosynthesis of cofactors
Cytochrome P450
Two component systems
Energy Metabolism, Anaerobic Respiration
Sugar Metabolism
Purine, Pyrimidine nucleotide biosynthesis
Aromatic Amino Acid Biosynthesis
Novel Group
Biosynthesis of Cofactors, Prosthetic groups
Synthesis and Modif. Of Macromolecules, rpl,rpm, rps
Amino Acid Biosynthesis (Branched)
Degradation of Fatty Acids
Emergy Metab. Respiration Aerobic
Energy Metabolism, oxidoreductase
Fig 4.
M. Strong, T. Graeber et al.
Energy Metabolism, oxidoreductase
Polyketide and non-ribosomal peptide synthesis
Lipid Biosynthesis
Amino acid Biosynthesis
Virulence
Deg. of Fatty Acids
Detoxification
Quantitative Assessment of
Inferred Protein Complexes
Calculating Probabilities of Co-evolution
 n  N  n 
 

k mk
Phylogenetic Profile
P(k | n, m, N )  
N
Rosetta Stone
 
N= number of fully sequenced genomes
m
n= number of homologs of protein A
m = number of homologs of protein B
k = number of genomes shared in common
Gene Neighbor
n = intergenic separation
 ln X k
k 0
k!
Pm ( X )  1  Pm ( X )  X 
X= fractional separation of genes
Operon
m 1
P(n)  1  e n
Combining Inferences of CoEvolution from 4 Methods
We use a Bayesian approach to combine the probabilities from the four methods
to arrive at a single probability that two proteins co-evolve:
Opost
 4 P( f i | pos)  P( pos)

  
 i 1 P( f i | neg )  P(neg )
where positive pairs are proteins with common pathway annotation
and negative pairs are proteins with different annotation
Benchmarking this Approach
Against Known Complexes
Ecocyc: Karp et al. NAR, 30, 56 (2002)
ROC plot
0.4
Fraction of True Positives
0.35
For high confidence links,
we find 1/3 of true interactions
with only one 1/1000 of the false
positive ones
0.3
0.25
0.2
0.15
0.1
Random
0.05
0
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
Fraction of False Positives
True positive interactions are between subunits of known complexes and false positive
ones are between subunits of different complexes.
Example Complex: NADH
Dehydrogenase I
11 of 13 subunits detected
Example Complex: NADH
Dehydrogenase I
11 of 13 subunits detected
3 false positives
Functional Linkages Among Cytochrome Oxidase Genes
CtaD
CtaE
Functional linkages relate all 3 components
of cytochrome oxidase complex
and also CtaB, the cytochrome
oxidase assembly factor
These genes are at four different chromosomal
locations
Membrane proteins linked to soluble proteins
CtaC
CtaB
From Inferred Protein
Complexes to their
Structures
The Problem of PE and PPE Proteins in M. tb
PE, PE-PGRS, and PPE Proteins in M. tuberculosis
38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins
Together compromise about 5 % of the genome
No function is known, but some appear to be membrane bound
No structure is known: always insoluble when expressed
Goal: use functional linkages to predict a complex between
a PE and a PPE protein: express complex, and determine
its structure
Research of Shuishu Wang and Michael Strong
Construction of a co-expression vector to test for
protein-protein interactions (Mike Strong)
T7 promoter lac oper.
RBS
gene A
Nde1
RBS
Kpn1
gene B
Thrombin
site
NcoI
His
tag
HindIII
pET 29b(+)
transcription
polycistronic mRNA
translation
protein A
If proteins do not interact
protein A
protein B (with His tag)
protein B (with His tag)
If proteins interact
(protein-protein interaction)
protein A protein B (with His tag)
When co-expressed, the PE and PPE proteins,
inferred to interact, do form a soluble complex,
Mr = 35,200
Sedimentation equilibrium experiments:
Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8
Concentration OD280 0.7, 0.45, 0.15
Expected Mr:
Rv 2431c (PE)
10,687
(10563.12 from Mass Spec)
Rv2430c+His tag (PPE) 24,072
(23895.00 from Mass Spec)
Possibly suggests a 1:1 complex between these
two proteins
Crystallization trials of the Complex Between
PE Protein Rv2430c and PPE Protein Rv2431c
Summary
Many functional lnkages are revealed
from genomic data (high coverage)
Summary
Many functional lnkages are revealed
from genomic data (high coverage)
Clustered genome-wide functional maps can reveal and
organize information on complexes (and pathways)
Summary
Many functional lnkages are revealed
from genomic data (high coverage)
Clustered genome-wide functional maps can reveal and
organize information on complexes (and pathways)
Known subunits of E. coli complexes can be
identified with high accuracy from functional linkages
Summary
Many functional lnkages are revealed
from genomic data (high coverage)
Clustered genome-wide functional maps can reveal and
organize information on complexes (and pathways)
Known subunits of E. coli complexes can be
identified with high accuracy from functional linkages
A protein complex suitable for structural studies
has been revealed from functional linkages
Summary
Many functional lnkages are revealed
from genomic data (high coverage)
Clustered genome-wide functional maps can reveal and
organize information on complexes (and pathways)
Known subunits of E. coli complexes can be
identified with high accuracy from functional linkages
A protein complex suitable for structural studies
has been revealed from functional linkages
The procedures for identifying and producing
protein complexes can be adapted for high thruput
Protein Interactions in M. tb.
Analysis of M.tb. Genome
Michael Strong, Debnath Pal,
Sulmin Kim
Whole Genome Interaction Maps
Michael Strong, Tom Graeber,
Huiying Li, Matteo Pellegrini
Methods of Inferring Interactions
Edward Marcotte, Matteo Pellegrini,
Todd Yeates, Michael Thompson
PI of Tb Structural Genomics Consortium
Tom Terwilliger