Download Genome Biology and

Document related concepts

G protein–coupled receptor wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

SR protein wikipedia , lookup

Signal transduction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cyclol wikipedia , lookup

List of types of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Proteomics wikipedia , lookup

Chemical biology wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Genome Biology and
Biotechnology
9. The localizome
Prof. M. Zabeau
Department of Plant Systems Biology
Flanders Interuniversity Institute for Biotechnology (VIB)
University of Gent
International course 2005
Summary
¤ DNA localizome or DNA interactome
– Genome-wide mapping of DNA binding proteins
• Transcription factor binding sites
• Localization of replication origins
¤ Protein localizome
– High throughput localization of proteins in cellular
compartments
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Genome-wide Analysis of Regulatory Sequences
¤ Gene expression is regulated by transcription factors
selectively binding to regulatory regions
– protein–DNA interactions involve sequence-specific recognition
– Other factors, such as chromatin structure may be involved
¤ Sequence-specific DNA-binding proteins from
eukaryotes generally
– recognize degenerate motifs of 5–10 base pairs
– Consequently, potential recognition sequences for transcription
factors occur frequently throughout the genome
¤ Genome-wide surveys of in vivo DNA binding proteins
– provides a platform to answer these questions
Genome-wide Analysis of Regulatory Sequences
¤ Methods combine
– Large-scale analysis of in vivo
protein–DNA crosslinking
– microarray technology
¤ ChIP-on-chip
– Chromatin ImmunoPrecipitation on DNA chips
Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)
Genome-Wide Location and Function of DNA Binding
Proteins
Ren et. al., Science, 290, 2306 (2000)
¤ Paper presents
– proof of principle for microarray-based approaches to determine
the genome-wide location of DNA-bound proteins
• Study of the binding sites of a couple of well known gene-specific
transcription activators in yeast: Gal4 and Ste12
– Combines data from
• in vivo DNA binding analysis with
• expression analysis
• to identify genes whose expression is directly controlled by these
transcription factors
Chromatin Immuno Precipitation (Chip) Procedure
– Cells are fixed with formaldehyde, harvested, and sonicated
– DNA fragments cross-linked to a protein of interest are enriched by
immunoprecipitation with a specific antibody
– Immuno-precipitated DNA is amplified and labeled with the fluorescent
dye Cy5
– Control DNA not enriched by immunoprecipitation is amplified and
labeled with the different fluorophore Cy3
– DNAs are mixed and hybridized to a microarray of intergenic sequences
– The relative binding of the protein of interest to each sequence is
calculated from the IP-enriched/unenriched ratio of fluorescence from
3 experiments
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Modified Chromatin Immuno Precipitation (Chip) Procedure
Close-up of a scanned image of a micro-array containing 6361 intergenic
region DNA fragments of the yeast genome
ChIP-enriched
DNA fragment
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Proof of concept: Gal4 transcription factor
¤ Identification of sites bound by the transcriptional
activator Gal4 in the yeast genome and genes induced
by galactose
– Gal4 activates genes necessary for galactose metabolism
• The best characterized transcription factor in yeast
– 10 genes were bound by Gal4 and induced in galactose
• 7 genes in the Gal pathway, previously reported to be
regulated by Gal4
• 3 novel genes: MTH1, PCL10, and FUR4
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Genome-wide location of Gal4 protein
Genes whose promoter regions are bound by Gal4 and whose expression levels
were induced at least twofold by galactose
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Role of Gal4 in Galactose-dependent Cellular Regulation
The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains
how regulation of several different metabolic pathways can be coordinated
increases
intracellular pools of
uracil
Fur4
Pcl10
MTH
1
reduces levels of
glucose transporter
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Conclusions
¤ The genes whose expression is controlled directly by
transcriptional activators in vivo
– Are identified by a combination of genome-wide location and
expression analysis
¤ Genome-wide location analysis provides information
– On the binding sites at which proteins reside in the genome
under in vivo conditions
Genomic Binding Sites of the Yeast Cell-cycle
Transcription Factors SBF and MBF
Iyer et al., Nature 409: 533 (2001)
¤ Paper presents
– The use of CHIP and DNA microarrays to define the genomic binding sites
of the SBF and MBF transcription factors in vivo
– The SBF and MBF transcription factors are active in the initiation of the
cell division cycle (G1/S) in yeast
• A few target genes of SBF and MBF are known but the precise roles of
these two transcription factors are unknown
• The two transcription factors are heterodimers containing the same
Swi6 subunit and a DNA binding subunit
– MBF is a heterodimer of Mbp1 and Swi6
– SBF is a heterodimer of Swi4 and Swi6
Genomic targets of SBF and MBF
Reprinted from: Iyer et al., Nature 409: 533 (2001)
In Vivo Targets of SBF and MBF
¤ The CHIP experiments identified
– 163 possible targets of SBF
– 87 possible targets of MBF
– 43 possible targets of both factors
¤ Support for the possible in vivo targets
– Most of the genes downstream of the putative binding sites peak
in G1/S
– Target genes are highly enriched for functions related to DNA
replication, budding and the cell cycle
– In vivo binding sites are highly enriched for sequences matching
the defined consensus binding sites
Reprinted from: Iyer et al., Nature 409: 533 (2001)
Transcriptome data for synchronized cell cultures
Expression Profiles
of SBF and MBF
Targets
Reprinted from: Iyer et al.,
Nature 409: 533 (2001)
Expression Profiles of SBF and MBF Targets
¤ Why are two different transcription factors used to
mediate identical transcriptional programmes during
the cell-division cycle in yeast?
– A possible answer is suggested by differences in the functions
of the genes that they regulate
• Many of the targets of SBF have roles in cell-wall biogenesis and
budding
• 25% of the MBF target genes have known roles in DNA replication,
recombination and repair
– The results support a model in which
• SBF is the principal controller of membrane and cell-wall formation
• MBF primarily controls DNA replication
¤ The need for DNA replication and membrane / cellwall biogenesis may be different in the mitotic and
meiotic cell cycle
Reprinted from: Iyer et al., Nature 409: 533 (2001)
A high-resolution map of active promoters in the
human genome
Kim et. al., Nature 436: 876-880 (2005)
¤ Paper presents
– a genome-wide map of active promoters in human fibroblast cells
• determined by experimentally locating the sites of RNA polymerase
II preinitiation complex (PIC) binding
• map defines 10,567 active promoters corresponding to
– 6,763 known genes
– >1,196 un-annotated transcriptional units
– Global view of functional relationships in human cells between
• transcriptional machinery
• chromatin structure
• gene expression
Identification of active promoters in the human genome
¤ Microarrays cover
– All non-repeat DNA at 100 bp
resolution
¤ Pol II preinitiation complex
(PIC)
– RNA polymerase II
– transcription factor IID
– general transcription factors
¤ ChIP of PIC-bound DNA
– monoclonal antibody against
TAF1 subunit of the complex
(TBP associated factor 1 )
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Results from TFIID ChIP-on-chip analysis
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Characterization of active promoters
¤ Matched the 12,150 TFIID-binding sites to
– the 5' end of known transcripts in transcript databases
– 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of
known messenger RNAs
¤ 8,960 promoters were mapped
– within annotated boundaries of 6,763 known genes in the EnsEMBL genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The chromatin-modification features of the
active promoters
¤ Validation of active
promoters
– ChIP-on-chip using an antiRNAP antibody
– ChIP-on-chip analysis using
• anti-acetylated histone H3
(AcH3) antibodies
• anti-dimethylated lysine 4 on
histone H3 (MeH3K4)
antibodies
• known epigenetic markers of
active genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
TFIID, RNAP, AcH3 and MeH3K4 profiles on
the promoter of RPS24 gene
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Additional findings
¤ Promoters of non-coding transcripts
– Are very similar to promoters of protein coding genes
¤ Promoters of novel genes
– Estimate 13% of human genes remain to be annotated in the genome
¤ Clustering of active promoters
– co-regulated genes tend to be organized into coordinately regulated
domains
¤ Genes using multiple promoters
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Multiple promoters in human genes
¤ WEE1 gene locus
– Two different transcripts with alternative 5’ends
• Encoding different proteins
– Two different TFIID-binding sites- two promoters
– Differential transcription during the cell cycle
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The transcriptome of a cell line
¤
Functional relationship between transcription
machinery and gene expression
–
¤
correlated genome-wide expression profiles with PIC promoter
occupancy
Four general classes of promoters
I. Actively transcribed genes
II. Weakly expressed genes
III. Weakly PIC bound genes
IV. Inactive genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Genome-Wide Distribution of ORC and MCM Proteins in
yeast: High-Resolution Mapping of Replication Origins
Wyrick et. al., Science, 294, 2357 (2001)
¤ Paper presents
– Genome-wide location analysis to map the DNA replication origins
in the 16 yeast chromosomes by determining the binding sites of
prereplicative complex proteins
Chromosome Replication In Eukaryotic Cells
¤ Chromosome replication
– initiates from origins of replication distributed along
chromosomes
– Origins of replication comprise autonomously replicating
sequences (ARS)
• ARS contain an 11-bp ARS consensus sequence (ACS)
– Essential for replication initiation
– Recognized by the Origin Recognition Complex (ORC)
• The majority of sequence matches to the ACS in the genome do not
have ARS activity
¤ Prereplicative complexes at replication origins comprise
– Origin Recognition Complex (ORC) proteins
– Minichromosome Maintenance (MCM) proteins
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Prereplicative Complexes At Origins Of
Replication
Reprinted from: Stillman, Science, 294, 2301(2001)
ORC- and MCM-binding sites compared with known ARSs
¤ High degree of correlation
between MCM and ORC
binding sites and known ARSs
– Correct identification of 88%
known ARSs
¤ The method can accurately
identify the position of ARSs to
a resolution of 1 kb or less
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Genome-wide Location
Of Potential Replication
Origins
Identification of 429
potential origins on the
entire genome
Reprinted from: Wyrick et.
al., Science, 294, 2357 (2001)
Conclusions
¤ The ChIP-based method identified the majority of
origins found in the analysis of genome-wide
replication timing in yeast
– and provides direct, high-resolution mapping of potential origins
¤ Similar approaches identified origins in other
organisms
– For example: Coordination of replication and transcription along
a Drosophila chromosome
• MacAlpine et al., Genes & Dev. 18: 3094-3105 (2004)
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Global analysis of protein localization in budding
yeast
Huh et. al., Nature 425, 686 - 691 (2004)
¤ Paper presents
– An approach to define the organization of proteins in the
context of cellular compartments involving
– the construction and analysis of a collection of yeast strains
expressing full-length, chromosomally tagged green fluorescent
protein fusion proteins
Experimental Strategy
¤ Systematic tagging of yeast ORFs with green
fluorescent protein (GFP)
– GFP is fused to the carboxy terminus of each ORF
– Full length fusion proteins are expressed from their native
promoters and chromosomal location
¤ The collection of yeast strains expressing GFP fusions
was analyzed by
– fluorescence microscopy to determine the primary subcellular
localization of the fusion proteins
• Defines 12 categories
– co-localization with red fluorescent protein (RFP) markers to
refine the subcellular localization
• Defines 11 additional categories
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Construction of GFP fusion proteins
¤ For each ORF a pair of PCR primers was designed
– Homologous to the chromosomal insertion site
– Matching a GFP – selectable marker construct
¤ Yeast was transformed with the PCR products to generate
– Strains expressing chromosomally tagged ORFs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Representative GFP Images
Nucleus
Nuclear periphery
Bud neck
mitochondrion
ER
Lipid particle
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
GFP and RFP Co-localization Images
Nucleolar marker
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Global results
22 categories
¤ Constructed ~6.000 ORF-GFP
fusions
– 4.156 had localizable GFP signals
(~75% of the yeast proteome)
– Good concordance with data from
earlier studies
• GFP does not affect the location
• Localized 70% of the new proteins
– Major compartments: cytoplasm
(30%) and the nucleus (25%)
– 20 other compartments: 44% of the
proteins
¤ Most the proteins can be located in
discrete cellular compartments
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
The proteome of the nucleolus
¤ Detected 164 proteins
in the nucleolus
– Plus 45 identified in other
studies
¤ Data are consistent with
MS analysis of human
Nucleolar proteins
– Allows identification of
yeast-human orthologs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Transcriptional co-regulation and subcellular
localization are correlated
subcellular localization
33 transcription modules
Co-regulated genes
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Conclusion
¤ The high-resolution, high-coverage localization data
set
– represents 75% of the yeast proteome
• classified into 22 distinct subcellular localization categories,
¤ Analysis of these proteins
– in the context of transcriptional, genetic, and protein–protein
interaction data
• provides a comprehensive view of interactions within and between
organelles in eukaryotic cells.
• helps reveal the logic of transcriptional co-regulation
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Genome Biology and
Biotechnology
10. The proteome
International course 2005
Summary
¤ Protein interactome
– Yeast two-hybrid protein interaction mapping
¤ Proteome
– Isolation of protein complexes
¤ Multilevel functional genomics
– Combination of
• phenome analysis
• protein interaction mapping
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Basic Concept of the Yeast Two-hybrid System
¤ Eukaryotic transcription factors
– activate RNA polymerase II at promoters by binding to upstream
activating DNA sequences (UAS)
¤ Basic structure of eukaryotic transcription factors
– The DNA binding and the activating functions are located in
physically separable domains
• The DNA-binding domain (DB)
• The activation domain (AD)
– The connection between DB and AD is structurally flexible
¤ Protein-protein interactions can reconstitute a
functional transcription factor
– by bringing the DB domain and the AD domain into close physical
proximity
Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)
Yeast two-hybrid system
¤ ‘Architectural blueprint’ for a functional transcription
factor
– DB-X/AD-Y, where X and Y could be essentially any proteins
from any organism
Gal4 transcription-activation domain
prey
bait
Gal4 DNA binding
domain
Y
X
AD
DB
UAS
Upstream Activating Sequence
Selectable marker gene
Yeast two-hybrid system
¤ The yeast two-hybrid system allows
– Genetic selection of genes encoding potential interacting
proteins without the need for protein purification
• System is to isolate genes encoding proteins that potentially
interact with DB-X (referred to as the ‘bait’) in complex AD-Y
libraries (referred to as the ‘prey’)
– Limitations of the system include
• False positives: clones with no biological relevance
• False negatives: Failure to identify knowm interactions
– Stringent criteria must be used to evaluate both the specificity
and the sensitivity of the assay
Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999)
Protein Interaction Mapping in C. elegans Using Proteins
Involved in Vulval Development
Walhout et al, Science 287: 116 (2000)
¤ Landmark paper presents
– First demonstration of large-scale two-hybrid analysis for
protein interaction mapping in C. elegans
• starting with 27 proteins involved in vulval development in C. Elegans
Experimental Approach
¤ Start from known genes in vulval development
– Used Recombinational cloning to introduce ORFs of 29 known
genes involved in vulval development into two-hybrid vectors
¤ Matrix two-hybrid experiment with 29 ORFs
– Each DB-vORF/AD-vORF pairwise combination was
• tested for protein-protein interactions by scoring two-hybrid
phenotypes
¤ Exhaustive two-hybrid screen
– using 27 vORF-DB fusion proteins as baits to select interactors
from a AD-Y cDNA library
• sequenced the selected clones: interaction sequence tag (IST)
Reprinted from:Walhout et al, Science 287: 116 (2000)
Construction of DB and AD Fusions by Recombinational
Cloning
DNA binding
domain
Activation
domain
Phage lambda excision:
Integrase, IHF & Exisionase
DB-ORF fusions
AD-ORF fusions
Reprinted from: Walhout et al, Science 287: 116 (2000)
Matrix of Two-hybrid Interactions Between the vORFs
Reprinted from:Walhout et al, Science 287: 116 (2000)
Interaction
Sequence Tag
(IST) screening
Reprinted from:Walhout et al,
Science 287: 116 (2000)
Results
¤ Matrix two-hybrid experiment with 29 ORFs
– ~ 50% (6 of 11) of the interactions reported were detected
• Two novel potential interactions were identified
– Typically the yeast two-hybrid system will detect ~50% of the
naturally occurring interactions
¤ Two-hybrid screen
– Identified 992 AD-Y encoding sequences
– ISTs corresponded to a total 124 different interacting proteins
• 15 previously known
– Provides a functional annotation for 109 predicted genes
Reprinted from:Walhout et al, Science 287: 116 (2000)
Validation of Potential Interactions
¤ Conservation of interactions in other organisms
– If X' and Y' are orthologs of X and Y, respectively
• X/Y conserved interactions are referred to as "interologs"
Reprinted from:Walhout et al, Science 287: 116 (2000)
Validation of Potential Interactions
¤ Systematic clustering analysis
– closed loop connections between vORF- encoded proteins
• X interacts with Y, Y interacts with Z, Z interacts with W, and so on
(X/Y/Z/W/...)
Mutations with
Similar phenotypes
Reprinted from:Walhout et al, Science 287: 116 (2000)
Conclusions
¤ Demonstrated the feasibility of generating a genomewide protein interaction maps
– Two-hybrid screens are
• Simple
• sensitive
• amenable to high-throughput
– Feasible using the C. elegans ORFeome
¤ Y2H detects approximately 50% of the interactions
– provides a useful coverage of biologically important interactions
Reprinted from:Walhout et al, Science 287: 116 (2000)
A Comprehensive Analysis of Protein–protein
Interactions in Saccharomyces Cerevisiae
Uetz et al., Nature 403: 623 (2000)
¤ Landmark paper presents
– The first Large scale high throughput mapping of protein-protein
interactions between ORFs predicted in S. cerevisiae using
– Two complementary yeast two-hybrid screening strategies
• Two-hybrid array of 6.000 hybrid proteins
• High-throughput library screen
The two-hybrid array screening
¤ Two-hybrid array of 6.000 hybrid proteins comprises
– Haploid yeast colonies derived from ~6,000 yeast ORFs fused to
the Gal4 activation domain (AD)
– The two-hybrid array contained on 16 plates of 384 colonies
¤ Matrix screen for interactions
– 192 different Gal4 DB ORF hybrids were mated to the two-hybrid
array
– 192 two-hybrid array screens were performed in duplicate
• Each yielded 1–30 positives
• But only ~ 20% were reproduced in the duplicate screen
¤ Putative interacting partners identified
– 87/192 DB hybrids yielded putative protein–protein interactions
– Identified 281 interacting protein pairs
Reprinted from: Uetz et al., Nature 403: 623 (2000)
The two-hybrid array screening
Positive control: 6,000 haploid yeast
Gal4 activation domain - ORF fusions
Two-hybrid positives from a mating with
a Gal4 DNA-binding domain - ORF fusion
16 microassay plates
Reprinted from: Uetz et al., Nature 403: 623 (2000)
High-Throughput Library Screen
¤ Used a library Made by pooling ORF-AD fusions
– Each ORFs was fused separately to a gal4 activation domain
– ORF-AD fusions were pooled to form an activation-domain library
• Advantage over traditional cDNA libraries is the uniform
presentation of each ORF
¤ Protein interactions were screened by
– mating the 6.000 DNA-binding domain hybrids in duplicate to the
activation domain library
– 817 yeast ORFs (15%) yielded protein–protein interactions
– Identified 692 interacting protein pairs
• 68% of the interactions were identified multiple times
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Results of the Systematic Two-Hybrid Screens
¤ The matrix array screens
– gave more interactors
• 45% of the 192 proteins in the array screens yielded interactions
– are much more labour- and material-intensive
• limits the number of screens that can be performed
• Full matrix would require testing 6.000 * 6.000 = 36.000.000
interactions!
¤ The library screens gave
– fewer interactors
• 8% of the proteins tested in the library screens yielded interactions
– a much higher throughput
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Analysis of the protein-protein interactions
¤ The analysis reveals
– Interactions that place unknown proteins into a biological
context
– Novel interactions between proteins involved in the same
biological function
– Novel interactions that connect biological functions into larger
cellular processes
Interactions involving unknown proteins
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Interactions Between Proteins in the RNA
Splicing Complex
Interactions are consistent with the crystallographic data
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Interaction Connecting two different Complexes
spindle checkpoint complex
microtubule checkpoint complex
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Analysis of
Interologs
Yeast
Human
Reprinted from: Uetz et al., Nature 403: 623 (2000)
Conclusions
¤ The two-hybrid array approach is feasible
– for systematic genome-wide analysis of protein interactions
¤ The large scale mapping of protein-protein interactions
reveals
– many new interactions between proteins
– that protein interactions should be viewed as potential
interactions that must be confirmed independently
– This conclusion is supported by the fact that the results of
different screens only partially overlap
Reprinted from: Uetz et al., Nature 403: 623 (2000)
A Map of the Interactome Network of the
Metazoan C. elegans
Li et. al., Science, 303, 540-543 (2004)
¤ Paper presents
– Large scale mapping of protein-protein interaction in C. elegans
using yeast two-hybrid screens with a subset of metazoanspecific proteins
• identified > 4000 interactions
– Together with already described Y2H interactions and
interologs predicted in silico,
• the current version of the Worm Interactome map contains 5500
interactions
Worm Interactome map
Phylogenetic classes
Eukaryotic
Multi cellular
Worm
Reprinted from: Li et. al., Science, 303, 540-543 (2004)
A Protein Interaction Map of Drosophila
melanogaster
Giot et. al., Science, 302, 1727-1736 (2003)
¤ Paper presents
– a two-hybrid–based protein-interaction map of the fly proteome
by screening 10,623 ORFs against cDNA libraries to produce
• a draft map of 7048 proteins and 20,405 interactions.
• Computational rating of interaction confidence produced
– a high confidence interaction network of 4679 proteins and
4780 interactions showing two levels of organization
• a short-range organization, presumably corresponding to
multiprotein complexes
• a more global organization, presumably corresponding to
intercomplex connections
The fly proteininteraction map:
Protein family/human
disease orthologs
Reprinted from: Giot et. al.,
Science, 302, 1727-1736
(2003)
The fly proteininteraction map:
Subcellular
localization
Reprinted from: Giot et. al.,
Science, 302, 1727-1736
(2003)
Towards a proteome-scale map of the human
protein–protein interaction network
Rual et. al., Nature 424: 1173-1178 (2005)
¤ Paper presents
– First step towards a systematic and comprehensive analysis of
the human interactome using
• stringent, high-throughput yeast two-hybrid system to test
pairwise interactions among the products of 8,100 currently
available Gateway-cloned open reading frames
High-throughput yeast two-hybrid pipeline
¤ Stringent test
– Second test using
GAL1::HIS3 and
GAL1::lacZ
– Reduces the number of
false positives
¤ Detected 2,800
interactions
Reprinted from: Rual et. al., Nature
424: 1173-1178 (2005)
Overlap of CCSB-HI1 with literature data
¤ Compared the overlap between
– Observed interactions
– Interactions reported in the
literature
¤ Conclude that the CCSB-HI1
data set contains 1% of the
human interactome
– Human interactome is estimated at
200.000 to 300.000 interactions.
Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005)
Interaction network of disease-associated
CCSB-HI1 proteins
¤ The human interactome will
further
– the understanding of human
health and disease
¤ Illustrated by
– The network of diseaseassociated proteins (green
nodes)
•
EWS protein
Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005)
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Proteome Analysis
¤ Large scale and comprehensive analysis of the
proteome has so far not been feasible
– Lack of suitable and sensitive protein fractionation methods
• 2-D gels are limited to a few 1000 proteins only – the most abundant
– Protein characterization is slow and laborious
• Despite enormous improvements in mass spectrometry, the
characterization of individual proteins remains the bottleneck
– Level of proteome characterization to date is in the order of a
few 1000 proteins at best
• Represents 5% to 25% of the proteome
¤ Tandem affinity purification (TAP) technology
constitutes an important breakthrough
– Fast and reliable method of protein purification
A generic protein purification method for protein
complex characterization
Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
¤ Paper presents
– a generic procedure to purify protein complexes under native
conditions using
• tandem affinity purification (TAP) tag procedure
– Using a combination of high-affinity tags for purification
Tag-based Characterization of protein complexes
Reprinted from: Kumar A. and Snyder M., Nature 415, 123(2002)
High-affinity Tags
¤ High-affinity protein tags
– Must allow efficient recovery of proteins present at low
concentrations
• ProtA tag: two IgG-binding units of protein A of S. aureus
– released from matrix-bound IgG under denaturing conditions
• CBP tag: calmodulin-binding peptide
– released from the affinity column under mild conditions
¤ Tandem affinity purification (TAP) tag
– A fusion cassette encoding both the ProtA tag and the CBP tag
• Separated by a specific TEV protease recognition sequence which
allows proteolytic release of the bound material under native
conditions
Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
Tandem affinity purification (TAP) tag
CBP
ProtA
Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
The TAP Purification Procedure
ProtA affinity purification step
TEV protease cleavage step
CBP affinity purification step
Reprinted from: Rigaut et. al.,
Nat. Biotechnol. 17, 1030 (1999)
Advantage of the Two-step Procedure
¤ Purification of U1 snRNP
– Single-step affinity
purification yields a high level
of contaminating proteins
– Tow-step affinity purification
yields highly specific
purification with very low
background
Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999)
Functional organization of the yeast proteome by
systematic analysis of protein complexes
Gavin et. al., Nature 415, 141 (2002)
¤ Landmark paper presents
– Large-scale application of the TAP technology for a systematic
analysis of multiprotein complexes from yeast
• Generated gene-specific TAP tag cassettes by PCR
• Insert TAP cassettes by homologous recombination at the 3' end of
the genes to generate fusion proteins in their native location
• Purified protein assemblies from cellular lysates by TAP
– Separate purified assemblies by denaturing gel electrophoresis
– Digest individual bands by trypsin
• Analyze peptides by MALDI–TOF MS to identify the proteins using
database search algorithms
The Gene Targeting Procedure
TAP tag gene-specific cassette
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Large-scale Analysis of Protein Complexes
¤ Experimental outline
– Started with a selection of 1,739 genes
• 1,143 genes representing eukaryotic orthologues
• 596 genes nonorthologous set
– Generated 1,167 strains expressing tagged proteins to detectable
levels
– Analyzed 589 protein complexes
• Comprising 418 different orthologues
– Generated 20,946 samples for mass spectrometry
• Identified 16,830 proteins
– Characterized a total of 232 protein complexes
• Comprising 1,440 distinct proteins ~ 25% of the ORFs in the genome
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Purification
and
Identification
of TAP
Complexes
Reprinted from: Gavin
et. al., Nature 415, 141
(2002)
Sensitivity and Specificity of the Approach
¤ Very efficient large-scale purification and
identification of protein complexes
– 78% of the 589 purified complexes have associated proteins
– The remaining 22% showing no interacting proteins
• May not form stable or soluble complexes
• The TAP tag may interfere with complex assembly or function
¤ Complexes are stable and show the same composition
when purified with different entry points
– Example: the polyadenylation machinery, responsible for
eukaryotic messenger RNA cleavage and polyadenylation
• Identified 12 of the 13 known components
• Identified 7 new components
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
The Polyadenylation Protein Complex
new components
of the polyadenylation
complex
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Composition of the Polyadenylation Complex
protein tagged for affinity purification
<
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Reliability of the TAP Method
¤ High sensitivity
– identify proteins present at 15 copies per cell
¤ High reproducibility
– 70% of the proteins are detected in independent purifications
¤ Low background
– The background comprises highly expressed proteins
• Identified 17 contaminant proteins (heat-shock and ribosomal
proteins)
¤ Limitations
– 18% of the tagged essential genes gave no viable strains
• The carboxy-terminal tagging can impair protein function
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Organization of the purified assemblies into complexes
¤ 589 purified complexes characterized
– 245 complexes corresponded to 98 known multiprotein complexes in
yeast
– 242 complexes correspond to 134 new complexes
¤ In total 232 annotated TAP complexes are identified
– 102 proteins showed no detectable association with other proteins
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Number Of Proteins Per Complex
Average of 12 proteins per complex
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Functional Classification Of The Complexes
wide functional distribution of complexes
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Protein Complexes are Dynamic
¤ Complexes are not necessarily of invariable composition
– Using distinct tagged proteins as entry points to purify a complex
• Core components can be identified as invariably present
• Regulatory components may be present differentially
¤ Dynamic complexes: e.g. signaling complexes
– The interactions of a signalling enzyme may be sufficiently strong
to allow the detection of distinct cellular complexes
• They may be diagnostic for the role of these enzymes in different
cellular activities
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Higher-order Organization of The Proteome Map
¤ Most complexes are linked together
– Complexes belonging to the same functional class often share
components
• mRNA metabolism, cell cycle, protein synthesis and turnover,
intermediate and energy metabolism
¤ Shared components linking complexes into a network
– The network connections reflect physical interaction of
complexes
• common architecture, localization or regulation
– Relationships between complexes suggests integration and
coordination of cellular functions
– The more connected a complex, the more central its position in
the network
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
The Yeast Protein Complex Network
membrane biogenesis
and traffic
cell polarity
and structure
protein synthesis
and turnover
intermediate and
energy metabolism
signalling
cell cycle
Transcription
DNA maintenance
chromatin structure
RNA metabolism
protein and RNA transport
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Protein Complexes Have a Similar Composition in Yeast
and Human
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Conclusions
¤ The paper clearly demonstrates the merits of the TAP
technology for
– characterizing protein complexes from different compartments,
including low-abundance and large complexes
– TAP data and yeast two-hybrid assay data show only a very small
overlap
• The two methodologies address different aspects of protein
interaction and are complementary
¤ The TAP analysis provides an outline of the eukaryotic
proteome as a network of protein complexes
– The human–yeast orthologous proteome represents core functions
for the eukaryotic cell
• Orthologous proteins are often responsible for essential functions
Reprinted from: Gavin et. al., Nature 415, 141 (2002)
Genome Biology and
Biotechnology
The next frontier: Systems biology
International course 2005
Genomics
Functional Genomics
Systems
Biology
From genes to networks
gene
Molecular Biology
60s to mid 80s
pathway
Molecular Genetics
since mid 80s
network
Systems Biology
since mid 90s
The large-scale organisation of metabolic
networks
Jeong et al (2000) Nature 407: 651
¤ Study of the design principles underlying the
structure of biological systems
– Dissection of integrated “pathway-genome” databases providing
complex connectivity maps
Case study
¤ Analyses of core cellular metabolisms as
– described in the `Intermediate metabolism and bioenergetics'
portions of the WIT database
¤ Prediction of metabolic pathways in organisms
– on the basis of its annotated genome (presence of presumed
open reading frame for enzymes that catalyse a given metabolic
reaction)
– in combination with firmly established data from the biochemical
literature.
¤ 6 archaea, 32 bacteria and 5 eukaryotes
Reprinted from: Jeong et al (2000) Nature 407: 651
Graph theoretic representation
Nodes are substrates
Links are metabolic reactions (with EC enzyme numbers)
Reprinted from: Jeong et al (2000) Nature 407: 651
Theoretical Network Architectures
The World Wide
Web and social
networks have a
scale-free structure
Probability
that
a node has k
links
random
uniform
scale-free
heterogeneous
Reprinted from: Jeong et al (2000) Nature 407: 651
Connectivity distribution
Metabolic networks
are scale-free as
shown by the
distribution of
incoming and outgoing
links for each
substrate.
Archaeglobus fulgidus
E. coli
C. elegans
All 43
This is a general
rule applying to all
organisms studied.
Reprinted from: Jeong et al (2000) Nature 407: 651
Network diameter
Biochemical pathway
length in E. coli
Definition: the
shortest
“pathway”averaged
over all pairs of
substrates
Unexpectedly, network
diameter does not
increase with complexity.
Therefore
interconnectivity grows
with the addition of
substrates.
Average path length
(43)
Archae
Bacteria
Eukarya
incoming links
outgoing links
Reprinted from: Jeong et al (2000) Nature 407: 651
Hub properties
• A few hubs dominate the overall connectivity
•The sequential (“mutations”) removal of the most connected hubs
dramatically increases the network diameter until disintegration
• the metabolic networks seem highly robust in computer simulations
(cf. lethal mutation rate observed in vivo)
Reprinted from: Jeong et al (2000) Nature 407: 651
Conclusions
¤ The structure of biological networks are far from
random
– Their contemporary topology reflects a long evolutionary
process
– They show a robust response towards internal defects
¤ Contrary to other scale-free networks,
– metabolic ones do not grow in diameter with increasing
complexity
– which may be represent an additional (necessary?) survival and
growth advantage
Reprinted from: Jeong et al (2000) Nature 407: 651
Extension of the concept
¤ Protein-protein interaction networks are also scalefree
– yeast Y2H data
¤ The probability for a gene to be essential
– increases with the connectedness of the encoded protein
– 93% of proteins have 5 links or less
• 21% of their genes are essential
– 7% of have more than 15 links
• 62 % of their genes are essential
Jeong et al (2001) Nature 411: 41
Reprinted from: Jeong et al (2001) Nature 411: 41
A long way to go…
¤ List of biological components
– cells, genes, proteins, metabolites
¤ Description of local relationships
¤
¤
¤
¤
–
–
–
–
expression cluster
protein-protein interaction
molecule trafficking
cell-cell crosstalk
Whole system architecture
Dynamic regulatory mechanisms
System behaviour prediction
System manipulation, de novo design
need
more
data!
Thank you!