* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome Biology and
Survey
Document related concepts
G protein–coupled receptor wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Signal transduction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Protein moonlighting wikipedia , lookup
List of types of proteins wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Chemical biology wikipedia , lookup
Transcript
Genome Biology and Biotechnology 9. The localizome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for Biotechnology (VIB) University of Gent International course 2005 Summary ¤ DNA localizome or DNA interactome – Genome-wide mapping of DNA binding proteins • Transcription factor binding sites • Localization of replication origins ¤ Protein localizome – High throughput localization of proteins in cellular compartments Functional Maps or “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001) Genome-wide Analysis of Regulatory Sequences ¤ Gene expression is regulated by transcription factors selectively binding to regulatory regions – protein–DNA interactions involve sequence-specific recognition – Other factors, such as chromatin structure may be involved ¤ Sequence-specific DNA-binding proteins from eukaryotes generally – recognize degenerate motifs of 5–10 base pairs – Consequently, potential recognition sequences for transcription factors occur frequently throughout the genome ¤ Genome-wide surveys of in vivo DNA binding proteins – provides a platform to answer these questions Genome-wide Analysis of Regulatory Sequences ¤ Methods combine – Large-scale analysis of in vivo protein–DNA crosslinking – microarray technology ¤ ChIP-on-chip – Chromatin ImmunoPrecipitation on DNA chips Reprinted from: Biggin M., Nature Genet. 28, 303 (2001) Genome-Wide Location and Function of DNA Binding Proteins Ren et. al., Science, 290, 2306 (2000) ¤ Paper presents – proof of principle for microarray-based approaches to determine the genome-wide location of DNA-bound proteins • Study of the binding sites of a couple of well known gene-specific transcription activators in yeast: Gal4 and Ste12 – Combines data from • in vivo DNA binding analysis with • expression analysis • to identify genes whose expression is directly controlled by these transcription factors Chromatin Immuno Precipitation (Chip) Procedure – Cells are fixed with formaldehyde, harvested, and sonicated – DNA fragments cross-linked to a protein of interest are enriched by immunoprecipitation with a specific antibody – Immuno-precipitated DNA is amplified and labeled with the fluorescent dye Cy5 – Control DNA not enriched by immunoprecipitation is amplified and labeled with the different fluorophore Cy3 – DNAs are mixed and hybridized to a microarray of intergenic sequences – The relative binding of the protein of interest to each sequence is calculated from the IP-enriched/unenriched ratio of fluorescence from 3 experiments Reprinted from: Ren et. al., Science, 290, 2306 (2000) Modified Chromatin Immuno Precipitation (Chip) Procedure Close-up of a scanned image of a micro-array containing 6361 intergenic region DNA fragments of the yeast genome ChIP-enriched DNA fragment Reprinted from: Ren et. al., Science, 290, 2306 (2000) Proof of concept: Gal4 transcription factor ¤ Identification of sites bound by the transcriptional activator Gal4 in the yeast genome and genes induced by galactose – Gal4 activates genes necessary for galactose metabolism • The best characterized transcription factor in yeast – 10 genes were bound by Gal4 and induced in galactose • 7 genes in the Gal pathway, previously reported to be regulated by Gal4 • 3 novel genes: MTH1, PCL10, and FUR4 Reprinted from: Ren et. al., Science, 290, 2306 (2000) Genome-wide location of Gal4 protein Genes whose promoter regions are bound by Gal4 and whose expression levels were induced at least twofold by galactose Reprinted from: Ren et. al., Science, 290, 2306 (2000) Role of Gal4 in Galactose-dependent Cellular Regulation The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains how regulation of several different metabolic pathways can be coordinated increases intracellular pools of uracil Fur4 Pcl10 MTH 1 reduces levels of glucose transporter Reprinted from: Ren et. al., Science, 290, 2306 (2000) Conclusions ¤ The genes whose expression is controlled directly by transcriptional activators in vivo – Are identified by a combination of genome-wide location and expression analysis ¤ Genome-wide location analysis provides information – On the binding sites at which proteins reside in the genome under in vivo conditions Genomic Binding Sites of the Yeast Cell-cycle Transcription Factors SBF and MBF Iyer et al., Nature 409: 533 (2001) ¤ Paper presents – The use of CHIP and DNA microarrays to define the genomic binding sites of the SBF and MBF transcription factors in vivo – The SBF and MBF transcription factors are active in the initiation of the cell division cycle (G1/S) in yeast • A few target genes of SBF and MBF are known but the precise roles of these two transcription factors are unknown • The two transcription factors are heterodimers containing the same Swi6 subunit and a DNA binding subunit – MBF is a heterodimer of Mbp1 and Swi6 – SBF is a heterodimer of Swi4 and Swi6 Genomic targets of SBF and MBF Reprinted from: Iyer et al., Nature 409: 533 (2001) In Vivo Targets of SBF and MBF ¤ The CHIP experiments identified – 163 possible targets of SBF – 87 possible targets of MBF – 43 possible targets of both factors ¤ Support for the possible in vivo targets – Most of the genes downstream of the putative binding sites peak in G1/S – Target genes are highly enriched for functions related to DNA replication, budding and the cell cycle – In vivo binding sites are highly enriched for sequences matching the defined consensus binding sites Reprinted from: Iyer et al., Nature 409: 533 (2001) Transcriptome data for synchronized cell cultures Expression Profiles of SBF and MBF Targets Reprinted from: Iyer et al., Nature 409: 533 (2001) Expression Profiles of SBF and MBF Targets ¤ Why are two different transcription factors used to mediate identical transcriptional programmes during the cell-division cycle in yeast? – A possible answer is suggested by differences in the functions of the genes that they regulate • Many of the targets of SBF have roles in cell-wall biogenesis and budding • 25% of the MBF target genes have known roles in DNA replication, recombination and repair – The results support a model in which • SBF is the principal controller of membrane and cell-wall formation • MBF primarily controls DNA replication ¤ The need for DNA replication and membrane / cellwall biogenesis may be different in the mitotic and meiotic cell cycle Reprinted from: Iyer et al., Nature 409: 533 (2001) A high-resolution map of active promoters in the human genome Kim et. al., Nature 436: 876-880 (2005) ¤ Paper presents – a genome-wide map of active promoters in human fibroblast cells • determined by experimentally locating the sites of RNA polymerase II preinitiation complex (PIC) binding • map defines 10,567 active promoters corresponding to – 6,763 known genes – >1,196 un-annotated transcriptional units – Global view of functional relationships in human cells between • transcriptional machinery • chromatin structure • gene expression Identification of active promoters in the human genome ¤ Microarrays cover – All non-repeat DNA at 100 bp resolution ¤ Pol II preinitiation complex (PIC) – RNA polymerase II – transcription factor IID – general transcription factors ¤ ChIP of PIC-bound DNA – monoclonal antibody against TAF1 subunit of the complex (TBP associated factor 1 ) Reprinted from: Kim et. al., Nature 436: 876-880 (2005) Results from TFIID ChIP-on-chip analysis Reprinted from: Kim et. al., Nature 436: 876-880 (2005) Characterization of active promoters ¤ Matched the 12,150 TFIID-binding sites to – the 5' end of known transcripts in transcript databases – 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of known messenger RNAs ¤ 8,960 promoters were mapped – within annotated boundaries of 6,763 known genes in the EnsEMBL genes Reprinted from: Kim et. al., Nature 436: 876-880 (2005) The chromatin-modification features of the active promoters ¤ Validation of active promoters – ChIP-on-chip using an antiRNAP antibody – ChIP-on-chip analysis using • anti-acetylated histone H3 (AcH3) antibodies • anti-dimethylated lysine 4 on histone H3 (MeH3K4) antibodies • known epigenetic markers of active genes Reprinted from: Kim et. al., Nature 436: 876-880 (2005) TFIID, RNAP, AcH3 and MeH3K4 profiles on the promoter of RPS24 gene Reprinted from: Kim et. al., Nature 436: 876-880 (2005) Additional findings ¤ Promoters of non-coding transcripts – Are very similar to promoters of protein coding genes ¤ Promoters of novel genes – Estimate 13% of human genes remain to be annotated in the genome ¤ Clustering of active promoters – co-regulated genes tend to be organized into coordinately regulated domains ¤ Genes using multiple promoters Reprinted from: Kim et. al., Nature 436: 876-880 (2005) Multiple promoters in human genes ¤ WEE1 gene locus – Two different transcripts with alternative 5’ends • Encoding different proteins – Two different TFIID-binding sites- two promoters – Differential transcription during the cell cycle Reprinted from: Kim et. al., Nature 436: 876-880 (2005) The transcriptome of a cell line ¤ Functional relationship between transcription machinery and gene expression – ¤ correlated genome-wide expression profiles with PIC promoter occupancy Four general classes of promoters I. Actively transcribed genes II. Weakly expressed genes III. Weakly PIC bound genes IV. Inactive genes Reprinted from: Kim et. al., Nature 436: 876-880 (2005) Genome-Wide Distribution of ORC and MCM Proteins in yeast: High-Resolution Mapping of Replication Origins Wyrick et. al., Science, 294, 2357 (2001) ¤ Paper presents – Genome-wide location analysis to map the DNA replication origins in the 16 yeast chromosomes by determining the binding sites of prereplicative complex proteins Chromosome Replication In Eukaryotic Cells ¤ Chromosome replication – initiates from origins of replication distributed along chromosomes – Origins of replication comprise autonomously replicating sequences (ARS) • ARS contain an 11-bp ARS consensus sequence (ACS) – Essential for replication initiation – Recognized by the Origin Recognition Complex (ORC) • The majority of sequence matches to the ACS in the genome do not have ARS activity ¤ Prereplicative complexes at replication origins comprise – Origin Recognition Complex (ORC) proteins – Minichromosome Maintenance (MCM) proteins Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Prereplicative Complexes At Origins Of Replication Reprinted from: Stillman, Science, 294, 2301(2001) ORC- and MCM-binding sites compared with known ARSs ¤ High degree of correlation between MCM and ORC binding sites and known ARSs – Correct identification of 88% known ARSs ¤ The method can accurately identify the position of ARSs to a resolution of 1 kb or less Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Genome-wide Location Of Potential Replication Origins Identification of 429 potential origins on the entire genome Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Conclusions ¤ The ChIP-based method identified the majority of origins found in the analysis of genome-wide replication timing in yeast – and provides direct, high-resolution mapping of potential origins ¤ Similar approaches identified origins in other organisms – For example: Coordination of replication and transcription along a Drosophila chromosome • MacAlpine et al., Genes & Dev. 18: 3094-3105 (2004) Reprinted from: Wyrick et. al., Science, 294, 2357 (2001) Functional Maps or “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001) Global analysis of protein localization in budding yeast Huh et. al., Nature 425, 686 - 691 (2004) ¤ Paper presents – An approach to define the organization of proteins in the context of cellular compartments involving – the construction and analysis of a collection of yeast strains expressing full-length, chromosomally tagged green fluorescent protein fusion proteins Experimental Strategy ¤ Systematic tagging of yeast ORFs with green fluorescent protein (GFP) – GFP is fused to the carboxy terminus of each ORF – Full length fusion proteins are expressed from their native promoters and chromosomal location ¤ The collection of yeast strains expressing GFP fusions was analyzed by – fluorescence microscopy to determine the primary subcellular localization of the fusion proteins • Defines 12 categories – co-localization with red fluorescent protein (RFP) markers to refine the subcellular localization • Defines 11 additional categories Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Construction of GFP fusion proteins ¤ For each ORF a pair of PCR primers was designed – Homologous to the chromosomal insertion site – Matching a GFP – selectable marker construct ¤ Yeast was transformed with the PCR products to generate – Strains expressing chromosomally tagged ORFs Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Representative GFP Images Nucleus Nuclear periphery Bud neck mitochondrion ER Lipid particle Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) GFP and RFP Co-localization Images Nucleolar marker Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Global results 22 categories ¤ Constructed ~6.000 ORF-GFP fusions – 4.156 had localizable GFP signals (~75% of the yeast proteome) – Good concordance with data from earlier studies • GFP does not affect the location • Localized 70% of the new proteins – Major compartments: cytoplasm (30%) and the nucleus (25%) – 20 other compartments: 44% of the proteins ¤ Most the proteins can be located in discrete cellular compartments Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) The proteome of the nucleolus ¤ Detected 164 proteins in the nucleolus – Plus 45 identified in other studies ¤ Data are consistent with MS analysis of human Nucleolar proteins – Allows identification of yeast-human orthologs Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Transcriptional co-regulation and subcellular localization are correlated subcellular localization 33 transcription modules Co-regulated genes Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Conclusion ¤ The high-resolution, high-coverage localization data set – represents 75% of the yeast proteome • classified into 22 distinct subcellular localization categories, ¤ Analysis of these proteins – in the context of transcriptional, genetic, and protein–protein interaction data • provides a comprehensive view of interactions within and between organelles in eukaryotic cells. • helps reveal the logic of transcriptional co-regulation Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004) Genome Biology and Biotechnology 10. The proteome International course 2005 Summary ¤ Protein interactome – Yeast two-hybrid protein interaction mapping ¤ Proteome – Isolation of protein complexes ¤ Multilevel functional genomics – Combination of • phenome analysis • protein interaction mapping Functional Maps or “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001) Basic Concept of the Yeast Two-hybrid System ¤ Eukaryotic transcription factors – activate RNA polymerase II at promoters by binding to upstream activating DNA sequences (UAS) ¤ Basic structure of eukaryotic transcription factors – The DNA binding and the activating functions are located in physically separable domains • The DNA-binding domain (DB) • The activation domain (AD) – The connection between DB and AD is structurally flexible ¤ Protein-protein interactions can reconstitute a functional transcription factor – by bringing the DB domain and the AD domain into close physical proximity Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999) Yeast two-hybrid system ¤ ‘Architectural blueprint’ for a functional transcription factor – DB-X/AD-Y, where X and Y could be essentially any proteins from any organism Gal4 transcription-activation domain prey bait Gal4 DNA binding domain Y X AD DB UAS Upstream Activating Sequence Selectable marker gene Yeast two-hybrid system ¤ The yeast two-hybrid system allows – Genetic selection of genes encoding potential interacting proteins without the need for protein purification • System is to isolate genes encoding proteins that potentially interact with DB-X (referred to as the ‘bait’) in complex AD-Y libraries (referred to as the ‘prey’) – Limitations of the system include • False positives: clones with no biological relevance • False negatives: Failure to identify knowm interactions – Stringent criteria must be used to evaluate both the specificity and the sensitivity of the assay Reprinted from:Vidal M. and Legrain P., Nucleic Acids Res. 27: 919 (1999) Protein Interaction Mapping in C. elegans Using Proteins Involved in Vulval Development Walhout et al, Science 287: 116 (2000) ¤ Landmark paper presents – First demonstration of large-scale two-hybrid analysis for protein interaction mapping in C. elegans • starting with 27 proteins involved in vulval development in C. Elegans Experimental Approach ¤ Start from known genes in vulval development – Used Recombinational cloning to introduce ORFs of 29 known genes involved in vulval development into two-hybrid vectors ¤ Matrix two-hybrid experiment with 29 ORFs – Each DB-vORF/AD-vORF pairwise combination was • tested for protein-protein interactions by scoring two-hybrid phenotypes ¤ Exhaustive two-hybrid screen – using 27 vORF-DB fusion proteins as baits to select interactors from a AD-Y cDNA library • sequenced the selected clones: interaction sequence tag (IST) Reprinted from:Walhout et al, Science 287: 116 (2000) Construction of DB and AD Fusions by Recombinational Cloning DNA binding domain Activation domain Phage lambda excision: Integrase, IHF & Exisionase DB-ORF fusions AD-ORF fusions Reprinted from: Walhout et al, Science 287: 116 (2000) Matrix of Two-hybrid Interactions Between the vORFs Reprinted from:Walhout et al, Science 287: 116 (2000) Interaction Sequence Tag (IST) screening Reprinted from:Walhout et al, Science 287: 116 (2000) Results ¤ Matrix two-hybrid experiment with 29 ORFs – ~ 50% (6 of 11) of the interactions reported were detected • Two novel potential interactions were identified – Typically the yeast two-hybrid system will detect ~50% of the naturally occurring interactions ¤ Two-hybrid screen – Identified 992 AD-Y encoding sequences – ISTs corresponded to a total 124 different interacting proteins • 15 previously known – Provides a functional annotation for 109 predicted genes Reprinted from:Walhout et al, Science 287: 116 (2000) Validation of Potential Interactions ¤ Conservation of interactions in other organisms – If X' and Y' are orthologs of X and Y, respectively • X/Y conserved interactions are referred to as "interologs" Reprinted from:Walhout et al, Science 287: 116 (2000) Validation of Potential Interactions ¤ Systematic clustering analysis – closed loop connections between vORF- encoded proteins • X interacts with Y, Y interacts with Z, Z interacts with W, and so on (X/Y/Z/W/...) Mutations with Similar phenotypes Reprinted from:Walhout et al, Science 287: 116 (2000) Conclusions ¤ Demonstrated the feasibility of generating a genomewide protein interaction maps – Two-hybrid screens are • Simple • sensitive • amenable to high-throughput – Feasible using the C. elegans ORFeome ¤ Y2H detects approximately 50% of the interactions – provides a useful coverage of biologically important interactions Reprinted from:Walhout et al, Science 287: 116 (2000) A Comprehensive Analysis of Protein–protein Interactions in Saccharomyces Cerevisiae Uetz et al., Nature 403: 623 (2000) ¤ Landmark paper presents – The first Large scale high throughput mapping of protein-protein interactions between ORFs predicted in S. cerevisiae using – Two complementary yeast two-hybrid screening strategies • Two-hybrid array of 6.000 hybrid proteins • High-throughput library screen The two-hybrid array screening ¤ Two-hybrid array of 6.000 hybrid proteins comprises – Haploid yeast colonies derived from ~6,000 yeast ORFs fused to the Gal4 activation domain (AD) – The two-hybrid array contained on 16 plates of 384 colonies ¤ Matrix screen for interactions – 192 different Gal4 DB ORF hybrids were mated to the two-hybrid array – 192 two-hybrid array screens were performed in duplicate • Each yielded 1–30 positives • But only ~ 20% were reproduced in the duplicate screen ¤ Putative interacting partners identified – 87/192 DB hybrids yielded putative protein–protein interactions – Identified 281 interacting protein pairs Reprinted from: Uetz et al., Nature 403: 623 (2000) The two-hybrid array screening Positive control: 6,000 haploid yeast Gal4 activation domain - ORF fusions Two-hybrid positives from a mating with a Gal4 DNA-binding domain - ORF fusion 16 microassay plates Reprinted from: Uetz et al., Nature 403: 623 (2000) High-Throughput Library Screen ¤ Used a library Made by pooling ORF-AD fusions – Each ORFs was fused separately to a gal4 activation domain – ORF-AD fusions were pooled to form an activation-domain library • Advantage over traditional cDNA libraries is the uniform presentation of each ORF ¤ Protein interactions were screened by – mating the 6.000 DNA-binding domain hybrids in duplicate to the activation domain library – 817 yeast ORFs (15%) yielded protein–protein interactions – Identified 692 interacting protein pairs • 68% of the interactions were identified multiple times Reprinted from: Uetz et al., Nature 403: 623 (2000) Results of the Systematic Two-Hybrid Screens ¤ The matrix array screens – gave more interactors • 45% of the 192 proteins in the array screens yielded interactions – are much more labour- and material-intensive • limits the number of screens that can be performed • Full matrix would require testing 6.000 * 6.000 = 36.000.000 interactions! ¤ The library screens gave – fewer interactors • 8% of the proteins tested in the library screens yielded interactions – a much higher throughput Reprinted from: Uetz et al., Nature 403: 623 (2000) Analysis of the protein-protein interactions ¤ The analysis reveals – Interactions that place unknown proteins into a biological context – Novel interactions between proteins involved in the same biological function – Novel interactions that connect biological functions into larger cellular processes Interactions involving unknown proteins Reprinted from: Uetz et al., Nature 403: 623 (2000) Interactions Between Proteins in the RNA Splicing Complex Interactions are consistent with the crystallographic data Reprinted from: Uetz et al., Nature 403: 623 (2000) Interaction Connecting two different Complexes spindle checkpoint complex microtubule checkpoint complex Reprinted from: Uetz et al., Nature 403: 623 (2000) Analysis of Interologs Yeast Human Reprinted from: Uetz et al., Nature 403: 623 (2000) Conclusions ¤ The two-hybrid array approach is feasible – for systematic genome-wide analysis of protein interactions ¤ The large scale mapping of protein-protein interactions reveals – many new interactions between proteins – that protein interactions should be viewed as potential interactions that must be confirmed independently – This conclusion is supported by the fact that the results of different screens only partially overlap Reprinted from: Uetz et al., Nature 403: 623 (2000) A Map of the Interactome Network of the Metazoan C. elegans Li et. al., Science, 303, 540-543 (2004) ¤ Paper presents – Large scale mapping of protein-protein interaction in C. elegans using yeast two-hybrid screens with a subset of metazoanspecific proteins • identified > 4000 interactions – Together with already described Y2H interactions and interologs predicted in silico, • the current version of the Worm Interactome map contains 5500 interactions Worm Interactome map Phylogenetic classes Eukaryotic Multi cellular Worm Reprinted from: Li et. al., Science, 303, 540-543 (2004) A Protein Interaction Map of Drosophila melanogaster Giot et. al., Science, 302, 1727-1736 (2003) ¤ Paper presents – a two-hybrid–based protein-interaction map of the fly proteome by screening 10,623 ORFs against cDNA libraries to produce • a draft map of 7048 proteins and 20,405 interactions. • Computational rating of interaction confidence produced – a high confidence interaction network of 4679 proteins and 4780 interactions showing two levels of organization • a short-range organization, presumably corresponding to multiprotein complexes • a more global organization, presumably corresponding to intercomplex connections The fly proteininteraction map: Protein family/human disease orthologs Reprinted from: Giot et. al., Science, 302, 1727-1736 (2003) The fly proteininteraction map: Subcellular localization Reprinted from: Giot et. al., Science, 302, 1727-1736 (2003) Towards a proteome-scale map of the human protein–protein interaction network Rual et. al., Nature 424: 1173-1178 (2005) ¤ Paper presents – First step towards a systematic and comprehensive analysis of the human interactome using • stringent, high-throughput yeast two-hybrid system to test pairwise interactions among the products of 8,100 currently available Gateway-cloned open reading frames High-throughput yeast two-hybrid pipeline ¤ Stringent test – Second test using GAL1::HIS3 and GAL1::lacZ – Reduces the number of false positives ¤ Detected 2,800 interactions Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005) Overlap of CCSB-HI1 with literature data ¤ Compared the overlap between – Observed interactions – Interactions reported in the literature ¤ Conclude that the CCSB-HI1 data set contains 1% of the human interactome – Human interactome is estimated at 200.000 to 300.000 interactions. Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005) Interaction network of disease-associated CCSB-HI1 proteins ¤ The human interactome will further – the understanding of human health and disease ¤ Illustrated by – The network of diseaseassociated proteins (green nodes) • EWS protein Reprinted from: Rual et. al., Nature 424: 1173-1178 (2005) Functional Maps or “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” ORFeome Genes Phenome Mutational phenotypes Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001) Proteome Analysis ¤ Large scale and comprehensive analysis of the proteome has so far not been feasible – Lack of suitable and sensitive protein fractionation methods • 2-D gels are limited to a few 1000 proteins only – the most abundant – Protein characterization is slow and laborious • Despite enormous improvements in mass spectrometry, the characterization of individual proteins remains the bottleneck – Level of proteome characterization to date is in the order of a few 1000 proteins at best • Represents 5% to 25% of the proteome ¤ Tandem affinity purification (TAP) technology constitutes an important breakthrough – Fast and reliable method of protein purification A generic protein purification method for protein complex characterization Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) ¤ Paper presents – a generic procedure to purify protein complexes under native conditions using • tandem affinity purification (TAP) tag procedure – Using a combination of high-affinity tags for purification Tag-based Characterization of protein complexes Reprinted from: Kumar A. and Snyder M., Nature 415, 123(2002) High-affinity Tags ¤ High-affinity protein tags – Must allow efficient recovery of proteins present at low concentrations • ProtA tag: two IgG-binding units of protein A of S. aureus – released from matrix-bound IgG under denaturing conditions • CBP tag: calmodulin-binding peptide – released from the affinity column under mild conditions ¤ Tandem affinity purification (TAP) tag – A fusion cassette encoding both the ProtA tag and the CBP tag • Separated by a specific TEV protease recognition sequence which allows proteolytic release of the bound material under native conditions Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Tandem affinity purification (TAP) tag CBP ProtA Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) The TAP Purification Procedure ProtA affinity purification step TEV protease cleavage step CBP affinity purification step Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Advantage of the Two-step Procedure ¤ Purification of U1 snRNP – Single-step affinity purification yields a high level of contaminating proteins – Tow-step affinity purification yields highly specific purification with very low background Reprinted from: Rigaut et. al., Nat. Biotechnol. 17, 1030 (1999) Functional organization of the yeast proteome by systematic analysis of protein complexes Gavin et. al., Nature 415, 141 (2002) ¤ Landmark paper presents – Large-scale application of the TAP technology for a systematic analysis of multiprotein complexes from yeast • Generated gene-specific TAP tag cassettes by PCR • Insert TAP cassettes by homologous recombination at the 3' end of the genes to generate fusion proteins in their native location • Purified protein assemblies from cellular lysates by TAP – Separate purified assemblies by denaturing gel electrophoresis – Digest individual bands by trypsin • Analyze peptides by MALDI–TOF MS to identify the proteins using database search algorithms The Gene Targeting Procedure TAP tag gene-specific cassette Reprinted from: Gavin et. al., Nature 415, 141 (2002) Large-scale Analysis of Protein Complexes ¤ Experimental outline – Started with a selection of 1,739 genes • 1,143 genes representing eukaryotic orthologues • 596 genes nonorthologous set – Generated 1,167 strains expressing tagged proteins to detectable levels – Analyzed 589 protein complexes • Comprising 418 different orthologues – Generated 20,946 samples for mass spectrometry • Identified 16,830 proteins – Characterized a total of 232 protein complexes • Comprising 1,440 distinct proteins ~ 25% of the ORFs in the genome Reprinted from: Gavin et. al., Nature 415, 141 (2002) Purification and Identification of TAP Complexes Reprinted from: Gavin et. al., Nature 415, 141 (2002) Sensitivity and Specificity of the Approach ¤ Very efficient large-scale purification and identification of protein complexes – 78% of the 589 purified complexes have associated proteins – The remaining 22% showing no interacting proteins • May not form stable or soluble complexes • The TAP tag may interfere with complex assembly or function ¤ Complexes are stable and show the same composition when purified with different entry points – Example: the polyadenylation machinery, responsible for eukaryotic messenger RNA cleavage and polyadenylation • Identified 12 of the 13 known components • Identified 7 new components Reprinted from: Gavin et. al., Nature 415, 141 (2002) The Polyadenylation Protein Complex new components of the polyadenylation complex Reprinted from: Gavin et. al., Nature 415, 141 (2002) Composition of the Polyadenylation Complex protein tagged for affinity purification < Reprinted from: Gavin et. al., Nature 415, 141 (2002) Reliability of the TAP Method ¤ High sensitivity – identify proteins present at 15 copies per cell ¤ High reproducibility – 70% of the proteins are detected in independent purifications ¤ Low background – The background comprises highly expressed proteins • Identified 17 contaminant proteins (heat-shock and ribosomal proteins) ¤ Limitations – 18% of the tagged essential genes gave no viable strains • The carboxy-terminal tagging can impair protein function Reprinted from: Gavin et. al., Nature 415, 141 (2002) Organization of the purified assemblies into complexes ¤ 589 purified complexes characterized – 245 complexes corresponded to 98 known multiprotein complexes in yeast – 242 complexes correspond to 134 new complexes ¤ In total 232 annotated TAP complexes are identified – 102 proteins showed no detectable association with other proteins Reprinted from: Gavin et. al., Nature 415, 141 (2002) Number Of Proteins Per Complex Average of 12 proteins per complex Reprinted from: Gavin et. al., Nature 415, 141 (2002) Functional Classification Of The Complexes wide functional distribution of complexes Reprinted from: Gavin et. al., Nature 415, 141 (2002) Protein Complexes are Dynamic ¤ Complexes are not necessarily of invariable composition – Using distinct tagged proteins as entry points to purify a complex • Core components can be identified as invariably present • Regulatory components may be present differentially ¤ Dynamic complexes: e.g. signaling complexes – The interactions of a signalling enzyme may be sufficiently strong to allow the detection of distinct cellular complexes • They may be diagnostic for the role of these enzymes in different cellular activities Reprinted from: Gavin et. al., Nature 415, 141 (2002) Higher-order Organization of The Proteome Map ¤ Most complexes are linked together – Complexes belonging to the same functional class often share components • mRNA metabolism, cell cycle, protein synthesis and turnover, intermediate and energy metabolism ¤ Shared components linking complexes into a network – The network connections reflect physical interaction of complexes • common architecture, localization or regulation – Relationships between complexes suggests integration and coordination of cellular functions – The more connected a complex, the more central its position in the network Reprinted from: Gavin et. al., Nature 415, 141 (2002) The Yeast Protein Complex Network membrane biogenesis and traffic cell polarity and structure protein synthesis and turnover intermediate and energy metabolism signalling cell cycle Transcription DNA maintenance chromatin structure RNA metabolism protein and RNA transport Reprinted from: Gavin et. al., Nature 415, 141 (2002) Protein Complexes Have a Similar Composition in Yeast and Human Reprinted from: Gavin et. al., Nature 415, 141 (2002) Conclusions ¤ The paper clearly demonstrates the merits of the TAP technology for – characterizing protein complexes from different compartments, including low-abundance and large complexes – TAP data and yeast two-hybrid assay data show only a very small overlap • The two methodologies address different aspects of protein interaction and are complementary ¤ The TAP analysis provides an outline of the eukaryotic proteome as a network of protein complexes – The human–yeast orthologous proteome represents core functions for the eukaryotic cell • Orthologous proteins are often responsible for essential functions Reprinted from: Gavin et. al., Nature 415, 141 (2002) Genome Biology and Biotechnology The next frontier: Systems biology International course 2005 Genomics Functional Genomics Systems Biology From genes to networks gene Molecular Biology 60s to mid 80s pathway Molecular Genetics since mid 80s network Systems Biology since mid 90s The large-scale organisation of metabolic networks Jeong et al (2000) Nature 407: 651 ¤ Study of the design principles underlying the structure of biological systems – Dissection of integrated “pathway-genome” databases providing complex connectivity maps Case study ¤ Analyses of core cellular metabolisms as – described in the `Intermediate metabolism and bioenergetics' portions of the WIT database ¤ Prediction of metabolic pathways in organisms – on the basis of its annotated genome (presence of presumed open reading frame for enzymes that catalyse a given metabolic reaction) – in combination with firmly established data from the biochemical literature. ¤ 6 archaea, 32 bacteria and 5 eukaryotes Reprinted from: Jeong et al (2000) Nature 407: 651 Graph theoretic representation Nodes are substrates Links are metabolic reactions (with EC enzyme numbers) Reprinted from: Jeong et al (2000) Nature 407: 651 Theoretical Network Architectures The World Wide Web and social networks have a scale-free structure Probability that a node has k links random uniform scale-free heterogeneous Reprinted from: Jeong et al (2000) Nature 407: 651 Connectivity distribution Metabolic networks are scale-free as shown by the distribution of incoming and outgoing links for each substrate. Archaeglobus fulgidus E. coli C. elegans All 43 This is a general rule applying to all organisms studied. Reprinted from: Jeong et al (2000) Nature 407: 651 Network diameter Biochemical pathway length in E. coli Definition: the shortest “pathway”averaged over all pairs of substrates Unexpectedly, network diameter does not increase with complexity. Therefore interconnectivity grows with the addition of substrates. Average path length (43) Archae Bacteria Eukarya incoming links outgoing links Reprinted from: Jeong et al (2000) Nature 407: 651 Hub properties • A few hubs dominate the overall connectivity •The sequential (“mutations”) removal of the most connected hubs dramatically increases the network diameter until disintegration • the metabolic networks seem highly robust in computer simulations (cf. lethal mutation rate observed in vivo) Reprinted from: Jeong et al (2000) Nature 407: 651 Conclusions ¤ The structure of biological networks are far from random – Their contemporary topology reflects a long evolutionary process – They show a robust response towards internal defects ¤ Contrary to other scale-free networks, – metabolic ones do not grow in diameter with increasing complexity – which may be represent an additional (necessary?) survival and growth advantage Reprinted from: Jeong et al (2000) Nature 407: 651 Extension of the concept ¤ Protein-protein interaction networks are also scalefree – yeast Y2H data ¤ The probability for a gene to be essential – increases with the connectedness of the encoded protein – 93% of proteins have 5 links or less • 21% of their genes are essential – 7% of have more than 15 links • 62 % of their genes are essential Jeong et al (2001) Nature 411: 41 Reprinted from: Jeong et al (2001) Nature 411: 41 A long way to go… ¤ List of biological components – cells, genes, proteins, metabolites ¤ Description of local relationships ¤ ¤ ¤ ¤ – – – – expression cluster protein-protein interaction molecule trafficking cell-cell crosstalk Whole system architecture Dynamic regulatory mechanisms System behaviour prediction System manipulation, de novo design need more data! Thank you!