Download Genome Biology and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein moonlighting wikipedia , lookup

Signal transduction wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

List of types of proteins wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
Genome Biology and
Biotechnology
9. The localizome
Prof. M. Zabeau
Department of Plant Systems Biology
Flanders Interuniversity Institute for Biotechnology (VIB)
University of Gent
International course 2005
Summary
¤ DNA localizome or DNA interactome
– Genome-wide mapping of DNA binding proteins
• Transcription factor binding sites
• Localization of replication origins
¤ Protein localizome
– High throughput localization of proteins in cellular
compartments
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Genome-wide Analysis of Regulatory Sequences
¤ Gene expression is regulated by transcription factors
selectively binding to regulatory regions
– protein–DNA interactions involve sequence-specific recognition
– Other factors, such as chromatin structure may be involved
¤ Sequence-specific DNA-binding proteins from
eukaryotes generally
– recognize degenerate motifs of 5–10 base pairs
– Consequently, potential recognition sequences for transcription
factors occur frequently throughout the genome
¤ Genome-wide surveys of in vivo DNA binding proteins
– provides a platform to answer these questions
Genome-wide Analysis of Regulatory Sequences
¤ Methods combine
– Large-scale analysis of in vivo
protein–DNA crosslinking
– microarray technology
¤ ChIP-on-chip
– Chromatin ImmunoPrecipitation on DNA chips
Reprinted from: Biggin M., Nature Genet. 28, 303 (2001)
Genome-Wide Location and Function of DNA Binding
Proteins
Ren et. al., Science, 290, 2306 (2000)
¤ Paper presents
– proof of principle for microarray-based approaches to determine
the genome-wide location of DNA-bound proteins
• Study of the binding sites of a couple of well known gene-specific
transcription activators in yeast: Gal4 and Ste12
– Combines data from
• in vivo DNA binding analysis with
• expression analysis
• to identify genes whose expression is directly controlled by these
transcription factors
Chromatin Immuno Precipitation (Chip) Procedure
– Cells are fixed with formaldehyde, harvested, and sonicated
– DNA fragments cross-linked to a protein of interest are enriched by
immunoprecipitation with a specific antibody
– Immuno-precipitated DNA is amplified and labeled with the fluorescent
dye Cy5
– Control DNA not enriched by immunoprecipitation is amplified and
labeled with the different fluorophore Cy3
– DNAs are mixed and hybridized to a microarray of intergenic sequences
– The relative binding of the protein of interest to each sequence is
calculated from the IP-enriched/unenriched ratio of fluorescence from
3 experiments
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Modified Chromatin Immuno Precipitation (Chip) Procedure
Close-up of a scanned image of a micro-array containing 6361 intergenic
region DNA fragments of the yeast genome
ChIP-enriched
DNA fragment
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Proof of concept: Gal4 transcription factor
¤ Identification of sites bound by the transcriptional
activator Gal4 in the yeast genome and genes induced
by galactose
– Gal4 activates genes necessary for galactose metabolism
• The best characterized transcription factor in yeast
– 10 genes were bound by Gal4 and induced in galactose
• 7 genes in the Gal pathway, previously reported to be
regulated by Gal4
• 3 novel genes: MTH1, PCL10, and FUR4
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Genome-wide location of Gal4 protein
Genes whose promoter regions are bound by Gal4 and whose expression levels
were induced at least twofold by galactose
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Role of Gal4 in Galactose-dependent Cellular Regulation
The identification of MTH1, PCL10, and FUR4 as Gal4-regulated genes explains
how regulation of several different metabolic pathways can be coordinated
increases
intracellular pools of
uracil
Fur4
Pcl10
MTH
1
reduces levels of
glucose transporter
Reprinted from: Ren et. al., Science, 290, 2306 (2000)
Conclusions
¤ The genes whose expression is controlled directly by
transcriptional activators in vivo
– Are identified by a combination of genome-wide location and
expression analysis
¤ Genome-wide location analysis provides information
– On the binding sites at which proteins reside in the genome
under in vivo conditions
Genomic Binding Sites of the Yeast Cell-cycle
Transcription Factors SBF and MBF
Iyer et al., Nature 409: 533 (2001)
¤ Paper presents
– The use of CHIP and DNA microarrays to define the genomic binding sites
of the SBF and MBF transcription factors in vivo
– The SBF and MBF transcription factors are active in the initiation of the
cell division cycle (G1/S) in yeast
• A few target genes of SBF and MBF are known but the precise roles of
these two transcription factors are unknown
• The two transcription factors are heterodimers containing the same
Swi6 subunit and a DNA binding subunit
– MBF is a heterodimer of Mbp1 and Swi6
– SBF is a heterodimer of Swi4 and Swi6
Genomic targets of SBF and MBF
Reprinted from: Iyer et al., Nature 409: 533 (2001)
In Vivo Targets of SBF and MBF
¤ The CHIP experiments identified
– 163 possible targets of SBF
– 87 possible targets of MBF
– 43 possible targets of both factors
¤ Support for the possible in vivo targets
– Most of the genes downstream of the putative binding sites peak
in G1/S
– Target genes are highly enriched for functions related to DNA
replication, budding and the cell cycle
– In vivo binding sites are highly enriched for sequences matching
the defined consensus binding sites
Reprinted from: Iyer et al., Nature 409: 533 (2001)
Transcriptome data for synchronized cell cultures
Expression Profiles
of SBF and MBF
Targets
Reprinted from: Iyer et al.,
Nature 409: 533 (2001)
Expression Profiles of SBF and MBF Targets
¤ Why are two different transcription factors used to
mediate identical transcriptional programmes during
the cell-division cycle in yeast?
– A possible answer is suggested by differences in the functions
of the genes that they regulate
• Many of the targets of SBF have roles in cell-wall biogenesis and
budding
• 25% of the MBF target genes have known roles in DNA replication,
recombination and repair
– The results support a model in which
• SBF is the principal controller of membrane and cell-wall formation
• MBF primarily controls DNA replication
¤ The need for DNA replication and membrane / cellwall biogenesis may be different in the mitotic and
meiotic cell cycle
Reprinted from: Iyer et al., Nature 409: 533 (2001)
A high-resolution map of active promoters in the
human genome
Kim et. al., Nature 436: 876-880 (2005)
¤ Paper presents
– a genome-wide map of active promoters in human fibroblast cells
• determined by experimentally locating the sites of RNA polymerase
II preinitiation complex (PIC) binding
• map defines 10,567 active promoters corresponding to
– 6,763 known genes
– >1,196 un-annotated transcriptional units
– Global view of functional relationships in human cells between
• transcriptional machinery
• chromatin structure
• gene expression
Identification of active promoters in the human genome
¤ Microarrays cover
– All non-repeat DNA at 100 bp
resolution
¤ Pol II preinitiation complex
(PIC)
– RNA polymerase II
– transcription factor IID
– general transcription factors
¤ ChIP of PIC-bound DNA
– monoclonal antibody against
TAF1 subunit of the complex
(TBP associated factor 1 )
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Results from TFIID ChIP-on-chip analysis
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Characterization of active promoters
¤ Matched the 12,150 TFIID-binding sites to
– the 5' end of known transcripts in transcript databases
– 87% of the PIC-binding sites were within 2.5 kb of annotated 5' ends of
known messenger RNAs
¤ 8,960 promoters were mapped
– within annotated boundaries of 6,763 known genes in the EnsEMBL genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The chromatin-modification features of the
active promoters
¤ Validation of active
promoters
– ChIP-on-chip using an antiRNAP antibody
– ChIP-on-chip analysis using
• anti-acetylated histone H3
(AcH3) antibodies
• anti-dimethylated lysine 4 on
histone H3 (MeH3K4)
antibodies
• known epigenetic markers of
active genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
TFIID, RNAP, AcH3 and MeH3K4 profiles on
the promoter of RPS24 gene
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Additional findings
¤ Promoters of non-coding transcripts
– Are very similar to promoters of protein coding genes
¤ Promoters of novel genes
– Estimate 13% of human genes remain to be annotated in the genome
¤ Clustering of active promoters
– co-regulated genes tend to be organized into coordinately regulated
domains
¤ Genes using multiple promoters
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Multiple promoters in human genes
¤ WEE1 gene locus
– Two different transcripts with alternative 5’ends
• Encoding different proteins
– Two different TFIID-binding sites- two promoters
– Differential transcription during the cell cycle
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
The transcriptome of a cell line
¤
Functional relationship between transcription machinery and gene
expression
–
¤
correlated genome-wide expression profiles with PIC promoter
occupancy
Four general classes of promoters
I.
II.
III.
IV.
Actively transcribed genes
Weakly expressed genes
Weakly PIC bound genes
Inactive genes
Reprinted from: Kim et. al., Nature 436: 876-880 (2005)
Genome-Wide Distribution of ORC and MCM Proteins in
yeast: High-Resolution Mapping of Replication Origins
Wyrick et. al., Science, 294, 2357 (2001)
¤ Paper presents
– Genome-wide location analysis to map the DNA replication origins in the
16 yeast chromosomes by determining the binding sites of prereplicative
complex proteins
Chromosome Replication In Eukaryotic Cells
¤ Chromosome replication
– initiates from origins of replication distributed along
chromosomes
– Origins of replication comprise autonomously replicating
sequences (ARS)
• ARS contain an 11-bp ARS consensus sequence (ACS)
– Essential for replication initiation
– Recognized by the Origin Recognition Complex (ORC)
• The majority of sequence matches to the ACS in the genome do not
have ARS activity
¤ Prereplicative complexes at replication origins comprise
– Origin Recognition Complex (ORC) proteins
– Minichromosome Maintenance (MCM) proteins
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Prereplicative Complexes At Origins Of
Replication
Reprinted from: Stillman, Science, 294, 2301(2001)
ORC- and MCM-binding sites compared with known ARSs
¤ High degree of correlation
between MCM and ORC
binding sites and known ARSs
– Correct identification of 88%
known ARSs
¤ The method can accurately
identify the position of ARSs to
a resolution of 1 kb or less
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Genome-wide Location
Of Potential Replication
Origins
Identification of 429
potential origins on the
entire genome
Reprinted from: Wyrick et.
al., Science, 294, 2357 (2001)
Conclusions
¤ The ChIP-based method identified the majority of
origins found in the analysis of genome-wide
replication timing in yeast
– and provides direct, high-resolution mapping of potential origins
¤ Similar approaches identified origins in other
organisms
– For example: Coordination of replication and transcription along
a Drosophila chromosome
• MacAlpine et al., Genes & Dev. 18: 3094-3105 (2004)
Reprinted from: Wyrick et. al., Science, 294, 2357 (2001)
Functional Maps
or “-omes”
Genes or proteins
1 2 3 4 5
n
“Conditions”
ORFeome
Genes
Phenome
Mutational phenotypes
Transcriptome
Expression profiles
DNA Interactome
Protein-DNA interactions
Localizome
Cellular, tissue location
Interactome
Protein interactions
Proteome
proteins
After: Vidal M., Cell, 104, 333 (2001)
Global analysis of protein localization in budding
yeast
Huh et. al., Nature 425, 686 - 691 (2004)
¤ Paper presents
– An approach to define the organization of proteins in the context of
cellular compartments involving
– the construction and analysis of a collection of yeast strains expressing
full-length, chromosomally tagged green fluorescent protein fusion
proteins
Experimental Strategy
¤ Systematic tagging of yeast ORFs with green
fluorescent protein (GFP)
– GFP is fused to the carboxy terminus of each ORF
– Full length fusion proteins are expressed from their native
promoters and chromosomal location
¤ The collection of yeast strains expressing GFP fusions
was analyzed by
– fluorescence microscopy to determine the primary subcellular
localization of the fusion proteins
• Defines 12 categories
– co-localization with red fluorescent protein (RFP) markers to
refine the subcellular localization
• Defines 11 additional categories
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Construction of GFP fusion proteins
¤ For each ORF a pair of PCR primers was designed
– Homologous to the chromosomal insertion site
– Matching a GFP – selectable marker construct
¤ Yeast was transformed with the PCR products to generate
– Strains expressing chromosomally tagged ORFs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Representative GFP Images
Nucleus
Nuclear periphery
Bud neck
mitochondrion
ER
Lipid particle
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
GFP and RFP Co-localization Images
Nucleolar marker
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Global results
22 categories
¤ Constructed ~6.000 ORF-GFP
fusions
– 4.156 had localizable GFP signals
(~75% of the yeast proteome)
– Good concordance with data from
earlier studies
• GFP does not affect the location
• Localized 70% of the new proteins
– Major compartments: cytoplasm
(30%) and the nucleus (25%)
– 20 other compartments: 44% of the
proteins
¤ Most the proteins can be located in
discrete cellular compartments
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
The proteome of the nucleolus
¤ Detected 164 proteins
in the nucleolus
– Plus 45 identified in other
studies
¤ Data are consistent with
MS analysis of human
Nucleolar proteins
– Allows identification of
yeast-human orthologs
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Transcriptional co-regulation and subcellular
localization are correlated
subcellular localization
33 transcription modules
Co-regulated genes
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Conclusion
¤ The high-resolution, high-coverage localization data
set
– represents 75% of the yeast proteome
• classified into 22 distinct subcellular localization categories,
¤ Analysis of these proteins
– in the context of transcriptional, genetic, and protein–protein
interaction data
• provides a comprehensive view of interactions within and between
organelles in eukaryotic cells.
• helps reveal the logic of transcriptional co-regulation
Reprinted from: Huh et. al., Nature 425, 686 - 691 (2004)
Recommended reading
¤ DNA-interactome
– Genome-Wide Location of DNA Binding Proteins
• Ren et. al., Science, 290, 2306 (2000)
– Map of active promoters in the human genome
• Kim et. al., Nature 436: 876-880 (2005)
¤ Global analysis of protein localization in yeast
• Huh et. al., Nature 425, 686 - 691 (2004)
Further reading
¤ Genome-Wide Location of DNA Binding Proteins
– Genomic Binding Sites of the Yeast Cell-cycle Transcription
Factors SBF and MBF
• Iyer et al., Nature 409: 533 (2001)
– High-Resolution Mapping of Replication Origins
• Wyrick et. al., Science, 294, 2357 (2001)