Download DNA:chromatin interactions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein purification wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Proteomics wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

List of types of proteins wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Transcript
DNA:chromatin interactions
Exploring transcription factor binding
and the epigenomic landscape
Lisa Stubbs
Eukaryotic genomes are complex structures comprised of
modified and unmodified DNA, RNA and many types of
interacting proteins
• Most DNA is wrapped around a “histone core”, to form nucleosomes
• The classical histone protein complexes bind very tightly to DNA and prevent
association with other proteins
• Modifications of the classical histones, or their replacement with unusual histone
types under certain conditions, can “loosen” the interaction with DNA, allowing
access to transcription factors, RNA polymerase, and other proteins
All four histones in the tetramer have “tails” that can
be modified in various ways, but the most
consequential modifications, with respect to
transcriptional activity, appear to involve methylation
or acetylation of Lysines (K) in histone H3
Histone H3 modifications, especially methylation and
Acetylation, mark “open” or “closed” DNA
• CLOSED: Histones bound more tightly to DNA
– H3K27Me3, H3K9Me3
• OPEN: Histones can be displaced by TFs, RNA Polymerase, and
other proteins
– H3K27Ac, H3K4me1, H3K4me3
• Histone marks, together with other assays of open chromatin,
are presently the only reliable indicators of the locations and
activities of regulatory elements
Many types of regulatory elements
• “Docking sites” for site-specific regulatory proteins
– Transcription factors, TATA binding factors, and other site-specific binders
– Recruit additional proteins: co-factors, RNA polymerase and others
• Enhancers
– Tissue-specific activators of transcription
– Binding sites for proteins that interact with the promoter to enhance transcription
• Silencers
– Also prevalent, but more difficult to detect and assay
– Many transcription factors repress, rather than enhance, gene expression
– “Enhancers” and “Silencers” are not mutually exclusive! Most regulatory elements can
serve either function, depending on the proteins bound at a particular time
• Insulators
– “boundary elements” that shield genes from the enhancers or heterochromatin proteins
in neighboring gene “territories”
– Involved in establishing loop structures that isolate genes
How to find them? Chromatin ImmunoPrecipitation (ChIP)
•
Antibody to a DNA binding protein is
used to “fish out” DNA bound to the
protein in a living cell
– DNA and protein are crosslinked in the cell
using brief treatment with low
concentration of high quality formaldehyde
– Crosslinked chromatin is sheared, usually by
sonication, to yield short fragments of
DNA+protein complexes
– Antibody to a TF or other binding protein
used to fish out fragments containing that
DNA binding protein
– DNA is then “released” and can be analyzed
by various methods:
• Original method is PCR: query for
enrichment of specific (known or
suspected) DNA binding regions in
ChIP-enriched DNA
• Creates a pool of sequences highly
enriched in binding sites for a
particular protein
– Requires availability of excellent
antibodies that can detect the protein in
its in vivo context
ChIP can be used to map DNA:protein interactions of
virtually any type
• Histone modifications:
• Secondary interactions (no direct linkage to DNA)
– Histone modifying proteins, such as SWI/SNF, histone deacetylases,
histone methylases
– Cofactors that bind to TFs at particular sites, and that stablize
chromatin loops
– Proteins that link chromatin to nuclear matrix
• RNA polymerase and elongation factors, to find promoters
and active sites of transcription
• Proteins involved in DNA recombination, repair, and
replication
• All of these methods require highly specific and efficient
antibodies (which are rare!)
ChIP Analytical challenges
• Genomic neighborhoods
– Shear efficiency is not really “random”
• Some genomic regions are fragile and sensitive
• Some regions are protected from shear or degradation
– Other artifacts
• Centromeres: repeat sequences that are not all represented in the
genome sequence build
• Polymorphic regions, and e.g. regions that are amplified in cell line DNA
• Repeats: most programs cannot manage sequence reads that are not
mapped uniquely
• Peak width
– Transcription factors are typically sharp peaks; chromatin marks are more
diffuse
• The best tools permit the user to modify these parameters
– MACS ( Xiaole Liu Lab; Zhang et al, 2008; Feng et al. Nature Methods
2102) is a user-friendly and widely used tool
– HOMER, a highly versatile tool with many different annotation features
and high sensitivity (Chris Benner, http://homer.salk.edu/homer/ngs/)
ChIP computational issues
• First step is to map reads:
BOWTIE, Novalign, BWA or other
• ChIP seq reads surround but may
not contain the DNA binding site
•
Sequence is generated from the ends
of randomly sheared fragments,
which overlap at the protein binding
site
• Gives rise to two adjacent sets of
read peaks separated by ~ 2X
fragment length
• Defines a “shift” distance between
read peaks at which you will find
the true ChIP peak summit
• Programs like MACS and HOMER
automatically subtracts your
control (genomic input) from
sample reads to define a final set
of peaks
Binding site
Seq reads
ChIP fragments
Traditional methods fail with broad, flat peaks
• Most tools designed for TF proteins: discreet, sharp peaks
• Certain chromatin proteins, and modified histones in certain regions, bind
continuously to large regions of chromatin and do not yield “peaks”
• MACS in default mode will carve the “mesa” into many peaks, or not
detect it at all
• New settings in MACS 2 can be set to overcome this problem
• HOMER has a wide variety of settings ideal for data of different types
Scale
chr7:
HOXA3
HOXA3
Enhanced H3K27Ac
Layered H3K4Me1
Layered H3K4Me3
hg18
20 kb
27,140,000 27,150,000 27,160,000 27,170,000 27,180,000 27,190,000 27,200,000 27,210,000
RefSeq Genes
HOXA-AS3
HOXA-AS4
HOXA11-AS
HOTTIP
HOXA4
HOXA5
HOXA7
MIR196B
HOXA11
HOXA13
HOXA6
HOXA9
HOXA-AS3
HOXA10
HOXA10-HOXA9
HOXA10
ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on 8 Cell Lines
ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on 8 Cell Lines
ENCODE Promoter-Associated Histone Mark (H3K4Me3) on 9 Cell Lines
NHGRI Catalog of Published Genome-Wide Association Studies
DNAse sensitivity assays are antibody free
The first approach: from Crawford et al., Genome
Research 16:123, 2006 (Francis Collins’ laboratory)
1.
Digest with DNAse I to “erase” all the
hypersensitive regions
–
2.
3.
4.
5.
6.
Easier to do– less need to optimize and
minimize DNAse cutting
Polish and ligate the remaining doublestrand ends
Ligate 5’-biotinylated linkers to the DS
ends
Shear (sonicate) or restriction-digest DNA
into smaller fragments
Purify end sequences on a streptavidin
column
Release sequences, add new linkers, and
sequence
–
Does not allow footprinting, because TF binding
sites inside the HS regions have been digested
away
Latest (and better) approach: sequences DNAse sensitive regions
per se and permits transcription factor “Footprinting”
• The easiest method uses
low concentrations of
Dnase I to generate short
fragments at sensitive
(“open) sites
• Released fragments can be
blunt-ended, ligated to
linkers and sequenced
directly
• Permits DNase
Footprinting: Very deep
sequencing can “see” short
protected regions that are
absent from the released
DNA, and appear as
protected “valleys” inside
the DNAse sensitive peaks
– protected from DNAse I
because they are occupied by
TF proteins
Related methods and twists on the theme
(see Furey et al., 2012 for review)
• Exo-ChIP
– Follows sonication with an exonuclease step, to “pare back” all but the
protein-protected region in ChIP
• “Nano-ChIP”
– ChIP normally required ~107 cells as input; hard to achieve for many cell types
– Nano ChIP can be carried out in several ways:
1. With carrier DNA: not the best for sequence analysis but can be done
2. Amplification after ChIP: very tricky because it can cause serious biases and
artifacts, but can be done with care; linear amplification is the best strategy
3. Tagmentation: a new method that creates libraries directly by transposon
insertion
– The problem is library preparation, which needs a minimal amount of
input for success
From Schmidl et al., Nature Methods 2015
Lessons from ENCODE chromatin assays: human and
Drosophila data
• Massive deep-sequencing of multiple chromatin features in cell lines
(ENCODE), primary cell types and tissues (Epigenetics Roadmap)
• Histone H3 modifications: highlight on H3K4me1, H3K4me3, H3K27Ac,
H3K27me3
• Other chromatin proteins: e.g. P300 (acetyltransferase)
– H3K4me3 marks are enriched at active promoters
• H3K4me3 marks are largely the same in all cell lines, with a small fraction of marks
being cell-specific
– P300, and H3K4me1 without H3K4me3 is enriched at enhancers
• Most P300 peaks also contain H3K4 me1
• P300, H3K4me1 marks are highly cell-type specific
• Most P300 marks are enhancers, but not all enhancers have P300
• Most enhancers have an H3K4me1 mark but, not all H3K4me1 marks are in
enhancers
– Other marks: H3K27Ac or H3K27me3
• Mutually exclusive marks for open (Ac) versus closed (Me3) chromatin regions
• H3K27Ac is perhaps the most general mark of open chromatin: promoters and
enhancers
• Can be found in combination with H3K4me1/me3
Combinatorial marks define subclasses of enhancers
• H3K4me1+ , H3K27Ac + mark enhancers with highest levels of activity
– Represent cell-type specific active enhancers in differentiated cells
– Mouse enhancers: gain K27Ac upon differentiation in mouse ES cells, leading to higher
expression
• H3K4me1+, H3K27Ac- marks
– Called “intermediate” enhancers, linked to a variety of non-specific cellular functions
• In humans especially, K4me1+, K27me3+ are called “poised” enhancers,
– K27me3 is a mark of polycomb repression; polycomb proteins are also associated with
these sites
– K9me3+ marks also found at poised enhancers
– These enhancers are associated specifically with development-related functions;
K27me3 may be replaced by K27Ac as differentiation progresses
– Poised enhancers are more likely to be conserved between species, and therefore most
of the enhancers that have been tested so far are probably of this subclass
• Explains why H3K4me1 does not always find active enhancers (finds the
“poised’ ones too)
Back to the nucleus:
Distant regulatory elements interact with promoters (and
each other) through long-range chromatin loops
•
Regulatory elements are essentially
“docking sites” for specific types of DNAbinding proteins
– Transcription factors, TATA-binding factors, and
others
•
TF
•
•
Shear chromatin
(Sonication or
restriction enzyme)
These proteins serve to attract co-factors,
which then mediate protein: protein
interactions across chromatin loops
Very long range interactions are common
in vertebrates, less so in invertebrate
species with lower coding:nocoding ratios
ChIP with an antibody that binds to “E”
DNA will bring down “P” DNA as well
•
•
TF
•
•
Proteins are crosslinked very efficiently to each
other, as well as to DNA, by formaldehyde
treatment
When crosslinking is reverse the complex falls
apart, and Both DNA fragments are released
independently
Only one sequence binds to the TF!
Common issue in analysis of ChIP
Chromatin conformation capture methods can identify
these loop-linked sequences
•
•
Ends of the co-captured DNA
fragments are ligated while still
captured on the antibody-bead
with protein complex
DNA is released and can be
–
–
–
•
Issue include
–
–
•
From Wikipedia
Queried by PCR for enrichment of
suspected candidate interactors
Circularized and PCR amplified
using a primer from a “bait”
region (4C)
Directly sequenced for all X all
interactions (5C, Hi-C, Chia-PET)
random co-ligation between
fragments that are not truly
connected in the cell
Over-crosslinking, which may join
sequences that are nearby,
incorrectly
Provides a view of 3-D
chromatin architecture,
especially important for
mammalian cells