* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Discovering conserved DNA
Epitranscriptome wikipedia , lookup
Transposable element wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome (book) wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Primary transcript wikipedia , lookup
Genomic library wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Human genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Pathogenomics wikipedia , lookup
Human Genome Project wikipedia , lookup
Metagenomics wikipedia , lookup
Transcription factor wikipedia , lookup
Minimal genome wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215 Initial QC • • • • • FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations / Uniquely mapped reads • Good to keep one read / location in peak calling 2 Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome – Chromatin and sequencing bias – 200-300bp control windows have to few tags – But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008 Peak Call Statistics • • • • • P-value and FDR Simulation: random sampling of reads? FDR = A / B, BH correction or Qvalue P-value / FDR changes with sequencing depth FoldMAT: change does not Quality Control <1% enriched A B 4 ChIP-seq QC • Number of peaks with good FDR and fold change • FRiP score: – Fraction of reads in peaks – Often higher for histone modifications than transcription factors – Often increase slightly with increasing read depth • Overlap with union of peaks in public DNase-seq data – Working ChIP-seq peaks overlap > 70% of union DHS 5 DNase-seq • Captures all regulatory sequences in the prostate genome 6 Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012 6 ChIP-seq QC • Evolutionary conservation – Can be used for ChIP QC • Conserved sites more functional? – Majority of functional sites not conserved 7 Odom et al, Nat Genet 2007 Enrichment Distribution • CEAS (Shin et al, Bioinfo, 2009) – Meta-gene profiles: TF and histone marks – % of peaks at promoter, exons, introns, and distal intergenic sequences – SitePro of signal at specific sites • Replicate agreement: > 60% or > 0.6 8 ChIP-seq Downstream Analysis 9 Target Gene Assignment Yeast TF Regulatory Network Protein Transcribe Regulate Gene 10 Human TF Binding Distribution • • • • • • Most TF binding sites are outside promoters How to assign targets? Nearest distance? Binding within 10KB? Number of binding? Other knowledge? 11 Higher Order Chromatin Interactions Chromatin confirmation capture Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009 How to Assign Targets for Enhancer Binding Transcription Factors? • Regulatory potential: sum of binding sites weighted by distance to TSS with exponential decay • Decay modeled from Hi-C experiments TSS 14 Direct Target Identification • Binary decision? • Rank product of regulatory potential and differential expression • BETA 15 Is My Factor an Activator, Repressor, or Both? • Most labs have differential expression profiling of transcription factor together with TF ChIP-seq • Do genes with higher regulatory potential show more up- or down-expression than all the genes in the genome? 16 ChIP-chip/seq Motif Finding • ChIP-chip gives 10-5000 binding regions ~2001000bp long. Precise binding motif? – Raw data is like perfect clustering, plus enrichment values • MDscan – High ChIP ranking => true targets, contain more sites – Search TF motif from highest ranking targets first (high signal / background ratio) – Refine candidate motifs with all targets 17 Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT8-mer TGTAACGTmatched AGTAACGTmatched TGCAACATmatched TGACACGGmatched AATAACAGmatched 8 7 6 5 4 m-matches for TGTAACGT Pick a reasonable m to call two w-mers similar 18 MDscan Seeds Higher enrichment A 9-mer ATTGCAAAT TTGCAAATC TTTGCGAAT Seed motif pattern ChIP-chip selected upstream sequences TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC ATTGCAAAT TTTGCGAAT TTTGCAAAT TTTGCAAAT GCAAATCCA CAAATCCAA GCAAATTCG CAAATCCAA GCAAATCCA GAAATCCAC GGAAATCCA GGAAATCCT TGCAAATCC TGCAAATTC GCCACCGT ACCACCGT ACCACGGT GCCACGGC … 19 Update Motifs With Remaining Seqs Seed1 m-matches Extreme High Rank All ChIP-selected targets 20 Refine the Motifs Seed1 m-matches Extreme High Rank All ChIP-selected targets 21 Further Refine Motifs • Could also be used to examine known motif enrichment • Is motif enrichment correlated with ChIP-seq enrichment? • Is motif more enriched in peak summits than peak flanks? • Motif analysis could identify transcription factor partners of ChIP-seq factors 22 Estrogen Receptor • • • • Carroll et al, Cell 2005 Overactive in > 70% of breast cancers Where does it go in the genome? ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1 ER TF?? Estrogen Receptor (ER) Cistrome in Breast Cancer • • • • Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators NRIP ER AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • • • • Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators ER AP1 NRIP Cell Type-Specific Binding • Same TF bind to very different locations in different tissues and conditions, why? • TF concentration? • Collaborating factors, esp pioneering factors • Interesting observations about pioneering factors 26 Summary • ChIP-seq identifies genome-wide in vivo proteinDNA interaction sites • ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR • Functional analysis of ChIP-seq data: – Strong vs weak binding, conserved vs non-conserved – Target identification – Motif analysis • Cell type-specific binding Epigenetics 27