* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Identification and analysis of functional elements in 1% of the human
Survey
Document related concepts
Exome sequencing wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
DNA sequencing wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Molecular cloning wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Molecular evolution wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Transcript
Genome-Wide Mapping of in Vivo Protein-DNA Interactions Johnson et al (Science 2007) Presented by Leo J. Lee Mar. 19, 2008 CSC 2417 1 Outline • Background on ChIP based methods to study protein-DNA interactions • Salient features of ChIPSeq • Overview of the experimental protocol • Data analysis pipeline used in the paper • Important biological findings/contributions • General discussions Mar. 19, 2008 CSC 2417 2 Protein-DNA interaction • DNA is the information carrier of almost all living organisms. • Protein is the major building block of life. • Interaction between DNA and protein play vital roles in the development and normal function of living organisms, and disease if something goes wrong. • An important mechanism of protein-DNA interaction is via direct binding, i.e., a protein binds to a particular fragment of the DNA. Mar. 19, 2008 CSC 2417 3 Chromatin Immunoprecipitation (ChIP) • ChIP is a method to investigate protein-DNA interaction in vivo. • The output of ChIP is enriched fragments of DNA that were bound by a particular protein. • The identity of DNA fragments need to be further determined by a second method. Mar. 19, 2008 CSC 2417 4 ChIP-chip (or ChIP-on-chip) • ChIP-chip uses microarray technology to determine the identity of DNA fragments produced by ChIP. • Typically a control sample (genomic DNA without going through ChIP) is used to properly define relative enrichment of specific sequences in the ChIP DNA. • It is the dominant high-throughput technique before the arrival of ChIPSeq. Mar. 19, 2008 CSC 2417 5 ChIPSeq Workflow ChIP Size Selection (200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome Mar. 19, 2008 CSC 2417 6 ChIPSeq vs. ChIP-chip • The experimental design of ChIPSeq is considerably simpler. • ChIPSeq typically can achieve higher genomic coverage than ChIP-chip (also depends on read length vs. probe length). • The data from ChIPSeq is arguably cleaner and easier to process. • Costs are comparable (?). Mar. 19, 2008 CSC 2417 7 Nice things about NRSF (REST) • Considerable knowledge on NRSF has been accumulated from previous studies, which provides a set of true positives and negatives. • Yet there is still room to make new discoveries, as illustrated in the paper. • The DNA motif bound by NRSF (called NRSE) is long and well-specified. • There is a high-quality antibody that recognizes NRSF efficiently. Mar. 19, 2008 CSC 2417 8 ChIPSeq Workflow ChIP Size Selection (200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome Mar. 19, 2008 CSC 2417 9 Sequence Mapping & Filtering • Only sequence reads mapped to a unique position on the human genome are kept (about 50%). • Two mismatches were allowed to accommodate polymorphism (and sequencing error). • The resulting sequence read distributions are processed by a peak locator algorithm to find the local concentration of sequence hits and its peak. • A minimum five fold enrichment over the control sampled is required. Mar. 19, 2008 CSC 2417 10 ChIPSeq Peak Locator Algorithm • Merge enriched regions within 500bp of one another. • Apply a triangular 5-point smoothing and identify the peak as the coordinate with the greatest number of overlapping reads. Mar. 19, 2008 CSC 2417 11 Selecting a read count threshold • A ROC curve was obtained by analyzing true positives and negatives. • A sequence read threshold of 13 was selected to reach 98% specificity and 87% sensitivity. Mar. 19, 2008 CSC 2417 12 Precision of ChIPSeq • Evaluated against the center of high-scoring canonical NRSE motifs. • 94% of these strong motifs fall within 50bp of the called experimental peak. Mar. 19, 2008 CSC 2417 13 Comprehensiveness of ChIPSeq • Virtually all strong canonical NRSE motif instances are detectably occupied. • Most of the sites previously studies by transfection analysis are also detected. Mar. 19, 2008 CSC 2417 14 Motif Visualization Mar. 19, 2008 CSC 2417 15 Motif Discovery • Two new kinds of motifs are discovered: – A noncanonical motif with variable spacing between the left and right half sites of the canonical motif – Half-site motifs • The enrichment of both kinds of motifs are highly statistically significant. • The authors are able to tell a nice evolutionary story about them. Mar. 19, 2008 CSC 2417 16 GO enrichment analysis • As expected, NRSF-bound loci are highly enriched in gene ontology (GO) terms related to neurons and their development. • A group of genes encoding transcription factors that are critical in driving islet cell development in pancreas are newly discovered. • Sequence counts for this group are modest but comfortably above the threshold of 13. • The authors are able to provide strong arguments on the significance of this discovery. Mar. 19, 2008 CSC 2417 17 Discussions What makes this a Science paper? Mar. 19, 2008 CSC 2417 18