* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Homework - The Fenyo Lab
Survey
Document related concepts
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Public health genomics wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
Transcript
ChIP-seq Tutorial The ChIP-seq tutorial typically uses the HPC cluster to process large data files. Requires a user account and a basic level of skill with Unix. The usual ChIP-seq informatics workflow is as follows: 1) 2) 3) 4) 5) 6) 7) Assess sequencing data quality (Illumina run-time stats, FastQC) for ChIP and control samples Align reads in FASTQ files to the reference genome with BWA (or Bowtie) Convert BAM output to sorted, indexed BAM files with SAMTools. Analyze ChIP and control sample with MACS Visualize BAM files with peaks on reference genome using IGV Annotate peaks using many different methods Compare/integrate peaks with other genome annotations in UCSC Genome Browser. For homework this week, we will only do steps 5-7. Indexed BAM file and MACS output (ChIP vs Control) are available in the ChIP-seq homework zip file. This is data from a human ChIP-seq experiment aligned to hg19 with a no-IP input DNA control. I will let you guess about the antibody used for the IP. A) Install IGV on your computer: http://www.broadinstitute.org/software/igv/download (you will need to register on the website, you may need to upgrade to 64-bit Java on your computer) B) Examine the tutorial_peaks.bed and tutorial_peaks.xls files a. How many peaks did MACS call in this data set? b. Most are on chromosome 1, why do you think a few on other chromosomes? C) Annotate the peaks to the nearest gene Transcription Start Site (TSS). Add this information to the tutorial_peaks.xls file and add your name to the name of this file. You can choose your own annotation method, but there are clearly too many peaks to do this by hand. Suggestions: a. Galaxy < https://usegalaxy.org/> : Upload the peaks.bed file. Download human hg19 genes plus 1KB of upstream sequence, find overlaps with the peaks.bed file (Operate on Genomic Intervals > Intersect) b. Cistrome server <http://cistrome.org/ap/root> : Integrative Analysis > Beta-minus c. BEDTools <http://bedtools.readthedocs.org/en/latest/content/tools/closest.html>: closestBed d. R/BioConductor ChIPpeakAnno: http://www.bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html [The problem with all of these methods is that they are imprecise – some peaks are close to two genes, some are not near any gene, some are in the middle of one and also close to the TSS of another, etc. Associations of a peak with a gene must be VALIDATED in some way.] D) Load the BED file and the BAM file into IGV, zoom in and take a screenshot of the reads around MACS_peak_10, 11, and 12. And then look at MACS_peak_13. Scroll left and right a few kb at a time. Why are there aligned reads all over the genome? Why do you think there is no peak called near chr1:1,033,817? What changes in the parameters do you recommend we make for MACS to reanalyze this data set? E) Load the peaks.bed file into UCSC Genome Browser as a custom track. <http://genome.ucsc.edu/cgi-bin/hgCustom?clade=mammal&org=Human&db=hg19> Take a screenshot of the peaks near gene NOC2L with the ENCODE Transcription Factor tracks turned on (and most of the other tracks hidden). Which ENCODE TF factors have peaks near MACS_Peak_12.