Download Homework - The Fenyo Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Public health genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Pathogenomics wikipedia , lookup

Human genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic library wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

ENCODE wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
ChIP-seq Tutorial
The ChIP-seq tutorial typically uses the HPC cluster to process large data files. Requires a user account
and a basic level of skill with Unix.
The usual ChIP-seq informatics workflow is as follows:
1)
2)
3)
4)
5)
6)
7)
Assess sequencing data quality (Illumina run-time stats, FastQC) for ChIP and control samples
Align reads in FASTQ files to the reference genome with BWA (or Bowtie)
Convert BAM output to sorted, indexed BAM files with SAMTools.
Analyze ChIP and control sample with MACS
Visualize BAM files with peaks on reference genome using IGV
Annotate peaks using many different methods
Compare/integrate peaks with other genome annotations in UCSC Genome Browser.
For homework this week, we will only do steps 5-7. Indexed BAM file and MACS output (ChIP vs
Control) are available in the ChIP-seq homework zip file. This is data from a human ChIP-seq experiment
aligned to hg19 with a no-IP input DNA control. I will let you guess about the antibody used for the IP.
A) Install IGV on your computer:
http://www.broadinstitute.org/software/igv/download
(you will need to register on the website, you may need to upgrade to 64-bit Java on your
computer)
B) Examine the tutorial_peaks.bed and tutorial_peaks.xls files
a. How many peaks did MACS call in this data set?
b. Most are on chromosome 1, why do you think a few on other chromosomes?
C) Annotate the peaks to the nearest gene Transcription Start Site (TSS). Add this information to
the tutorial_peaks.xls file and add your name to the name of this file.
You can choose your own annotation method, but there are clearly too many peaks to do this by
hand. Suggestions:
a. Galaxy < https://usegalaxy.org/> : Upload the peaks.bed file. Download human hg19
genes plus 1KB of upstream sequence, find overlaps with the peaks.bed file (Operate on
Genomic Intervals > Intersect)
b. Cistrome server <http://cistrome.org/ap/root> : Integrative Analysis > Beta-minus
c. BEDTools <http://bedtools.readthedocs.org/en/latest/content/tools/closest.html>:
closestBed
d. R/BioConductor ChIPpeakAnno:
http://www.bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html
[The problem with all of these methods is that they are imprecise – some peaks are close to
two genes, some are not near any gene, some are in the middle of one and also close to the
TSS of another, etc. Associations of a peak with a gene must be VALIDATED in some way.]
D) Load the BED file and the BAM file into IGV, zoom in and take a screenshot of the reads around
MACS_peak_10, 11, and 12. And then look at MACS_peak_13. Scroll left and right a few kb at
a time. Why are there aligned reads all over the genome? Why do you think there is no peak
called near chr1:1,033,817? What changes in the parameters do you recommend we make for
MACS to reanalyze this data set?
E) Load the peaks.bed file into UCSC Genome Browser as a custom track.
<http://genome.ucsc.edu/cgi-bin/hgCustom?clade=mammal&org=Human&db=hg19>
Take a screenshot of the peaks near gene NOC2L with the ENCODE Transcription Factor tracks
turned on (and most of the other tracks hidden). Which ENCODE TF factors have peaks near
MACS_Peak_12.