Download running ChIPMunk slide 13:14 - do I need ChIPHorde?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Using ChIPMunk for motif discovery
-quick-start guide-
slide 3:10 - preparing data
slide 11:12 - running ChIPMunk
slide 13:14 - do I need ChIPHorde?
"A short guide to breeding and
taming highly intelligent ChIPMunks"
Some basic questions
• Can I use ChIPMunk for the WHOLE PEAK
SEGMENTS from ChIP-Seq experiment?
– YES! But you will need to supply the “base coverage
profile” (also called as the “peak shape”).
• Should I cut short segments around ChIP-Seq
peak summits for ChIPMunk?
– NO! Use the whole peaks with the base coverage data
when possible.
• Want more details? Move to the next slides!
Prerequisites
• To use ChIPMunk motif discovery tool you need:
– Java runtime environment (JRE, also called as Java
Virtual Machine), use version no less than 1.5
• May be you already have Java, test it by typing
java –version
• Linux users: check your distro-specific package manager.
ChIPMunk will run under Oracle Java as well as under
OpenJDK.
• Windows users: go directly to java.com
! [NOTE] You do not need JDK (Java Development
Kit), only JRE/JVM.
Extracting ChIPMunk
• Let’s assume you have successfully got your
chipmunk_v?_binary.zip from the official ChIPMunk
website (see downloads section):
http://autosome.ru/ChIPMunk
– Unpack it to any suitable folder. You now should see
autosome directory. This is the ChIPMunk Java package
autosome.ru.
– Now you can run you ChIPMunk from the folder, that
contains the autosome package. For simplicity you may
wish to store the files with sequences just one level
upper of the ChIPMunk’s autosome folder.
! [NOTE] Do not try move anything outside of the
autosome folder. Your ChIPMunk should live
there.
Preparing your data: overview
• No prior information: simple multi-fasta, Simple
data set
• Some arbitrary weights or quality values assigned
for each sequence: multi-fasta with weights in
headers, Weighted data set
• Prior positional profile along each sequence:
multi-fasta with profiles in headers, Peak data set
• Peak and Weighted data sets can be useful not
only for ChIP-Seq data but for any kind of data set
where you have some quality rating or known
positional preferences.
Preparing your data: Simple data set
• The simplest case: you already have a number of
sequences to be used for motif discovery with
ChIPMunk. No additional information is
available.
– You should arrange a simplest multi-fasta file like
> header1
ACTGTGTGAAA
> header2
AGTGTGTGTGTG
! [NOTE] You can omit fasta headers since ChIPMunk
would simply skip them. Remember – this is Simple
data set.
Preparing your data: Weighted data set
• Let’s assume you have some prior information
like any quality rating or any prior measure of
presence/power of binding sites.
– You should arrange a simple multi-fasta file
specifying your arbitrary quality of each sequence in
fasta headers:
> 10.0
ACGGTGTAAAAA
> 2.0
GGTAGTGTCGTAGTG
! [NOTE] Your weights (quality values) should always
be positive. Never use negative or zero-quality.
Remember, this is Weighted data set.
Preparing your data: Peak data set
• If you have any prior profile information like
shape of ChIP-Seq peaks than you can provide a
profile in the fasta-header like:
> 1.0 2.0 3.0 2.0 1.5 2.0
AGTAAC
> 1.0 2.0 3.0 2.0 1.5
CAGTA
! [NOTE] The length of each profile should be equal
to the length of the corresponding sequence.
Remember, this is Peak (or Profiled) data set.
ChIP-Seq data: what to do
1. The best usage case: ChIP-Seq data with base
coverage (often provided in wiggle-files, .wig).
Extract peak heights for each position of each
sequence and generate the Peak multi-fasta.
2. Only peak height h and peak summit position is
known. You should manually generate triangle
profiles with triangle shape, having 0.0 height at
both ends of the sequence and h height at peak
summit position.
3. Only peak height h is known. Then use the
weighted data set specifying h as weight/quality.
! [NOTE] When available always use base coverage
information or generate triangle profiles. This is
extremely important for ChIPMunk performance.
Running ChIPMunk: specifying data set
• So, now you know the type of your sequence.mfa
dataset. It is either Simple (s:sequence.mfa),
weighted (w:sequence.mfa) or peak
(p:sequence.mfa).
• Remember to supply it to ChIPMunk like
p:sequence.mfa if your file is placed in your
current directory. You can specify the local path to
your file after p: if your file is located somewhere
else on your drive.
! [NOTE] We highly advise to use the peak data set if
possible.
Running ChIPMunk: default mode
java -Xms512M -Xmx1G autosome.ru.ChIPMunk
p:your_sequences_with_profiles.mfa > output.log
This will produce output.log with all
informative output and allow Oracle Java to use
from 512Mb to 1Gb of RAM.
! [NOTE] This will be the best way to search for
unknown motif and allow ChIPMunk automatically
use default parameter settings.
Running ChIPMunk: tweaking parameters
• The most obvious things you can tweak are: the
motif lengths range (from 7 to 22bp for example):
java -Xms512M -Xmx1G autosome.ru.ChIPMunk 7 22
yes 1.0 w:your_weighted_set.mfa
• The number of starting seeds, increasing the number
from default 100 will improve precision:
java -Xms512M -Xmx1G autosome.ru.ChIPMunk 7 22
yes 1.0 w:your_weighted_set.mfa 200
• Allow ChIPMunk to automatically estimate the
background model instead of predefined 0.5 GC%:
java autosome.ru.ChIPMunk 7 22 yes 1.0
p:peak_data.fasta 200 20 1 2 random local
! [NOTE] Don’t hesitate to consult with ChIPMunk manual
or to contact ivan-dot-kulakovskiy-at-gmail-dot-com.
There are many useful advanced options for ChIPMunk.
ChIPHorde extension: do I need it?
• You want to find the most significant motif in the set
(for example find a common motif for a given
transcription factor, TF)
– ChIPHorde? NO, ChIPMunk is enough.
• You want to check different motif lengths (like 10, 12
and 15 bps) and manually select the best motif.
– ChIPHorde? NO, run ChIPMunk several times with 10 to
10, 12 to 12 and 15 to 15 motif length ranges.
– OR YES, you can run ChIPHorde in its ‘dummy’ mode like:
java autosome.ru.ChIPHorde 10:10,12:12,15:15
dummy yes 1.0 w:your_weighted_sequence_set.mfa
! [NOTE] So, if you want to find the MOST SIGNIFICANT motif
for a dataset then you DO NOT NEED ChIPHorde extension.
But you can use it in dummy mode to check different lengths
and then manually select required motifs.
You need ChIPHorde if
• You suspect different distinct motifs for your TF. Use
‘filter’ mode (dropping sequences with motif hits
from the previous step):
java autosome.ru.ChIPHorde 7:21,7:21,7:21 filter yes
0.0 w:your_weighted_sequence_set.mfa
• You want to find potential cofactor TFs. Use ‘mask’
mode (masking good motif hits from the previous
step):
java autosome.ru.ChIPHorde 7:21,7:21,7:21 mask yes 0.0
w:your_weighted_sequence_set.mfa
The length range from 7 to 21bp is used to search for three
different motifs.
! [NOTE] ZOOPS factor (0.0 in this example) may
heavily affect results. Please consult the manual!
The length ranges are also important, especially in
‘mask’ mode.