Download Walk-thru of CAGE exercise

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

MicroRNA wikipedia , lookup

Metagenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Human genome wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Minimal genome wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

History of genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

RNA interference wikipedia , lookup

Genomic imprinting wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

RNA silencing wikipedia , lookup

Gene desert wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene nomenclature wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genome editing wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Walk-thru of CAGE exercise
• Also at
http://people.binf.ku.dk/albin/teaching/htbinf
/tag_analysis/
• …together with updated slides
• And linked from web page
Interlude: a logistics problem
• The largest cDNA project so far
made 102,000 cDNAs
• If you publish, you need to be
able to ship these to the
people asking for it
• This would take >50kg of dry
ice! Expensive and a logistics
nightmare since you need to
keep track of the 102,000
tubes
• How can we transfer DNA?
RNA-seq
• With a high-throughput tag sequencer, we can
also do the brute force approach – fragment
all mRNAs in a cell and sequence the pieces
(or part of the pieces)
• This is commonly referred to as RNA-seq
Compared to SAGE, CAGE
• Sequence the whole mRNA – not just the end
or the start
• Can give connectivity, so that we know what
exons that are used, and what isoforms
• Is actually bad at capturing 5’ and 3’ edges,
due to statistical issues (white board demo)
Typical protocol
AAAAA
AAAAA
TTTTT
AAAAA
Isolate mRNA
Break up mRNAs
Make cDNAs of RNA fragments
Add adapters, amplify
and sequence
We sequence 25-35 bp
reads…randomly selected from each
side of the fragment
Mapping tags
Challenge: What do we get (pros and cons) if we
map the tags
a) To the genome
b) To the transcriptome (like all refseq
transcripts)
Genome: unbiased – we could hit any
transcripts. Hard to hit spliced tags, and
possibly mRNAs that get modified…
Transcriptome: We hit annotated genes, and
splice sites are not a problem. On the other
hand, we cannot find new things
Going from tags to wigs
Showing all tags as blocks in the browser is
possible, but dumb – because there are
potentially thousands in the window of
interest, and we go blind
Easy way to summarize is to make nucleotide
histograms – whiteboard demo
Looking at RNA-seq data
• At the tag _analysis web directoy, there is a
wig file, mm9_brain.wig showing tags an
RNA-seq experiment from mouse brains.
Upload this to the browser and look at the
two genes below – are they expressed, and
how much?
• Kcnc3
• Hoxa5
Thought challenge: from tags to
expression
• We have a wig file showing where all the tags
match on the genome
• We have the UCSC annotation for all known
genes
• We want something like a microarray, saying
– Gene X has an expression of Y
– How can we do this? (2 minutes with your
sideman)
“Naïve solution”
• For each gene, count the tags that overlap it
– Gene X has 45 tags
– Gene Y has 4578 tags
– Etc
Problems with this?
Length of transcripts will have an
effect!
• A long transcript gives more tags when broken
up, and can be captured more easily
• So, the number of tags from a transcript
depends on
– Actual expression (number of RNA molecules)
– Length of the RNAs
Normalizing for length – not that hard
• For each gene, count the tags that overlap it,
and divide by gene length
– Gene X has 45/(length of x) tags
– Gene Y has 4578(length of y) tags
– Etc
What if we want to compare two experiments?
We also need to normalize for sample
size, just as in SAGE, CAGE and ESTs
• Recap: TPM is a normalization that remakes the
tags count into what we would get if having
exactly one million tags
• …so, 10^6* (#tags in my gene)/(total tags)
Combining the two
• Normalize by gene length AND sample size
• Gene X has an expression of
– Z TPMs/(N)
– Where N is the RNA length.
Summary of tag technologies
• ESTs: old, expensive, long tags. Biased to 5’and 3’ of genes. Can be used for
exploration
• SAGE: 3’ end tags. Only gene expression, no functional data. Limited for
exploration
• CAGE/5’SAGE: 5’ end tags. Promoter expression and location. Can be used
for exploration
• RNA-seq: “Random” tags over the whole mRNA. Expression and location –
can be used for both expression and exploration