* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Walk-thru of CAGE exercise
Metagenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Human genome wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Minimal genome wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
History of genetic engineering wikipedia , lookup
RNA interference wikipedia , lookup
Genomic imprinting wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
Non-coding RNA wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
RNA silencing wikipedia , lookup
Gene desert wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene nomenclature wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome editing wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Primary transcript wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
Walk-thru of CAGE exercise • Also at http://people.binf.ku.dk/albin/teaching/htbinf /tag_analysis/ • …together with updated slides • And linked from web page Interlude: a logistics problem • The largest cDNA project so far made 102,000 cDNAs • If you publish, you need to be able to ship these to the people asking for it • This would take >50kg of dry ice! Expensive and a logistics nightmare since you need to keep track of the 102,000 tubes • How can we transfer DNA? RNA-seq • With a high-throughput tag sequencer, we can also do the brute force approach – fragment all mRNAs in a cell and sequence the pieces (or part of the pieces) • This is commonly referred to as RNA-seq Compared to SAGE, CAGE • Sequence the whole mRNA – not just the end or the start • Can give connectivity, so that we know what exons that are used, and what isoforms • Is actually bad at capturing 5’ and 3’ edges, due to statistical issues (white board demo) Typical protocol AAAAA AAAAA TTTTT AAAAA Isolate mRNA Break up mRNAs Make cDNAs of RNA fragments Add adapters, amplify and sequence We sequence 25-35 bp reads…randomly selected from each side of the fragment Mapping tags Challenge: What do we get (pros and cons) if we map the tags a) To the genome b) To the transcriptome (like all refseq transcripts) Genome: unbiased – we could hit any transcripts. Hard to hit spliced tags, and possibly mRNAs that get modified… Transcriptome: We hit annotated genes, and splice sites are not a problem. On the other hand, we cannot find new things Going from tags to wigs Showing all tags as blocks in the browser is possible, but dumb – because there are potentially thousands in the window of interest, and we go blind Easy way to summarize is to make nucleotide histograms – whiteboard demo Looking at RNA-seq data • At the tag _analysis web directoy, there is a wig file, mm9_brain.wig showing tags an RNA-seq experiment from mouse brains. Upload this to the browser and look at the two genes below – are they expressed, and how much? • Kcnc3 • Hoxa5 Thought challenge: from tags to expression • We have a wig file showing where all the tags match on the genome • We have the UCSC annotation for all known genes • We want something like a microarray, saying – Gene X has an expression of Y – How can we do this? (2 minutes with your sideman) “Naïve solution” • For each gene, count the tags that overlap it – Gene X has 45 tags – Gene Y has 4578 tags – Etc Problems with this? Length of transcripts will have an effect! • A long transcript gives more tags when broken up, and can be captured more easily • So, the number of tags from a transcript depends on – Actual expression (number of RNA molecules) – Length of the RNAs Normalizing for length – not that hard • For each gene, count the tags that overlap it, and divide by gene length – Gene X has 45/(length of x) tags – Gene Y has 4578(length of y) tags – Etc What if we want to compare two experiments? We also need to normalize for sample size, just as in SAGE, CAGE and ESTs • Recap: TPM is a normalization that remakes the tags count into what we would get if having exactly one million tags • …so, 10^6* (#tags in my gene)/(total tags) Combining the two • Normalize by gene length AND sample size • Gene X has an expression of – Z TPMs/(N) – Where N is the RNA length. Summary of tag technologies • ESTs: old, expensive, long tags. Biased to 5’and 3’ of genes. Can be used for exploration • SAGE: 3’ end tags. Only gene expression, no functional data. Limited for exploration • CAGE/5’SAGE: 5’ end tags. Promoter expression and location. Can be used for exploration • RNA-seq: “Random” tags over the whole mRNA. Expression and location – can be used for both expression and exploration