* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Essential gene wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Copy-number variation wikipedia , lookup
RNA interference wikipedia , lookup
Gene desert wikipedia , lookup
Non-coding RNA wikipedia , lookup
Metagenomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Oncogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Messenger RNA wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Genomic library wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epitranscriptome wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Alternative splicing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome editing wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Primary transcript wikipedia , lookup
Transcriptome analysis Edouard Severing Overview • Introduction: Transcriptome complexity • Transcriptome reconstruction – Without a genome – With a genome • Transcript abundances – Differential expression • Transcript abundances models – (Maximum likelihood) Gene-expression/Phenotypes What are the gene expression differences that underly these phenotypic differences? Gene expression measured by assessing the abundance of mRNA molecules Transcriptome vs. genome Initial assumption N Protein coding genes N mRNA Molecules N Proteins Assumption is based on studies that were performed on bacterial systems Complexity and gene count 20.000 genes 25.000 genes Transcriptome vs. genes in eukaryotes Current view N Protein coding genes XN mRNA Molecules What happens here ? ?N Proteins Splicing Pre-mRNA 5’- 5’- -3’ Exon Intron Exon Gene Intron Splicing mRNA 5’- Exon Exon Exon -3’ Exon -3’ Alternative splicing II (Alternative splicing) Pre-mRNA 5’- -3’ -3’ 5’- 5’- -3’ Splicing 5’- Splicing -3’ 5’- -3’ Complexity and AS 90% genes have AS 42% genes have AS The average number of transcripts produced by human genes is also higher than the average number of transcripts produced by plant genes Extremes Dscam gene produces over 35,000 transcripts AS type difference In humans exon skipping is most frequent AS event type In plants intron retention are the most common AS event type Humans Plants Exon skipping Intron retention RNA editing (Base modification) Primary transcript (Predicted sequence) 5’- A C U A C G A U - 3’ RNA-Editing After editing (Observed sequence) 5’- A C U A U G A U - 3’ Difficulty: Distinguish genuine RNA-editing from sequencing errors Translation or decay • A large fraction (>30%) of transcripts of protein coding genes are degraded by the nonsense-mediated decay (NMD) pathway. • The position of the stop codon is used to predict whether a transcript is likely to be degraded by the NMD pathway NMD target prediction Pre-mRNA 5’- mRNA -3’ 5’- -3’ Exon/Exon junctions M Open reading frame Stop 5’- -3’ d Transcripts containing a Stop codon more than 55 nt upstream of the last exon/exon junction are predicted to be targets for the NMD-pathway. Remember • The number of unique mRNA molecules is much larger than the number of genes. • A large fraction of the mRNA molecules is degraded by the NMD pathway. – NMD provides a means to regulate gene-expression at the post-transcriptional level Transcriptome analysis. • Reconstruction of the expressed transcripts given the sequencing data (Fragmented). – Without a reference genome • Trinity, TransABySS and Velvet – With a reference genome • Cufflinks, Scripture • Determining the relative abundances of the predicted transcripts (cufflinks) • Differential analysis (cufflinks) – Gene-expression – Alternative splicing Without genome I Without genome II With a genome (Spliced alignment) Genome -3’ 5’mRNA With a genome With Genome II Assignment Transcriptome reconstruction Mapping of reads to the genome using tophat Reconstruction of the transcriptome using cufflinks Blast analysis of the assembly result Your login barshap berryk cizara dennisv dirkv dunyac giorgiot heleenw hildam ioannism jitskel joelk kamleshs leilas luigif mushtar patricial peterve roberte seyeda taox tristanj weic xiaoxues yanickh allemaal hetzelfde pw: wvdABcv12 Change password • ssh <yourlogin>@137.224.100.201 • passwd – Enter your password – Change it to new password – Type new password again • Exit Details • ssh –X <yourlogin>@137.224.100.212 • cd /mnt/geninf15/work/bif_course_2012 • assignments are in assignment.txt Estimating Expression levels • Would be easy if only full length transcripts were recovered. • However, we have transcript fragments. • Simply counting the number of reads mapping to a gene or transcript is not good enough (Normalization is needed) • The number of fragments that can be produced from a transcript not only depends its abundance but also its length. Expression levels Number of reads mapped to a region RPKM 10 x Total reads x region length 9 FPKM is analogous to RPKM One fragment One read Back to gene level expression (I) Back to gene level expression (II) Differential expression analysis -A genes is differentially expressed under two conditions if its expression difference is statistically significant. Larger that you would expect based random natural variation - In order to estimate the variance it is important to have experimental replicates . (Variation between biological replicates is larger than that between technical replicates). Expression assignment • Estimate the expression levels of predicted transcripts / genes in Arabidopsis roots and flower buds. (Cufflinks) • Differential expression analysis of transcript abundances in Arabidopsis roots and flower buds (Cuffdiff)