Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Essential gene wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

MicroRNA wikipedia , lookup

Copy-number variation wikipedia , lookup

RNA interference wikipedia , lookup

Gene desert wikipedia , lookup

Non-coding RNA wikipedia , lookup

Metagenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Oncogenomics wikipedia , lookup

Transposable element wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Messenger RNA wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

Genomic library wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Epitranscriptome wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Alternative splicing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Primary transcript wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Transcriptome analysis
Edouard Severing
Overview
• Introduction: Transcriptome complexity
• Transcriptome reconstruction
– Without a genome
– With a genome
• Transcript abundances
– Differential expression
• Transcript abundances models
– (Maximum likelihood)
Gene-expression/Phenotypes
What are the gene expression differences that underly these phenotypic differences?
Gene expression measured by assessing the abundance of mRNA molecules
Transcriptome vs. genome
Initial assumption
N
Protein coding
genes
N
mRNA
Molecules
N
Proteins
Assumption is based on studies that were performed on bacterial systems
Complexity and gene count
20.000 genes
25.000 genes
Transcriptome vs. genes
in eukaryotes
Current view
N
Protein coding
genes
XN
mRNA
Molecules
What happens here ?
?N
Proteins
Splicing
Pre-mRNA
5’-
5’-
-3’
Exon
Intron
Exon
Gene
Intron
Splicing
mRNA
5’-
Exon
Exon
Exon
-3’
Exon
-3’
Alternative splicing II
(Alternative splicing)
Pre-mRNA
5’-
-3’
-3’
5’-
5’-
-3’
Splicing
5’-
Splicing
-3’
5’-
-3’
Complexity and AS
90% genes have
AS
42% genes have
AS
The average number of transcripts produced by human genes is
also higher than the average number of transcripts produced by plant genes
Extremes
Dscam gene produces over 35,000 transcripts
AS type difference
In humans exon skipping is most frequent AS event type
In plants intron retention are the most common AS event
type
Humans
Plants
Exon skipping
Intron retention
RNA editing
(Base modification)
Primary transcript
(Predicted sequence)
5’- A
C
U
A
C
G
A
U - 3’
RNA-Editing
After editing
(Observed sequence)
5’- A
C
U
A
U
G
A
U - 3’
Difficulty: Distinguish genuine RNA-editing from sequencing errors
Translation or decay
• A large fraction (>30%) of transcripts of
protein coding genes are degraded by the
nonsense-mediated decay (NMD) pathway.
• The position of the stop codon is used to
predict whether a transcript is likely to be
degraded by the NMD pathway
NMD target prediction
Pre-mRNA
5’-
mRNA
-3’
5’-
-3’
Exon/Exon junctions
M
Open reading frame
Stop
5’-
-3’
d
Transcripts containing a Stop codon more than 55 nt upstream of the last exon/exon
junction are predicted to be targets for the NMD-pathway.
Remember
• The number of unique mRNA molecules is
much larger than the number of genes.
• A large fraction of the mRNA molecules is
degraded by the NMD pathway.
– NMD provides a means to regulate gene-expression at the
post-transcriptional level
Transcriptome analysis.
• Reconstruction of the expressed transcripts given the
sequencing data (Fragmented).
– Without a reference genome
• Trinity, TransABySS and Velvet
– With a reference genome
• Cufflinks, Scripture
• Determining the relative abundances of the predicted
transcripts (cufflinks)
• Differential analysis (cufflinks)
– Gene-expression
– Alternative splicing
Without genome I
Without genome II
With a genome
(Spliced alignment)
Genome
-3’
5’mRNA
With a genome
With Genome II
Assignment
Transcriptome reconstruction
Mapping of reads to the genome using tophat
Reconstruction of the transcriptome using cufflinks
Blast analysis of the assembly result
Your login
barshap
berryk
cizara
dennisv
dirkv
dunyac giorgiot
heleenw hildam
ioannism jitskel joelk
kamleshs leilas
luigif mushtar patricial
peterve roberte
seyeda taox
tristanj weic
xiaoxues
yanickh
allemaal hetzelfde pw:
wvdABcv12
Change password
• ssh <yourlogin>@137.224.100.201
• passwd
– Enter your password
– Change it to new password
– Type new password again
• Exit
Details
• ssh –X <yourlogin>@137.224.100.212
• cd /mnt/geninf15/work/bif_course_2012
• assignments are in assignment.txt
Estimating Expression levels
• Would be easy if only full length transcripts were
recovered.
• However, we have transcript fragments.
• Simply counting the number of reads mapping to a
gene or transcript is not good enough (Normalization is
needed)
• The number of fragments that can be produced from a
transcript not only depends its abundance but also its
length.
Expression levels
Number of reads mapped to a region
RPKM  10 x
Total reads x region length
9
FPKM is analogous to RPKM
One fragment
One read
Back to gene level expression (I)
Back to gene level expression (II)
Differential expression analysis
-A genes is differentially expressed under two conditions if its expression difference
is statistically significant. Larger that you would expect based random natural
variation
- In order to estimate the variance it is important to have experimental replicates .
(Variation between biological replicates is larger than that between technical
replicates).
Expression assignment
• Estimate the expression levels of predicted
transcripts / genes in Arabidopsis roots and
flower buds. (Cufflinks)
• Differential expression analysis of transcript
abundances in Arabidopsis roots and flower
buds (Cuffdiff)