Download 12:51, 17 October 2012

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Exploring the package
TopHat-CuffDiff
Jean-François Taly
Bioinformatics Core Facilities
Group meeting
October 2nd 2012
1
RNAseq expression data analysis
1. TopHat for mapping reads to the reference
– Reads directionality
2. CuffDiff for the differential enrichment
– Statistics with version 2.0.0 or 2.0.1
3. Enrichment threshold
– Which transcripts are present in mitochondria?
2
MitomiR project
miRNP
?
Regulation on mitochondrial
translation
miRNAs
miRNAs
mRNAs
PNPASE
Mito
proteins
Question 1 : Are Nuclear DNA-encoded miRNAs imported to mitochondria ?
Slide from
MitomiR_3 EU0183
MitomiR project
miRNP
?
miRNAs
Regulation on
mitochondrial
translation
miRNAs
proteins
mRNAs
Question 2 : Do miRNAs exist in the mitochondrial genome?
Slide from
4
MitomiR_ EU0183
One cell, two DNAs
Nucleus
-23 chromosome pairs
-human DNA : 2.9 billion DNA base pairs
-20,000 and 25,000 human protein-coding genes
-»Junk » DNA or non coding DNA
-Noncoding functional RNA (tRNA, rRNA,miRNA…)
The human genome may encode over 1000
miRNAs, which may target about 60% of
mammalian gene
Mitochondria
-Circular DNA
-human (ADNmt) mitochondria genome = 16.6 kb
-13 for subunits of respiratory complexes I, III, IV
and V
-22 for mitochondrial tRNA
-2 for rRNA
*One mitochondrion can contain two to ten copies of its DNA
* Exceptions to the universal genetic code (UGC) in mitochondria
From Lung et al. , 2006
MitomiR_5 EU0183
RNAseq libraries
• Short insert size: searching for miRNAs
–
–
–
–
No poly-A selection
No fragmentation
Size selected: 18-36 nt
stranded
• Long insert size: searching for lncRNAs
–
–
–
–
No poly-A selection
Fragmented
Size selected: 200 nt
stranded
6
2 Conditions
• Total fraction (tot)
– Full cell lysate
• Mitochondrial fraction (mit)
– RNA extracted from mitochondria
7
RNAseq expression data analysis
1. TopHat for mapping reads to the reference
– Reads directionality
2. CuffDiff for the differential enrichment
– Statistics with version 2.0.0 or 2.0.1
3. Enrichment threshold
– Which transcripts are present in mitochondria?
8
Stranded RNAseq: Vocabulary
5’
3’
coding
Forward
Reverse
coding
3’
5’
Forward = 5’ end the closest from centromer in Human
50% of the genes are coding in the forward strand
Forward / Reverse = Plus / Minus
Coding / Template = Sense / Anti-sense
http://www.biostars.org/post/show/3423/forward-and-reverse-strand-conventions/
9
Orientation of reads?
5’
3’
coding DNA
template DNA
3’
5’
Transcription
5’
3’
RNA
Reverse-transcription
5’
3’
RNA
cDNA
3’
5’
First strand sequencing
dUTP, NSR, NNSR
Duplication
Second strand sequencing
Directional Illumina (Ligation)
Standard SOLiD
5’
3’
coding DNA
cDNA
3’
5’
10
Proper TopHat option?
--library-type :
• fr-unstranded: Default, Standard Illumina Reads
• fr-firststrand: dUTP, NSR, NNSR
• fr-secondstrand: Directional Illumina (Ligation),
Standard SOLiD
We mapped the reads using the unstranded and the
secondstrand for comparisons
11
How can we evaluate directionality?
• Reads mapping in the F strand should be
aligned with genes coding in F as well.
• Bitwise FLAG of the BAM file:
– How many reads in forward?
samtools view -c -F 16 accepted_hits.bam
– How many reads in reverse?
samtools view -c -f 16 accepted_hits.bam
Total number of reads
Percentage of Forward
Mapping (PFM)
--library-type
fr-secondstrand
173,219,584
55%
default
173,196,005
55%
12
How can we evaluate directionality?
(2)
• Gene by gene
default
--library-type
frsecondstrand
– Bitwise FLAG + gene strand annotation
Transcripts in the
(+) strand
Transcripts in the
(-) strand
Transcripts in
both strands
Number of
transcripts
82,782
80,648
163,430
Average PFM
77%
24%
51%
Median PFM
92%
1%
55%
Number of
transcripts
82,868
80,693
163,561
Average PFM
77%
24%
51%
Median PFM
92%
1%
54%
A small number of genes received a huge amount of miss-mapped reads!
13
Example of miss-aligned reads
• AC097532.1: chr2:133038647-133038738
– miRNA automatically annotated in E67 but retired
from E68;
– CIGAR string of some reads is 26kb long;
– 11,000,115 reads mapped (6% of total);
– 8,205,667 mapped to the position 133,038,644;
– NCBI blast of the major sequence:
• hit on the opposite strand but with 100% coverage and
100% identity to the 28S ribosomal RNA.
14
RNAseq expression data analysis
1. TopHat for mapping reads to the reference
– Reads directionality
2. CuffDiff for the differential enrichment
– Statistics with version 2.0.0 or 2.0.1
3. Enrichment threshold
– Which transcripts are present in mitochondria?
15
CuffDiff needs a special GTF
• CuffDiff needs a GTF with the 2 following tags:
– tss_id: The ID of this transcript's inferred start site.
– p_id: The ID of the coding sequence this transcript
contains.
• You can produce a compatible GTF with
CuffCompare:
cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf
16
CuffCompare + CuffDiff V2.0.2
CuffCompare + CuffDiff V2.0.2
Effect of CuffCompare
CuffDiff V2.0.2
CuffDiff V2.0.2
17
CuffDiff V2.0.2
CuffDiff V2.0.2
Effect of CuffDiff Version
CuffDiff V2.0.1
CuffDiff V2.0.1
18
Highly sensible statistics
Reproducibility?
Version effect?
CuffCompare effect?
Genome annotation effect?
From 902 differentialy expressed genes with V2.0.1, we went to 15 with v2.0.2!!!
19
RNAseq expression data analysis
1. TopHat for mapping reads to the reference
– Reads directionality
2. CuffDiff for the differential enrichment
– Statistics with version 2.0.0 or 2.0.1
3. Enrichment threshold
– Which transcripts are present in mitochondria?
20
Expression data reflects expectations
qPCR(tot)/qP
CR(mit)
21-07-2011
qPCR
29-07-2011
RNA seq
ShortIS
RNA seq
LongIS
Ensembl Ids
Gene
Length
shortest
ENSG00000198899
MT-ATP6
681
0.600
0.500
-
0.18
ENSG00000198840
MT-ND3
346
0.400
0.400
-
0.21
ENSG00000111640
GAPDH
390
416.000
362.000
-
7.1
ENSG00000089157
RLP0
402
611.000
446.000
-
8.6
Statistics may not be trustable but the fold change is!
 Define an enrichment threshold based on log2(FPKMtot/FPKMmit)
 Cytosol
 Vincinity of mitochodria
 Mitochondrial genes
21
Compartimented genes
• Cytosolic genes:
– UniProt: experimentaly observed in cytosol
– Ensembl: no automatic annotations
• Vincinity of mitochondria:
– Paper from Kang et al. 2012
• Mitochondrial genes
– The 37 genes in the chromosome
22
Log2(Fold Change) distributions
for the long insert library
23
Summary
SortIS
DE
Mean
DE
Median
SeqNumb
LongIS
DE
Mean
DE
Median
SeqNumb
All
Cyt
Ensembl67
Cyt
UniProt
Mitochondrial
Kang2012
VicinityMit
1.7
0.41
-
-0.6
-
2.05
0.46
-
-0.65
-
2117
9
0
22
0
0.46
1.05
0.9
-2.21
1.94
0.5
1.14
0.96
-2.27
2.2
21030
1664
127
34
13
24
Significantly enriched genes
Method
Short Insert
Long Insert
CuffDiff V2.0.1
988
908
Threshold
309
714
Intersection
22
99
25
Back Up slides
26
Mithochondrial genome
27
Mithochondrial genome – first 3 genes
28
Short
Ensembl Ids
ENSG00000198695
ENSG00000198712
ENSG00000198727
ENSG00000198763
ENSG00000198786
ENSG00000198804
ENSG00000198840
ENSG00000198886
ENSG00000198888
ENSG00000198899
ENSG00000198938
ENSG00000209082
ENSG00000210049
ENSG00000210077
ENSG00000210082
ENSG00000210100
ENSG00000210107
ENSG00000210112
ENSG00000210117
ENSG00000210127
ENSG00000210135
ENSG00000210140
ENSG00000210144
ENSG00000210151
ENSG00000210154
ENSG00000210156
ENSG00000210164
ENSG00000210174
ENSG00000210176
ENSG00000210184
ENSG00000210191
ENSG00000210194
ENSG00000210195
ENSG00000210196
ENSG00000211459
ENSG00000212907
ENSG00000228253
Gene
MT-ND6
MT-CO2
MT-CYB
MT-ND2
MT-ND5
MT-CO1
MT-ND3
MT-ND4
MT-ND1
MT-ATP6
MT-CO3
J01415.1
J01415.2
J01415.3
J01415.4
J01415.5
J01415.6
J01415.7
J01415.8
J01415.9
J01415.10
J01415.11
J01415.12
J01415.13
J01415.14
J01415.15
J01415.16
J01415.17
J01415.18
J01415.19
J01415.20
J01415.21
J01415.22
J01415.23
J01415.24
MT-ND4L
J01415.25
Length
525
684
1141
1042
1812
1542
346
1378
956
681
784
75
71
69
1559
69
72
68
68
69
73
66
66
69
68
70
68
65
69
59
71
69
66
68
954
297
207
Long
FPKM mit
FPKM tot
log2(tot/mit)
FPKM mit
FPKM tot
log2(tot/mit)
81
459
159
172
129
154
226
166
150
94
270
39041
179164
96298
1546
10163
75946
171524
11418
1932
20509
12550
9804
5078
5943
28619
5627
7569
43092
1175590
67641
157602
71836
45761
943
412
735
23
169
144
59
58
58
66
56
92
26
269
34034
80467
67810
642
12512
35617
97116
7479
1427
12667
7616
5234
1809
3392
32650
3232
10780
28770
395027
36817
115972
77279
30983
583
141
160
-1.81
-1.44
-0.15
-1.53
-1.15
-1.42
-1.77
-1.56
-0.71
-1.83
-0.01
-0.20
-1.15
-0.51
-1.27
0.30
-1.09
-0.82
-0.61
-0.44
-0.70
-0.72
-0.91
-1.49
-0.81
0.19
-0.80
0.51
-0.58
-1.57
-0.88
-0.44
0.11
-0.56
-0.69
-1.54
-2.20
1820
4063
2332
1559
2153
4186
2890
3400
1183
2357
2037
56409
257938
2524440
HIDATA
63087
2191
67897
7944
13615
1864
77355
74448
NOTEST
1800
1734
5572
11149
150713
735380
70081
603010
19871
121678
HIDATA
9230
36590
377
764
504
285
437
766
610
698
233
431
401
9045
55524
682409
27286
11058
455
22503
2424
3971
196
13629
11999
NOTEST
760
345
1972
4206
34863
208681
14281
124182
4777
15826
29151
1991
8531
-2.27
-2.41
-2.21
-2.45
-2.30
-2.45
-2.24
-2.28
-2.35
-2.45
-2.34
-2.64
-2.22
-1.89
0.00
-2.51
-2.27
-1.59
-1.71
-1.78
-3.25
-2.50
-2.63
NOTEST
-1.24
-2.33
-1.50
-1.41
-2.11
-1.82
-2.29
-2.28
-2.06
-2.94
0.00
-2.21
-2.1029
Ensembl Ids
ENSG00000198695
ENSG00000198712
ENSG00000198727
ENSG00000198763
ENSG00000198786
ENSG00000198804
ENSG00000198840
ENSG00000198886
ENSG00000198888
ENSG00000198899
ENSG00000198938
ENSG00000209082
ENSG00000210049
ENSG00000210077
ENSG00000210082
ENSG00000210100
ENSG00000210107
ENSG00000210112
ENSG00000210117
ENSG00000210127
ENSG00000210135
ENSG00000210140
ENSG00000210144
ENSG00000210151
ENSG00000210154
ENSG00000210156
ENSG00000210164
ENSG00000210174
ENSG00000210176
ENSG00000210184
ENSG00000210191
ENSG00000210194
ENSG00000210195
ENSG00000210196
ENSG00000211459
ENSG00000212907
ENSG00000228253
Gene
MT-ND6
MT-CO2
MT-CYB
MT-ND2
MT-ND5
MT-CO1
MT-ND3
MT-ND4
MT-ND1
MT-ATP6
MT-CO3
J01415.1
J01415.2
J01415.3
J01415.4
J01415.5
J01415.6
J01415.7
J01415.8
J01415.9
J01415.10
J01415.11
J01415.12
J01415.13
J01415.14
J01415.15
J01415.16
J01415.17
J01415.18
J01415.19
J01415.20
J01415.21
J01415.22
J01415.23
J01415.24
MT-ND4L
J01415.25
Type
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
protein_coding
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_rRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_tRNA
Mt_rRNA
protein_coding
protein_coding
Status
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
KNOWN
NOVEL
NOVEL
NOVEL
KNOWN
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
KNOWN
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
NOVEL
KNOWN
NOVEL
NOVEL
KNOWN
KNOWN
KNOWN
Level
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3 30
3
Cellular metabolism regulation
(E2C slide)
Glucose
Glycolysis
O2
2 ATP
Pyruvate
Glucose
Mitochondrial dysfunction
Glycolysis
Aminoacids
nucleotides
Differentiation
2 ATP
Pyruvate
Lactate
CO2
OXPHOS
Lactate
36 ATP
Warburg effect
Proliferative cells
Undifferentiated cells
Biosynthesis efficiency
Working cells
Differentiated cells
Energetic efficiency
MCF7
MCF7 is a breast
cancer cell line able
to grow in OXPHOS
conditions
Cells grown in different metabolic condition might represent a
unique way to distinguish RNA subpopulation expressed in
mitochondria (ncRNA and … miRNA?)
31
Slide from
Experimental design
OXPHOS 0mM glucose
Low Glucose
High Glucose
Stable MCF-7 cell lines
J0
MCF7
oxphos
MCF7
MCF7
oxphos
MCF7
oxphos
SHIFTS!!!
OXPHOS
HIGH Glucose
J1
MCF7
oxphos
MCF7
Oxphos shift to
High Gluc
Total cells and mito
extraction
TLDA
RNA-seq
MCF7
High Gluc
AGB:CH3854
ATCC:HTB-22
MCF7
High Gluc
HIGH Glucose
Min 3 weeks
Stable cell lines
MCF7
High Gluc
MCF7
High Gluc shit
to OXPHOS
MCF7
High Gluc
OXPHOS
Total cells and mito
extraction
N= 3 to 4 independent batches
TLDA = Microfluidic miRNA qPCR
32
Exon
Exon 1
Exon2
33
Related documents