Download No evidence for viral sequences in lepidic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cre-Lox recombination wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Epitranscriptome wikipedia , lookup

Molecular evolution wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene expression wikipedia , lookup

RNA wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Non-coding RNA wikipedia , lookup

Community fingerprinting wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
1
Supplemental Methods
2
3
Preparation of nucleic acid (DNA/RNA) and sequencing.
4
Ten cryostat sections (20-µm) of each tumor sample were submitted to nucleic acid
5
extraction. Presence of at least 50% tumor cells was verified in a light microscope using
6
standard HES-staining. They were initially grinded with the cell lyser and the nucleic
7
acids were extracted with the RNAeasy microkit according to the manufacturer’s
8
instructions (Qiagen GmbH, Hilden, Germany). The extracted RNA was treated with
9
TurboDNase (Invitrogen Inc., Carlsbad, CA) and then retrotranscribed into cDNA using
10
SuperScript III reverse transcriptase (Invitrogen Inc., Carlsbad, CA) and random
11
hexamer primers. The cDNA was amplified based on Phi29 polymerase as previously
12
described [16]. The different extracted DNAs were directly sequenced without prior
13
amplification whereas the extracted RNAs were amplified after a depletion step
14
intended to eliminate the maximum of ribosomal RNA. The Illumina Sequencing using
15
HiSeq 2000 was conducted with a mean depth per sample of 1.5×108 single reads of 100
16
nucleotides (nt) size. For the ATLL sample, the DNA and the RNA runs produces 1.54
17
and 1.71x108 reads respectively. For the PEL sample, the DNA and the RNA runs
18
produces 1.31 and 1.44x108 reads respectively whereas a mean depth per AIS sample of
19
1.8.x108 (range 1.5–1.9×108) single reads of 100 nucleotides (nt) size was obtained.
20
21
Pilot Study.
22
The
23
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) then the sequences were
24
selected and trimmed according to their quality scores with cutadapt (web site :
25
https://code.google.com/p/cutadapt/). The human genome was filtered by mapping the
26
reads on the Homo sapiens hg19 reference with bowtie2 using the "sensitive" flag
27
option [17]. For the RNA runs, the percentage of reads mapped on the rRNA was 8.6%
28
for HTLV1 sample and 25% for the HHV8 sample. Bowtie2 was used on the remaining
29
reads to search there similarity with the two reference genomes (HTLV1 Acc J02029 and
30
HHV8 Acc AF148805) then the reads were assembled with SPAdes [18] to produce 4 set
31
of contigs.
quality
of
the
reads
was
assessed
by
FastQC
(web
site:
32
1
33
Patient study.
34
The quality of the reads was assessed by FastQC then the sequences were selected and
35
trimmed according to their quality scores with cutadapt. The human genome was
36
filtered by mapping the reads on the Homo sapiens hg19 reference with bowtie2 using
37
the "sensitive" flag option. This host filtering step eliminated an average of 99.5% reads
38
per sample (range 99.4-99.7%). The remaining reads of the two control samples were
39
assembled with SPAdes [18] to produce a set of control-contigs (43714 contigs with
40
5.7X106 bases). For the five other samples, the remaining sequences were further
41
filtered by mapping against the control-contigs using bowtie2 ("sensitive" parameters).
42
This second filtering step eliminated an average of 76% reads per sample (range 61-
43
91%).
44
At the end of the filtering steps, each sample consisted of a set of sequences containing
45
between 0.3 and 2.2x105 reads. These groups of reads were assembled individually and
46
mixed together by SPAdes assemblers [18]. The mixed assembly produce by SPAdes was
47
composed of 596 contigs with an average length of 312 bases (maximum length = 2982
48
bases). For all contigs and singlets an attempt at taxonomic assignment has been made
49
by similarity search with sequences of EMBL database (STD section) using BLASTN [19]
50
and with sequences of Uniprot database using BLASTX with an E-value equal to 1E-3. All
51
the possible assignments were counted with a weight equivalent to the number of reads
52
of
53
(taxoptimizer/rankoptimizer) and explored using the krona visualization system [20].
each
contig.
The
results
were
sorted
by
homemade
software
54
55
56
57
58
59
60
61
62
63
64
65
2
66
Supplemental figures.
67
68
69
70
71
Figure S1. Results of the control processes for the RNA library from the HHV8 sample (pilot study). A:
Annotation of the HHV8 genome (Acc AF148805). B1: Bowtie2 mapping of the RNA library, reads used in
sens are drawn in red ; reads used in anti-sens are drawn in green. The most expressed genes are noted
(mainly K proteins). B2 : zoom of the same mapping. C: blastn mapping of the contigs built with the RNA
library.
72
73
3
74
75
76
77
78
Figure S2. Results of the control processes for the DNA library from the HHV8 sample (pilot study). A:
Annotation of the HHV8 genome (Acc AF148805). B: Bowtie2 mapping of the DNA library, reads used in
sense are drawn in red ; reads used in anti-sense are drawn in green. C: blastn mapping of the contigs built
with the DNA library.
79
80
4
81
82
83
84
Figure S3. Results of the control processes for the RNA library from the HTLV1 sample (pilot study). A:
Annotation of the HTLV1 genome (Acc AJ02029). B: Bowtie2 mapping of the RNA library, reads used in
sense are drawn in red ; reads used in anti-sense are drawn in green. C: blastn mapping of the contigs built
with the RNA library.
85
86
87
5
88
89
90
91
Figure S4. Results of the control processes for the DNA library from the HTLV1 sample (pilot study). A:
Annotation of the HTLV1 genome (Acc AJ02029). B: Bowtie2 mapping of the DNA library, reads used in
sense are drawn in red ; reads used in anti-sense are drawn in green. C: blastn mapping of the contigs built
with the DNA library.
92
93
94
6
95
96
Figure S5. Krona representation of the taxonomies incidence from the RNA library of
the HHV8 sample.
97
98
99
7
100
101
Figure S6. Krona representation of the taxonomies incidence from the DNA library of
the HHV8 sample.
102
103
104
8
105
106
107
108
Figure S7. Krona representation of the taxonomies incidence from the RNA library of
the HTLV1 sample.
109
110
9
111
112
Figure S8. Krona representation of the taxonomies incidence for the branch "Viruses"
from the RNA library of the HTLV1 sample.
113
114
115
10
116
117
Figure S9. Krona representation of the taxonomies incidence from the DNA library of
the HTLV1 sample.
118
119
120
11
121
122
Figure S10. Krona representation of the taxonomies incidence for the branch "Viruses"
from the DNA library of the HTLV1 sample.
123
124
125
12
126
127
Figure S11. Krona representation of the taxonomies incidence from global assembly of
the 5 BAC samples.
128
129
13