Download The sugarcane chloroplast genome: a next generation sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA profiling wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA sequencing wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Exome sequencing wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human Genome Project wikipedia , lookup

Transcript
19/01/2016
Recent publication in New Negatives in Plant Science
The sugarcane
chloroplast genome:
a next generation
sequencing perspective
Nam V. Hoang | Agnelo Furtado | Richard B. McQualter | Robert J. Henry
Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD 4072, Australia
International Plant and Animal Genome XXIV Conference, San Diego, CA, USA 2016
@ Sugar Cane Sequencing Initiative
Background
DNA from
PCR amplification
DNA from
chloroplast isolation
Total genomic DNA
from whole leaf
Chloroplast
(cp)
Background
Sequencing and
assembly of
Chloroplast (cp)
genome
(defined as the presence of
more than one type of cp
genome in an organism)
Medicago, kiwifruit, date palm..
Current study | Using sugarcane as a test case for this hypothesis
Why sugarcane?
• It has a large complex and polyploid genome.
• A C4 plant with two types of photosynthetic cells, mesophyll and
bundle sheath cells
Mitochondria
(mt)
61-100% similarity
between cp genuine
sequences and
cp-insertions
Nucleus
(nr)
(reviewed in Notsu et al., Mol. Genet.
Genomics 2004).
(Gao et al., Funct Integr Genomics 2014).
What if the
cp-insertions were
amplified instead of the
genuine cp DNA?
cp heteroplasmy
What if there was still
mt-DNA or nr-DNA
remained in the
isolated cp-DNA?
What if the
heteroplasmy reported
was due to the DNA
from mt or nr?
Methods | Sample preparation
Leaf tissue of sugarcane hybrid (Saccharum spp. hybrids) cv. Q155
Aims
• To construct the sugarcane cp-genome from high coverage of
Illumina NGS data.
• To re-investigate the presence of cp heteroplasmy by using
different types of samples with different DNA proportions from
chloroplast, mitochondria and nucleus.
Methods | Overview
Sample collection
Part 1.
Sugarcane chloroplast genome
assembly using Next Generation
Sequencing read data.
DNA extraction
Library preparation
Whole leaf tissue.
Mesophyll cell
isolation
Bundle sheath cell
isolation
Whole leaf (WL)
sample
Mesophyll cell
(MC) sample
Bundle sheath cell
(BC) sample
DNA sequencing
Read QC, trimming
Genome assembly
Annotation
DNA extraction
• Cell isolation (Majeran et al., Plant Cell Online 2005)
• DNA extraction followed the modified CTAB method (Furtado, Methods Mol Biol 2014).
• Sequencing was conducted by the Australian Genomic Research Facility (Melbourne, Australia) using
Illumina HiSeq 2000 (2x100bp reads).
Part 2.
Whether or not sugarcane
chloroplast genome displays
heteroplasmy
from NGS perspective
Read proportion estimation
Contamination estimation
Variant analysis
1
19/01/2016
Methods | Assembly | Workflow
Trimmed reads
De novo assembly
Mapping to the
reference
Blast to the cp-database
Part 1.
Sugarcane chloroplast genome assembly using
Next Generation Sequencing read data.
Indels and structural
variants analysis
Extract the cp contigs
Update contigs
Local realignment
Anchor to the reference
cp mapping consensus
cp de novo consensus
Alignment of mapping and
de novo consensus
sequences
Check for mismatches
Final cp-sequence
Analysis was done using the
CLC Genomics Workbench ver7.0.4
Methods | Assembly | Reference
Furtado et al., unpublished
Results | Mapping Assembly
Mapped to cp-RefSeq (SP80-3280)
Chloroplast genome of sugarcane
(Saccharum spp. hybrids) was
sequenced for cv. NCo310: using
PCR products (cp-DNA) amplified on
leaf DNA template (Asano et al., DNA Res 2004)
cv. SP80-3280: using cp-DNA
isolation from leaf DNA and
Sanger sequencing (BigDye
Terminator cycle) with an 8x
coverage (Calsa Júnior et al., Curr. Genet. 2004).
• Reference used for mapping assembly: cv. SP80-3280
chloroplast genome.
106,785,114
trimmed reads of
WL-DNA
29,018,308
trimmed reads of
MC-cp-DNA
3.12%
3,308,467 reads
InDels and
structural variant
analysis
0.81%
231,777 reads
Local realignment
29,252,862
trimmed reads of
BC-cp-DNA
0.49%
140,531 reads
Three mapping consensus
141,181 bp *
*Minor differences between three sequences due to SP80-3280
cp-sequence (containing some errors) being used as reference
and different read proportions in three samples
Results | de novo Assembly
Results | Final cp sequence
WL-DNA mapping consensus
WL-DNA
523,854 contigs
cp-Contig1: 83,125 bp
cp-Contig2: 12,622 bp
cp-Contig3: 22,795 bp
MC-cp-DNA mapping consensus
BC-cp-DNA mapping consensus
The average coverage of three cp-contigs in three samples
De novo
assembly
Alignment to check
for mismatches
WL-DNA de novo consensus
MC-cp-DNA
251,339 contigs
cp-Contig1: 83,091 bp
cp-Contig2: 12,588 bp
cp-Contig3: 22,795 bp
BC-cp-DNA
258,657 contigs
cp-Contig1: 83,091 bp
cp-Contig2: 12,588 bp
cp-Contig3: 22,798 bp
MC-cp-DNA de novo consensus
1 sequence
BC-cp-DNA de novo consensus
Contigs aligned to the reference sequence in Clone Manager 9
Three identical
de novo consensus sequences
141,181 bp
Final
Q155 cp-sequence
~de novo consensus
Length (bp)
LSC (bp)
IRa (bp)
SSC (bp)
IRb (bp)
141,181
83,047
22,795
12,544
22,795
Coverage
WL-DNA sample: 2,357x
MC-cp-DNA sample: 166x
BC-cp-DNA samples: 101x
2
19/01/2016
Results | Annotation
Results | Comparative Analysis
Positions of difference and genes involved in chloroplast
genome of Q155 comparing to NCo310 and SP80-3280.
•
•
•
•
•
•
DOGMA: gene prediction
CPGAVAS: gene prediction
Apollo: adjusting gene coordinates
e-PCR: Sequence Tagged Sites (STS)
Sequin: GenBank submission
OGDRAW: cp-genome map
GenBank Accession Number:
KU214867
cv. NCo310 | AP006714.1
Position NCo310 SP80-3280 Q155
11007
11811
11963
cv. Q155 | KU214867
12027
14312
16884
37993
54595
67367
The Q155 cp-genome in comparison with
79557
NCo310 and SP80-3280 cp-genomes.
123625
123651
NCo310 SP80-3280
Q155
123802
Length (bp) 141,182
141,182
141,181 123815
LSC (bp)
83,048  83,047
83,047
cv. SP80-3280 | AE009947.2
IRa (bp)
22,795
22,795
22,795
SSC (bp)
12,544
12,544
12,544
IRb (bp)
22,795
22,796
22,795
C
A
C
C
C
A
A
G
A
G
G
T





G
T
C
T
T
T
C
T
A
A
A
A









T
A
C
T
T
A
T
A
G
G
T
Consensus
nucleotide
T
A
C
T
T
A
T
A
G
G
T
Gene
psbC
psbC
Intergenic
trnS-UGA
Intergenic
Intergenic
atpA
trnM-CAU
Intergenic
Intergenic
rrn23
rrn23
rrn23
rrn23
This suggests that there were some errors in two previous
cp-sequences due to limitations of the techniques used, could be
contamination by DNA from mitochondria or nucleus
Conclusions
Chloroplast genome can be extracted from whole genome shotgun
sequencing data (of total DNA or enriched DNA samples), even from
a highly polyploid plant like sugarcane by using de novo assembly
and reference mapping.
Q155 cp-sequence was based on a much higher coverage of ~2,600x
Part 2.
Whether or not sugarcane chloroplast genome
displays heteroplasmy, from NGS perspective
The cp-genome of sugarcane cv. Q155 reported
here is likely to be the cp-genome sequence for
cultivated sugarcane as it is the consensus of
sequences reported for other cultivars.
Narrow origin of
sugarcane
Methods | Contamination Estimation
Results | Contamination Estimation
Non-chloroplast DNA reads
(contamination fractions)
Trimmed reads
13.59% mt reads
0.66% nr reads
Align to
Single copy genes
10 chloroplast genes
6 mitochondrial genes
7.82% mt reads
3.48% nr reads
4 nuclear genes
Average coverage of each set
14.68% mt reads
5.18% nr reads
Contamination proportions
3
19/01/2016
Results | Contamination Estimation
Results | Contamination Estimation
For chloroplast genome
de novo and mapping
assembly, it was based on
this majority read fraction
of cp-reads
Illumina data
~107M reads of Q155 WL-DNA
~ 3% reads mapped to cp reference
Chloroplast reads, ~85%
Genuine cp reads, average
coverage of 2,191x
Chloroplast homologous
regions with mt and nuclear
Different proportions of
chloroplast, mitochondrial and
nuclear DNA in samples collected
from different tissues of a plant
cp-reference
For variant detection in
chloroplast genome,
these two read fractions
from mt and nuclear
could cause troubles
Nuclear homologous reads ~1%
Originally from cp insertions in
nuclear genome, average
coverage of 9x
Mitochondrial homologous reads ~14%.
Originally from cp insertions in mt
genome, average coverage of 220x
Results | Variant Analysis
Methods | Variant Analysis
The number of SNPs detected at minimum variant frequency
of 1%, 5%, 10%, 15% and 20%.
Trimmed reads
Align to Q155 cp-genome
Sample
WL-DNA
MC-cp-DNA
BC-cp-DNA
Mapping files
SNP detection
Final conclusions
Non-cp
fraction
13.59%
7.82%
14.68%
•
None of SNPs with minimum variant frequency of
20%, three at 15% in the highest contaminated
sample, BC-cp-DNA sample.
•
In WL-DNA sample, none of SNPs were at higher
frequency than the contamination fraction. There
were only three SNPs in each of MC-cp-DNA and
BC-cp-DNA samples higher than the contamination
fraction in that sample (lower coverage samples).
•
All SNPs detected could be attributed to the
fractions of non-chloroplast sequences from
the mitochondria and nucleus.
SNP filtering
Check for true
SNPs at each
variant location
Minimum variant frequency
1%
5% 10% 15% 20%
85
53
13
0
0
42
15
1
0
0
50
31
11
3
0
Conclusions
Scheme of possible sources of chloroplast SNPs detected from NGS data of sugarcane
cultivar Q155. cp and mt denote chloroplast and mitochondria, respectively.
Acknowledgement
There could be varied numbers of chloroplast, mitochondrial,
nuclear genomes in preparations from different tissues of a
plant. Different read proportions and sample coverage
influence the number of variants detected
This project is jointly supported by the Department of Agriculture and
Fisheries and the University of Queensland.
This may explain some earlier reports of heteroplasmy
in the chloroplast in different parts of plants.
There is no positive evidence from NGS data for
heteroplasmy in the sugarcane cp-genome.
It is possible that plant cp-genomes
do not display heteroplasmy.
Photo by UQ Absolute Interns
4
19/01/2016
Thank you very much for
your attention!
5