Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
19/01/2016 Recent publication in New Negatives in Plant Science The sugarcane chloroplast genome: a next generation sequencing perspective Nam V. Hoang | Agnelo Furtado | Richard B. McQualter | Robert J. Henry Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD 4072, Australia International Plant and Animal Genome XXIV Conference, San Diego, CA, USA 2016 @ Sugar Cane Sequencing Initiative Background DNA from PCR amplification DNA from chloroplast isolation Total genomic DNA from whole leaf Chloroplast (cp) Background Sequencing and assembly of Chloroplast (cp) genome (defined as the presence of more than one type of cp genome in an organism) Medicago, kiwifruit, date palm.. Current study | Using sugarcane as a test case for this hypothesis Why sugarcane? • It has a large complex and polyploid genome. • A C4 plant with two types of photosynthetic cells, mesophyll and bundle sheath cells Mitochondria (mt) 61-100% similarity between cp genuine sequences and cp-insertions Nucleus (nr) (reviewed in Notsu et al., Mol. Genet. Genomics 2004). (Gao et al., Funct Integr Genomics 2014). What if the cp-insertions were amplified instead of the genuine cp DNA? cp heteroplasmy What if there was still mt-DNA or nr-DNA remained in the isolated cp-DNA? What if the heteroplasmy reported was due to the DNA from mt or nr? Methods | Sample preparation Leaf tissue of sugarcane hybrid (Saccharum spp. hybrids) cv. Q155 Aims • To construct the sugarcane cp-genome from high coverage of Illumina NGS data. • To re-investigate the presence of cp heteroplasmy by using different types of samples with different DNA proportions from chloroplast, mitochondria and nucleus. Methods | Overview Sample collection Part 1. Sugarcane chloroplast genome assembly using Next Generation Sequencing read data. DNA extraction Library preparation Whole leaf tissue. Mesophyll cell isolation Bundle sheath cell isolation Whole leaf (WL) sample Mesophyll cell (MC) sample Bundle sheath cell (BC) sample DNA sequencing Read QC, trimming Genome assembly Annotation DNA extraction • Cell isolation (Majeran et al., Plant Cell Online 2005) • DNA extraction followed the modified CTAB method (Furtado, Methods Mol Biol 2014). • Sequencing was conducted by the Australian Genomic Research Facility (Melbourne, Australia) using Illumina HiSeq 2000 (2x100bp reads). Part 2. Whether or not sugarcane chloroplast genome displays heteroplasmy from NGS perspective Read proportion estimation Contamination estimation Variant analysis 1 19/01/2016 Methods | Assembly | Workflow Trimmed reads De novo assembly Mapping to the reference Blast to the cp-database Part 1. Sugarcane chloroplast genome assembly using Next Generation Sequencing read data. Indels and structural variants analysis Extract the cp contigs Update contigs Local realignment Anchor to the reference cp mapping consensus cp de novo consensus Alignment of mapping and de novo consensus sequences Check for mismatches Final cp-sequence Analysis was done using the CLC Genomics Workbench ver7.0.4 Methods | Assembly | Reference Furtado et al., unpublished Results | Mapping Assembly Mapped to cp-RefSeq (SP80-3280) Chloroplast genome of sugarcane (Saccharum spp. hybrids) was sequenced for cv. NCo310: using PCR products (cp-DNA) amplified on leaf DNA template (Asano et al., DNA Res 2004) cv. SP80-3280: using cp-DNA isolation from leaf DNA and Sanger sequencing (BigDye Terminator cycle) with an 8x coverage (Calsa Júnior et al., Curr. Genet. 2004). • Reference used for mapping assembly: cv. SP80-3280 chloroplast genome. 106,785,114 trimmed reads of WL-DNA 29,018,308 trimmed reads of MC-cp-DNA 3.12% 3,308,467 reads InDels and structural variant analysis 0.81% 231,777 reads Local realignment 29,252,862 trimmed reads of BC-cp-DNA 0.49% 140,531 reads Three mapping consensus 141,181 bp * *Minor differences between three sequences due to SP80-3280 cp-sequence (containing some errors) being used as reference and different read proportions in three samples Results | de novo Assembly Results | Final cp sequence WL-DNA mapping consensus WL-DNA 523,854 contigs cp-Contig1: 83,125 bp cp-Contig2: 12,622 bp cp-Contig3: 22,795 bp MC-cp-DNA mapping consensus BC-cp-DNA mapping consensus The average coverage of three cp-contigs in three samples De novo assembly Alignment to check for mismatches WL-DNA de novo consensus MC-cp-DNA 251,339 contigs cp-Contig1: 83,091 bp cp-Contig2: 12,588 bp cp-Contig3: 22,795 bp BC-cp-DNA 258,657 contigs cp-Contig1: 83,091 bp cp-Contig2: 12,588 bp cp-Contig3: 22,798 bp MC-cp-DNA de novo consensus 1 sequence BC-cp-DNA de novo consensus Contigs aligned to the reference sequence in Clone Manager 9 Three identical de novo consensus sequences 141,181 bp Final Q155 cp-sequence ~de novo consensus Length (bp) LSC (bp) IRa (bp) SSC (bp) IRb (bp) 141,181 83,047 22,795 12,544 22,795 Coverage WL-DNA sample: 2,357x MC-cp-DNA sample: 166x BC-cp-DNA samples: 101x 2 19/01/2016 Results | Annotation Results | Comparative Analysis Positions of difference and genes involved in chloroplast genome of Q155 comparing to NCo310 and SP80-3280. • • • • • • DOGMA: gene prediction CPGAVAS: gene prediction Apollo: adjusting gene coordinates e-PCR: Sequence Tagged Sites (STS) Sequin: GenBank submission OGDRAW: cp-genome map GenBank Accession Number: KU214867 cv. NCo310 | AP006714.1 Position NCo310 SP80-3280 Q155 11007 11811 11963 cv. Q155 | KU214867 12027 14312 16884 37993 54595 67367 The Q155 cp-genome in comparison with 79557 NCo310 and SP80-3280 cp-genomes. 123625 123651 NCo310 SP80-3280 Q155 123802 Length (bp) 141,182 141,182 141,181 123815 LSC (bp) 83,048 83,047 83,047 cv. SP80-3280 | AE009947.2 IRa (bp) 22,795 22,795 22,795 SSC (bp) 12,544 12,544 12,544 IRb (bp) 22,795 22,796 22,795 C A C C C A A G A G G T G T C T T T C T A A A A T A C T T A T A G G T Consensus nucleotide T A C T T A T A G G T Gene psbC psbC Intergenic trnS-UGA Intergenic Intergenic atpA trnM-CAU Intergenic Intergenic rrn23 rrn23 rrn23 rrn23 This suggests that there were some errors in two previous cp-sequences due to limitations of the techniques used, could be contamination by DNA from mitochondria or nucleus Conclusions Chloroplast genome can be extracted from whole genome shotgun sequencing data (of total DNA or enriched DNA samples), even from a highly polyploid plant like sugarcane by using de novo assembly and reference mapping. Q155 cp-sequence was based on a much higher coverage of ~2,600x Part 2. Whether or not sugarcane chloroplast genome displays heteroplasmy, from NGS perspective The cp-genome of sugarcane cv. Q155 reported here is likely to be the cp-genome sequence for cultivated sugarcane as it is the consensus of sequences reported for other cultivars. Narrow origin of sugarcane Methods | Contamination Estimation Results | Contamination Estimation Non-chloroplast DNA reads (contamination fractions) Trimmed reads 13.59% mt reads 0.66% nr reads Align to Single copy genes 10 chloroplast genes 6 mitochondrial genes 7.82% mt reads 3.48% nr reads 4 nuclear genes Average coverage of each set 14.68% mt reads 5.18% nr reads Contamination proportions 3 19/01/2016 Results | Contamination Estimation Results | Contamination Estimation For chloroplast genome de novo and mapping assembly, it was based on this majority read fraction of cp-reads Illumina data ~107M reads of Q155 WL-DNA ~ 3% reads mapped to cp reference Chloroplast reads, ~85% Genuine cp reads, average coverage of 2,191x Chloroplast homologous regions with mt and nuclear Different proportions of chloroplast, mitochondrial and nuclear DNA in samples collected from different tissues of a plant cp-reference For variant detection in chloroplast genome, these two read fractions from mt and nuclear could cause troubles Nuclear homologous reads ~1% Originally from cp insertions in nuclear genome, average coverage of 9x Mitochondrial homologous reads ~14%. Originally from cp insertions in mt genome, average coverage of 220x Results | Variant Analysis Methods | Variant Analysis The number of SNPs detected at minimum variant frequency of 1%, 5%, 10%, 15% and 20%. Trimmed reads Align to Q155 cp-genome Sample WL-DNA MC-cp-DNA BC-cp-DNA Mapping files SNP detection Final conclusions Non-cp fraction 13.59% 7.82% 14.68% • None of SNPs with minimum variant frequency of 20%, three at 15% in the highest contaminated sample, BC-cp-DNA sample. • In WL-DNA sample, none of SNPs were at higher frequency than the contamination fraction. There were only three SNPs in each of MC-cp-DNA and BC-cp-DNA samples higher than the contamination fraction in that sample (lower coverage samples). • All SNPs detected could be attributed to the fractions of non-chloroplast sequences from the mitochondria and nucleus. SNP filtering Check for true SNPs at each variant location Minimum variant frequency 1% 5% 10% 15% 20% 85 53 13 0 0 42 15 1 0 0 50 31 11 3 0 Conclusions Scheme of possible sources of chloroplast SNPs detected from NGS data of sugarcane cultivar Q155. cp and mt denote chloroplast and mitochondria, respectively. Acknowledgement There could be varied numbers of chloroplast, mitochondrial, nuclear genomes in preparations from different tissues of a plant. Different read proportions and sample coverage influence the number of variants detected This project is jointly supported by the Department of Agriculture and Fisheries and the University of Queensland. This may explain some earlier reports of heteroplasmy in the chloroplast in different parts of plants. There is no positive evidence from NGS data for heteroplasmy in the sugarcane cp-genome. It is possible that plant cp-genomes do not display heteroplasmy. Photo by UQ Absolute Interns 4 19/01/2016 Thank you very much for your attention! 5