* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An in-silico functional genomics resource: Targeted re
Polycomb Group Proteins and Cancer wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Ridge (biology) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Frameshift mutation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Primary transcript wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Gene expression programming wikipedia , lookup
Copy-number variation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Oncogenomics wikipedia , lookup
SNP genotyping wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Designer baby wikipedia , lookup
Non-coding DNA wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Y chromosome wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Neocentromere wikipedia , lookup
Whole genome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Minimal genome wikipedia , lookup
Metagenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Pathogenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Helitron (biology) wikipedia , lookup
An in-silico functional genomics resource: Targeted re-sequencing of wheat TILLING mutant populations Cristobal Uauy ([email protected]) JIC / NIAB An in-silico functional genomics resource: Targeted re-sequencing of wheat TILLING mutant populations Cristobal Uauy ([email protected]) JIC / NIAB UC Davis Ksenia Krasileva Vince Buffalo Jorge Dubcovsky TGAC Sarah Ayling Matt Clark Paul Bailey Rothamsted Andy Phillips TILLING: Targeting Induced Local Lesions IN Genomes Combines: Population of (EMS) mutagenised plants High throughput screen to identify mutations in a gene of interest • • • • reverse-genetics approach, requires knowledge of gene sequence, non-transgenic, best suited to knockout genes. Reverse Genetics Polyploid TILLING Screening of ~2,000 mutant plants should lead to at least one knock-out in >90% of wheat genes (stop or splice junction mutation) However, traditional approaches require genome specific amplification Wang et al 2012 Can we perform TILLING on thousands of genes? High quality TILLING populations 4x: Kronos (Uauy et al 2009) 6x: Cadenza (Rakszegi et al 2010) “Low cost” sequencing PCR or enrichment methods Pooling of multiple PCR products (Tsai et al 2011) Exome capture NGS approaches Approach: 1. Establish feasibility of genome capture in 6x wheat 2. Define wheat exome (not just transcriptome) 3. Perform Nimblegen capture (multiplex mutants pre-capture) 4. Illumina sequence 5. Identify and assign mutation to gene and the specific homoeologue 6. Make data and germplasm accessible through public database and seed store Feasibility of exome capture in wheat Design • 1,846 sequences (RIKEN FL-cDNA and some genes of interest) • MySelect capture array (solution based hybridization) • Designed 120-mer probes (60-bp overlap design) • Exon-intron boundaries were not considered • Probes based on one homoeologue of each target gene Capture • Three 6x Cadenza EMS mutants were used • DNA was barcoded and sequenced (Illumina PE, 100 cycles) Quality control • BLASTN against 5x 454 raw sequences CS • 5.8% probes were excluded (>over 60 hits; E-50) Capture achieved a 1,300-fold enrichment of targets • Expected coverage 0.19x • Median coverage 250 x Good coverage of most exons using cDNA-based probes Small exon, no probes Short introns covered • Genome-specific genomic contig as reference improves exon junction coverage and mapping quality for SNP detection Known mutations identified in captured sequence • G>A mutation in GA20ox1D gene previously identified by HRM Mutant 1 Mutant 2 Probes based on one genome capture all three homoeologues but with varying efficiency • On average, target genomes account for 45% of captured reads vs 25% of the two other non-target genomes (Using target genome as reference) RFL_Contig1056 RFL_Contig1063 RFL_Contig1673 RFL_Contig1686 RFL_Contig1705 RFL_Contig1851 Target Genome D B B D B A Coverage all SNPs Average Median 217 189 32 30 392 408 324 322 444 359 313 272 Average Freq for given SNP A B D 16.36 22.88 46.60 25.68 47.34 26.88 24.15 48.17 19.93 25.97 28.34 42.77 21.30 42.78 26.90 45.31 25.83 22.35 Weighted average (%) 44.5 Target genome 24.1 Non-target genone does not add to 100% as some reads could not be unambigously assigned Preliminary conclusions • Trade off between “generic” probe and efficiency (sequencing costs vs design) • Exon junctions will be important for future designs (padding exon junctions) • Genome-specific genomic contigs to map reads translates into better SNP calling Next steps? • Develop genome-specific gene models to design wheat exome • Combine IWGSC chromosome arm assemblies with transcriptome data to achieve this. Chromosome arm assemblies Integrated transcriptome data-set • >140k non-redundant transcript sequences from different origins: • 4x Kronos transcriptome • Complementary dataset • RIKEN FL-cDNA • Re-assembled NCBI Unigenes (4,530) • Published transcriptomes (Brenchley et al 2012, Cantu et al 2011) Ksenia Krasileva Vince Buffalo Integrated transcriptome data-set • >140k non-redundant transcript sequences from different origins: • 4x Kronos transcriptome • Complementary dataset • RIKEN FL-cDNA • Re-assembled NCBI Unigenes (4,530) • Published transcriptomes (Brenchley et al 2012, Cantu et al 2011) • Identified 84,068 ORFs (69.3 Mb) • 30% no similarity to any plant protein (BLASTX e-3; Pfam e-3) • 13% disrupted by premature termination codon (pseudogenes) Ksenia Krasileva Vince Buffalo Combining transcriptome and chromosome arms transcript sequence gDNA contigs (Chromosome arms) predicted exons EXONERATE est2genome model (splice aware) 5´ 3´ exon intron splice site Combining transcriptome and chromosome arms transcript sequence 5´ 3´ exon gDNA contigs (Chromosome arms) intron splice site predicted exons Classification target homoeologue+ all exons inferred homoeologue+ all exons Identity (%) Coverage (%) Durum and 6x ≥99 ≥95 48% 94-99 ≥95 29% Combining transcriptome and chromosome arms transcript sequence 5´ 3´ exon gDNA contigs (Chromosome arms) intron splice site predicted exons Classification Identity (%) Coverage (%) Durum and 6x ≥99 ≥95 48% 94-99 ≥95 29% partial coverage (65-95% cov.) ≥94 65-95 11% no hits to assembly sequence with our criteria <94 <65 13% target homoeologue+ all exons inferred homoeologue+ all exons • For ~75% of transcripts, all exons can be identified within the chromosome arm assemblies by either the target genome or the homoeologous genome • 13% of transcripts had no hits or below our criteria to any of the chromosome arm assemblies Final output • Predicted genome specific exons, associated chromosome arm, and contig for future mapping of reads. Design submitted to Nimblegen last week. >AY625680_1 1AL 1AL_3960881 (12201 bp) exon1 5358 .. 5151 (208 bp) ATGATGTACCATGCTAAGAAGTTTTCTGTGCCCTTTGCACCACAGAGGGCTCAGAATAGTGAGCATGTAAGTAACATTGGAGCTTTCGGTGGATCCAACATAAGCAACCCTGCTAATCCTGTAGGGAGTGGCAAACAACGTCTAAGAT GGACCTCTGATCTCCATAGTCGTTTTGTGGATGCAATCGCCCAACTTGGTGGACCAGATA >AY625680_1 1AL 1AL_3960881 (12201 bp) exon2 4472 .. 4396 (77 bp) GAGCAACACCTAAAGGAGTACTGACTGTAATGGGTGTACCGGGGATTACAATTTATCACGTGAAGAGCCATTTGCAG >AY625680_1 1AL 1AL_3960881 (12201 bp) exon3 4313 .. 4271 (43 bp) AAGTATCGCCTTGCAAAGTACATACCAGAATCTCCTGCTGAAG >AY625680_1 1AL 1AL_3960881 (12201 bp) exon4 4168 .. 4111 (58 bp) GTTCCAAGGACGAAAAGAAGGATTCTAGTGATTCATTCTCTAATGCAGATTCTGCACC >AY625680_1 1AL 1AL_3960881 (12201 bp) exon5 3979 .. 3910 (70 bp) GGGTTCACAAATCAATGAAGCATTGAAGATGCAAATGGAAGTTCAGAAGCGGCTCCATGAACAACTCGAG >AY625680_1 1AL 1AL_3960881 (12201 bp) exon6 3549 .. 3205 (345 bp) GTTCAAAAGCAGTTGCAGCTGAGAATCGAAGCACAAGGGAAGTACTTGCAGATGATCATAGAGGAGCAGCAAAAGCTTGGTGGCTCACTTGAAGGTTCTGAGGAGAGGAAGCTTTCACATTCACCACCTACCTTAGATGACTACCCTG ACAGCATACAGCCTTCTCCGAAGAAACCACGGTTGGATGATCTGTCAACAGATGCGGTCCGGGGTGTTACACAGCCAGGGTTTGAATCCCATCTTATTGGCCCATGGGATCAAGAACTCTGTCCGAAGACCAACATATGCGATCCTGC ATTCCAAGTGGATGAGTTTAAGGCAAACCCTGGTTTGAGCAAGTCATAA >AY625680_1 1DL 1DL_2252313 (9898 bp) exon1 1193 .. 1400 (208 bp) ATGATGTACCATGCTAAGAAGTTTTCTGTGCCCTTTGCACCACAGAGGGCTCAGAATAGTGAGCATGTGAGTAATATTGGAGCTTTCGGTGGATCCAACATAAGCAACCCTGCCAATCCTGTAGGGAGTGGGAAACAACGTCTAAGAT GGACCTCTGATCTTCATAGTCGTTTTGTGGATGCAATCGCCCAACTTGGTGGACCAGATA >AY625680_1 1DL 1DL_2252313 (9898 bp) exon2 2024 .. 2100 (77 bp) GAGCAACACCTAAAGGAGTACTGACTGTAATGGGTGTACCGGGGATTACAATTTATCACGTGAAGAGCCATTTGCAG >AY625680_1 1DL 1DL_2252313 (9898 bp) exon3 2183 .. 2225 (43 bp) AAGTATCGCCTTGCAAAGTACATACCAGAATCGCCTGCTGAAG >AY625680_1 1DL 1DL_2252313 (9898 bp) exon4 2329 .. 2386 (58 bp) GTTCCAAGGACGAAAAGAAGGATTCTAGTGATTCATTCTCTAATGCAGATTCTGCACC >AY625680_1 1DL 1DL_2252313 (9898 bp) exon5 2516 .. 2585 (70 bp) GGGTTCACAAATCAATGAAGCATTGAAGATGCAAATGGAAGTTCAGAAGCGGCTCCATGAACAACTCGAG >AY625680_1 1DL 1DL_2252313 (9898 bp) exon6 2937 .. 3281 (345 bp) GTTCAAAAGCAGTTGCAGCTGAGAATCGAAGCACAAGGGAAGTACTTGCAGATGATCATAGAGGAGCAGCAAAAGCTTGGTGGCTCACTTGAAGGTTCTGAGGAGAGGAAGCTTTCACATTCACCACCTACCTTAGATGACTACCCTG ACAGCATACAGCCTTCTCCGAAGAAACCACGGTTGGATGATCTGTCAACAGACGCGGTCCGGGGTGTTGCACAGCCAGGGTTTGAATCCCATCTTGTCGGCCCATGGGATCAAGAACTCTGTCCAAAGACCAACATATGCGATCCCGC ATTCCAAGTGGATGAGTTTAAGGCAAACCCTGGTTTGAGCAAGTCATAA Final thoughts • Sequencing of~2,000 mutant should lead to at least one knock-out in >90% of wheat genes • Genome capture works well in polyploids and there is a trade-off between “generic” probe and capture efficiency • Determining exon junctions was important for probe designs and the use of genomespecific genomic contigs to map reads will be critical for mapping and proper SNP calling • Full-length genome contigs in at least one homoeologue for ~75% of transcripts Access to mutants • We plan to hold mirror collection of seeds at UC Davis and JIC Seed Store • Mutants will be free from any IP for the mutations people find • We plan to charge a small fee for 10-15 seeds of each mutant to maintain collections IWGSC UC Davis Ksenia Krasileva Vince Buffalo Jorge Dubcovsky TGAC Sarah Ayling Matt Clark Paul Bailey Sophie Janacek Rothamsted Andy Phillips EBI Paul Kersey Dan Bolser JIC Mike Bevan James Simmonds Lorelei Bilham Wheat (and human) genetic diversity Exploiting genetic diversity for cereal breeding 4 July (all day), 5 July (am) • Keynote speakers • Jorge Dubcovsky (UC Davis and HHMI) • Pat Schnable (Univ of Iowa) • Robbie Waugh (JHI) •Invited speakers • Beat Keller (Univ of Zurich) • Bin Han (SIPPE, CAS) • Thorsten Schnurbusch (IPK) • Viktor Korzun (KWS) • Kentaro Yoshida (TSL) • Wen Wang (Kunming Institute of Zoology, CAS) • Woolhouse Lecture: Susan McCouch (Cornell) http://www.sebiology.org/