Download An in-silico functional genomics resource: Targeted re

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polycomb Group Proteins and Cancer wikipedia , lookup

NEDD9 wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Ridge (biology) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Epistasis wikipedia , lookup

Frameshift mutation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Primary transcript wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Mutation wikipedia , lookup

Karyotype wikipedia , lookup

NUMT wikipedia , lookup

Gene expression programming wikipedia , lookup

Copy-number variation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Oncogenomics wikipedia , lookup

SNP genotyping wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Transposable element wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Designer baby wikipedia , lookup

Non-coding DNA wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Y chromosome wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Neocentromere wikipedia , lookup

Polyploid wikipedia , lookup

Whole genome sequencing wikipedia , lookup

X-inactivation wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pathogenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Genome editing wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
An in-silico functional genomics resource:
Targeted re-sequencing of wheat TILLING mutant populations
Cristobal Uauy
([email protected])
JIC / NIAB
An in-silico functional genomics resource:
Targeted re-sequencing of wheat TILLING mutant populations
Cristobal Uauy
([email protected])
JIC / NIAB
UC Davis
Ksenia Krasileva
Vince Buffalo
Jorge Dubcovsky
TGAC
Sarah Ayling
Matt Clark
Paul Bailey
Rothamsted
Andy Phillips
TILLING: Targeting Induced Local Lesions IN Genomes
Combines:
Population of (EMS) mutagenised plants
High throughput screen to identify mutations in a gene of interest
•
•
•
•
reverse-genetics approach,
requires knowledge of gene sequence,
non-transgenic,
best suited to knockout genes.
Reverse Genetics
Polyploid TILLING
Screening of ~2,000 mutant plants should lead to at least one knock-out in >90% of
wheat genes (stop or splice junction mutation)
However, traditional approaches require genome specific amplification
Wang et al 2012
Can we perform TILLING on thousands of genes?
High quality TILLING
populations
4x: Kronos
(Uauy et al 2009)
6x: Cadenza
(Rakszegi et al 2010)
“Low cost”
sequencing
PCR or enrichment
methods
Pooling of multiple PCR
products (Tsai et al 2011)
Exome capture
NGS approaches
Approach:
1.
Establish feasibility of genome capture in 6x wheat
2.
Define wheat exome (not just transcriptome)
3.
Perform Nimblegen capture (multiplex mutants pre-capture)
4.
Illumina sequence
5.
Identify and assign mutation to gene and the specific homoeologue
6.
Make data and germplasm accessible through public database and seed store
Feasibility of exome capture in wheat
Design
• 1,846 sequences (RIKEN FL-cDNA and some genes of interest)
• MySelect capture array (solution based hybridization)
• Designed 120-mer probes (60-bp overlap design)
• Exon-intron boundaries were not considered
• Probes based on one homoeologue of each target gene
Capture
• Three 6x Cadenza EMS mutants were used
• DNA was barcoded and sequenced (Illumina PE, 100 cycles)
Quality control
• BLASTN against 5x 454 raw sequences CS
• 5.8% probes were excluded (>over 60 hits; E-50)
Capture achieved a 1,300-fold enrichment of targets
• Expected coverage
0.19x
• Median coverage
250 x
Good coverage of most exons using cDNA-based probes
Small exon,
no probes
Short introns
covered
• Genome-specific genomic contig as reference improves exon junction coverage
and mapping quality for SNP detection
Known mutations identified in captured sequence
• G>A mutation in GA20ox1D gene previously identified by HRM
Mutant 1
Mutant 2
Probes based on one genome capture all three homoeologues but
with varying efficiency
• On average, target genomes account for 45% of captured reads vs 25% of the two
other non-target genomes (Using target genome as reference)
RFL_Contig1056
RFL_Contig1063
RFL_Contig1673
RFL_Contig1686
RFL_Contig1705
RFL_Contig1851
Target Genome
D
B
B
D
B
A
Coverage all SNPs
Average Median
217
189
32
30
392
408
324
322
444
359
313
272
Average Freq for given SNP
A
B
D
16.36
22.88
46.60
25.68
47.34
26.88
24.15
48.17
19.93
25.97
28.34
42.77
21.30
42.78
26.90
45.31
25.83
22.35
Weighted average (%)
44.5
Target genome
24.1
Non-target genone
does not add to 100% as some reads could
not be unambigously assigned
Preliminary conclusions
• Trade off between “generic” probe and efficiency (sequencing costs vs design)
• Exon junctions will be important for future designs (padding exon junctions)
• Genome-specific genomic contigs to map reads translates into better SNP calling
Next steps?
• Develop genome-specific gene models to design wheat exome
• Combine IWGSC chromosome arm assemblies with transcriptome data to achieve this.
Chromosome arm assemblies
Integrated transcriptome data-set
• >140k non-redundant transcript sequences from different origins:
• 4x Kronos transcriptome
• Complementary dataset
• RIKEN FL-cDNA
• Re-assembled NCBI Unigenes (4,530)
• Published transcriptomes (Brenchley et al 2012, Cantu et al 2011)
Ksenia Krasileva
Vince Buffalo
Integrated transcriptome data-set
• >140k non-redundant transcript sequences from different origins:
• 4x Kronos transcriptome
• Complementary dataset
• RIKEN FL-cDNA
• Re-assembled NCBI Unigenes (4,530)
• Published transcriptomes (Brenchley et al 2012, Cantu et al 2011)
• Identified 84,068 ORFs (69.3 Mb)
• 30% no similarity to any plant protein (BLASTX e-3; Pfam e-3)
• 13% disrupted by premature termination codon (pseudogenes)
Ksenia Krasileva
Vince Buffalo
Combining transcriptome and chromosome arms
transcript
sequence
gDNA contigs
(Chromosome arms)
predicted exons
EXONERATE est2genome model
(splice aware)
5´
3´
exon
intron
splice site
Combining transcriptome and chromosome arms
transcript
sequence
5´
3´
exon
gDNA contigs
(Chromosome arms)
intron
splice site
predicted exons
Classification
target homoeologue+ all exons
inferred homoeologue+ all exons
Identity
(%)
Coverage
(%)
Durum and 6x
≥99
≥95
48%
94-99
≥95
29%
Combining transcriptome and chromosome arms
transcript
sequence
5´
3´
exon
gDNA contigs
(Chromosome arms)
intron
splice site
predicted exons
Classification
Identity
(%)
Coverage
(%)
Durum and 6x
≥99
≥95
48%
94-99
≥95
29%
partial coverage (65-95% cov.)
≥94
65-95
11%
no hits to assembly sequence with our criteria
<94
<65
13%
target homoeologue+ all exons
inferred homoeologue+ all exons
• For ~75% of transcripts, all exons can be identified within the chromosome arm
assemblies by either the target genome or the homoeologous genome
• 13% of transcripts had no hits or below our criteria to any of the chromosome arm
assemblies
Final output
• Predicted genome specific exons, associated chromosome arm, and contig for future
mapping of reads. Design submitted to Nimblegen last week.
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon1 5358 .. 5151 (208 bp)
ATGATGTACCATGCTAAGAAGTTTTCTGTGCCCTTTGCACCACAGAGGGCTCAGAATAGTGAGCATGTAAGTAACATTGGAGCTTTCGGTGGATCCAACATAAGCAACCCTGCTAATCCTGTAGGGAGTGGCAAACAACGTCTAAGAT
GGACCTCTGATCTCCATAGTCGTTTTGTGGATGCAATCGCCCAACTTGGTGGACCAGATA
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon2 4472 .. 4396 (77 bp)
GAGCAACACCTAAAGGAGTACTGACTGTAATGGGTGTACCGGGGATTACAATTTATCACGTGAAGAGCCATTTGCAG
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon3 4313 .. 4271 (43 bp)
AAGTATCGCCTTGCAAAGTACATACCAGAATCTCCTGCTGAAG
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon4 4168 .. 4111 (58 bp)
GTTCCAAGGACGAAAAGAAGGATTCTAGTGATTCATTCTCTAATGCAGATTCTGCACC
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon5 3979 .. 3910 (70 bp)
GGGTTCACAAATCAATGAAGCATTGAAGATGCAAATGGAAGTTCAGAAGCGGCTCCATGAACAACTCGAG
>AY625680_1 1AL 1AL_3960881 (12201 bp) exon6 3549 .. 3205 (345 bp)
GTTCAAAAGCAGTTGCAGCTGAGAATCGAAGCACAAGGGAAGTACTTGCAGATGATCATAGAGGAGCAGCAAAAGCTTGGTGGCTCACTTGAAGGTTCTGAGGAGAGGAAGCTTTCACATTCACCACCTACCTTAGATGACTACCCTG
ACAGCATACAGCCTTCTCCGAAGAAACCACGGTTGGATGATCTGTCAACAGATGCGGTCCGGGGTGTTACACAGCCAGGGTTTGAATCCCATCTTATTGGCCCATGGGATCAAGAACTCTGTCCGAAGACCAACATATGCGATCCTGC
ATTCCAAGTGGATGAGTTTAAGGCAAACCCTGGTTTGAGCAAGTCATAA
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon1 1193 .. 1400 (208 bp)
ATGATGTACCATGCTAAGAAGTTTTCTGTGCCCTTTGCACCACAGAGGGCTCAGAATAGTGAGCATGTGAGTAATATTGGAGCTTTCGGTGGATCCAACATAAGCAACCCTGCCAATCCTGTAGGGAGTGGGAAACAACGTCTAAGAT
GGACCTCTGATCTTCATAGTCGTTTTGTGGATGCAATCGCCCAACTTGGTGGACCAGATA
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon2 2024 .. 2100 (77 bp)
GAGCAACACCTAAAGGAGTACTGACTGTAATGGGTGTACCGGGGATTACAATTTATCACGTGAAGAGCCATTTGCAG
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon3 2183 .. 2225 (43 bp)
AAGTATCGCCTTGCAAAGTACATACCAGAATCGCCTGCTGAAG
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon4 2329 .. 2386 (58 bp)
GTTCCAAGGACGAAAAGAAGGATTCTAGTGATTCATTCTCTAATGCAGATTCTGCACC
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon5 2516 .. 2585 (70 bp)
GGGTTCACAAATCAATGAAGCATTGAAGATGCAAATGGAAGTTCAGAAGCGGCTCCATGAACAACTCGAG
>AY625680_1 1DL 1DL_2252313 (9898 bp) exon6 2937 .. 3281 (345 bp)
GTTCAAAAGCAGTTGCAGCTGAGAATCGAAGCACAAGGGAAGTACTTGCAGATGATCATAGAGGAGCAGCAAAAGCTTGGTGGCTCACTTGAAGGTTCTGAGGAGAGGAAGCTTTCACATTCACCACCTACCTTAGATGACTACCCTG
ACAGCATACAGCCTTCTCCGAAGAAACCACGGTTGGATGATCTGTCAACAGACGCGGTCCGGGGTGTTGCACAGCCAGGGTTTGAATCCCATCTTGTCGGCCCATGGGATCAAGAACTCTGTCCAAAGACCAACATATGCGATCCCGC
ATTCCAAGTGGATGAGTTTAAGGCAAACCCTGGTTTGAGCAAGTCATAA
Final thoughts
• Sequencing of~2,000 mutant should lead to at least one knock-out in >90% of wheat genes
• Genome capture works well in polyploids and there is a trade-off between “generic”
probe and capture efficiency
• Determining exon junctions was important for probe designs and the use of genomespecific genomic contigs to map reads will be critical for mapping and proper SNP calling
• Full-length genome contigs in at least one homoeologue for ~75% of transcripts
Access to mutants
• We plan to hold mirror collection of seeds at UC Davis and JIC Seed Store
• Mutants will be free from any IP for the mutations people find
• We plan to charge a small fee for 10-15 seeds of each mutant to maintain collections
IWGSC
UC Davis
Ksenia Krasileva
Vince Buffalo
Jorge Dubcovsky
TGAC
Sarah Ayling
Matt Clark
Paul Bailey
Sophie Janacek
Rothamsted
Andy Phillips
EBI
Paul Kersey
Dan Bolser
JIC
Mike Bevan
James Simmonds
Lorelei Bilham
Wheat (and human) genetic diversity
Exploiting genetic diversity for cereal breeding
4 July (all day), 5 July (am)
• Keynote speakers
• Jorge Dubcovsky (UC Davis and HHMI)
• Pat Schnable (Univ of Iowa)
• Robbie Waugh (JHI)
•Invited speakers
• Beat Keller (Univ of Zurich)
• Bin Han (SIPPE, CAS)
• Thorsten Schnurbusch (IPK)
• Viktor Korzun (KWS)
• Kentaro Yoshida (TSL)
• Wen Wang (Kunming Institute of Zoology, CAS)
• Woolhouse Lecture: Susan McCouch (Cornell)
http://www.sebiology.org/