Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Toward the genetic basis of adaptation using arrays Justin Borevitz Ecology & Evolution University of Chicago http://naturalvariation.org Light Affects the Entire Plant Life Cycle de-etiolation } hypocotyl Local Population Variation Ivan Baxter Scott Hodges Seasonal Variation Matt Horton Megan Dunning Seasons in the Growth Chamber • • • • Changing Day length Cycle Light Intensity Cycle Light Colors Cycle Temperature Light Intensity Day Length Temperature 1400 Sw eden Spain 20:00 1200 30 Spain standard 18:00 25 standard standard 1000 16:00 800 600 8:00 10 Spain High 5 400 6:00 Spain Low 0 200 0 standard month month jun apr may Spain mar feb jan dec oct nov aug jul jun may apr feb mar jan dec nov Sweden oct -10 sep jul aug jun apr may mar jan dec nov oct sep feb month Sw eden Low -5 2:00 0:00 Sw eden High sep 4:00 aug 10:00 15 jul W/m2 12:00 degrees C 20 14:00 hours 35 Sw eden 22:00 Talk Outline •• Natural Natural Variation Variation in in Light Light Response Response •• Single Single Feature Feature Polymorphisms Polymorphisms (SFPs) (SFPs) –– Potential Potential deletions deletions –– Bulk Bulk segregant/ segregant/ eXtreme eXtreme Mapping Mapping •• Barley Barley RNA RNA SFPs SFPs •• Aquilegia Aquilegia Light Affects the Entire Plant Life Cycle Light response variation can be seen under constant conditions in the lab Quantitative Trait Loci Which arrays should be used? • Spotted oligo arrays Arizona 29,000 - 70mers • ATH1, Affymetrix expression GeneChip 202,806 unique 25bp oligo nucleotides features • AtTILE1, universal whole genome array every ~35bp, > 3Million PM features • Re-sequencing array 120Mbp*8features – 20 Accessions, Perlegen, – Max Planck (Weigel), USC (Nordborg) GeneChip Which arrays should be used? cDNA array Long oligo array Which 25mer arrays should be used? Gene array Exon array Tiling array Universal Whole Genome Array RNA Gene Discovery Gene model correction Non-coding/ micro-RNA Antisense transcription DNA Chromatin Immunoprecipitation ChIP chip Methylation Transcriptome Atlas Expression levels Tissues specificity Alternative Splicing Polymorphism SFPs Discovery/Genotyping Comparative Genome Hybridization (CGH) Insertion/Deletions ~35 bp tile, “good” binding oligos, non-repetitive regions, evenly spaced Transcriptome Atlas Improved Genome Annotation ORFa ORFb start conservation MMMM M M AAAAA SFP SFP SFP SNP Chromosome (bp) deletion MMMM M M SNP Potential Deletions Delta p0 FALSE Called FDR 1.00 0.95 18865 160145 11.2% 1.25 0.95 10477 132390 7.5% 1.50 0.95 6545 115042 5.4% 1.75 0.95 4484 102385 4.2% 2.00 0.95 3298 92027 3.4% False Discovery and Sensitivity Cereon may be a sequencing Error TIGR match is a match PM only SAM threshold 5% FDR GeneChip SFPs nonSFPsCereon marker accuracy 3806 89118 100% 90% 80% 70% Sequence 817 121 696Sensitivity Polymorphic 340 117 223 34% 41% 53% 85% Non-polymorphic 477 4 473 False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p -value = 1.845e-40 GeneChip SFPs nonSFPsCereon marker accuracy 10627 82297 100% 90% 80% 70% Sequence 817 223 594Sensitivity Polymorphic 340 195 145 57% 67% 85% 100% Non-polymorphic 477 28 449 False Discovery rate: 13% Test for independence of all factors: Chisq = 265.13, df = 1, p -value = 1.309e-59 SAM threshold 18% FDR 3/4 Cvi markers were also confirmed in PHYB Chip genotyping of a Recombinant Inbred Line 29kb interval Discovery 8 replicates X $500 80,000 SFPs = $0.05 Typing 1 replicate X $500 80,000 SFPs = $0.00625 Map bibb 100 bibb mutant plants 100 wt mutant plants Array Mapping Hazen et al Plant Physiology 2005 eXtreme Array Mapping 12 Histogram of Kas/Col RILs Red light 6 4 2 0 counts 8 10 15 tallest RILs pooled vs 15 shortest RILs pooled 6 8 10 hypocotyl length (mm) 12 14 eXtreme Array Mapping Allele frequencies determined by SFP genotyping. Thresholds set by simulations RED2 QTL 12cM LOD Chromosome 2 16 12 RED2 QTL LOD 8 4 0 0 20 40 cM 60 80 100 Composite Interval Mapping Red light QTL RED2 from 100 Kas/ Col RILs (Wolyn et al Genetics 2004) Potential Deletions >500 potential deletions 45 confirmed by Ler sequence 23 (of 114) transposons Disease Resistance (R) gene clusters Single R gene deletions Genes involved in Secondary metabolism Unknown genes Potential Deletions Suggest Candidate Genes FLM natural deletion FLOWERING1 QTL Chr1 (bp) FLM Flowering Time QTL caused by a natural deletion in FLM (Werner et al PNAS 2005) Fast Neutron deletions FKF1 80kb deletion CHR1 Het cry2 10kb deletion CHR1 Natural Variation on Tiling Arrays Review • Single Feature Polymorphisms (SFPs) can be used to • Identify recombination breakpoints • eXtreme Array Mapping • Potential deletions (candidate genes) • Haplotyping • Diversity/Selection • Association Mapping Complex, Large Genomes? • Signal to Noise with Large Genomes • RNA, less complex, but differential expression • Barley SFPs Barley SFPs RNA 2 genotypes, 18 replicates False Discovery Rate RNA RNA hybridization 17 Golden Promise 19 Morex, 6 tissues SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances s0 = 0.0342 (The 5 % quantile of the s values.) Number of permutations: 500 MEAN number of falsely called genes is computed. Delta p0 Called FALSE FDR 0.5 0.95 27159 5884 0.206 1.0 0.95 17744 594 0.032 1.5 0.95 13285 65 0.005 2.0 0.95 10504 7 0.001 2.5 0.95 8583 0 0.000 Sequence Verification of SFPs RNA Sequence MX Nonpolymorphic GP GeneChip mxSFP nonSFP gpSFP 5301 240307 5203 178 115 45 18 2200 223 27 7 2045 61 128 155 Chisq = 2049.2, df = 4, p-value = 0 Position of SNP Aquilegia (Columbines) Recent adaptive radiation, 350Mb genome Species with > 20k ESTs 11/14/2003 Animal lineage: good coverage Plant lineage: crop plant coverage Aquilegia (Columbines) • • • • 300 F3 RILs growing (Evadne Smith) 85,000 5’ 3’ ESTs -- 51,000 clones, >16,00 SNPs TIGR gene index and GenBank arrays being designed by Nimblegen Genetics of Speciation along a Hybrid Zone NSF Genome Complexity • Physical Map (BAC tiling path) – Physical assignment of ESTs • QTL for pollinator preference – ~400 RILs, map abiotic stress – QTL fine mapping/ LD mapping • Develop transformation techniques • http://www.AQgenome.org Scott Hodges (UCSB) Elena Kramer (Harvard) Magnus Nordborg (USC) Justin Borevitz (U Chicago) Jeff Tompkins (Clemson) NaturalVariation.org University University of of Chicago Chicago Salk Salk Jon Werner Joanne Chory Joseph Joseph Ecker Ecker Max Max Planck Planck Detlef Weigel Weigel UC UC San San Diego Diego Charles Berry Scripps Scripps Sam Hazen Elizabeth Winzeler Xu Zhang Evadne Smith Ken Okamoto Purdue Ivan Baxter UC UC Davis Davis Julin Maloof University University of of Guelph, Guelph, Canada Canada Dave Wolyn Sainsbury Sainsbury Laboratory Laboratory Jonathan Jones Barley SFPs Genomic DNA 3 genotypes 3 replicates False Discovery Rate DNA Genomic DNA hybridizaiton 3 replicates 3 genotypes SAM Analysis for the Multi-Class Case with 3 Classes s0 = 0.0123 (The 25 % quantile of the s values.) Number of permutations: 100 MEAN number of falsely called genes is computed. Delta p0 Called FALSE FDR 1 0.95 4017 2073 0.47 2 0.95 1728 583 0.31 3 0.95 1090 258 0.22 4 0.95 789 139 0.16 5 0.95 631 86 0.13