Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Toward the genetic basis of adaptation: Arrays/Association Mapping Justin Borevitz Ecology & Evolution University of Chicago http://naturalvariation.org/ Widely Distributed Olivier Loudet http://www.inra.fr/qtlat/NaturalVar/NewCollection.htm Aranzana, et al PLOS genetics (2005), Sung Kim, Keyan Zhao 17k SNPs 96 lines Local Population Variation Ivan Baxter Scott Hodges Seasonal Variation Matt Horton Megan Dunning Light Affects the Entire Plant Life Cycle de-etiolation } hypocotyl Seasons in the Growth Chamber • • • • Changing Day length Cycle Light Intensity Cycle Light Colors Cycle Temperature Light Intensity Day Length Temperature 1400 Sw eden Spain 20:00 1200 30 Spain standard 18:00 25 standard standard 1000 16:00 800 600 8:00 10 Spain High 5 400 6:00 Spain Low 0 200 0 standard month Developmental Developmental Plasticity Plasticity == == Behavior Behavior month jun apr may Spain mar feb jan dec oct nov aug jul jun may apr feb mar jan dec nov Sweden oct -10 sep jul aug jun apr may mar jan dec nov oct sep feb month Sw eden Low -5 2:00 0:00 Sw eden High sep 4:00 aug 10:00 15 jul W/m2 12:00 degrees C 20 14:00 hours 35 Sw eden 22:00 Talk Outline • Arabidopsis Light Response – PHYA, QTL mapping • Whole Genome Tiling Arrays – Alternative splicing/Methylation – Single Feature Polymorphisms (SFPs) – Potential deletions/ Copy Number Variants – Genetic Mapping • Resequencing/ Haplotypes – Variation Scanning • Aquilegia for Genetics of Adaptive Radiations Quantitative Trait Loci Tiling Arrays vs Resequencing Arrays • AtTILE1, universal whole genome array 25mer every ~35bp, > 6.5 Million features single array, many individuals. • Re-sequencing array 120Mbp*8features ~1 Billion features, 8 wafers 20 Accessions available mid year Perlegen, Max Planck (Weigel), USC (Nordborg), Salk (Ecker) GeneChip Which arrays should be used? cDNA array Long oligo array Which 25mer arrays should be used? Gene array Exon array Tiling array Which 25mer arrays should be used? SNP array Ressequencing array Tiling/SNP array Universal Whole Genome Array RNA Gene Discovery Gene model correction Non-coding/ micro-RNA Antisense transcription DNA Chromatin Immunoprecipitation ChIP chip Methylation Transcriptome Atlas Expression levels Tissues specificity Alternative Splicing Polymorphism SFPs Discovery/Genotyping Comparative Genome Hybridization (CGH) Insertion/Deletions Control for hybridization/genetic polymorphisms to understand true EXPRESSION polymorphisms True cis variation == Allele Specific Expression Alternative Splicing Van Col VVVCCC Xu Zhang Potential Deletions SFP detection on tiling arrays Delta p0 FALSE Intergenic 1.00 0.95 SFPs 60770 18865 1.25 0.95 1.50 total 0.95 685575 6545 1.75 0.95 4484 2.00 0.95 3298 % 8.86% Called Exon 160145 23519 10477 132390 115042 665524 FDR intron 11.2% 17216 7.5% 5.4% 301648 102385 4.2% 92027 3.4% 3.53% SFPs/gene 0 >=1 >=2 >=3 >=4 >=5 genes 16322 9146 4304 2495 1687 1121 5.71% Methods for labeling • • • • • Extract genomic 100ng DNA (single leaf) Digest with either msp1 or hpa2 CCGG Label with biotin random primers Hybridize to array Fit model methylated features and mSFPs Enzyme effect, on CCGG features GxE mQTL? >10,000 of 100,000 at 5% FDR 276 at 15% FDR SFP Resequencing • Advantages – – – – Discovery and typing tool Indels, rare variants, HMM tool Quantitative score Good for low polymorphism < 1% • Caveats – No SNP knowledge, synonymous? – Bad for high polymorphism > 1% • Rearrangements, Reference sequence Chip genotyping of a Recombinant Inbred Line 29kb interval Potential Deletions >500 potential deletions 45 confirmed by Ler sequence 23 (of 114) transposons Disease Resistance (R) gene clusters Single R gene deletions Genes involved in Secondary metabolism Unknown genes Potential Deletions Suggest Candidate Genes FLM natural deletion FLOWERING1 QTL Chr1 (bp) FLM Flowering Time QTL caused by a natural deletion in FLM (Werner et al PNAS 2005) Natural Variation on Tiling Arrays Map bibb 100 bibb mutant plants 100 wt mutant plants Array Mapping Hazen et al Plant Physiology 2005 eXtreme Array Mapping 12 Histogram of Kas/Col RILs Red light 6 4 2 0 counts 8 10 15 tallest RILs pooled vs 15 shortest RILs pooled 6 8 10 hypocotyl length (mm) 12 14 eXtreme Array Mapping Drosophila, Chao-Qiang Lai -Tufts University Allele frequencies determined by SFP genotyping. Thresholds set by simulations RED2 QTL 12cM LOD Chromosome 2 16 12 RED2 QTL LOD 8 4 0 0 20 40 cM 60 80 Composite Interval Mapping Red light QTL RED2 from 100 Kas/ Col RILs 100 Transcriptome Atlas Improved Genome Annotation ORFa ORFb start conservation MMMM M M AAAAA SFP SFP SFP SNP Chromosome (bp) deletion MMMM M M SNP Array Haplotyping • What about Diversity/selection across the genome? • A genome wide estimate of population genetics parameters, θw, π, Tajima’D, ρ • LD decay, Haplotype block size • Deep population structure? • Col, Lz, Bur, Ler, Bay, Shah, Cvi, Kas, C24, Est, Kin, Mt, Nd, Sorbo, Van, Ws2 Fl-1, Ita-0, Mr-0, St-0, Sah-0 Array Haplotyping Chromosome1 ~500kb Inbred lines Low effective recombination due to partial selfing Extensive LD blocks Col Ler Cvi Kas Bay Shah Lz Nd SFPs for reverse genetics 14 Accessions 30,950 SFPs` http://naturalvariation.org/sfp Chromosome Wide Diversity Diversity 50kb windows Tajima’s D like 50kb windows RPS4 unknown R genes vs bHLH 40 10 20 30 Rgenes bHLH 0 frequency 50 60 70 Selection (-1,-0.8] (-0.6,-0.4] (-0.2,0] (0.2,0.4] Tajima's D like statistic (0.6,0.8] Experimental Design of Association Study • Sample > 3000 wild strains, ~100 SNPs • Select 500 less structured reference fine mapping set for SFP resequencing • Scan Genome for variation/selection • Measure phenotype in Seasonal Chambers • Haplotype map/ LD recombination blocks • Associate Quantitative phenotypes with HapMap Aquilegia (Columbines) Recent adaptive radiation, 350Mb genome Species with > 20k ESTs 11/14/2003 Animal lineage: good coverage Plant lineage: crop plant coverage Aquilegia (Columbines) • • • • 300 F3 RILs growing (Evadne Smith) TIGR gene index 85,000 ESTs >16,00 SNPs Complete BAC physical map Clemson Nimblegen arrays Genetics of Speciation along a Hybrid Zone NSF Genome Complexity • Microarray development – QTL candidates • Physical Map (BAC tiling path) – Physical assignment of ESTs • QTL for pollinator preference – ~400 RILs, map abiotic stress – QTL fine mapping/ LD mapping • Develop transformation techniques – VIGS • Whole Genome Sequencing (JGI?) Scott Hodges (UCSB) Elena Kramer (Harvard) Magnus Nordborg (USC) Justin Borevitz (U Chicago) Jeff Tompkins (Clemson) NaturalVariation.org University of Chicago USC Magnus Nordborg Paul Marjoram Max Planck Detlef Weigel Scripps Sam Hazen University of Michigan Sebastian Zollner Xu Zhang Evadne Smith Ken Okamoto Michigan Michigan State State Shinhan Shui Purdue Ivan Baxter University University of of Guelph, Guelph, Canada Canada Dave Wolyn Sainsbury Laboratory Jonathan Jones