Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Toward the genetic basis of adaptation:
Arrays/Association Mapping
Justin Borevitz
Ecology & Evolution
University of Chicago
http://naturalvariation.org/
Arabidopsis thaliana
•
•
•
•
•
•
•
•
•
•
Genome Sequence 2000 (120Mb),
20 strains by Perlegen, Weigel, Nordborg, Ecker
~1% sequence variation, LD extends 50-250kb
~3000 collected “inbred” lines, >50 RIL sets
A. lyrata, Capsella rubella sister species JGI 2006
>5300 Research Labs (17th annual conference)
Fields study data to come, Annie Schmidt et al
340k Sequence Indexed collection of KO lines
Gene Expression Atlas >300 tissues, time points
15,000 full length cDNAs in recombination clones
Widely Distributed
Olivier Loudet
http://www.inra.fr/qtlat/NaturalVar/NewCollection.htm
Aranzana, et al PLOS genetics (2005), Sung Kim, Keyan Zhao
17k SNPs 96 lines
Local Population Variation
Ivan Baxter
Scott Hodges
Seasonal Variation
Matt Horton
Megan Dunning
Light Affects the Entire Plant Life Cycle
de-etiolation
}
hypocotyl
Seasons in the Growth Chamber
•
•
•
•
Changing Day length
Cycle Light Intensity
Cycle Light Colors
Cycle Temperature
Light Intensity
Day Length
Temperature
1400
Sw eden
Spain
20:00
1200
30
Spain
standard
18:00
25
standard
standard
1000
16:00
800
600
8:00
10
Spain High
5
400
6:00
Spain Low
0
200
0
standard
month
Developmental
Developmental Plasticity
Plasticity ==
== Behavior
Behavior
month
jun
apr
may
Spain
mar
feb
jan
dec
oct
nov
aug
jul
jun
may
apr
feb
mar
jan
dec
nov
Sweden
oct
-10
sep
jul
aug
jun
apr
may
mar
jan
dec
nov
oct
sep
feb
month
Sw eden Low
-5
2:00
0:00
Sw eden High
sep
4:00
aug
10:00
15
jul
W/m2
12:00
degrees C
20
14:00
hours
35
Sw eden
22:00
Talk Outline
• Arabidopsis Light Response
– PHYA, QTL mapping
• Whole Genome Tiling Arrays
– Alternative splicing/Methylation
– Single Feature Polymorphisms (SFPs)
– Potential deletions/ Copy Number Variants
– Genetic Mapping
• Resequencing/ Haplotypes
– Variation Scanning
• Aquilegia for Genetics of Adaptive Radiations
Quantitative Trait Loci
Tiling Arrays vs Resequencing Arrays
• AtTILE1, universal whole genome array
25mer every ~35bp, > 6.5 Million features
single array, many individuals.
• Re-sequencing array 120Mbp*8features
~1 Billion features, 8 wafers
20 Accessions available mid year
Perlegen, Max Planck (Weigel),
USC (Nordborg), Salk (Ecker)
GeneChip
Which arrays should be used?
cDNA array
Long oligo array
Which 25mer arrays should be used?
Gene array
Exon array
Tiling array
Which 25mer arrays should be used?
SNP array
Ressequencing array
Tiling/SNP array
Universal Whole Genome Array
RNA
Gene Discovery
Gene model correction
Non-coding/ micro-RNA
Antisense transcription
DNA
Chromatin
Immunoprecipitation
ChIP chip
Methylation
Transcriptome Atlas
Expression levels
Tissues specificity
Alternative Splicing
Polymorphism SFPs
Discovery/Genotyping
Comparative Genome
Hybridization (CGH)
Insertion/Deletions
Control for hybridization/genetic polymorphisms
to understand true EXPRESSION polymorphisms
True cis variation == Allele Specific Expression
Alternative Splicing
Van
Col
VVVCCC
Xu Zhang
Potential Deletions
SFP
detection
on tiling
arrays
Delta
p0
FALSE
Intergenic
1.00
0.95
SFPs
60770
18865
1.25
0.95
1.50
total
0.95
685575 6545
1.75
0.95
4484
2.00
0.95
3298
%
8.86%
Called
Exon
160145
23519
10477
132390
115042
665524
FDR
intron
11.2%
17216
7.5%
5.4%
301648
102385
4.2%
92027
3.4%
3.53%
SFPs/gene
0
>=1
>=2
>=3
>=4
>=5
genes
16322
9146
4304
2495
1687
1121
5.71%
Methods for labeling
•
•
•
•
•
Extract genomic 100ng DNA (single leaf)
Digest with either msp1 or hpa2 CCGG
Label with biotin random primers
Hybridize to array
Fit model
methylated features and mSFPs
Enzyme effect, on CCGG features
GxE
mQTL?
>10,000 of 100,000 at 5% FDR
276 at 15% FDR
SFP Resequencing
• Advantages
–
–
–
–
Discovery and typing tool
Indels, rare variants, HMM tool
Quantitative score
Good for low polymorphism < 1%
• Caveats
– No SNP knowledge, synonymous?
– Bad for high polymorphism > 1%
• Rearrangements, Reference sequence
Chip genotyping of a Recombinant Inbred Line
29kb interval
Potential Deletions
>500 potential deletions
45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance
(R) gene clusters
Single R gene deletions
Genes involved in
Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
FLM natural deletion
FLOWERING1 QTL
Chr1 (bp)
FLM
Flowering Time QTL caused by a natural deletion in FLM
(Werner et al PNAS 2005)
Natural Variation on Tiling Arrays
Map bibb
100 bibb mutant plants
100 wt mutant plants
Array Mapping
Hazen et al Plant Physiology 2005
eXtreme Array Mapping
12
Histogram of Kas/Col RILs Red light
6
4
2
0
counts
8
10
15 tallest RILs pooled vs
15 shortest RILs pooled
6
8
10
hypocotyl length (mm)
12
14
eXtreme Array Mapping
Drosophila, Chao-Qiang Lai -Tufts University
Allele frequencies
determined by SFP
genotyping. Thresholds
set by simulations
RED2 QTL 12cM
LOD
Chromosome 2
16
12
RED2 QTL
LOD
8
4
0
0
20
40
cM
60
80
Composite Interval Mapping
Red light QTL RED2 from 100 Kas/ Col RILs
100
Transcriptome Atlas
Improved Genome Annotation
ORFa
ORFb
start
conservation
MMMM M M
AAAAA
SFP
SFP
SFP
SNP
Chromosome (bp)
deletion
MMMM M M
SNP
Array Haplotyping
• What about Diversity/selection across the
genome?
• A genome wide estimate of population
genetics parameters, θw, π, Tajima’D, ρ
• LD decay, Haplotype block size
• Deep population structure?
• Col, Lz, Bur, Ler, Bay, Shah, Cvi, Kas,
C24, Est, Kin, Mt, Nd, Sorbo, Van, Ws2
Fl-1, Ita-0, Mr-0, St-0, Sah-0
Array Haplotyping
Chromosome1 ~500kb
Inbred lines
Low effective
recombination
due to partial
selfing
Extensive LD
blocks
Col Ler Cvi Kas Bay Shah Lz Nd
SFPs for reverse genetics
14 Accessions 30,950 SFPs`
http://naturalvariation.org/sfp
Chromosome Wide Diversity
Diversity 50kb windows
Tajima’s D like 50kb windows
RPS4
unknown
R genes vs bHLH
40
10
20
30
Rgenes
bHLH
0
frequency
50
60
70
Selection
(-1,-0.8]
(-0.6,-0.4]
(-0.2,0]
(0.2,0.4]
Tajima's D like statistic
(0.6,0.8]
Experimental Design of Association Study
• Sample > 3000 wild strains, ~100 SNPs
• Select 500 less structured reference fine
mapping set for SFP resequencing
• Scan Genome for variation/selection
• Measure phenotype in Seasonal Chambers
• Haplotype map/ LD recombination blocks
• Associate Quantitative phenotypes with
HapMap
Aquilegia (Columbines)
Recent adaptive radiation, 350Mb genome
Species with
> 20k ESTs
11/14/2003
Animal lineage:
good coverage
Plant lineage:
crop plant coverage
Aquilegia (Columbines)
•
•
•
•
300 F3 RILs growing (Evadne Smith)
TIGR gene index 85,000 ESTs >16,00 SNPs
Complete BAC physical map Clemson
Nimblegen arrays
Genetics of Speciation
along a Hybrid Zone
NSF Genome Complexity
• Microarray development
– QTL candidates
• Physical Map (BAC tiling path)
– Physical assignment of ESTs
• QTL for pollinator preference
– ~400 RILs, map abiotic stress
– QTL fine mapping/ LD mapping
• Develop transformation techniques
– VIGS
• Whole Genome Sequencing (JGI?)
Scott Hodges (UCSB)
Elena Kramer (Harvard)
Magnus Nordborg (USC)
Justin Borevitz (U Chicago)
Jeff Tompkins (Clemson)
NaturalVariation.org
University of Chicago
USC
Magnus Nordborg
Paul Marjoram
Max Planck
Detlef Weigel
Scripps
Sam Hazen
University of Michigan
Sebastian Zollner
Xu Zhang
Evadne Smith
Ken Okamoto
Michigan
Michigan State
State
Shinhan Shui
Purdue
Ivan Baxter
University
University of
of Guelph,
Guelph, Canada
Canada
Dave Wolyn
Sainsbury Laboratory
Jonathan Jones
Related documents