Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Toward the genetic basis of
adaptation using arrays
Justin Borevitz
Ecology & Evolution
University of Chicago
http://naturalvariation.org
Light Affects the Entire Plant Life Cycle
de-etiolation
}
hypocotyl
Local Population Variation
Ivan Baxter
Scott Hodges
Seasonal Variation
Matt Horton
Megan Dunning
Seasons in the Growth Chamber
•
•
•
•
Changing Day length
Cycle Light Intensity
Cycle Light Colors
Cycle Temperature
Light Intensity
Day Length
Temperature
1400
Sw eden
Spain
20:00
1200
30
Spain
standard
18:00
25
standard
standard
1000
16:00
800
600
8:00
10
Spain High
5
400
6:00
Spain Low
0
200
0
standard
month
month
jun
apr
may
Spain
mar
feb
jan
dec
oct
nov
aug
jul
jun
may
apr
feb
mar
jan
dec
nov
Sweden
oct
-10
sep
jul
aug
jun
apr
may
mar
jan
dec
nov
oct
sep
feb
month
Sw eden Low
-5
2:00
0:00
Sw eden High
sep
4:00
aug
10:00
15
jul
W/m2
12:00
degrees C
20
14:00
hours
35
Sw eden
22:00
Talk Outline
•• Natural
Natural Variation
Variation in
in Light
Light Response
Response
•• Single
Single Feature
Feature Polymorphisms
Polymorphisms (SFPs)
(SFPs)
–– Potential
Potential deletions
deletions
–– Bulk
Bulk segregant/
segregant/ eXtreme
eXtreme Mapping
Mapping
•• Barley
Barley RNA
RNA SFPs
SFPs
•• Aquilegia
Aquilegia
Light Affects the Entire Plant Life Cycle
Light response variation can be seen under constant conditions in the lab
Quantitative Trait Loci
Which arrays should be used?
• Spotted oligo arrays Arizona 29,000 - 70mers
• ATH1, Affymetrix expression GeneChip
202,806 unique 25bp oligo nucleotides features
• AtTILE1, universal whole genome array
every ~35bp, > 3Million PM features
• Re-sequencing array 120Mbp*8features
– 20 Accessions, Perlegen,
– Max Planck (Weigel), USC (Nordborg)
GeneChip
Which arrays should be used?
cDNA array
Long oligo array
Which 25mer arrays should be used?
Gene array
Exon array
Tiling array
Universal Whole Genome Array
RNA
Gene Discovery
Gene model correction
Non-coding/ micro-RNA
Antisense transcription
DNA
Chromatin
Immunoprecipitation
ChIP chip
Methylation
Transcriptome Atlas
Expression levels
Tissues specificity
Alternative Splicing
Polymorphism SFPs
Discovery/Genotyping
Comparative Genome
Hybridization (CGH)
Insertion/Deletions
~35 bp tile,
“good” binding oligos,
non-repetitive regions,
evenly spaced
Transcriptome Atlas
Improved Genome Annotation
ORFa
ORFb
start
conservation
MMMM M M
AAAAA
SFP
SFP
SFP
SNP
Chromosome (bp)
deletion
MMMM M M
SNP
Potential Deletions
Delta
p0
FALSE
Called
FDR
1.00
0.95
18865
160145
11.2%
1.25
0.95
10477
132390
7.5%
1.50
0.95
6545
115042
5.4%
1.75
0.95
4484
102385
4.2%
2.00
0.95
3298
92027
3.4%
False Discovery and Sensitivity
Cereon
may be a
sequencing
Error
TIGR
match is
a match
PM only
SAM threshold
5% FDR
GeneChip
SFPs nonSFPsCereon marker accuracy
3806 89118 100% 90% 80% 70%
Sequence
817
121
696Sensitivity
Polymorphic 340
117
223 34% 41% 53% 85%
Non-polymorphic 477
4
473
False Discovery rate:
3%
Test for independence of all factors:
Chisq = 177.34, df = 1, p
-value = 1.845e-40
GeneChip
SFPs nonSFPsCereon marker accuracy
10627 82297 100% 90% 80% 70%
Sequence
817
223
594Sensitivity
Polymorphic 340
195
145 57% 67% 85% 100%
Non-polymorphic 477
28
449
False Discovery rate:
13%
Test for independence of all factors:
Chisq = 265.13, df = 1, p
-value = 1.309e-59
SAM threshold
18% FDR
3/4 Cvi markers were also confirmed in PHYB
Chip genotyping of a Recombinant Inbred Line
29kb interval
Discovery 8 replicates X $500 80,000 SFPs = $0.05
Typing 1 replicate X $500 80,000 SFPs = $0.00625
Map bibb
100 bibb mutant plants
100 wt mutant plants
Array Mapping
Hazen et al Plant Physiology 2005
eXtreme Array Mapping
12
Histogram of Kas/Col RILs Red light
6
4
2
0
counts
8
10
15 tallest RILs pooled vs
15 shortest RILs pooled
6
8
10
hypocotyl length (mm)
12
14
eXtreme Array Mapping
Allele frequencies
determined by SFP
genotyping. Thresholds
set by simulations
RED2 QTL 12cM
LOD
Chromosome 2
16
12
RED2 QTL
LOD
8
4
0
0
20
40
cM
60
80
100
Composite Interval Mapping
Red light QTL RED2 from 100 Kas/ Col RILs (Wolyn et al Genetics 2004)
Potential Deletions
>500 potential deletions
45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance
(R) gene clusters
Single R gene deletions
Genes involved in
Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
FLM natural deletion
FLOWERING1 QTL
Chr1 (bp)
FLM
Flowering Time QTL caused by a natural deletion in FLM
(Werner et al PNAS 2005)
Fast Neutron deletions
FKF1 80kb deletion CHR1
Het
cry2 10kb deletion CHR1
Natural Variation on Tiling Arrays
Review
• Single Feature Polymorphisms
(SFPs) can be used to
• Identify recombination breakpoints
• eXtreme Array Mapping
• Potential deletions (candidate genes)
• Haplotyping
• Diversity/Selection
• Association Mapping
Complex, Large Genomes?
• Signal to
Noise with
Large
Genomes
• RNA, less
complex, but
differential
expression
• Barley SFPs
Barley SFPs
RNA 2 genotypes, 18 replicates
False Discovery Rate RNA
RNA hybridization 17 Golden Promise 19 Morex, 6 tissues
SAM Analysis for the Two-Class Unpaired Case Assuming Unequal
Variances
s0 = 0.0342 (The 5 % quantile of the s values.)
Number of permutations: 500
MEAN number of falsely called genes is computed.
Delta
p0
Called
FALSE
FDR
0.5
0.95
27159
5884
0.206
1.0
0.95
17744
594
0.032
1.5
0.95
13285
65
0.005
2.0
0.95
10504
7
0.001
2.5
0.95
8583
0
0.000
Sequence Verification of SFPs
RNA
Sequence
MX
Nonpolymorphic
GP
GeneChip
mxSFP nonSFP gpSFP
5301 240307
5203
178
115
45
18
2200
223
27
7
2045
61
128
155
Chisq = 2049.2, df = 4, p-value = 0
Position of SNP
Aquilegia (Columbines)
Recent adaptive radiation, 350Mb genome
Species with
> 20k ESTs
11/14/2003
Animal lineage:
good coverage
Plant lineage:
crop plant coverage
Aquilegia (Columbines)
•
•
•
•
300 F3 RILs growing (Evadne Smith)
85,000 5’ 3’ ESTs -- 51,000 clones, >16,00 SNPs
TIGR gene index and GenBank
arrays being designed by Nimblegen
Genetics of Speciation
along a Hybrid Zone
NSF Genome Complexity
• Physical Map (BAC tiling path)
– Physical assignment of ESTs
• QTL for pollinator preference
– ~400 RILs, map abiotic stress
– QTL fine mapping/ LD mapping
• Develop transformation techniques
• http://www.AQgenome.org
Scott Hodges (UCSB)
Elena Kramer (Harvard)
Magnus Nordborg (USC)
Justin Borevitz (U Chicago)
Jeff Tompkins (Clemson)
NaturalVariation.org
University
University of
of Chicago
Chicago
Salk
Salk
Jon Werner
Joanne Chory
Joseph
Joseph Ecker
Ecker
Max
Max Planck
Planck
Detlef Weigel
Weigel
UC
UC San
San Diego
Diego
Charles Berry
Scripps
Scripps
Sam Hazen
Elizabeth Winzeler
Xu Zhang
Evadne Smith
Ken Okamoto
Purdue
Ivan Baxter
UC
UC Davis
Davis
Julin Maloof
University
University of
of Guelph,
Guelph, Canada
Canada
Dave Wolyn
Sainsbury
Sainsbury Laboratory
Laboratory
Jonathan Jones
Barley SFPs
Genomic DNA
3 genotypes
3 replicates
False Discovery Rate DNA
Genomic DNA hybridizaiton 3 replicates 3 genotypes
SAM Analysis for the Multi-Class Case with 3 Classes
s0 = 0.0123 (The 25 % quantile of the s values.)
Number of permutations: 100
MEAN number of falsely called genes is computed.
Delta
p0
Called
FALSE
FDR
1
0.95
4017
2073
0.47
2
0.95
1728
583
0.31
3
0.95
1090
258
0.22
4
0.95
789
139
0.16
5
0.95
631
86
0.13
Related documents