Download Presentation #2 - UCLA Human Genetics

Document related concepts

Polyploid wikipedia , lookup

Gene expression profiling wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Epistasis wikipedia , lookup

Tag SNP wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Neocentromere wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Population genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Behavioural genetics wikipedia , lookup

Twin study wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Heritability of IQ wikipedia , lookup

Public health genomics wikipedia , lookup

Designer baby wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Pathogenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
HG236B, April 30, 2010 (Lusis)
Mouse Genetics: Gene Mapping
1. Mendelian traits
2. Quantitative trait locus mapping
3. Fine mapping
4. Association analysis
Genome Scan
1
genetic
markers
2 3
4
5 6
7 8 9
10 11 12 13 14 15 16 17 18 19 X Y
Breeding Strategies for Mapping Genes in Mice: Backcross
1A
1A
1B
1B
2A
2A
2B
2B
x
Parental Strain #1
Parental Strain #2
1A
1B
1B
1B
2A
2B
2B
2B
x
F1 Heterozygote
Backcross
Progeny
Parental Strain #2
1A
1B
1B
1B
1A
1B
1B
1B
2A
2B
2B
2B
2B
2B
2A
2B
parental
recombinant
Linkage Analysis in a Backcross: An Example
Recombinant
Parental
Expected in
absence of
linkage
(total 100):
Observed:
A
A
a
A
A
A
a
A
B
B
b
B
b
B
B
B
25
25
25
25
30
32
20
18
Estimated distance:
38 = 38 cM
100
But, are these data significant?
Chi Squared Test
n
χ2 = Σ (Oi – Ei)2
i=1
Ei
Oi = observed value in ith group
Ei = expected value in ith group
number of groups = n
degrees of freedom = n - 1
χ2
= (obsr – expr)2 +
expr
(obsp – expp)2
expp
= (38 – 50)2
50
+
(62 – 50)2
50
= 144
50
144
50
=
+
5.76
Number of degrees of freedom = one less than
the number of potential outcome classes = 1
Chi-Square Distribution
Probability
Degrees
of
Freedom
0.95 0.90 0.80
0.70
0.50
0.30
0.20
0.10
0.05
0.01
0.001
1
.004 0.02 0.06
0.15
0.46
1.07
1.64
2.71
3.84
6.64
10.83
2
0.10 0.21 0.45
0.71
1.39
2.41
3.22
4.60
5.99
9.21
13.82
3
0.35 0.58 1.01
1.42
2.37
3.66
4.64
6.25
7.82
11.34 16.27
4
0.71 1.06 1.65
2.20
3.36
4.88
5.99
7.78
9.49
13.28 18.47
5
1.14 1.61 2.34
3.00
4.35.
6.06
7.29
9.24
11.07 15.09 20.52
6
1.63 2.20 3.07
3.83
5.35
7.23
8.56
10.64 12.59 16.81 22.46
7
2.17 2.83 3.82
4.67
6.35
8.38
9.80
12.02 14.07 18.48 24.32
8
2.73 3.49 4.59
5.53
7.34
9.52
11.03 13.36 15.51 20.09 26.12
9
3.32 4.17 5.38
6.39
8.34
10.66 12.24 14.68 16.92 21.67 27.88
10
3.94 4.86 6.18
7.27
9,34
11.78 13.44 15.99 18.31 23.21 29.59
Nonsignificant
Significant
Ikeda, et al. Nature, 30:401 (2002)
Inbred strains of mice differ in traits relevant to common diseases in humans
Naturally Occurring Mouse Models for Common Human Diseases
Disorder
Strain
Alcoholism/drug addiction
C57BL/6
Arthritis MRL
Asthma
A
Atherosclerosis
C57BL/6, DBA
Autoimmune disease NZB, NZW
Cleft palate
A
Deafness
LP
Dental disease
C57BL/6, BALB/c
Diabetes, Type 1
NOD
Diabetes, Type 2
C57BL/6
Epilepsy
EL, SWR
Hemolytic anemia
NZB
BALB/c
Hepatitis
Hodgkin’s disease
SJL
Hypertension
MA/My
Obesity Many strains
Osteoperosis
DBA
Daily Average Food Intake Adjusted by Weight (g/30gBW)
Mapping Genes for a Complex Trait in a Cross between Two Strains of Mice
Strain B
Strain A
FI Hybrids
F2 Intercross Mice
Hepatic fibrosis in 7 inbred strains and A x BALB/c F2 mice
Screen inbred strains
for trait of interest to
identify those that
differ the most.
Construct an F2 cross
using the 2 extreme
strains (A and
BALB/c) to generate a
large number of mice
to map loci
responsible for trait
differences in the
parental strains.
382 AxBALB/c F2
6 wk time point
GENOTYPE
PHENOTYPE
A backcross between two strains typed
for a trait
The backcross mice were
typed at a marker on Chr 1 and
another on Chr 2
Linear regression model
y
f(x)
y = βx + e
x
y = observed value (ex: weight = 2.2, 2, 4, )
x = value of the predictive variable (ex: snp genotype = AA,GG, AA). x is observed
β = slope, expected change in y for one unit change in x
e = unobserved random variable, which adds noise to the observed y (contributes to variation in y). Sometimes referred to as “error”, although it is not necessarily error
Mapping using linear regression
Phenotype
y
simple
case
xi
A/A
A/G
G/G
y = βxi + e
y = observed phenotype for each individual, ex: weight
xi = genotype at a given marker
β = slope, gives change in y for each x. β=snp effect size
e = remaining variation in phenotype y, not explained by xi
– With linear regression, a likelihood ratio is used, derived from goodness of fit for the model with genetic effects included vs that without:
• H1 : y = µ + β1(additive) + β2(dominant) + e
(full model) vs
• H0 : y = µ + e (reduced model)
• The likelihood ratio LR = n loge (RSSreduced/RSSfull ) .
– RSS= residual sum of squares; n = # of observations
• LOD score = LR/4.61
Distribution of body weight and body weight
QTL in B6 x BTBR ob/ob F2 cross
Stoehr et al. Diabetes 59:245 (2004)
Identifying QTL
• Interval mapping:
– “Simple interval mapping” incorporates marker map position and adjacency – “Composite interval mapping” additionally incorporates background markers and is designed for detecting multiple QTL.
• A number of QTL mapping programs have been developed. (List at: http://www.mapmanager.org/qtsoftware.html)
–
–
–
–
MAPMAKER/QTL Map Manager QTL
QTL Cartographer
R/qtl
“…we can effectively destroy any association between the
trait values and the analysis points linked to the QTL by
randomly shuffling the trait values, i.e., by reassigning each
trait value to a new individual while retaining the individual’s
genetic map.”
The standard error for an empirical p-value
is the square root of p(1 − p)/N, where p is
the empirical p-value and N is the number
of permuted data sets. Thus, for example,
800 permuted data sets are sufficient to
establish a standard error of 0.005 for an
empirical p-value of 0.02, assuring us that
it is well below the 0.05 significance level.
QTL analysis is highly reproducible
Estimating QTL effect size in crosses
• Total trait (y) variance is the sum of genetic and environmental components, determined in the F2 mice by:
Variance = s2 = (Σ(x‐mean)2 )/(n‐1)
• Environmental variance is estimated from parental strain data as:
(s2a + s2b) / 2
• Overall genetic variance (heritability) is:
Total – Environmental variance.
• QTL effect size is the % of total variance explained by a given QTL.
The effect size of most QTL is under 10%
Flint NRG 2005
The resolution of QTL is generally poor, and thus identification of causative gene is a bottleneck
• QTL mapping began in early 1990’s
• By 2005, approximately 2,050 mouse and 700 rat QTLs reported
• Only 20 causative genes identified
• At this rate (20 genes/15 years) it will take 1500 years to identify causative genes for already identified QTL
• What approaches might help?
Flint et al. 2005
Strategies to dissect quantitative trait loci for gene identification Parent 1
Parent 2
X
Congenic
strains
Recombinant
inbred lines
Chromosome
Recombinant
inbred congenic substitution
strains
lines
Advanced
intercross
lines
Development
Congenic
Strains
Construction ofof
Congenic
Strains
Strain A
Strain B
N1 25%
Strain B
N2 12.5%
Strain B
Repeated Backcrossing
for ~10 Generations
Congenic Strain
A.B. Locus
Fine mapping using “subcongenic strains”
Critical Region
Plasma
cholesterol
High
Low
Low
High
High
High
High
Low
High
Low
High
BALB
MRL
Strategies to dissect quantitative trait loci for gene identification
Parent 1
Parent 2
X
Congenic
strains
Recombinant
inbred lines
Chromosome
Recombinant
inbred congenic substitution
strains
lines
Advanced
intercross
lines
Construction of Recombinant Inbred (RI) Strains
Chromosome 15 genotypes in recombinant inbred (RI) strains derived from C57BL/6 (filled) and DBA/2 (open)
RI strain number
The “Collaborative Cross”‐ multi‐line RI strains
8 progenitor
strains
Design:
Representative
genotype
distribution:
X 1000
Benefits: More genetic variation; finer mapping; cumulative data; single genotyping
Strategies to dissect quantitative trait loci for gene identification
Parent 1
Parent 2
X
Congenic
strains
Recombinant
inbred lines
Chromosome
Recombinant
inbred congenic substitution
strains
lines
Advanced
intercross
lines
Demant P. (2003)
Strategies to dissect quantitative trait loci for gene identification
Parent 1
Parent 2
X
Congenic
strains
Recombinant
inbred lines
Chromosome
Recombinant
inbred congenic substitution
strains
lines
Advanced
intercross
lines
Science, 2004
Strategies to dissect quantitative trait loci for gene identification
Parent 1
Parent 2
X
Congenic
strains
Recombinant
inbred lines
Chromosome
Recombinant
inbred congenic substitution
strains
lines
Advanced
intercross
lines
Genome‐wide genetic association of complex traits in heterogeneous stock mice
William Valdar, Leah C Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O Cookson, Martin S Taylor, J Nicholas P Rawlins, Richard Mott & Jonathan Flint Nat. Genet 38, 879 (2006)
Boxes above peaks are 95% confidence intervals and corresponding bootstrap probabilities
History of laboratory inbred strains of mice
Laboratory inbred strains are mosaics derived
from several widely divergent subspecies
from Frazer KA, Eskin E, Kang HM et al. Nature. Aug 2007
http://mouse.cs.ucla.edu/
Linear Model
400 -
400 -
350 -
350 -
300 -
300 -
250 -
250 -
200 -
200 -
G
C
y = μ + βx + ε
Associated:
β≠0
T
C
y = μ + βx + ε
Not Associated:
β=0
Association studies
p=0.001
-log10(Pvalue)
-log10(Pvalue)
Chromosome
Complex genetic relatedness of lab strains
body weight
10.0
15.0
20.0
25.0
30.0
35.0
Eun Yong Kang, Chris Jones, E. Eskin
Efficient Mixed Model Association (EMMA) reduces inflated p‐values
Body weight t‐test
Body weight EMMA
Saccharin
preference t‐test
Saccharin
preference EMMA
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E.
Genetics. 2008: 178:1709‐23.
Linear Mixed Model
Fixed effects + Random effects
(aka variance components)
Let’s break it up....
Random effects are not ‘random’,
they are Random Variables!
Random Variable (RV) is a variable (an event) that takes on values with a given probability
Examples:
a) Roll a die, let U denote the number observed
p(U=1) = 1/6, p(U=2) = 1/6...
b) Roll two dice, let U denote the sum of the two numbers observed:
p(U=1) =0, p(U=2)=(1/6)x(1/6), etc...
c) Let U ~ identity by descent at a locus, between sibs
p(U=0) = 1/4, p(U=1) = 1/2, p(U=2) = 1/4. The values of U occur with a given probability, they are not fixed, hence U is a random variable
y = βX + Zu + e
Variance components
var(u) = σ2gK var(e) = σ2e
σ2g*K is the nxn var‐covar matrix
Describes the covariance structure among strains
i.e. the additive genetic variance
σ2g is proportional to the kinship K,
The kinship itself is not random, it’s a constant
By including K, we allow part of the genetic variance to be explained by K
Inbred/recombinant inbred population for high resolution mapping : Whole genome association
~40 inbred strains
>135,000 SNPs
Classical inbred strains
provide mapping resolution
~70 recombinant inbred strains
>135,000 SNPs
Data collection:
Phenomic
Transcriptomic
Proteomic
Metabolomic
Whole genome association
Recombinant inbred strains provide statistical power
Plasma high density lipoproteins (n=8 males/group)
Chromosome 1 locus for HDL
40 Mb
(~300 genes)
Chromosome 1 locus for HDL: Validation of ApoA2 gene involvement
*
=