Download 1 - r - Barley World

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Basic QTL Analysis
Is there an association between marker genotype and quantitative trait phenotype?
- Classify progeny by marker genotype
g = genotypic effect
- Compare phenotypic mean between classes (t-test or ANOVA)
µ1 = trait mean for
- Significance = marker linked to QTL
genotypic class AA
- Difference between means = estimate of QTL effect
g = (µ1 - µ2)/2
µ2 = trait mean for
genotypic class aa
y
βo
0
-1
aa
AA
Genotypic classes
x
Notations for single-QTL models in backcross and F2
populations
Model
Backcross (Qq x QQ)
DH (qq x QQ)
F2
(Qq x Qq)
Genotype
Value
QQ
µ1
Qq
µ2
Genetic effect
g = 0.5(µ1 - µ2)
QQ
µ1
Qq
µ2
Genetic effect
g = 0.5(µ1 - µ2)
QQ
µ1
Qq
µ2
qq
µ3
Additive
a = 0.5(µ1 - µ3)
Dominance
d = 0.5(2µ2 - µ1 - µ3)
Single-marker analysis
• How it works
– Finds associations between marker genotype
and trait value
y j    f ( A)   j
r
A
(marker)
Q
(putative QTL)
• When to use
– Order of markers unknown or incomplete
maps
– Quick scan
– Find best possible QTLs
– Identify missing or incorrectly formatted
data
• Limitations
Underestimates QTL number and effects
QTL position can not be precisely
determined
r = recombination fraction
yj = trait value for the jth
individual in the population
μ = population mean
f(A) = function of marker
genotype
εj = residual associated with
the jth individual
Single-marker analysis in backcross
progeny
• Parents:
• Backcross:
AAQQ x aaqq
aaqq
x
AaQq
x
AAQQ
Expected
Frequency
• BC Progeny
AaQq
AAQQ
0.5 (1 - r)
Aaqq
AAQq
0.5r
aaQq
AaQQ
0.5r
aaqq
AaQq
0.5(1 - r)
r is recombination frequency between A and Q
Expected QTL genotypic frequencies
conditional on genotypes
Marker
genotype
Observed
count
Marginal
frequencies
QTL genotype
QQ
Qq
Expected trait
value
Joint frequency
AA
n1
0.5
0.5(1-r)
0.5r
Aa
n2
0.5
0.5r
0.5(1-r)
Conditional frequency
AA
n1
0.5
1-r
r
(1-r)µ1 + rµ2
Aa
n2
0.5
r
1-r
rµ1 + (1-r)µ2
Single-marker analysis
r
A
Q
(marker) (putative QTL)
- Simple t-test
- Analysis of variance
- Linear regression
- Likelihood
Simple t-test using backcross progeny
H0: [μAa - μaa ] = 0
Yj(i)k = μ + Mi + g(M)j(i) + ei(j)k
(a + d) = 0
r = 0.5
tM 
ˆ Aa  ˆ aa
1 1
sˆ   
 n1 n2 
2
M
ˆ Aa  ˆ aa
tM 
2
sˆAa
sˆaa2

n1 n2
t-distribution with df = N – 2
Yj(i)k = trait value for individual j with
genotype
i in the replication k
μ = population mean
Mi = effect of the marker genotype
g(M)j(i) = genotypic effect which cannot be
explained by the marker genotype
ei(j)k = error term
µAa = trait mean for genotypic class Aa
µaa = trait mean for genotypic class aa
s2M = pooled variance within the two
classes
If tM is significant, then a QTL is declared to be near the marker
Analysis of variance using backcross
H : [μ
progeny
0
Aa
- μaa ] = 0
(a + d) = 0
r = 0.5
Source
df
MS (Mean
Square)
Expected MS
Total Genetics N - 1
MSG
 e2  b G2
Marker
1
MSM
 e2  b  G2 (QTL )  4r (1  r )a 2  bc(1  2r ) 2 a 2
G(Marker)
N-2
MSG(M)
 e2
Residual
N (b - 1)
MSE
 e2

 b
2
G ( QTL )
MSM
F
MSG (M )
F-distribution with 1 and N – 2 df
If F is significant, then a QTL is declared to be near the marker
F = t if df for numerator is 1

 4r (1  r )a 
2
N= no. of individuals in pop.
b = no. of replications
r = recombination fraction
Analysis of variance using SAS
(A simple example)
data a;
input Individuals Trait1 Marker1 Marker2;
cards;
1 1.57 A B
2 1.35 B A
3 10.7 B B
…
proc glm;
class Marker1 Marker2;
model Trait1 = Marker1 Marker2;
lsmeans Marker1 Marker2;
run;
Linear regression using backcross progeny
y j  0  1 x j   j
H0: [μAa - μaa ] = 0
(a + d) = 0
R2: percent of the phenotypic variance explained by
the QTL
r = 0.5
y
β1
Dummy variables:
yj= trait value for the jth
individual
βo
aa = -1
xj= dummy variable
Aa = 1
βo= intercept for the regression
0
-1
Expectations:
aa
Aa
Genotypic classes
x
β1= slope for the regression
j= random error
E(βo) = 0.5 (µAa + µaa) = Mean for the trait
E(β1) = 0.5 (1 - 2r) (µAa - µaa) = (1 - 2r) g = 0.5 (a + d) (1 - 2r)
Linear regression using backcross
progeny
Interpretation of results depends on coding of the dummy variables
6
y
y=3 +x+e
5
-1
4
3
3
2
2
1
1
0
1
aa
Aa
Genotypic classes
µ=3
µAa = 4
µaa = 2
g = 0.5(µAa - µaa) = 1
y=3 -x+e
5
4
0
-2
y
6
x2
0
-2
-1
0
1
aa
Aa
Genotypic classes
µ=3
µAa = 2
µaa = 4
g = 0.5(µAa - µaa) = -1
x2
A likelihood approach using backcross
progeny
Joint distribution function:
L
 ( yi   j ) 2 
p(Q j / M i ) exp 


2
2

i 1 j 1


N
1
 2 
N
2
A likelihood approach using backcross progeny
(cont.)
2
2


  N
(
y


)

i
j
2
2
Ln L( 1 , 2 ,  , r   Ln p(Q j / M i ) exp 

Ln
(
2

)


2
2
 j 1
i 1

  2


N
1 N
N
LnL( 1   2      2  ( yi   ) 2  Ln(2 2 )
2 i 1
2
  ( yi  1 ) 2   ( yi   2 ) 2   N
2
LnL(r  0.5)   Lnexp 


Ln
(
2

)




2
2
i 1
  2
 2
  2
N
A likelihood approach using backcross
progeny (cont.)
(Weller, 1986)
G-statistics
H0: [μAa - μaa ] = 0
Likelihood ratio test statistics (LR)
Probability of occurrence of the data under the
(a + d) = 0
null
hypothesis


r = 0.5

G  2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ)  ln L(r  0.5)

G is distributed asymptotically as a chisquare variable with one degree of
freedom


G  2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ)  ln L(  Aa   aa   )
The t-test is approximately equivalent to
the likelihood ratio test using this formula
LOD score
LOD : Logarithm of the odds ratio
Base 10 logarithm of G
LR= 2 (log)LOD = 4.605LOD
LOD= 0.217LR
LOD is interpreted as an odds ratio
(probability of observing the data under linkage/probability of
observing the same data under no linkage)
No theoretical distribution is needed to interpret a lOD score
Key value: ≥ 3 (H1 is 1000 times more likely than H0 -no linkage-)
(approx: p = 0.001)
p= probability of type I error
Type I error: false positive (declare a QTL when there is no QTL)
G-Statistics and LOD score
Single-marker analysis
Summary
•
•
•
•
Identify marker-trait associations
Identify missing or incorrectly formatted data
Genetic map is not required
Divide the population into subpopulations based on the
allelic segregation of individual loci (one marker at a time)
• Get trait means for each subpopulation (genotypic class)
• Determine if the subpopulations trait means are significantly
different
• Limitations
Underestimates QTL number and effects
QTL position can not be precisely determined
Related documents