Download Lecture 3: Resemblance Between Relatives

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Therapeutic gene modulation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

X-inactivation wikipedia , lookup

Mutation wikipedia , lookup

Gene desert wikipedia , lookup

History of genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Public health genomics wikipedia , lookup

Twin study wikipedia , lookup

Ridge (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Point mutation wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Heritability of IQ wikipedia , lookup

Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Epistasis wikipedia , lookup

Genetic drift wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Gene expression profiling wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Lecture 5:
Major Genes, Polygenes, and
QTLs
Major genes --- genes that have a significant
effect on the phenotype
Polygenes --- a general term of the genes of small
effect that influence a trait
QTL, quantitative trait locus --- a particular gene
underlying the trait.
Usually used when a gene underlying a trait is
mapped to a particular chromosomal region
Candidate gene --- a particular known gene that is of
interest as being a potential candidate for contributing
to the variation in a trait
Mendelizing allele. The allele has a sufficiently large
effect that its impact is obvious when looking at phenotype
Major Genes
• Major morphological mutations of classical
genetics that arose by spontaneous or induced mutation
• Genes of large effect have been found selected lines
– pygmy, obese, dwarf and hg alleles in mice
– booroola F in sheep
– halothane sensitivity in pigs
• Major genes tend to be deleterious and are at very
low frequencies in unselected populations, and
contribute little to Var(A)
Genes for Genetic modification of muscling
“Natural” mutations in the myostatin gene in cattle
“Natural” mutation in the callipyge - gene in sheep
“Booroola” gene in sheep increasing ovulation rate
Merino Sheep
Major genes for mouse body size
The mutations ob or db cause deficiencies in
leptin production, or leptin receptor deficiencies
Major Genes and Isoalleles
What is the genetic basis for quantitative variation?
Honest answer --- don’t know.
One hypothesis: isoalleles. A locus that has an allele of
major effect may also have alleles of much smaller effect
(isoalleles) that influence the trait of interest.
Structural vs. regulatory changes
Structural: change in an amino acid sequence
Regulatory: change affecting gene regulation
General assumption: regulatory changes are likely more
important
Cis vs. trans effects
Cis effect --- regulatory change only affects gene
(tightly) linked on the same chromosome
Trans effect --- a diffusible factor that can influence
regulation of unlinked genes
Cis-acting locus. The allele influences
Trans-acting locus. This locus influences genes on
The regulation of a gene on the same
other chromosomes and non-adjacent sites on the same
DNA molecule
chromosome
CIS-modifiers
MASTER modifiers
Genomic location of genes on array
TRANS-modifiers
Genomic location of mRNA level modifiers
Polygenic Mutation
For “normal” genes (i.e., those with large effects)
simply giving a mutation rate is sufficient
(e.g. the rate at which an dwarfing allele
appears)
For alleles contributing to quantitative variation,
we must account for both the rate at which mutants
appear as well as the phenotypic effect of each
Mutational variance, Vm or s2m - the amount of new additive
genetic variance introduced by mutation each generation
Typically Vm is on the order of 10-3 VE
Simple Tests for the
Presence of Major Genes
Simple Visual tests:
• Phenotypes fall into discrete classes
• Multimodality --- distribution has several modes (peaks)
Simple statistical tests
• Fit to a mixture model (LR test)
p(z) •= pr(QQ)p(z|QQ)
Heterogeneity of+pr(Qq)p(z|Qq)
within-family variances
+ pr(qq)p(z|qq)
Select and backcross
Mixture Models
The distribution of trait value z is the weighted sum of n
underlying distributions
Xn
p(z) =
P r(i) p i (z)
i= 1
The
probability
thatofaare
random
individual
is from
The component
distributions
typically
assumed
normal
The
distributions
phenotypes
conditional
of
class
the iindividual
Xn belonging to class i
p(z) =
Pr(i) ' (z; š i ; æi2 )
i= 1
•
•
2
3n-1 parameters: 2
1
(z ° š i )
' (z; š i ; æi ) = p
exp °
2
n-1 mixture
proportions,
n
variances
2
2º æ means, n2æ
i
i
(
)
Normalcommon
with mean
m and variance
s2
Typically assume
variances
-> 2n-1 parameters
In quantitative genetics, the underlying classes are
typically different genotypes (e.g. QQ vs. Qq) although
we could also model different environments in the same
fashion
Likelihood function for an individual under a mixture model
`(zj ) = P r(QQ) pQ Q (zj ) + P r(Qq) pQ q (zj ) + P r(qq) p qq (zj )
= P r(QQ) ' (zj ; š Q Q ; æ2 ) + P r(Qq) ' (zj ; š Q q ; æ2 ) + P r(qq) ' (zj ; š qq ; æ2 )
Mixture proportions follow from Hardy-Weinberg,
Likelihood function for a random sample of m individuals
e.g. Pr(QQ) = pQ* pQ
. . . zm ) =
`(z) = `(z1 ; z2 ; ¢¢¢;
Ym
`(zj )
j=1
Likelihood Ratio test for Mixtures
Null hypothesis: A single normal distribution is
adequate to fit the data. The maximum of the
likelihood function under the null hypothesis is
0
max ` 0 (z1 ; z2 ; ¢¢¢; zm ) = (2º S 2 )°
m =2
1
Xm
1
2A
exp @°
(z
°
z
)
j
2S 2 j
…
=1
1 X
2
S
=
(z
°
z)
i
The LR test for a significantly
better fit under a mixture
m
2
is given by 2 ln (max { likelihood under mixture}/max l0 )
The LR follows a chi-square distribution with n-2 df, where
n-1 = number of fitted parameters for the mixture
Complex Segregation Analysis
A significant fit to a mixture only suggests the possibility
of a major gene.
A much more formal demonstration of a major gene is
given by the likelihood-based method of Complex
Segregation Analysis (CSA)
Testing the fit of a mixture model requires a sample of
random individuals from the population.
CSA requires a pedigree of individuals. CSA uses
likelihood to formally test for the transmission of
A major gene in the pedigree
Building the likelihood for CSA
Start with a mixture model
Difference is that the mixing proportions are not the
same for each individual, but rather are a function of
its parental (presumed) genotypes
X3
`(zi j j gf ; gm ) =
P r(go j gf ; gm ) ' (zi j ; š go ; æ2 )
go = 1
Mean
ofhaving
genotype
go
Major-locus
genotypes
of
parents
Phenotypic
Transmission
variance
Probability
conditioned
of
an
offspring
on
major-locus
genotype
genotype
Example:
code
qq=3,
Qq=2,QQ=1
Phenotypic
value
of
individual
j
in
family
i
3
3
X
X
Likelihood
for
family
i genotypes,
Sumgoisgiven
over the
all possible
indexed
by
go =1,2,3
parental
genotypes
are
g
,
g
.
f
m
`(zi ¢) =(go = 3 j gf = `(z
j
g
;
g
)
(g
;
g
)
1; gm
QQ; gm = Qq) = 0
i ¢ = f2) = m(qq j gff = m
Sum
(ggo f == 12 gj mgf ==1 1; gm = 2) = (Qqj gf = QQ; gm = Qq) = 1=2
ni
Y
(g
=
1
j
g
=
1;
g
=
2)
=
(QQ
j
g
=
QQ;
g
o all possible
f
m parental genotypes
f
m = Qq) = 1=2
over
`(zi ¢ j gf ; gm ) =
Conditional family likelihood
` (zi j j gf ; gm )
j=1
Transmission Probabilities
Explicitly model the transmission probabilities
P r( qqj gf ; gm ) = (1 °- øg f ) (1 °- øg m )
P r( Qqj gf ; gm ) = øg f (1 °- øg m ) + øg m (1 °- øg f )
P r( QQ j gf ; gm ) = øg f øg m
Probability
that
thethe
father
mother
transmits
transmits
Q Q
FormalProbability
CSA
test of
athat
major
gene
(three
steps):
• Significantly better overall fit of a mixture model compared
with a single normal
• Failure to reject the hypothesis of Mendelian segregation :
tQQ = 1, , tQq = 1/2, tqq = 0
• Rejection of the hypothesis of equal transmission for all
genotypes (tQQ = tQq = tqq )
CSA Modification: Common Family Effects
Families can share a common environmental effect
Expected value for go genotype, family i is mgo + ci
2
Likelihood conditioned
on common family effect ci
Yn i
`(zi j gf ; gm ; ci ) =
4
X3
3
P r(go j j gf ; gm ) ' (zi j ; š g o + ci ; æ2 ) 5
j
j= 1
go = 1
j
Z1
Unconditional likelihood
(average over all c --- assumed
2
2
Normal
with
mean
zero
and
variance
s
`(z
j
g
;
g
)
=
`(z
j
g
;
g
;
c)
'
(c;
0;
æ
c
i
f
m
i
f
m
c ) dc
° 1
Likelihood function with no major gene, but family effects
Z
1
` (zi ) =
° 1
Z
1
=
° 1
`(zi j c) ' (c; 0; æ2c ) dc
2
3
n
Yi °
¢
°
¢
2
2
4
5
' zi j ; š + c; æ
' c; 0; æc dc
(
j=1
)
(
)
Maps and Mapping Functions
The unit of genetic distance between two markers is
the recombination frequency, c
If the phase of a parent is AB/ab, then 1-c is the
frequency of “parental” gametes (e.g., AB and ab),
while c is the frequency of “nonparental” gametes
(e.g.. Ab and aB).
A parental gamete results from an EVEN number of
crossovers, e.g., 0, 2, 4, etc.
For a nonparental (also called a recombinant) gamete,
need an ODD number of crossovers between A & b
e.g., 1, 3, 5, etc.
Hence, simply using the frequency of “recombinant”
(i.e. nonparental) gametes UNDERESTIMATES
the m number of crossovers, with E[m] > c
In particular, c = Prob(odd number of crossovers)
Mapping functions attempt to estimate the expected
number of crossovers m from observed recombination
frequencies c
When considering two linked loci, the phenomena
of interference must be taken into account
The presence of a crossover in one interval typically
decreases the likelihood of a nearby crossover
Suppose the order of the genes is A-B-C.
If there is no interference (i.e., crossovers occur
independently of each other) then
cA C = cA B (1 ° cB C ) + (1 ° cA B ) cB C = cA B + cB C ° 2cA B cB C
Probability(odd number of crossovers btw A and C)
Odd
We need
number
Even
tonumber
assume
of crossovers
inindependence
A-B, btw
odd number
A &ofB crossovers
and
in B-C
even in
number
orderinterference
tobetween
multiplyBthese
&
two probabilities
When
is Cpresent,
we can write this as
cA C = cA B + cB C ° 2(1 ° ±)cA B cB C
Interference parameter
d=0
No interference.
Crossovers
occur of
1 --> complete
interference:
The presence
of each nearby
other crossovers
aindependently
crossover eliminates
Mapping functions. Moving from c to m
Haldane’s mapping function (gives Haldane map
distances)
Assume
the the
numberk k of crossovers
in a region
This
makes
of NO INTERFERENCE
Pr(Poisson
= k) assumption
= l Exp[-l]/k!
follows a Poisson distribution with parameter m
l = expected number of successes
1
X1
X
m2k + 1
1 °- e°- 2 m
°- m
c=
p(m; 2k + 1) = e
=
(2k + 1)!
2
k= 0
k= 0
Odd number
Prob(Odd number
of crossovers)
This gives the estimated Haldane distance as
ln(1 ° 2c)
m= °
2
Usually
in m
units
of Morgans
or m
Centimorgans
(Cm)
Onereported
morgan -->
= 1.0.
One Cm -->
= 0.01
Linkage disequilibrium mapping
Idea is to use a random sample of individuals from
the population rather than a large pedigree.
Ironically, in the right settings this approach has
more power for fine mapping than pedigree analysis.
Why?
Key is the expected number of recombinants.
in a pedigree, Prob(no recombinants) in n
individuals is (1-c)n
LD mapping uses the historical recombinants in
a sample. Prob(no recomb) = (1-c)2t, where t =
Time back to most recent common ancestor
Expected number of recombinants in a sample of
n sibs is cn
Expected number of recombinants in a sample of
n random individuals with a time t back to the
MRCA (most recent common ancestor) is 2cnt
Hence, if t is large, many more expected recombinants
in random sample and hence more power for very
fine mapping (i.e. c < 0.01)
Because so many expected recombinants, only works
with c very small
Fine-mapping genes
Suppose an allele causing a large effect on the trait
arose as a single mutation in a closed population
New mutation arises on
red chromosome
Initially, the new mutation is
largely associated with the
red haplotype
Hence, markers that define the red haplotype are
likely to be associated (i.e. in LD) with the mutant allele
This linkage disequilibrium decays slowly with time if
c is small
Let p = Prob(mutation associated with original haplotype)
p =(1-c)t
Thus if we can estimate p and t, we can solve for c,
c = 1- p 1/t
Diastrophic dysplasis (DTD) association with
CSF1R marker locus alleles
Allele
Normal
DTD-bearing
1-1
4 (3.3%)
144 (94.7%)
1-2
28 (22.7%)
1 (0.7%)
2-1
7 (5.7%)
0 (0%)
2-2
84 (68.3%)
7 (4.6%)
Hence, allele 1-1 appears to be on the original haplotype
in which
thefrequent
DTD mutation
arose
--> p between
= 0.947
Most
allele type
varies
normal and
DTD-bearing
haplotypes
1/t
1/100
c = 1- p = 1- 0.947
100 generations
to MRCA
used
Gives c = 0.00051 between
marker and
DTD. Best
for Finnish population
Estimate from pedigrees is c = 0.012 (1.2cM)
Candidate Loci and the TDT
Often try to map genes by using case/control contrasts,
also called association mapping.
The frequencies of marker alleles are measured in both a
case sample -- showing the trait (or extreme values)
control sample -- not showing the trait
The idea is that if the marker is in tight linkage, we might
expect LD between it and the particular DNA site causing
the trait variation.
Problem with case-control approach: Population
Stratification can given false positives.
When population being sampled actually consists of
Example.
The Gm
marker was thought
biological
several
distinct
subpopulations
we have(for
lumped
together,
reasons)alleles
to be may
an excellent
candidate gene
marker
provide information
as tofor
which group
diabetes
in the
high-risk
population
of Pima
indians
an
individual
belongs.
If there
are other
risk
factors in
the American
Initially
a verybtw
strong
aingroup,
this can Southwest.
create a false
association
marker and
association was observed:
trait
Gm+
Total
% with diabetes
Present
293
8%
Absent
4,627
29%
The association was+ re-examined in a population of Pima
Problem: freq(Gm ) in Caucasians (lower-risk diabetes
that were 7/8th (or more)
full heritage:
Population) is 67%, Gm+ rare in full-blooded Pima
Gm+
Total
% with diabetes
Present
17
59%
Absent
1,764
60%
Transmission-disequilibrium test (TDT)
The TDT accounts for population structure. It requires
sets of relatives and compares the number of times a
marker allele is transmitted (T) versus not-transmitted
(NT) from a marker heterozygote parent to affected
offspring.
Under the hypothesis of no linkage, these values
should be equal, resulting in a chi-square test for
lack of fit:
¬ 2td
(T ° N T) 2
=
(T + N T )
Scan for type I diabetes in Humans. Marker locus
D2S152
Allele
T
NT
c2
p
228
81
45
10.29
0.001
230
59
73
1.48
0.223
240
36
24
2.30
0.121
2
(81
°
45)
¬2 =
= 10:29
(81 + 45)