Download 04/01/15 lecture2_04c

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Tay–Sachs disease wikipedia , lookup

Gene expression programming wikipedia , lookup

Twin study wikipedia , lookup

Heritability of IQ wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Biomath 207B / Biostat 237 / HG 207B
Lecture 2 - Segregation Analysis
1/15/04
If a trait is inherited in a recessive manner with complete
penetrance, then the child of two unaffected carriers has a
25% chance of having the trait.
Example: Brown and Blue eyes. The Blue is recessive to
Brown.
B-b
B-B
B-b
B-b
B-b
b-b
1
If a trait is inherited in a dominant manner with complete
penetrance, then the child of an affected heterozygous
parent and an unaffected parent has a 50:50 chance of
having the trait. Dominant and recessive are relative terms.
Note Brown eye gene is dominant to Blue eye gene.
B-b
b-b
b-b
b-b
B-b
B-b
2
Simple segregation patterns:
(1) recessive pattern of inheritance.
(2) disease is fully penetrant
(3) let D denote the disease allele
(4) p(d)=0.7, p(D)=0.3
(5) collect all families with exactly two children
What distribution of affecteds do we expect
to see under Hardy Weinberg Equilibrium and random
mating?
75.1%
6.6%
1.1%
10.7%
3.8%
1.9%
Unaffected
parents:
One affected
parent (male or
female):
0.8%
Two affected parents:
3
A disease that is inherited in a dominant manner has a
different pattern
(1) disease is fully penetrant
(2) let D denote the disease allele
(3) p(d)=0.9, p(D)=0.1
(4) collect all families with exactly two children
(5) Hardy Weinberg equilibrium and random mating
65.6%
Unaffected
parents:
7.3%
14.6%
0.2%
1.2%
8.9%
One affected
parent (male or
female):
2.2%
Two affected
parents:
4
Why is it not always this simple?
-More than one gene can be involved
and environment influences disease risk. That is,
there are diseases with reduced penetrance and
sporadic cases of disease.
-Can’t sample everyone. Complete ascertainment
is impractical for rare diseases
-Family structures will vary. Parents may not be
available.
5
Most common diseases are examples of multi-factorial,or
complex,traits.
Complex trait: more than one gene or gene(s) and environment
play a role.
Two genes
additive effects
gene 1
gene
TRAIT
gene 2
gene-environment
additive effects
genes-environment
interactions
gene 1
TRAIT
environment
gene 2
TRAIT
environment
In a multi-factorial disease, genes that play a role in
susceptibility to a disease may not be necessary or sufficient for
disease expression. Do not observe Mendelian inheritance
patterns.
Mendelian inheritance patterns include the transmission
patterns expected if there is a single gene obeying Mendel’s law
of independent assortment of alleles at a single locus, eg.
dominant, recessive.
6
Quantifying the Familial Aggregation
The first step of any genetic study is to ask one of the
following related questions:
(1) Does the disease aggregate in families (more than
expected by chance)?
(2) Are family members’ trait values more likely to be
similar than the trait values of two randomly
selected people from the same population?
One popular method of answering these questions is to
calculate the recurrence risk to relatives.
Recurrence risk to relatives of type R :
R = Prob(relative of type R affected | subject affected)
Prob(random person affected)
The larger R, the greater than degree of aggregation in
families but a large value of R does not prove disease
has a genetic basis. Aggregation could be common
environmental factors.
Prob(random person affected)= population prevalence.
The observation that offspring > siblings argues against a
purely Mendelian trait.
7
Segregation Analysis
• Goal of Segregation analysis: To identify
the specific genetic mechanisms that may
control traits associated with disease.
• Segregation Analysis is used to determine
if the observed familial aggregation has a
genetic basis. In addition, it is used to
estimate the relative effects of genetic and
environmental factors shared among
family members. It can also be used to test
for gene-environmental interactions.
• See Jarvik (1998) Complex Segregation
analyses: Uses and Limitations AJHG
63:942-946 for more information.
8
Why go to all the trouble of
segregation analysis?
(1) Calculating relative risks isn’t good enough.
Familial aggregation can be due to shared
environment. High sibling relative risk (s) or
heritability does not prove that the disease has a
genetic component (see for example, Guo AJHG
1998). Segregation analysis increases the
confidence that genes play a role in the
susceptibility to the disease.
(2) The most powerful forms of linkage analysis
require accurate knowledge of the inheritance mode
and penetrance of the disease.
Genetic model based gene mapping (classical
linkage analysis) requires that the inheritance mode
(dominant, recessive, etc) for the major gene and
the probability of disease given a particular
genotype be known. If the genetic model is wrong
the false negative rate is increased (Martinez M. et
al, Gen. Epi., 1989, 6:253-8).
9
Segregation analysis is a more difficult but more
informative method of gathering evidence for
substantial genetic involvement in susceptibility to the
trait.
Familial Aggregation can be due to:
(1) Shared genes
(a) one gene acting in a
(i)
dominant manner
Let D be the disease risk gene
P(disease|DD)=P(disease|Dd)>P(disease|dd)
(ii)
recessive manner
P(disease|DD)>P(disease|Dd)=P(disease|dd)
(iii) additive manner
P(disease|Dd)=1/2(P(disease|DD)+P(disease|dd))
(iv)
codominant manner
P(disease|DD)>P(disease|dD)>P(disease|dd)
(b) several genes
(c) many genes (polygene model)
(2) Shared environment
(3) A combination of both genes and environment that
can include interactions between the genes and the
environment.
10
Segregation Analysis involves:
(1) Specifying a mathematical model (similar to genetic
model based linkage analysis).
(2) Computing the likelihood of the observed data under
the model
(3) Comparing various models to find the “best” fitting
model.
Note that with segregation analysis, the best model is the
best model among those examined. For example, if a
polygene model is not among the choices for a disease
caused by many loci, the best fitting model might be end up
being a major gene model with spurious environmental
factors.
Environmental factors must be identified and carefully
documented for accurate results. The method of finding the
families (ascertainment) should be included in the model.
11
The overall approach to
segregation analysis is:
• Step (1): Specify null and alternative hypotheses.
• For example: no aggregation in families at all
(sporadic model) for the null hypothesis and
Mendelian inheritance (single gene) as the
alternative hypothesis.
• Step (2): Translate into mathematical models.
• Step (3): Compute the maximum likelihood of the
data and maximum likelihood estimates for the
parameters in the mathematical model for both
hypotheses.
• Step (4): If the null model is a special case of the
alternative (nested models), then compare the
models using Likelihood ratio tests (LRT) to find
the hypothesis that is best supported by the data
(hierarchical testing). If not nested, then use AIC
criterion or simulation to test.
• Repeat these steps for as many hypotheses as you
wish to test.
12
Comparing models:
(1) If the null hypothesis is a special case of the
alternative model then one way to compare is using
a LRT test. For example a dominant Mendelian
model is a restriction of the co-dominant Mendelian
model. Under this null hypothesis: 2*LR has a chisquare distribution.
The degrees of freedom are determined by the
difference in the number of parameters. When
comparing the dominant and codominant
Mendelian models, the degree of freedom is one.
The chi-square statistic has an associated p-value.
If it is less than 0.05 then reject the null hypothesis
in favor of the alternative. If it is greater than 0.05
then accept the null hypothesis.
(2) If the null hypothesis is not a special case of the
alternative use the AIC criterion to compare. For
example, a dominant Mendelian model under HWE
is not a special case of a recessive Mendelian model
where we do not assume HWE. The model with the
lowest AIC corresponds to the accepted hypothesis.
13
Converting hypotheses into
models:
• The mathematical models have three parts:
• The penetrance – a measure of how likely is the
trait value given a person is in a particular risk
group In genetics, the most relevant parameters are
m=gaa, gAa, gAA, representing the value for phenotype
value for the aa, and the change in value for the Aa,
or AA group.
• The prior - The probability that a founder belongs
to a particular risk group (under HWE determined
by qA).
• The transmission probabilities - The probability that
an offspring belongs to a particular risk group given
their parents’ risk groups. The relevant parameters
taa, tAa, and tAA. For example
taa = P(A transmitted from an aa parent)
and
taa taa =P(AA transmitted from aa and aa parents)
Under Mendelian inheritance, taa= 0, tAa=1/2, and
tAA=1.
14
With this information, determine the likelihood of the
trait gene location given the marker genotypes for the
family members. (Sum over the possible genotypes for
the trait).
Prob for family r
  ... Pen X i | Gi  Prior G j 
G1
Gn
i
j
 TransG
m
| Gl , Gm 
{ k , l , m}
Each family is independent so the individual family
likelihoods multiply.
15
Ousiotype model:
Define tAA, tAa, taa to be the probability of "transmitting" type A to an
offspring depending on the parental type. These transmission probabilities are
Pr(gi|gfi,gmi) where gi is person i's ousiotype,
gfi is his father's ousiotype, and gmi is his mother's ousiotype.
if
P(offspring ousiotype |parents ousiotype)
father's
mother's offspring's ousiotype (gi):
ousiotype ousiotype aa
aA
(gfi)
(gmi)

AA
aa
aa
aa
aa
Aa
AA
taa)2
taa)tAa)
taa)tAA)
taataa)
taatAa)+taa)tAa
taatAA)+taa)tAA
taa2
taatAa
taatAA
Aa
Aa
Aa
aa
Aa
AA
tAa)taa)
tAa)2
tAa)tAA)
tAataa)+tAa)taa
tAatAa)
tAatAA)+taA)tAA
tAataa
tAa2
tAatAA
AA
AA
AA
aa
Aa
AA
tAA)taa)
tAA)tAa)
tAA)2
tAAtaa)+tAA)taa
tAAtAa)+tAA)tAa
tAAtAA)
tAAtaa
tAAtAa
tAA2
16