* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 04/01/08 Lecture - UCLA Human Genetics
Survey
Document related concepts
Transcript
Applied Genetic Modeling –
Biological/Mathematical Basics and Overview to
first half of course
(1) Genetic terms and concepts.
(2) Probability
(3) Locating the gene for a using a genetic model based
method– What is the basic idea?
(4) Does the trait cluster in families? segregation analysis
Suggested Reading:
P. Sham: Read Chapter 1 and Skim Chapter 2
Khoury et al.: Read Chapters 2 and 4.
In the second half of the course we will discuss methods
map multi-factorial traits and diseases (complex traits).
1
Genetics = study of how traits are
trans mitted from generation to generation.
Genetic Epide miology = The study of the
roles of genetic factors and their interaction
with environmental factors in the
occurrence of a trait or disease.
Gene = the basic unit of heredity. It is a
sequence of DNA that encodes polypeptides
(part of a protein) or controls them. It
determines the nature of an individuals
inherited traits.
More terminology given in the Glossary
handout.
2
Reminder - nucleotides are the
building blocks of genes
A gene is composed of strings of bases (A,G, C, T) held
together by a sugar phosphate backbone.
3
Usually only
one strand of
dna is read
(codes for a
protein)
Mutations can
occur along the
sequence so that
two individuals
Have different
bases at sequence
positions.
From www.kadets. D20.co.edu/~lundberg/dna.html
4
Human Genetic Studies:
1. Our Objective: To map the position of disease
susceptibility or trait genes relative to genes of known
location (markers).
2. A single gene can vary in its composition (due to
mutations, insertions or deletions), so that two randomly
selected people may have biochemically distinct forms of
the gene. These distinct forms are called alleles. We are
also interested in finding the alleles which lead to increase
risk of disease.
3. Example of traits affected by genes with multiple alleles.
(a) Some alleles in genes may increase the risk of disease
(qualitative trait) like diabetes
(b) Allelic differences may result in different enzyme levels
or activities (quantitative traits) like insulin levels.
5
What is a genetic marker?
(1) Markers are loci with known locations, several
common alleles, and known transmission
patterns. Using linkage analysis, markers are
mapped relative to one another without studying
a disease.
(2) Markers were available before their physical
locations were actually known. Allelic forms
were distinguished by their biochemical
properties (examples RH, ABO blood group).
Genetic locations were determined by inferring
the order and recombination fractions between
markers by linkage analysis.
(3) Currently, studies commonly use microsatellite
markers. They are strings of di-, tri- or
tetranucleotide repeats that can be detected by
PCR. They are highly polymorphic. Single
nucleotide polymorphisms SNPs have also
become routinely used. See box 11.1 of Strachen
and Read for more information.
6
Single Base Mutations in the genome
• 1.4 million locations
• single base DNA differences
= single nucleotide polymorphisms
= SNPs
• Example:
AAGGCTAA to ATGGCTAA
• once every 100 to 300 bases
• predispose to disease
•influence drug response
Wellcome News, Issue 20 Q3 1999
7
Gene Mapping – Genetics Review
1.
Genes made of strings of bases (DNA) and are arranged
linearly on chromosomes.
2.
Humans have 46 chromosomes. Two of these are the sex
determining chromosomes (X and Y). The remaining
form 22 distinct types, each type is found in duplicate.
These are the autosomes. At any given autosomal locus
(gene location), one copy is maternally derived and one
copy is paternally derived. This is Mendel's first law (the
principle of independent assortment of alleles).
3.
In general for autosomal chromosomes, each of the two
alleles present in the parent have a 50:50 chance of
being passed on.
4.
Think of the markers as genetic guideposts along the
chromosome.
1
2 3 4
56
7 8 9
10 11
= marker
= Location of disease susceptibility gene
8
How to Gene Map? Use pedigrees. Pedigrees (families)
provide a powerful source of information in genetics
2
1
3-5
6
4-4
3
4
4-5
4-5
5
7
4-7
Phenotype = an individuals observable (measurable) values.
Examples: blue/brown eyes, diabetes (yes, no), cholesterol
concentration
Genotype = the often unobservable state of an individual’s
genes at a particular locus that relates to the phenotype
of interest. If alleles are the same then the genotype is
homozygous, otherwise heterozygous.
By Mendel’s law 1 (independent assortment of alleles at a
locus). Person 1 is equally likely to transmit a 3 or a 5 to his
child at the marker locus. Person 2 must transmit a 4.
Suppose the marker has alleles, 2, 3, …, 7. What are the
possible marker genotypes for person 5? How about person 6?
9
standard 1
2
3
4
4-4
4-5
4-5
7
7
6
5
4
3
2
Marker genotype: 3-5
4-7
10
The pedigree can be expressed in a file,
if the appropriate fields are recorded
2
1
3-5
6
4-4
3
4
4-5
4-5
5
7
4-7
Fam.
1
1
1
1
1
1
1
Subj
1
2
3
4
5
6
7
mom dad sex Status trait
0
0
2
2
2
0
3
0
0
1
1
1
0
6
M
F
F
F
F
M
F
alive
alive
alive
alive
dead
dead
alive
genotype
brown
3-5
blue
4-4
blue
4-5
brown
4-5
blue
0-0
unknown 0-0
blue
4-7
11
Phenotype, Genotype and Penetrance:
Example ABO blood group
ABO blood group:
A and B are antigens (proteins that cause an immune
response) expressed on the surface of blood cells.
Phenotypes detected by antibody reaction.
O is the absence of an antigen.
A phenotype reacts with the A antibody but not B antibody
B phenotype reacts with the B antibody but not A antibody
AB phenotype reacts with both A and B antibodies
O phenotype does not react with either antibody
Phenotype
A
AB
B
O
Genotype
A/A, A/O, or O/A
A/B or B/A
B/B, B/O or O/B
O/O
12
A masks the presence of O so we don’t know if
person 1 is A-A or A-O.
A is dominant to O
O is recessive to A
B is dominant to O
A and B are both detectable
- codominant
Penetrance = probability of observing a phenotype
given a genotype.
If given the genotype we know the phenotype with 100%
certainty then the trait is fully penetrant. Example ABO:
P(A|A-O)=1 P(B|A-O)=P(O|A-O)=P(AB|A-O)=0.
Most trait or disease genes are not so penetrant.
Having the apoE4-apoE4 genotype elevates lifetime
risk of Alzheimer’s disease but
P(lifetime risk of Alzheimer’s disease | apoE4/apoE4) < 0.50
likewise BRCA1 mutated allele and breast cancer
P(lifetime risk of Breast Cancer | BRCA1/brca1) = .30-.80.
13
Genotypes: Example ABO
Since there are three alleles, there are 32 = 9 ordered
genotypes. By ordered genotype we mean that
we know the parental origins of the alleles (the phase).
1
2
A
O
3
4
A
AB
5
B
We know the phase for person’s 3 and 5 genotypes. Person
4 is heterozygous but phase is unknown. Also note that we
can infer person 2’s genotype with certainty but not person 1’s.
14
If a trait is inherited in a recessive manner with complete
penetrance, then child of two unaffected carriers has a 25%
chance of having the trait.
Example: Brown and Blue eyes. The Blue is recessive to
Brown.
B-b
B-B
B-b
B-b
B-b
b-b
15
If a trait is inherited in a dominant manner with complete
penetrance, then child of an affected heterozygous parent
and an unaffected parent has a 50:50 chance of having
the trait. Dominant and recessive are relative terms.
Note Brown eye gene is dominant to Blue eye gene.
B-b
b-b
b-b
b-b
B-b
b-b
16
Each event, A, is assigned a probability
Pr(A) satisfying the following rules:
(a) Pr(A) = 1 if A = .
(b) Pr(A) = 0 if A = .
(c) 0<=Pr(A)<=1
(d)
Pr( A B ) Pr( A) Pr( B ) if Pr( A B )
17
Conditional Probability:
Pr( A B)
Pr( A | B)
Pr( B)
Example: Consider a recessive disease with full
penetrance. What is the probability that a normal child of
two parents with one copy each of the disease gene has a
copy of disease gene?
Hint: Let A denote the event having one or two copies of
the disease allele. Let B denote the event of having a
normal phenotype.
Pr(A)=3/4, Pr(B)=3/4, Pr( A B) 2 / 4
18
One common use of the conditional probability is
“to divide and conquer.”
If the events B1, B2, ... are mutually exclusive and
exhaustive, then the following decomposition often
simplifies calculation:
Pr( A) Pr(( A B1 ) ( A B2 ) ... ( A Bm ))
Pr( A Bi )
i
Pr(Bi ) Pr(A | Bi )
i
Keep this formula in mind when we look the
equations for segregation analysis and genetic
model based linkage.
19
Final Probability Rule to remember:
Independence:
A and B are independent events if
Pr A B Pr A PrB
If Pr(B) >0, then the independence of A and B
is equivalent to Pr(A|B)=Pr(A).
That is, knowing that B occurred does not help
us predict whether A has occurred.
20
Application: Using probability rules and Hardy
Weinberg equilibrium to calculate genotype
frequencies from allele frequencies
ABO blood group: Suppose we know that the allele
frequencies are pA = 0.21, pB = 0.05, and pO = .74 What
is the probability that a randomly selected individual
has the unordered genotype AO?
Hardy Weinberg Equilibrium (HWE): Suppose mating is
random with respect to the gene in question, there is no
selection, migration or mutation, then the allele frequencies
will remain constant from generation to generation and we
can calculate the genotype frequencies from the allele
frequencies (and vice versa).
P(AO unordered genotype) =
P(A/O or O/A ordered genotype)
= P(A/O)+P(O/A) = 2P(A/O).
P(A/O) = P( A allele and O allele) =pApO (HWE is a form
of independence)
P(AO unordered genotype) =2pApO= 0.3108
Important: HWE explains why multiple alleles persist in
the population
21
Using the cosegregation patterns of markers
and traits in families to map trait genes.
If the two genes are on different chromosomes then they
are passed from parent to offspring independently.
B b
3
b
3
b
4
P = 1/4
5
b
5
b
4
b
4
P = 1/4
b
4
B b
3
4
P = 1/4
B
5
b
4
P = 1/4
Mendel’s law #2 - Independent assortment of loci
22
What if the two loci are on the same
chromosome?
One might expect that the two loci are always
transmitted together.
B b
b
b
3
4
4
5
B
b
B
b
b
b
b
b
3
4
3
4
5
4
5
4
P = 1/2
P = 1/2
The set of alleles arranged by parental origin are called
haplotypes. Here we observe three haplotypes B3 and b5
and b4.
23
One often observes more than the parental
haplotypes in the children: eg seeB5, B3, b5, and b3
B b
3
5
B b
b b
3 4
3
4
P = (1-q)/2 P = q/2
b
4
b
4
B b
5
4
P = q/2
b b
5 4
P = (1-q)/2
1. The two paternal copies of the chromosome can crossover
and recombine during the formation of the gametes (sperm).
Recombination can also occur maternal chromosomes but it
is not observable in this case.
2. q is the recombination fraction and equals the probability of
recombination between the two loci. q is a function
of the distance between the two loci. q=1/2 when
loci are far apart or on different chromosomes.
24
The distortion in the expected segregation pattern of 1/4:1/4:1/4:1/4
depends on the frequency of crossovers between the two loci. The
frequency is dependent on the distance between the two loci.
The two members of the same autosomal pair, duplicate and
pair up
A2
A1
B2
B1
A1
A2
A1
B1 B2
B1
B2
B2
Let the crossover
frequency be 10%
Recombination
Fraction q = 10%
in gametes
A1
A1
A2
A2
B1
B2
B1
B2
45%
5%
5%
45%
25
Comments about using pedigrees to locate genes:
(1) Statistical mapping, Linkage analysis, is the first
step in the strategy of positional cloning of disease
loci. Prior to cloning linkage analysis can be used
for risk prediction.
(2) An unqualified success for mapping disease genes
with regular Mendelian patterns of inheritance,
e.q. cystic fibrosis and hypercholesterolemia
(3) Based on the fact that genes close in distance
violate Mendel’s law of independent assortment of
loci. The actual relationship between physical
distance and genetic distance is complicated.
(4) Trait loci are positioned (mapped) relative to nondisease markers.
26
The Genetic Model based methods have been
successful in localizing genes in single gene traits with
high penetrance (Mendelian traits):
In genetic model based linkage analysis need to know or
estimate:
(1) Transmission Probabilities. Is there one or more
genes involved? Is there recombination between
genes?
(2) Penetrance: What is the probability of having the
trait given a particular genotype? (Penetrance can
be dependent upon age, environmental factors etc.).
(3) Priors: Marker genotype frequencies must be
determined.
With this information, determine the likelihood of the
trait gene location given the marker genotypes for the
family members. (Sum over the possible genotypes for
the trait).
Likelihood for family r
... Pen X i | Gi PriorG j
G1
Gn
i
j
TransG m | Gl , Gm
{ k ,l ,m}
Each family is independent so the individual family
likelihoods multiply.
Examples: Duchenne muscular dystrophy,
Retinoblastoma, Cystic Fibrosis
27
But it’s not as easy as it seems.
.
The degree that genes play a role in trait susceptibility
varies from trait to trait
–Some traits are primarily determined by genes
–Some traits are primarily determined by the environment
–Some traits are a combination of genes and environment
Single Gene
Genes and Environment
Cystic
Fibrosis diabetes cancer infections
Huntington’s
Environment
accidents
heart disease
We need accurate information about family history,
life styles, environment, genetic data, to determine
first whether it is worth attempting gene mapping
and then to map them.
28
Quantifying the Familial Aggregation
The first step of any genetic study is to ask one of the
following related questions:
(1) Does the disease aggregate in families (more than
expected by chance)?
(2) Are family members’ trait values more likely to be
similar than the trait values of two randomly
selected people from the same population?
One popular method of answering these questions is to
calculate the recurrence risk to relatives.
Recurrence risk to relatives of type R :
R = Prob(relative of type R affected | subject affected)
Prob(random person affected)
The larger R, the greater than degree of aggregation in
families but a large value of R does not prove disease
has a genetic basis. Aggregation could be common
environmental factors.
Prob(random person affected)= population prevalence.
The observation that offspring > siblings argues against a
purely Mendelian trait.
29
Segregation analysis is a more difficult but more
informative method of gathering evidence for
substantial genetic involvement in susceptibility to the
trait.
The goal of segregation analysis is to identify the
specific genetic mechanisms that may control traits that
are associated with disease.
Familial Aggregation can be due to:
(1) Shared genes
(a) one gene acting in a
(i)
dominant manner
P(disease|DD)=P(disease|Dd)>P(disease|dd)
(ii)
recessive manner
P(disease|DD)>P(disease|Dd)=P(disease|dd)
(iii)
additive manner
P(disease|Dd)=1/2(P(disease|DD)+P(disease|dd))
(iv)
codominant manner
P(disease|DD)>P(disease|dD)>P(disease|dd)
(b) several genes
(c) many genes
(2) Shared environment
(3) A combination of both genes and environment that
can include interactions between the genes and the
environment.
30
Segregation Analysis involves:
(1) Specifying a mathematical model (similar to genetic
model based linkage analysis).
(2) Computing the likelihood of the observed data under
the model
(3) Comparing various genetic models to find the “best”
fitting model.
Note that with segregation analysis, the best model is the
best model among those examined. For example, if a
polygene model is not among the choices for a disease
caused by many loci, the best fitting model might be end up
being a major gene model with spurious environmental
factors.
Environmental factors must be identified and carefully
documented for accurate results. The method of finding the
families (ascertainment) should be included in the model.
31
Most common diseases are examples of multi-factorial,or
complex,traits.
Complex trait: more than one gene or gene(s) and environment
play a role.
Two genes
additive effects
gene 1
gene
TRAIT
gene 2
gene-environment
additive effects
genes-environment
interactions
gene 1
TRAIT
environment
gene 2
TRAIT
environment
In a multi-factorial disease, genes that play a role in
susceptibility to a disease may not be necessary or sufficient for
disease expression. Do not observe Mendelian inheritance
patterns.
Mendelian inheritance patterns include the transmission
patterns expected if there is a single gene obeying Mendel’s law
of independent assortment of alleles at a single locus, eg.
dominant, recessive.
32
Does genetic model based linkage work with
complex traits?
Yes, provided the model is correctly (nearly correctly)
specified. It has worked with complex diseases (example:
Alzheimer's disease, breast cancer, hemolytic disease of
the newborn) if one can identify a population, the
environmental factors involved etc. In some cases, there
is a subpopulation where the trait seems to be segregating
in a Mendelian (single gene) manner.
Model Based Linkage Analysis has worked best when the
disease gene is necessary and sufficient for disease
expression.
Genetic model based linkage analysis can give misleading
results if
(1) Incorrect ascertainment correction is used;
(2) Phenotypes are incorrectly classified (the penetrance
is misspecified);
(3) The degree of heterogeneity is misspecified
(populations differ in the causal gene); or
(4) The mode of transmission is misspecified.
(Clerget-Darpoux, 1986; J. Ott, 1991)
All of these problems can occur in linkage analysis of a
complex trait.
33
Alternatives to Genetic Model Based
Linkage Analysis for Mapping Genes
(1) Affected Relative Methods
(2) Variance Component (linear regression)
Models
(3) Association Methods
These methods use the degree of allele
sharing to determine the location of genes.
We will discuss these alternative methods,
called genetic model free methods, in the
second half of the course.
34