Download Multilocus Genetics

Document related concepts

Tay–Sachs disease wikipedia , lookup

Tag SNP wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Medical genetics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genetic drift wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome evolution wikipedia , lookup

Human genetic variation wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Population genetics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome-wide association study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Molecular Evolution: Selection
• the ratio between the number of nonsynonymous substitutions (KA) and synonymous
substitutions (KS) in a gene during a specific
evolutionary period.
• Assuming that KS provides an index of the
random mutation rate, the KA/KS ratio measures
whether the rate of protein evolution differs from
the rate expected under neutral drift.
– If KA>KS, this is taken to indicate accelerated aminoacid change, which might be due to positive selection.
– Conversely, if KA<KS, this suggests purifying
selection.
Brain Development in Primates
• MCPH1 (the gene that encodes
microcephalin)
• and ASPM (abnormal-spindle-like,
microcephaly associated)
• Both MCPH1 and ASPM are evolutionarily
ancient, with orthologues that are likely to
be present in all chordates
MCPH
MCPH
(A) Schematic representation of the alignment. Promoter regions, exons, and
introns are marked in gray, red, and blue, respectively. White segments
correspond to gaps.
(B) Positions of long (50 bp or longer) insertions/deletions. “O” denotes
orangutan, “M” macaque, “OGCH” the orangutan–gorilla–chimpanzee–human
clade, and “GCH” the gorilla–chimpanzee–human clade.
(C) Positions of polymorphic bases derived from the GenBank single nucleotide
polymorphism (SNP) database.
(D) Positions of the CpG island. The approximately 800-bp-long CpG island
includes promoter, 5′ UTR, first exon, and a small portion of the first intron.
(E) Location of an approximately 3-kb-long segmental duplication.
(F) Positions of selected motifs associated with genomic rearrangements in the
human sequence. Numbers in parentheses reflect number of allowed
differences from the consensus motif (zero for short or two ambiguous motifs,
two for longer sites).
(G) Distribution of repetitive elements. The individual ASPM genes share the
same repeats except of indels marked in (B).
(H) DNA identity and GC content. Both plots were made using a 1-kb-long
sliding window with 100-bp overlaps. The GC profile corresponds to the
consensus sequence; the individual sequences have nearly identical profiles.
Linkage Studies
Monogenic and Complex Studies
Genetic and physical markers
 Markers ordered based on
recombination distances
 Measured in centiMorgans
(cM)
 Physical map
 Markers ordered based on
sequence distance
 Measured in number of
bases
 Can integrate physical and
genetic maps
genetic
map
physical
map
0
17.0 cM
34.5 cM
DIIMit35
DIIMit260
kilobases (kb)
 Genetic map
62.0 cM DIIMit199
IL5
IL3
1,000
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Linkage analysis
 Find markers that are genetically linked to
disease phenotype
 Calculate the odds of linkage vs. nonlinkage
 Threshold for significant linkage typically
taken at 1,000:1 odds of being linked vs.
nonlinked
 Usually converted into LOD (logarithm of the
odds)
 LOD threshold is 3
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
An example of linkage analysis
 Cystic fibrosis found to be linked to wellknown “met” marker on chromosome 7
 But linkage was not strong enough to start
sequencing
 Finer genetic mapping
 Markers D7S122 and D7S340 showed very
strong linkage
 This finding allowed researchers to narrow
down search area for a chromosome walk
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Chromosome walking
 In 1989, vectors could not carry the large DNA inserts
that are possible today
 Example: finding the cystic fibrosis gene
 Walk began with probes constructed from the DNA
markers D7S122 and D7S340
 Clones (~30 kb) are pulled out, beginning with marker
probe
 Clones used for new probes to find next sequence
 Chromosome jumping facilitated the process and
avoided unclonable regions
 Process repeated for 280-kb region
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
 CF affects lung and
pancreas, not testes or
brain
 Northern blot analysis
 One gene expressed
in affected tissues,
not expressed in
unaffected tissues
d
glan
Swe
at
n
Colo
Panc
rea
Nasa s
l pol
yp
Lung
T84
Brai
 Four genes found by
chromosome walking
 Associate gene with
disease
n
Plac
en
L iv e t a
r
Adre
na
Test l gland
is
Paro
t id g
land
kidn
ey
Analyzing candidate genes
28S –
18S –
1
2 3 4 5 6 7 8 910 11 12 13 14 15 16 17
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Using genomics to find Mendelian
disease genes
Linkage analysis
Finer genetic mapping
Physical mapping
Identify candidate
genes in human genome
databases
Gene cloning
Mutation identification
Mutation identification
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Postgenome approach for finding
Mendelian disease genes
 High-density SNP map makes linkage analysis easier
 Pre–human genome: ~6,000 chromosomal markers
 Post–human genome: millions of SNPs
 Complete sequence of human genome provides all
possible candidate genes
 Homology searches now much easier
 Characteristics of disease may suggest what disease
gene should look like (e.g., a brain-specific ion channel
for a neurological disease)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Complex disease
 Monogenic disease traits
 Mendelian disease
 Mitochondrial disease
 Typically rare: < 0.1%
 Complex disease
 Common: > 0.1%
 Polygenic or oligogenic
 Environmental factors contribute
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Factors complicating the analysis of
complex-disease traits
 Incomplete penetrance
 Inheritance of gene predisposing individual to
disease is not sufficient to cause disease
 Phenocopies
 People may develop disease without necessarily
possessing a gene that predisposes them to that
illness
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Common multifactorial diseases
 Frequency of some well-known complexdisease traits







Hypertension (~23%)
Diabetes (~5%)
Epilepsy (~1%)
Schizophrenia (~1%)
Bipolar disorder (~1%)
Multiple sclerosis (~0.1%)
Autism (~0.1%)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Environmental contribution to complex
disease
 Environmental factors
can affect expression of
a complex-disease
phenotype
 Example: diabetes
 Large increase in adultonset diabetes in the
U.S. from 1990 to
2000
 Likely due to sedentary
lifestyle and poor
eating habits
Percentage of adults
with diabetes
1990
2000
no data
< 4%
4%–6%
> 6%
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Quantifying the genetic component of
complex diseases
 Familial clustering
 Frequency of complex disease in relatives (s)
 Twin studies
 Frequency of disease traits in twins
 Studies of adoptees
 Example: twins separated at birth
 Studies of populations with similar genetics,
but different environments
 Example: members of the Pima tribe in the
U.S.A. and Mexico
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
A genetic component for epilepsy
 Epilepsy affects 1% of the population
 Diverse etiology
 Genetics of epilepsy
 Concordance of epilepsy in monozygotic twin
pairs: 62%
 Can also be used as a measure of hereditability
 Concordance of epilepsy in dizygotic twins:
18%
 Probability that the child of an epileptic parent
will have the illness: 4%
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
The Pima tribe and diabetes
 Pimas of Arizona have
highest rate of type 2
diabetes in world
 ~50% of Pimas over 35
have type 2 diabetes
 High-fat American diet
 2 hrs/week hard labor
 Pimas of Mexico
 Normal rates of
diabetes
 Low-fat diet
 23 hrs/week hard labor
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Understanding polygenic traits
 Phenotypes associated with polygenic traits
 Continuous
 Threshold effect
 Distribution of phenotype explained by multiple
genes, each with multiple alleles
AB
Ab
aB
ab
AB
AABB
AABb
AaBB
AaBb
Ab
AABb
AAbb
AaBb
Aabb
aB
AaBB
AaBb
aaBB
aaBb
ab
AaBb
Aabb
aaBb
aabb
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
More genes means a normal
distribution of phenotype
 A continuous trait
5
Frequency
 More underlying genes
create bell-shaped
distribution of
phenotype
 Bell curve also called
normal distribution, or
Gaussian
 Shape of distribution is
expected from statistics
6
4
3
2
1
0
4'0" 4'6" 5'0" 5'6" 6'0"
Height
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
The threshold effect
 Mendelian disease
 Complex disease
mm
Mm
healthy
Cc sick
Complex
 Disease risk small for
cc genotype
 Greater for Cc
 Greater still for CC
sick
Mendelian
 Example: autosomaldominant disease
 Almost all individuals
carrying M allele will
have disease
healthy
cc
CC
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Genetic and physical markers
genetic
map
 Markers ordered based
on recombination
distances
 Measured in
17.0 cM DIIMit35
centiMorgans (cM)
 Physical map
 Markers ordered based
on sequence distance
 Measured in bases
 Can integrate physical
and genetic maps
34.5 cM
physical
map
DIIMit260
0
kilobases (kb)
 Genetic map
62.0 cM DIIMit199
IL5
IL3
1000
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Strict criteria for complex-disease
gene studies
 Statistical significance for genomewide association or
linkage
 No genomewide association studies to date; only
linkage
 Resolution of linkage insufficient for identifying gene
 Fine mapping of locus
 Linkage disequilibrium
 Analysis of the sequence
 Find nucleotide variants consistent with disease
phenotype
 Testing of candidate-gene function
 Complement deficient gene to restore healthy state
 Circumstantial evidence
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Complex-disease genes that satisfy
the criteria
 At end of 2002: Seven complex-disease gene
studies in humans satisfied the strict criteria







HLA-DQA (Type 1 diabetes)
HLA-DQB (Type 1 diabetes)
CAPN10 (Type 2 diabetes)
NOD2 (Crohn’s disease)
ApoE (Alzheimer’s disease)
ADAM33 (Asthma)
ACE (Cardiovascular disease)
 No complementation studies for these genes
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
The impact of genomics on finding
complex-disease genes
 Improved annotation
 Better candidate genes
 High-resolution marker
map makes linkage
analyses easier
 Density of SNP markers
makes linkagedisequilibrium studies
possible
100
% windows with 1 or more SNP
 Finding human
homologues of
nonhuman genes
80
60
40
20
0
1 2 5 10 15 20 40 80
size of sequence windows (kb)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Nail-Patella Syndrome
• Nail Patella Syndrome (also called Fong's
Disease, Hereditary OnychoOsteodysplasia ['HOOD'] is characterized
by several typical abnormalities of the
arms and legs as well as kidney disease
and glaucoma
Recombination Frequency
• to determine the linkage distance between the
two genes (B/O and NP genes). The original
mating in generation I and the first two matings
in generation II are test cross. The third mating
in generation II is not informative because it
involves the A allele which we are not following.
We have a total of 16 offspring that are
informative. Of these three were recombinant.
As with all test crosses, this gives a genetic
distance of 18.8 cM [100*(3/16)].
http://www.ndsu.edu/instruct/mcclean/plsc431/linkage/linkage6.htm
Lod Score Method of Estimating
Linkage Distances
The following pedigree will be used to
demonstrate a method developed to
determine the distance between genes.
This approach has been widely adapted
to various system and genetic programs
have been developed based on this
technique.
Pedigree
• Even though we are working with the same two genes,
nail-patella and blood type, in this pedigree the dominant
allele seems to be coupled with the A blood type allele.
• Remember in the previous example, the dominant nailpatella allele was linked with the B allele. This is an
important point in genetics --- not all linkages between
alleles of two genes are found to be constant throughout
a species.
• Why??? Because at some point in the lineage of this
family, the disease (nail-patella) allele recombined and
became linked to a different blood type allele. In even
other lineages, the nail-patella causing allele is linked to
the O blood type allele.
Recombination Frequency
• we have one recombinant among the eight
progeny. This gives us a recombination
frequency of 0.125 and a distance of 12.5
cM.
LOD Score Method
• developed by Newton E. Morton, and is an
iterative approach that include a series of
lod scores calculated from a number of
proposed linkage distance.
LOD Score Method
• A linkage distance is estimated, and given
that estimate, the probability of a given
birth sequence is calculated. That value is
then divided by the probability of a given
birth sequence assuming that the genes
are unlinked. The log of this value is
calculated, and that value is the lod score
for this linkage distance estimate.
LOD Score Method
Example
• In this first birth sequence, we have an
individual with a parental genotype. The
probability of this event is (1 - 0.125).
Because there are two parental types, this
value is divided by two to give a value of
0.4375. In this pedigree we have a total of
seven parental types. We also have one
recombinant type. The probability of this
event is 0.125 which is divided by two
because two recombinant types exist.
Example
• What would the sequence of births be if
these genes were unlinked?
• When two genes are unlinked the
recombination frequency is 0.5. Therefore,
the probability of any given genotype
would be 0.25.
Linkage Probability
• The probability of a given birth sequence
is the product of each of the independent
events. So the probability of the birth
sequence based on our estimate of 0.125
as the recombination frequency would be
equal to (0.4375)7(0.0625)1 = 0.0001917.
Non-linkage Probability
• The probability of the birth sequence
based on no linkage would be (0.25)8 =
0.0000153.
Calculation of LOD score
• Now divide the linkage probability by the
non-linkage probability and you get a
value of 12.566. Next take the log of this
value, and you obtain a value of 1.099.
This value is the lod score.
• LOD= 0.0001917/ 0.0000153=log(12.566)
In practice, we would like to see a lod
score greater that 3.0.
What this means is that the likelihood
of linkage occurring at this distance is
1000 times greater that no linkage.
What we can learn from LD
 Finding disease genes
 Fine mapping of genes
 Genomewide association studies
 Other uses
 Revealing the history of human populations
 Understanding human origins
 Studying patterns of recombination in the
human genome
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Finding a gene associated with asthma
 Use classic linkage analysis to localize disease
gene to area on the chromosome
 Avoid the problem of multiple etiologies by
restricting disease definition
 Use linkage disequilibrium to narrow down
area where disease gene might be located
 Look for candidate genes
 Perform gene expression study to ensure that
disease gene is expressed in appropriate tissues
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Case Control Studies
Modified from Iris A. Granek,
M.D., M.S.
Case-control studies
• Search for differences in allele frequency
between disease carriers (cases) and non
carriers (controls) with the assumption
differences in frequencies are associated
with disease outcome.
• Can be applied to exposure to a chemical
or a carcinogen instead of allele
(genotypes).
Case Selection
• Define the source population
– residents of a geographic region
– hospital inpatient or clinic
• Strict case definition
– inclusion criteria
Control Selection
• Same source population as the cases
• Choose the controls by random from
the source population
– spouses
– associates
– patients within the same facility
– matched for certain criteria
Hospital Controls
• Without regard to diagnosis
• Excluding certain diseases
• Including only diseases believed to
be unrelated to the exposures (or
alleles) being studied
• Clinic patients from same hospital
Case Control Study Design
Compares distribution of exposure
cases (disease)
vs.
controls (without disease)
Exposure History Cases &
Controls
Exposed
CASES
a
Not Exposed
Totals
Proportions
exposed
CONTROLS
b
c
a+c
a
a+c
d
b+d
b
b+d
Distribution of past benzene
exposure among leukemia cases vs.
controls
• 20 leukemia cases
found among large
group of chemical
workers
• 16 cases had past
benzene exposure
• Proportion of cases
exposed to benzene:
16/20=80%
• 100 healthy controls
randomly selected
from same group of
chemical workers
• 12 controls had past
benzene exposure
• Proportion of
controls exposed to
benzene
12/100=12%
Odds Ratio Unmatched
Analysis
CASES
EXPOSED
a
NOT EXPOSED
c
Ratio of
a/c
CONTROLS
b
d
odds of exposure in cases
odds of exposure in controls
b/d
Odds Ratio
OR = ad
bc
Odds Ratio Unmatched
Analysis
BENZENE
NO BENZENE
LUNG CA CONTROLS
16
12
4
88
Ratio of
odds of exposure in cases
16/4
odds of exposure in controls
12/88
Odds Ratio
OR = 16 X 88 = 29.3
4 X 12
Odds Ratios
• OR > 1 indicates a positive
association between the factor and
the disease
– The lung cancer patients were 29
times more likely than the controls to
have been exposed to benzene
• OR < 1 indicates the factor is
protective
• OR = 1 indicates no association
95% Confidence Limits
• 95% probability that the true value lies
within the confidence interval or
between the confidence limits
• Odds ratios are statistically significant if
they do not include 1
• OR = 7 (0.5 - 15.0) not statistically
significant
• OR = 7 (3.0 - 12.0) is statistically
significant
Advantages of Case
Control
• Quick and Inexpensive
• Optimal for rare diseases
• Useful for diseases of long latency
from exposure to disease
development
• Can evaluate multiple risk factors
Bias in Case Control
Studies
• Bias is a systematic error in the
study that distorts the results &
limits the validity of the
conclusions.
– Selection Bias
– Confounding
– Observation Bias (recall bias,
interviewer bias, misclassification)
Selection Bias
• Systematic errors arising from the
way the subjects are selected
– Study subjects are selected in a way
that can misleadingly increase or
decrease the magnitude of an
association
• Exposure of cases differs from
exposure of all cases in source
population or exposure of controls
selected differs from non diseased in
source population
Selection Bias
Source Population
Study Sample
EEEEXX
EEE
XXXXXX
EXX
With disease
Cases
EEEEEE
EEX
EEXXXX
XXX
Without disease
Controls
Confounding
• Distortion of the true relationship
between the exposure and outcome
due to a mutual relationship with
another factor
• Can be the reason for an apparent
association & also may cause a true
association to not be observed
• Confounder must be associated with
the outcome and the exposure
Confounding Factors
Benzene Exposure
Lung Cancer
Cigarette Smoking
(Confounder)
Controlling for
Confounding
The effect of confounding
variables
• can be controlled during the data
analysis by various methods
– stratification
– multivariate analysis
• can be controlled during the study
design by matching controls and
cases for the factor
Matched Case Control Design
• Controls selected matched to cases on
factors associated with the disease
– age, sex, race, socioeconomic status
• Makes the two groups similar on
factors other than the exposure of
interest
• Cannot compare groups on matched
factors
• Must used matched analysis
Observation Bias
• Interviewer (data collection) bias
– keep data collection same for
cases and controls
• Misclassification Bias
– incorrect characterization of
exposure
• Recall Bias
– recall of exposures may be
influenced by current disease
status
Calculate the Odds Ratio
Esophagial cancer and alcohol
Fisher’s Exact Test
http://www.matforsk.no/ola/fisher.htm
Circumstantial evidence for
disease-gene function
 ADAM33 gene is expressed in lung tissue, bronchial
smooth muscle (BSM), and other tissues
 Gene codes for a putative protease
 Perhaps ADAM33 processes proteins in affected tissues
BS
M
lun
nonlung tissues
g
 Still no definitive theory for cause of asthma
5.0 kb
3.5 kb
actin
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Progress in finding complex-disease genes
100
Mendelian traits
1,800
1980
Complex traits
 Mendelian disease rarer,
but genes much easier to
find than complexdisease genes
 Progress in finding
complex-disease genes
is slow, but increasing
1990
2000
Human Mendelian
Human complex
All complex
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Genomewide association studies
 Based on concept of linkage disequilibrium
 Basic concept
 Characterize SNPs in a variety of populations
 Use SNP associations with disease
 Presently prohibitively expensive
 New technologies being developed
 DNA pooling
 More efficient genotyping methods
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Linkage disequilibrium
A1
A2
After 100 Years
100 kb
A1
A2
Population
50%
50% 50%
50%
Mutation
99%
1% 65%
35%
M
B1
B2
100 Mb
After 1,000 Years
A1
B1
B2
chromosomes
A2
B1
B2
Population
50% 50%
50%
50%
Mutation
70% 30%
50%
50%
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Quantifying LD
 Measures of linkage disequilibrium (LD)
depend on the frequency of alleles
 D often used as a measure of LD
 What is the linkage disequilibrium between loci
A and B?
 D = pAB – pApB
 pAB is frequency of alleles A and B occurring
together
 pA is frequency of allele A occurring alone
 pB is frequency of allele B occurring alone
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Scaling for allele frequency
 Value of D will depend strongly on allele
frequency
 D can be positive or negative
 Developing a more consistent measure of LD
 What is Dmax for given allele frequencies?
 Dmax = min(pApb, papB) if D is positive
 Dmax = min(pApB, papb) if D is negative
 D’ = D / Dmax
 |D’| = 0 for linkage equilibrium
 |D’| = 1 for 100% co-occurrence of alleles
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Square-of-the-correlation coefficient
 Another common measure of linkage
disequilibrium is the square of the correlation
coefficient, r2
 r2 = D2/(pApapBpb)
 Advantages of using r2
 Existing body of literature in population
genetics already uses r2
 Correlation coefficient is standard statistical
measure
 Sample size needed to detect statistically
significant LD is inversely proportional to r2
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
LD and recombination distance
1.0
0.8
r2
0.6
0.4
0.2
0
0.00
0.05
0.10
0.15
0.20
0.25
0.30 cM
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Caveats about linkage disequilibrium
 Population stratification
 Arises when examining populations with
different genetic backgrounds
 Need population-specific strategies
 300,000 SNPs likely needed for Europeans
 1,000,000 SNPs likely needed for Africans
 False associations
 Marker in LD with the ability to speak
Icelandic
 Have you found the Icelandic-language gene?
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458