Download 171 Estimation of admixture and detection of linkage in admixed populations... a Bayesian approach : application to African-American populations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetics and archaeogenetics of South Asia wikipedia , lookup

Public health genomics wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Ancestry.com wikipedia , lookup

Microevolution wikipedia , lookup

Genetic drift wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Population genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
171
Ann. Hum. Genet. (2000), 64, 171–186
Printed in Great Britain
Estimation of admixture and detection of linkage in admixed populations by
a Bayesian approach : application to African-American populations
P. M. MKEIGUE", J. R. CARPENTER", E. J. PARRA#  M. D. SHRIVER#
" Department of Epidemiology and Population Health, London School of Hygiene & Tropical
Medicine, Keppel Street, London WC1E 7HT, UK
# Department of Anthropology, College of Liberal Arts, 409 Carpenter Building, The Pennsylvania
State University, University Park, PA 16802, USA
(Received 12.11.99. Accepted 7.2.00)

We describe a novel method for analysis of marker genotype data from admixed populations,
based on a hybrid of Bayesian and frequentist approaches in which the posterior distribution is
generated by Markov chain simulation and score tests are obtained from the missing-data likelihood.
We analysed data on unrelated individuals from eight African-American populations, genotyped at
ten marker loci of which two (FY and AT3) are linked (22 cM apart). Linkage between these two loci
was detected by testing for association of ancestry conditional on parental admixture. The strength
of this association was consistent with European gene flow into the African-American population
between five and nine generations ago. To mimic the mapping of an unknown gene in an ‘ affectedsonly ’ analysis, a binary trait was constructed from the genotype at the AT3 locus and a score test
was shown to detect linkage of this ‘ trait ’ with the FY locus. Mis-specification of the ancestry-specific
allele frequencies – the probabilities of each allelic state given the ancestry of the allele – was
detected at three of the ten marker loci. The methods described here have wide application to the
analysis of data from admixed populations, allowing the effects of linkage and population structure
(variation of admixture between individuals) to be distinguished. With more markers and a more
complex statistical model, genes underlying ethnic differences in disease risk could be mapped by this
approach.

In a population formed by admixture between
two or more ancestral founding populations,
marker genotype data can be used to estimate
the ancestry (origin from one of the founding
populations) of the two alleles at each marker
locus in each individual studied, the admixture
of the individual (proportion of that individual’s
genome that has ancestry from each founding
Correspondence : Dr Paul McKeigue, Department of
Epidemiology and Population Health, London School of
Hygiene & Tropical Medicine, Keppel Street, London
WC1E 7HT, UK. Tel : j44 (020) 7927 2312 ; Fax : j44
(020) 7580 6897.
E-mail : paul.mckeigue!Lshtm.ac.uk
population), and the distribution of admixture in
the population. As long as the ancestry-specific
allele frequencies – the probability of each allelic
state given the ancestry of the allele – in the
admixed population under study are correctly
specified for each marker locus, Bayes’s theorem
can be applied to invert these conditional
probabilities and obtain the posterior distribution of ancestry at each locus given the
observed marker genotypes.
With estimates of individual admixture, it is
possible to study the relationship of admixture of
disease risk, and to control for admixture as a
confounder in studies of the associations of other
genetic or environmental factors with disease
172
P. M. MK  
risk. This can help to distinguish between genetic
and environmental explanations for ethnic variation in disease risk (Chakraborty & Weiss,
1986). This approach has been used, for instance,
to study the relation of blood pressure to
admixture in African-Americans (MacLean et al.
1974) and the relation of diabetes to admixture
in Mexican-Americans (Chakraborty et al. 1986).
With information about the ancestry of alleles
at marker loci in individuals of mixed descent, it
is possible to map genes underlying ethnic
differences in disease risk in a manner analogous
to linkage analysis of an experimental cross
(McKeigue, 1998). The basis of this approach is
to test for association with states of ancestry on
chromosomes of mixed descent, conditioning on
parental admixture. By combining information
from all marker loci in a multipoint analysis, it is
possible to estimate accurately the ancestry of
the alleles at each locus, even though no single
marker is fully informative for ancestry. In
principle this approach can extract all the
information about linkage that is generated by
admixture (McKeigue, 1998), in contrast to other
approaches that rely on detecting the allelic
association that is secondary to the correlation of
states of ancestry at linked loci on chromosomes
of mixed descent. As this approach has more in
common with linkage analysis of a cross than
with conventional linkage disequilibrium mapping, the term ‘ admixture mapping ’ (Zheng &
Elston, 1999) is preferable to the term ‘ mapping
by admixture linkage disequilibrium ’ coined by
earlier writers (Stephens et al. 1994).
We describe a novel method for estimating
admixture from marker data and exploiting the
information generated by admixture to detect
linkage, based on a hybrid of Bayesian and
frequentist approaches. This is applied to a large
dataset of marker genotypes from AfricanAmerican populations.
Limitations of existing methods for estimating
admixture
Classical methods for estimation of admixture
from marker data estimate the mean proportion
of alleles in the population under study that have
ancestry from each founding population, but
cannot estimate variation of admixture between
individuals (Elston, 1971 ; Reed, 1971 ; Chakraborty, 1975 ; Long & Smouse, 1983). This limits
our ability to detect the substructure that occurs
in admixed populations where there has been
continuing gene flow from one or both founding
populations. Existing methods assume that for
any given individual the states of ancestry at
different marker loci are independent of each
other. This assumption will not hold if the
marker loci are linked, because the stochastic
variation of ancestry on chromosomes of mixed
descent generates correlation of ancestry between
linked marker loci. The closer the loci, the
more highly correlated will be the states of
ancestry. This limits the number of autosomal
markers that can be used in admixture studies to
one per chromosome arm, which is not enough
for accurate estimation of individual admixture
(Reed, 1973). To estimate individual admixture
accurately, it is necessary to model the stochastic
variation of ancestry on chromosomes of mixed
descent so that information from multiple
markers on each chromosome can be combined.
To map genes underlying ethnic differences in
disease risk in a genome search even higher
marker densities are required, so that ancestry of
the two alleles at each locus can be estimated at
all points on the genome (McKeigue, 1998).
Estimation of admixture from marker genotype data relies on specifying the ancestryspecific allele frequencies correctly for the admixed population under study. These frequencies
are usually estimated in samples of modern
descendants of the founding populations that
contributed to the admixed population. Often,
however, it is impossible to determine what
subpopulations should be sampled to draw a
representative sample of descendants of a founding population. For instance we cannot easily
draw a sample that is exactly representative of
the West African subpopulations that contributed to the African-American population. Even
where it is feasible to draw a representative
sample from the founding populations, the
ancestry-specific allele frequencies within an
Detection of linkage in admixed populations
admixed population may vary from the allele
frequencies in modern descendants of the founding populations as a result of drift, mutation or
selection since admixture.
To overcome this difficulty we require methods
for detecting mis-specification of ancestryspecific allele frequencies and re-estimating the
correct frequencies within the admixed population under study. A simple test for mis-specified
ancestry-specific allele frequencies is to compare
the estimates for population admixture obtained
from single loci (Cavalli-Sforza & Bodmer, 1971 ;
Elston, 1971). If one of the ancestry-specific
allele frequencies at a locus is mis-specified, the
estimate of admixture obtained from that locus
will vary from the estimate obtained by combining marker genotype data for all loci. This
however does not fully exploit the information
about ancestry-specific allele frequencies that is
available in a dataset in which individuals of
mixed descent have been typed at multiple
marker loci.
Advantages of a Bayesian approach
To model admixture in the population adequately, a hierarchical model is required, estimating the admixture of each individual and
the distribution of individual admixture in the
population. In such a model the admixture of
each parental gamete, the ancestry of the two
alleles at each locus in each typed individual, and
any missing marker genotypes can be specified as
‘ missing data ’. Classical likelihood-based methods are difficult to apply to such a hierarchical
model where most of the data are missing. An
alternative is to adopt a Bayesian approach
(MacLean & Workman, 1973). If we specify a full
probability model in which all observed and
missing data are treated as random variables and
prior distributions are assigned where necessary,
the posterior distribution of the missing data
given the observed data can be generated by
Markov chain simulation (Gelman et al. 1995).
Bayesian inference is based on combining the
prior distribution and the likelihood to generate
a posterior distribution. With non-informative
prior distributions (Gelman et al. 1995) and large
173
sample sizes, the posterior distribution is dominated by the likelihood, and Bayesian analyses
yield results that are close to those obtained by
a frequentist approach. Thus in large samples,
the mode (or mean) of the posterior distribution
is asymptotically equivalent to the maximumlikelihood estimate, and the 95 % central posterior interval is asymptotically equivalent to a
95 % confidence interval : that is, it has 95 %
probability of covering the true value under
repeated sampling with any fixed true value of
the parameter (Gelman et al. 1995).
Once we have generated the posterior distribution of the missing data, the expectation of
any quantity of interest over this posterior
distribution can be evaluated and we can use a
frequentist approach to test whether any of the
assumptions of the model should be rejected. For
any null hypothesis of interest, it is straightforward to construct a score test from the
missing-data likelihood, as described below. With
this hybrid of Bayesian and frequentist approaches, it is not necessary to assign a prior
distribution to the parameter that we are testing
for departure from its null value. In large
samples, score tests are asymptotically equivalent to the likelihood ratio tests (lod scores)
conventionally used in genetic linkage studies.

Data source
The African-American population samples and
the genotyping have been described in detail
elsewhere (Parra et al. 1998). From the ten
African-American and Afro-Caribbean population samples studied previously, eight populations that represented the range of mean
admixture values were selected for this analysis.
The samples were obtained from a paternity
testing lab (Houston), blood donation centres
(Pittsburgh and New Orleans) and participants
in epidemiological studies (New York, Philadelphia, Baltimore, Charleston and Jamaica).
The New York sample comprised cases and
controls in a study of obesity in African-
174
P. M. MK  
Americans. The Philadelphia and Jamaica
samples were collected as controls in studies of
hypertension. The Baltimore sample was collected in an epidemiological study of HIV
infection among intravenous drug users. The
Charleston sample comprised pregnant women
who participated in a study of blood lead levels.
Ten marker loci, chosen to have large differentials in allele frequencies between West Africans and Europeans, were typed. Nine of these
loci – RB2300, LPL, FY, AT3, APO, SB19.3,
ICAM, OCA2, GC – have been described previously (Parra et al. 1998, ID : 4444). RB2300 is a
polymorphism in the retinoblastoma gene
located at 13q14.3. APO is an alu insertion
polymorphism in the APO4 gene at 11q23. ICAM
is a polymorphism in the ICAM1 gene at
19p13.3–p13.2. SB19.3 is an alu insertion located
on chromosome 19. For this analysis genotypes
at an additional locus – L19.2, a SacI polymorphism located on chromosome 11 – were
available. Most of these markers are restriction
site polymorphisms which were detected by
digestion with the appropriate restriction enzyme after PCR. Markers were genotyped by
standard PCR and electrophoretic separation of
DNA fragments as described previously (Parra et
al. 1998). Two of the marker loci – FY and AT3
– are located in the same chromosomal band,
22 cM apart. All markers except GC are biallelic.
For simplicity in programming the statistical
analysis, the three allelic states at the GC locus
were grouped into two categories so that all
markers could be treated as biallelic. Allele 1F
was coded as allelic state 1, and the two alleles
that have higher frequency in Europeans than in
west Africans – 1S and 2 – were combined as
allelic state 2. Ancestry-specific allele frequencies
were estimated from samples of West African
(Nigerian and Central African) and European
(English, Irish and German) populations as
described previously (Parra et al. 1998). The
Wahlund variance ( f ), a function of the ancestryspecific allele frequencies, measures the proportion of information about ancestry extracted
by a biallelic marker where the prior probabilities
of each state of ancestry are equal (McKeigue,
1998). For the 10 markers used, the average fvalue was 0.38, with a range from 0.14 (ICAM ) to
1 (FY ).
Statistical model
The analysis was based on the directed graphical model shown in Figure 1. As the distribution
of admixture in the population cannot be
assumed unimodal or even continuous, parental
admixture was modelled as a discrete random
variable with 33 possible values ranging from 0
to 1 as integer fractions of 32. This random
variable was assigned a multinomial distribution with 33 possible outcomes. The conjugate
prior for this multinomial distribution is a
Dirichlet distribution with a parameter vector
that has 33 coordinates. Each of these 33
coordinates was assigned a value of " . This prior
$$
distribution can be interpreted as contributing
information equivalent to observing the admixture of a single gamete, with equal weight given
to each possible outcome. With the sample sizes
of more than 150 gametes that were available for
all the subpopulations in this study, the contribution of prior information to the posterior
distribution is thus small in comparison to the
contribution of the observed marker genotype
data.
The admixture of each parental gamete is
modelled as an independent draw from the
multinomial distribution of parental admixture.
This assumes that mating in the population from
which the parental gametes were drawn is not
assortative for admixture. The ancestry of the
allele transmitted on each parental gamete at
each locus is then a Bernoulli variable (0 l
European, 1 l African) with probability parameter specified by the admixture of the parental
gamete. The probabilities of observing each of
the three possible marker genotypes (0, 1 or 2
copies of allele 1) are then simple functions of the
ancestry of the two alleles at the marker locus
and the ancestry-specific allele frequencies.
The score tests for linkage are based on the
model in Figure 1, in which there is no association
between the ancestry of alleles at different loci,
conditional on parental admixture. We have
Detection of linkage in admixed populations
175
Fig. 1. Directed graphical model for dependence of marker genotypes on admixture at population level
and individual level. Constants are shown as single-edged rectangles, observed data as double-edged
rectangles, and stochastic nodes as eclipses. Stochastic dependence is indicated by continuous arrows.
Single-edged rectangles represent strata : individuals and loci.
shown previously that for unlinked loci this
assumption follows directly from Mendel’s law of
independent assortment (McKeigue, 1998). To
estimate the association between the ancestry of
alleles at the two linked marker loci – FY and
AT3 – on each gamete, a stochastic node representing the odds ratio for this association was
added to the model. The likelihood as a function
of the odds ratio, conditional on parental admixture, is given in the Appendix. With a flat
prior for the log odds ratio, the program was
unable to sample adequately from the posterior
distribution. To overcome this problem, the
natural logarithm of the odds ratio was assigned
a normal prior with mean zero and precision 0.5
(equivalent to variance of 2). This incorporates
our prior knowledge that extreme values of the
log odds ratio are implausible. For two loci 22 cM
apart, the highest possible values for the log odds
ratio are 2.8 (three grandparents African, one
European) in gametes from a population of
individuals with 25 % European admixture, and
2.9 (seven grandparents African, one European)
in gametes from a population of individuals with
12.5 % European admixture.
Using Markov chain simulation to generate the
posterior distribution of the missing data
The BUGS program (WinBUGS 1.2, beta test
version) (Spiegelhalter et al. 1999) was used to
generate the posterior distribution by Gibbs
sampling. The dataset of marker genotypes for
each study population was analysed separately.
Two chains with overdispersed starting values
were run in parallel, with an initial run of 1000
iterations to ensure convergence, as assessed by
the Gelman–Rubin method (Gelman & Rubin,
1992). After convergence at least 5000 further
iterations were run to monitor parameters of
interest, continuing if necessary to reduce the
Monte Carlo error of the estimates.
P. M. MK  
176
Construction of score tests from the missing-data
likelihood
Once we have generated the posterior distribution of the missing data given the observed
data, the following method can be used to
construct a significance test for any null hypothesis of interest. The gradient of the loglikelihood of the observed data (expressed as a
function of a parameter θ) is the efficient score. If
the null hypothesis (θ l θ ) is correct, the score
!
at θ has expectation zero and variance given by
!
the observed information (defined as minus the
curvature of the log-likelihood function) at θ .
!
We can write down an expression for the loglikelihood of any realization of the complete data
as a function of θ, and differentiate this expression twice with respect to θ to obtain the
score and information. The score and information
(evaluated at θ ) are averaged over the posterior
!
distribution of the missing data to yield the score
for the observed data and the complete information. The variance of the score over this
posterior distribution is the missing information
(Louis, 1982). The observed information is calculated by subtracting the missing information
from the complete information (Little & Rubin,
1987). The ratio of observed information to
complete information is the proportion of information about θ that is extracted by the
analysis. For a simple null hypothesis, the score
U and observed information V are scalars, and
UV−"/# has a standard normal distribution under
the null hypothesis. For a composite null hypothesis, where U is a row vector and V is a
matrix, a χ# test statistic is calculated as UV−"Uh.
Derivations of the score tests used in these
analyses are given in the Appendix.

Distribution of individual admixture in each
population
Table 1 shows the estimates for the distribution of individual admixture (in the parental
generation) in each population. The estimated
mean admixture (mean of the posterior dis-
tribution of the population mean) ranged from
0.07 in Jamaica to 0.22 in New Orleans. These
estimates are similar to those obtained previously by conventional likelihood-based approaches (Parra et al. 1998). With a Bayesian
approach, however, we can also estimate the
distribution of individual admixture in the
population. The estimated proportion of parents
with more than 50 % European admixture was
low in most of the populations studied : only in
New Orleans was this proportion estimated to be
greater than 10 %. The posterior intervals for the
proportion of parents with less than 12.5 % (oneeighth) European admixture were wide in comparison with the posterior intervals for the mean
admixture of each population. The low information content of these estimates is because only
10 markers have been used in each individual
and because the individual’s parents themselves
have not been genotyped.
Testing for linkage by testing for association of
ancestry conditional on parental admixture
To examine the ability to detect linkage by
testing for association of ancestry conditional on
parental admixture, we examined the association
between the two linked loci – FY and AT3 –
using a score test based on the odds ratio, as
described in the Appendix. If this test performs
as theory predicts, it should detect association of
ancestry between linked loci in populations
where admixture has been recent, but should not
detect association of ancestry between unlinked
loci more often than expected by chance. To
examine this, we applied the test to a pair of
unlinked loci : FY and OCA2. Of the ten markers
used, these two have the highest information
content for ancestry, so that if variation of
admixture between individuals generates association in the absence of linkage, it is between
these two loci that we would expect to detect it
most easily.
As there is no plausible biological mechanism
for an inverse association of ancestry between
two loci, one-tailed p-values are given in the
Table. For each pair of loci a summary test was
obtained by summing the score and the observed
177
Detection of linkage in admixed populations
Table 1. Distribution of parental admixture in each African-American subpopulation
Jamaica
Charleston
Philadelphia
Baltimore
Houston
New York
Pittsburgh
New Orleans
n
93
94
303
100
100
236
84
105
Proportion of European admixture
Population mean
Percent 50 % European
Percent 12.5 % European
(posterior mean, 95 % PI)
(posterior mean, 95 % PI)
(posterior mean, 95 % PI)
0.07 (0.04–0.10)
1 (1–4)
79 (33–100)
0.12 (0.08–0.16)
2 (0–8)
59 (0–73)
0.15 (0.13–0.16)
0 (1–2)
12 (0–45)
0.15 (0.12–0.19)
5 (0–15)
36 (0–88)
0.16 (0.15–0.19)
3 (0–12)
24 (0–75)
0.21 (0.18–0.24)
7 (0–20)
9 (0–32)
0.22 (0.18–0.26)
3 (0–15)
11 (0–59)
0.22 (0.18–0.26)
13 (0–26)
36 (0–81)
* PI, posterior interval.
Table 2. Association of ancestry between loci FY and AT3, and between FY and OCA2, conditional
on parental admixture
Score at
Percent
Log
Loci
odds ratio Observed information Z test One-tailed odds
Population
tested
of 1
information extracted statistic
p-value
ratio
Jamaica
FY-AT3
1.22
0.67
12
1.49
0.07
1.0
FY-OCA2
k0.15
0.04
1
k0.71
0.76
Charleston
FY-AT3
2.17
1.14
11
2.04
0.02
1.3
FY-OCA2
0.78
0.20
2
1.74
0.04
Philadelphia FY-AT3
2.13
5.82
13
0.88
0.19
0.2
FY-OCA2
k1.94
0.62
2
k2.47
0.99
Baltimore
FY-AT3
1.71
1.84
15
1.26
0.10
1.0
FY-OCA2
k0.24
0.20
2
k0.53
0.70
Houston
FY-AT3
2.50
2.41
13
1.61
0.05
1.1
FY-OCA2
0.99
0.68
4
1.19
0.12
New York
FY-AT3
7.69
7.68
16
2.77
0.003
1.8
FY-OCA2
k1.59
1.46
4
k1.32
0.91
Pittsburgh
FY-AT3
k0.95
3.17
19
k0.53
0.70
k0.5
FY-OCA2
k0.01
0.84
5
k0.02
0.51
New Orleans FY-AT3
4.08
2.83
16
2.42
0.008
1.9
FY-OCA2
k0.53
0.59
4
k0.69
0.76
All
FY-AT3
20.55
25.57
15
4.06
2i10−5
FY-OCA2
k2.70
4.63
3
k1.25
0.90
information across all the populations studied
(Table 2). For loci FY and AT3, the summary
score test was significant at a p-value of 2i10−&,
whereas for loci FY and OCA2 there was no
evidence of association ( p l 0.9).
When the score tests in each population were
examined separately, the association of ancestry
between FY and AT3 was statistically significant
at the 5 % level in three of the eight AfricanAmerican populations studied : Charleston, New
York and New Orleans. The overall proportion of
information about the odds ratio that was
extracted by the analysis was low : 14 % for the
FY–AT3 odds ratio, and 3 % for the FY–OCA2
odds ratio.
95 %
posterior
interval
k2.0, 3.5
k1.4, 3.5
k1.6, 1.6
k1.3, 3.0
k1.3, 3.0
0.5, 3.0
k2.8, 1.3
0, 3.6
Bayesian estimates (posterior means) for the
log odds ratio for association of ancestry at loci
FY and AT3 ranged from k0.5 in Pittsburgh to
1.9 in New Orleans. As the posterior intervals for
the log odds ratio were wide, it was not possible
to establish whether there was heterogeneity of
the odds ratio between populations.
Mapping the AT3 gene as if the phenotype were a
binary trait determined by an unknown locus
To mimic the mapping of an unknown disease
gene in an affecteds-only analysis, we defined a
binary trait in which those with at least one copy
of allele 2 (the allele more common in Europeans)
P. M. MK  
178
Table 3. ‘ Affecteds-only ’ score test for linkage with the FY locus, using a ‘ trait ’ constructed from the
AT3 genotype
Proportion of
European ancestry
at FY locus
Population
Jamaica
Charleston
Philadelphia
Baltimore
Houston
New York
Pittsburgh
New Orleans
All
n
33
37
123
46
45
120
34
47
485
Observed
0.12
0.18
0.18
0.20
0.24
0.33
0.21
0.33
0.24
Expected
0.11
0.15
0.16
0.18
0.19
0.24
0.24
0.31
0.20
Information
Score
0.53
1.08
3.37
0.81
2.64
10.83
k1.16
0.76
18.85
at the AT3 locus were coded as ‘ affected ’. The
prevalence of this ‘ trait ’ is 0.24 in unadmixed
west Africans and 0.92 in unadmixed Europeans :
a population risk ratio of 3.9. If we sample
individuals ‘ affected ’ with this trait from an
admixed population, at any marker locus that is
linked to AT3 the observed proportion of alleles
that are of European ancestry will be higher than
expected on the basis of the parental admixture
of these individuals. This principle can be applied
to construct an affecteds-only test for linkage, as
outlined previously (McKeigue, 1998). If we
assume a multiplicative genetic model, the effect
of the locus can be represented by the population
risk ratio r. As shown in the Appendix, the score
at r l 1 is simply the observed minus the
expected proportion of alleles that have European ancestry at the marker locus, summed over
all individuals and averaged over the posterior
distribution of the missing data. As for this
‘ trait ’ the correct genetic model is a dominant
one, the score test based on a multiplicative
model will be less powerful than one based on the
correct model. However this does not affect the
validity of the p-values obtained from the score
test, as these are calculated on the assumption
that the null hypothesis (r l 1) is correct.
Where we have established in advance that the
trait is more common in the high-risk population
(Europeans in this example of a trait defined by
the genotype at the AT3 locus) than in the lowrisk population (West Africans in this example),
it is appropriate to use a one-tailed test.
Observed
1.41
2.15
8.28
2.24
4.62
15.85
0.61
3.40
38.56
Percent
extracted
73
72
76
69
83
85
37
71
78
Z-test
statistic
0.44
0.74
1.17
0.54
1.23
2.72
k1.48
0.41
3.04
One-tailed
p-value
0.33
0.23
0.12
0.29
0.11
0.003
0.93
0.34
0.001
To construct the test dataset, ‘ affected ’ individuals were defined as above and the AT3
genotype was dropped from the data. As shown
in Table 3, the affecteds-only score test detected
significant evidence of linkage of the ‘ trait ’ with
the FY locus in all populations combined (summary p-value 0.001), and separately in the New
York sample ( p l 0.003).
Testing for mis-specification of ancestry-specific
allele frequencies
The score test for mis-specification of the
ancestry-specific allele frequencies yields three
test statistics : a Z statistic (standard normal
deviate) for the African-specific allele frequency,
a Z statistic for the European-specific allele
frequency, and a summary chi-square statistic
with 2 degrees of freedom for the joint null
hypothesis that both ancestry-specific allele
frequencies are correctly specified. A positive
value of the Z statistic (positive gradient of the
log-likelihood function at the null value) implies
that the most likely value of the ancestry-specific
allele frequency is higher than the value specified
in the model, and a negative value that the most
likely value is lower than the specified value.
Table 4 shows, for the four largest population
samples, the results of these score tests. The
percentage of information extracted by the
analysis is typically around 80 %.
The summary chi-square statistic for the four
African-American populations combined yielded
significant evidence ( p-values less than 0.01) for
179
Detection of linkage in admixed populations
Table 4. Score tests for mis-specified ancestry-specific allele frequencies
Locus name
pAfr
pEur
RB2300
0.92
0.333
LPL
0.973
0.486
FY
0
1
AT3
0.874
0.279
APO
0.441
0.927
SB 19.3
0.425
0.91
ICAM
0.756
1
OCA2
0.098
0.769
L19.2
0.089
0.541
GC
0.824
0.156
Philadelphia
Z African-specific k0.13 k0.55
0.92 k1.00 k0.62
0.05 k1.82 k2.43
1.45
2.53
% info
97
94
91
98
99
99
100
97
98
98
Z European-specific k0.11 k0.44
0.98 k1.03 k0.42
0.04 k1.93 k2.36
1.36
2.46
% info
94
97
92
94
86
85
77
91
95
87
Summary χ#
0.05
1.05
1.38
1.08
2.12
0.34
4.03
6.02
2.43
6.49
Baltimore
Z African-specific
0.51 k1.00 k0.80 k1.66 k0.53
1.17
2.61 k1.86
0.37
1.60
% info
90
83
69
93
98
98
99
91
92
95
Z European-specific 0.81 k1.16 k0.06 k1.38 k0.75
1.18
1.69 k2.10
1.35
1.42
% info
79
91
58
79
66
66
60
69
84
64
Summary χ#
0.71
1.34
3.03
2.75
0.57
1.61
7.14
4.51
0.14
2.61
New York
Z African-specific
0.63 k0.49
1.32 k2.60 k0.55
0.84 k1.86 k1.60
1.99
1.71
% info
87
76
63
90
97
97
99
88
91
93
Z European-specific 1.40 k1.43
1.31 k2.53
0.41
0.91 k1.52 k1.34 k2.41
1.21
% info
82
92
54
84
72
70
52
77
86
72
Summary χ#
3.00
4.63
1.82
7.05
4.42
0.85
3.56
2.58
5.81
2.97
New Orleans
Z African-specific
1.05
0.28
0.60 k1.07
0.77 k1.60
1.32
0.31 k0.44
0.03
% info
90
85
67
90
97
98
99
89
91
93
Z European-specific 1.93
1.29
0.89 k1.47 k0.08 k1.45
1.78
0.41 k0.26
0.16
% info
76
87
74
88
54
65
74
83
88
77
Summary χ#
5.87
7.07
1.77
2.97
2.34
2.56
4.31
0.22
0.29
0.10
All four populations combined
Z African-specific
0.69 k1.01
1.21 k3.09 k0.74
0.57 k1.23 k3.20
0.11
3.26
% info
92
86
76
94
98
98
100
93
95
96
Z European-specific 1.68 k1.38
1.69 k3.24 k0.30
0.69 k1.24 k2.88 k0.74
2.65
% info
85
93
65
86
72
72
59
80
89
74
Summary χ#
5.58
2.18
3.81
10.70
0.98
0.47
1.61
10.20
3.42 10.70
Z scores with absolute value greater than 2.33, or chi-square values greater than 9.21 (shown in bold) are significant
at p 0.01.
mis-specification of the ancestry-specific allele
frequencies at three of the ten loci : AT3, OCA2
and GC. The Z statistics for the African-specific
and the European-specific allele frequencies are
correlated with each other, for reasons that are
explained in the Discussion. Examination of
the Z statistics in Table 4 suggests that where
the ancestry-specific allele frequencies are misspecified, this mis-specification is fairly consistent
across the African-American populations studied. Thus at the AT3 locus, the Z statistics are
negative in all four populations, even though
their deviation from zero is statistically significant only in the New York sample. Similarly, at
the GC locus the Z statistics are positive in all
four populations studied, even though their
deviation from zero is statistically significant
only in the Philadelphia sample.

In this analysis we are able to estimate the
distribution of individual admixture within the
African-American populations studied, even
though we cannot estimate admixture accurately
for any single individual. In most of these
populations admixture varies only over a narrow
range, and fewer than 10 % of individuals who
identify themselves as African-American have
more than 50 % European admixture. This
limites the statistical power of epidemiological
studies designed to detect relationships between
disease risk and individual admixture in AfricanAmericans. To overcome this limitation, one
could identify other populations in which admixture varies over a wider range, or use more
markers so that individual admixture can be
180
P. M. MK  
estimated more accurately. We can estimate
(using the large-sample variance of the maximum-likelihood estimator) that at least 40
unlinked markers with average f-value of 0.4
would be required to estimate individual admixture with a standard error of no more than
0.1. Even more markers would be required if
there is association of ancestry (independent of
parental admixture) between some of the markers, as there will be if the markers are linked
and admixture has been recent.
Our approach to the detection of linkage
in admixed populations differs fundamentally
from previous approaches that rely on detection
of allelic association (linkage disequilibrium)
(Chakraborty & Weiss, 1988 ; Briscoe et al. 1994 ;
Kaplan et al. 1998 ; Zheng & Elston, 1999).
Instead of testing for allelic association, we use
marker genotype data to extract information
about states of ancestry on chromosomes of
mixed descent, then test for association with
ancestry at the marker locus. The association
between states of ancestry at two linked loci in
an admixed population depends only upon the
history of admixture and the map distance
between the loci ; it does not, of course, depend
upon what markers are typed. The analysis
demonstrates with real data a result derived
previously (McKeigue, 1998) ; that if we condition on parental admixture, a test for association with ancestry at a marker locus is a
specific test for linkage. The score test detects
association between the two linked loci – FY and
AT3 – but not between unlinked loci. In a
previous analysis of this dataset, a conventional
test for linkage disequilibrium detected allelic
association between two unlinked loci – FY and
OCA2 – in the New Orleans sample at a pvalue of less than 1 % (Parra et al. 1998). This
association between unlinked loci is eliminated
by conditioning on parental admixture, as theory
predicts (McKeigue, 1998). The analysis extracts
less than 20 % of the information about the odds
ratio for association of ancestry that we would
have if the complete data (ancestry at both loci
on each gamete) were available. This is because
without typing information on other family
members we have no information on phase, and
because no other loci adjacent to the AT3 or the
OCA2 locus have been typed. By typing other
family members, and by typing more markers
in the region under study, the proportion of
information extracted could be increased as
desired.
The strength of the association of ancestry
(measured as the log odds ratio) between two
linked loci separated by a given map distance,
conditioned on parental admixture, is a measure
of how recently admixture has occurred. For
most of the African-American populations in
this study, estimates of the log odds ratio for
association of ancestry between loci FY and AT3
are in the range of 1 to 2, although the posterior
intervals are wide. Using the equations given in
the Appendix, we can calculate that log odds
ratios between 1 and 2 in African-American
populations where proportionate European admixture is three-sixteenths (18.75 %) are consistent with European gene flow between five
and nine generations ago. If most European gene
flow into these African-American populations
had occurred more recently than this, the log
odds ratios would be in the range of 2 to 3.
If we assume exponential distributions for the
lengths of chromosomal segments that are of
African and European ancestry, we can calculate
the density of transitions of ancestry on chromosomes of mixed descent – the ancestry crossover
rate – from the log odds ratio for loci FY and
AT3, the map distance between these loci and
the mean admixture as described in the Appendix. These calculations show that log odds
ratios between 1 and 2 for loci FY and AT3 imply
ancestry crossover rates between 3.9 and 2.2 per
Morgan. These estimates should be considered to
be only approximate, as the assumption of
exponentially-distributed segment lengths does
not in general hold in admixed populations
(McKeigue, 1998). The ancestry crossover rate
has practical implications for studies that exploit
admixture to map genes. For an initial genome
search, the optimal strategy is to choose populations where admixture is recent (crossover rate
less than 3 per Morgan), thus keeping down the
Detection of linkage in admixed populations
number of markers required and the number of
independent hypotheses to be tested. For finer
mapping of a trait locus identified in such a
study, one would try to identify populations
where admixture has occurred less recently
(crossover rates greater than 4 per Morgan).
The ‘ affecteds-only ’ analysis mimics the approach that we have suggested for mapping
genes underlying ethnic differences in disease
risk (McKeigue, 1998). This analysis detects
linkage in all populations combined, and in New
York separately with a sample of 121 ‘ affected ’
individuals. We have estimated previously that
in a population where proportionate admixture
from the high-risk population is between 0.3 and
0.7, a sample of about 130 affected individuals is
adequate to detect linkage at p 0.001 with a
locus that accounts for a population risk ratio of
3. As the FY locus is fully informative for
ancestry, the main limitations on our ability to
detect linkage in this analysis are that the
proportion of European admixture in these
populations is low and the marker locus is 22 cM
from the trait locus. In practice, admixture
mapping would use more closely-spaced markers
(McKeigue, 1998).
Obtaining the posterior distribution of ancestry at marker loci given marker genotype
data depends upon correctly specifying the
ancestry-specific allele frequencies in the admixed population under study. As long as these
frequencies are correctly specified, the analysis
does not depend upon any assumptions about
population history as we are simply applying
Bayes’s theorem to invert conditional probabilities. The ability to test for mis-specification of
the ancestry-specific allele frequencies, and to reestimate the correct frequencies where necessary,
is thus crucial to the application of admixture
mapping in practice. The score test that we have
derived detects mis-specification of ancestryspecific allele frequencies at three of the ten
marker loci, but does not clearly distinguish
whether it is the African-specific frequencies, the
European-specific frequencies, or both, that are
mis-specified. At each locus, the scores for the
African-specific and the European-specific fre-
181
quencies are highly correlated. This is because
with only 10 markers we cannot accurately
estimate the admixture of each individual, even
though we can accurately estimate the average
admixture of the population. One way to convey
this argument is to think of the score test
conditional on parental admixture as comparing
at each locus in each individual the observed
allele frequency with the expected frequency
given the admixture of the individual’s parents.
The expected frequency is a weighted average of
the African-specific and European-specific allele
frequencies, with weights equal to the admixture
of the individual’s parents. If we have sufficient
information to estimate accurately the average
admixture of the population, we can estimate
accurately the expected frequency of each allele
in the total sample. This allows us to detect a
difference between observed and expected frequencies, but does not tell us whether it is the
European-specific or the African-specific allele
frequencies that are mis-specified. If individual
admixture varies over a wide range in the sample,
and enough markers have been typed for individual admixture to be estimated accurately,
the score test will be able to distinguish which
ancestry-specific allele frequencies are misspecified. If, for instance, only the Europeanspecific allele frequencies are mis-specified, the
difference between observed and expected frequencies will be largest in those individuals who
have the highest proportion of European admixture. In a full multipoint analysis, additional
information about the ancestry of the alleles at
each locus would be available from modelling the
stochastic variation of ancestry on chromosomes
of mixed descent and combining information
from all markers to estimate ancestry at each
locus (McKeigue, 1998). We could then use a score
test conditional on ancestry at each locus, rather
than simply conditioning on parental admixture
as in this study. In principle it would be
straightforward to re-estimate the ancestryspecific allele frequencies within an admixed
population under study using a Bayesian approach. If we assign prior distributions to these
parameters that reflect our uncertainty about
182
P. M. MK  
their values, the Gibbs sampler will estimate
their posterior means. We have not attempted to
re-estimate the ancestry-specific allele frequencies in this dataset, because with only 10
markers we do not have enough information for
these re-estimates to be accurate. From a
practical point of view, we note that in a previous
analysis of this dataset estimates of population
admixture obtained from single loci did not differ
significantly (Parra et al. 1998). The effect of any
mis-specification of ancestry-specific allele frequencies upon estimates of overall admixture is
therefore likely to be small. Mis-specification of
allele frequencies at the OCA2 locus could affect
the validity of the test for linkage between FY
and OCA2, but there is no evidence in Table 2 of
a false-positive result (positive association between ancestry at the OCA2 locus and the FY
locus, conditional on parental admixture) in the
summary score test.
These preliminary analyses suggest that the
ancestry-specific allele frequencies are likely to
be fairly consistent across different AfricanAmerican subpopulations. At the three loci where
a summary chi-square test for all four populations combined indicates that ancestry-specific
allele frequencies are mis-specified, the deviations
of the Z statistics from zero are generally in the
same direction in each subpopulation. This
suggests that even though diverse west African
subpopulations contributed genes to the modern
African-American populations, within the pool
of alleles that are of African ancestry the allele
frequencies do not vary much between regions of
the US. Studies with larger samples and more
markers are required to resolve this.
This study demonstrates the application of
some of the statistical techniques that we have
proposed using to map genes underlying ethnic
differences in disease risk (McKeigue, 1998). We
have shown that the posterior distribution of
parental admixture and ancestry at each marker
locus, conditional on the observed marker genotype data, can be generated by Markov chain
simulation. We have shown that score tests
based on the missing-data likelihood can be
applied to detect linkage and to detect mis-
specification of ancestry-specific allele frequencies within the admixed population under
study. We have shown, using the genotype at the
AT3 locus as if it were a binary trait determined
by an unknown gene, that a simple ‘ affectedsonly ’ score test can detect linkage of a marker
locus at a distance of 22 cM from a trait locus.
To extend the methods described in this paper
to a search of the genome for genes underlying
ethnic differences in disease risk would require
more markers, a more complicated statistical
model, and far more computing power, but would
introduce no fundamentally new principles beyond those demonstrated in this paper. Simulations suggest that about 1000 biallelic markers
with average f-values of 0.4 are required to
extract 80 % of information about ancestry in an
initial genome search in a population where
the ancestry crossover rate is 2 per Morgan
(McKeigue, 1998). At the current rate of progress
in identification and typing of single nucleotide
polymorphisms (SNPs), identification of such
marker sets will soon be feasible even if it is
necessary to screen a library of 50 000 or more
SNPs to select those that have high f-values. For
a full multipoint analysis that uses data on
affected individuals and their relatives, it will be
necessary to model the stochastic variation of
inheritance (maternally-derived or paternallyderived allele transmitted) and ancestry on
chromosomes of mixed descent. The approach
can be extended to deal with admixture between
three or more founding populations, or to handle
multi-allelic markers.
In conclusion, these results provide further
evidence that admixture mapping is a feasible
approach to mapping genes for complex traits.
As it is now clear that conventional allele-sharing
designs lack adequate statistical power to map
genes for complex traits with realistic sample
sizes (Risch & Merikangas, 1996), such novel
methodological approaches may offer the best
chance for finding human genes that influence
the risks of conditions such as diabetes, hypertension, obesity and autoimmune disease.
We thank David Clayton for suggesting the use of a
score test based on the missing data likelihood.
183
Detection of linkage in admixed populations
 :      ,      
Testing for linkage between two marker loci by testing for association between the ancestry of alleles
conditional on parental admixture
At each locus on each gamete there are two possible states of ancestry : African or European
population Y. The null hypothesis is that, conditional on parental admixture, the odds ratio for the
association between states of ancestry at any two loci on the same gamete loci is 1. For two loci A
and B on a single gamete, we write pij for the probability of ancestry state j at locus B given ancestry
state i at locus A, where i and j have value 0 for African ancestry and 1 for European ancestry. There
are three possible outcomes – 0, 1 or 2 alleles with European ancestry – for which the likelihoods are
Π l (1kM)(1kp ), Π l 2(1kM)p , and Π l Mp , where Πi is the probability that the gamete
!
!"
"
!"
#
""
has (at loci A and B combined) a total of i alleles that have European ancestry, and M is the
proportion of the parent’s genome that is of European ancestry.
We can write the conditional probabilities of each ancestry state at locus B given the state at locus
A as a matrix of transition probabilities in which the rows and columns represent states of ancestry
at loci A and B respectively.
p
!"
!"
91kp
1kp
p :
""
""
(1)
As the probabilities of European ancestry at locus A and locus B are both equal to M, we have
(1kM)p jMp l M
!"
""
(2)
We can write the odds ratio in terms of the transition probabilities as
p (1kp )
!" .
ψ l ""
p (1kp )
!"
""
(3)
From these equations we can obtain the likelihood of each of the three possible outcomes in terms
of M and ψ :
1 N1j4M(1kM) (ψk1)k1
Π l 1kMk
for ψ 1, l (1kM)#
!
2
(ψk1)
N1j4M(1kM) (ψk1)k1
(ψk1)
N
1 1j4M(1kM) (ψk1)k1
Π l Mk
#
2
(ψk1)
Π l
"
for ψ l 1
for ψ 1, l 2M(1kM)
for ψ l 1
for ψ 1, l M#
for ψ l 1
The first and second derivatives with respect to ψ of
N1j4M(1kM) (ψk1)k1
at ψ l 1
(ψk1)
(evaluated as the limit from above) are k4M#(1kM)# and 4M$(1kM)$. At ψ l 1, the scores
corresponding to these three possible realized states are therefore 2M#, k2M(1kM) and 2(1kM)#.
The corresponding values of the information are 2M$(1jM), 2M#(1kM)#, and 2(1kM)$(2kM). For
each realization of the posterior distribution, the score and information are summed over all gametes
in the dataset.
P. M. MK  
184
Relationships of crossover rate and odds ratio for association of ancestry between linked loci to history
of admixture
If parents 1 and 2 have admixture M and M and produce gametes on which the probabilities of
"
#
European ancestry at locus B given European ancestry at locus A (equivalent to coordinate p in
""
the transition matrix (1) above) are P and P , the expected admixture of their offspring is "(M jM )
"
#
# "
#
and these offspring will produce gametes on which the probability of European ancestry at locus B
given European ancestry at locus A is
(1kθ) (M P jM P )j2θM M
" "
# #
" #,
M jM
"
#
where θ is the recombination fraction between A and B
Using this equation repeatedly, it is possible to calculate for any given history of admixture the
transition probability p in the transition matrix (1) above. From p , the other coordinates of the
""
""
transition matrix, and the odds ratio ψ can be calculated using equations (2) and (3) above.
If the lengths of chromosomal segments of African and European ancestry are distributed
exponentially with parameters µ and λ respectively on a gamete with admixture M, the transition
matrix for two loci separated by map distance x is
9
:
1 λjµe−(λ+µ)x µkµe−(λ+µ)x
λjµ λkλe−(λ+µ)x µjλe−(λ+µ)x
and M l µ\(λjµ).
The parameters λ and µ can thus be calculated from x, ψ and M. The crossover rate is "(λjµ).
#
Affecteds-only test for linkage of a disease or binary trait with a marker locus
The null hypothesis is that the population risk ratio r that the locus accounts for is 1. Suppose that
we sample only affected individuals from the population under study. We have shown previously
(McKeigue, 1998) that if a multiplicative model for penetrance applies, the probability Πi that an
affected individual has i alleles that have ancestry from the high-risk population (Europeans in this
example where the trait is defined by the presence of allele 1 at the AT3 locus), conditional on
parental admixture (which determines the probabilities P , P , P ) is :
! " #
N
P
P r
Pr
!
"
#
Π l
, Π l
, Π l
! P jP NrjP r
" P jP NrjP r
# P jP NrjP r
!
"
!
"
!
"
#
#
#
We write d for the realized proportion of alleles at the marker locus that have European ancestry in
the affected individual. The log likelihood for this realization, given that the individual is affected,
is
d log rklog (P jP NrjP r)
!
"
#
and the score function is
"P r−"/#jP
d
#
k # "
r P jP r"/#jP r
!
"
#
At r l 1, the score is simply the realized minus the expected proportion of alleles that have European
ancestry :
U l dk("P jP )
# "
#
and the information is
dk"P k("P jP )#.
% " # "
#
For each realization, these expressions are summed over all individuals in the dataset.
185
Detection of linkage in admixed populations
Testing for mis-specification of ancestry-specific allele frequencies
We write pX for the frequencies of allele 1 given African ancestry, pY for the frequency of the allele
given European ancestry, and P , P , P for the probabilities that the parents of the individual
! " #
transmit 0, 1, 2 alleles of European ancestry. These probabilities are functions of parental admixture.
The probabilities Πi of observing the genotype with i copies of allele 1 in an individual, conditional
on parental admixture, are as below :
Π l qX#P jqX qY P jqY#P
!
"
!
#
Π l 2pX qX Poj(pX qYjqX pY)P j2pY qY P
!
"
#
Π l pX#P jpX pY P jpY#P .
#
"
!
#
Differentiating the logarithms of these three expressions with respect to pX and pY, we obtain
expressions for the score vector for the three possible observed genotypes :
P
Y #
9k2qX PΠ!!kqY P" kqX PΠ"k2q
:
!
2(q kpX)P j(qYkpY)P (qXkpX)P j2(qYkpY)P
!
"
"
#:
U l9 X
Π
Π
"
"
2p P jpY P pX P j2pY P
"
"
#:
U l9 X !
Π
Π
#
#
U0 l
l [u X u Y]
!
!
1
l [u X u Y]
"
"
2
l [u X u Y].
#
#
Differentiating again with respect to pX, pY and multiplying by minus 1 yields the corresponding
expressions for the information matrix :
−"P
—
! !
9uu!X!Xu#k2Π
−
kΠ "P u Y#k2Π −"P :
!Y
! "
!
! #
u X#j4Π −"P
—
" !
I l9 "
u X u Yj2Π −"P u Y#j4Π −"P :
" " "
" "
" #
u X#k2Π −"P
—
# !
I l9 #
u X u YkΠ −"P u Y#k2Π −"P :
# "
#
# #
# #
I0 l
1
2
For each realization, the score and information are summed over all individuals in the dataset
(excluding those with missing genotypes at the locus under study).
R
Briscoe, D., Stephens, J. C. & O’Brien, S. J. (1994).
Linkage disequilibrium in admixed populations : applications in gene mapping. J. Hered. 85, 59–63.
Cavalli-Sforza, L. L. & Bodmer, W. F. (1971). The
genetics of human populations, San Francisco : Freeman.
Chakraborty, R. (1975). Estimation of race admixture
– a new method. Am. J. Phys. Anthropol. 42, 507–511.
Chakraborty, R., Ferrell, R. E., Stern, M. P.,
Haffner, S. M., Hazuda, H. P. & Rosenthal, M.
(1986). Relationship of prevalence of non-insulindependent diabetes mellitus to Amerindian admixture
in the Mexican Americans of San Antonio, Texas.
Genet. Epidemiol. 3, 435–454.
Chakraborty, R. & Weiss, K. M. (1986). Frequencies of
complex diseases in hybrid populations. Am. J. Phys.
Anthropol. 70, 489–503.
Chakraborty, R. & Weiss, K. M. (1988). Admixture as
a tool for finding linked genes and detecting that
difference from allelic association between loci. Proc.
Natl. Acad. Sci. USA 85, 9119–9123.
Elston, R. C. (1971). The estimation of admixture in
racial hybrids. Ann. Hum. Genet. 35, 9–17.
Gelman, A., Carlin, D. B., Stern, H. S. & Rubin, D. B.
(1995). Bayesian data analysis, London : Chapman &
Hall.
186
P. M. MK  
Gelman, A. & Rubin, D. B. (1992). Inference from
iterative simulation using multiple sequences. Statistical Science 7, 457–511.
Kaplan, N. L., Martin, E. R., Morris, R. W. & Weir,
B. S. (1998). Marker selection for the transmission\
disequilibrium test in recently admixed populations.
Am. J. Hum. Genet. 62, 703–712.
Little, R. J. A. & Rubin, D. B. (1987). Statistical
analysis with missing data, New York : Wiley.
Long, J. C. & Smouse, P. E. (1983). Intertribal gene flow
between the Yecuana and Yanomama : genetic analysis
of an admixed village. Am. J. Phys. Anthropol. 61,
411–422.
Louis, T. A. (1982). Finding the observed information
matrix when using the EM algorithm. J. R. Statistical
Soc. Series B 44, 226–232.
Maclean, C. J., Adams, M. S., Leyshon, W. C., Workman, P. J., Reed, T. E., Gershowitz, H. &
Weitkamp, L. R. (1974). Genetic studies on hybrid
populations. III. Blood pressure in an American Black
community. Am. J. Hum. Genet. 26, 614–626.
Maclean, C. J. & Workman, P. L. (1973). Genetic
studies on hybrid populations. I. Individual estimates
of ancestry and their relation to quantitative traits.
Ann. Hum. Genet. 36, 341–351.
McKeigue, P. M. (1998). Mapping genes that underlie
ethnic differences in disease risk : methods for detecting
linkage in admixed populations by conditioning on
parental admixture. Am. J. Hum. Genet. 63, 241–251.
Parra, E. J., Marcini, A., Akey, J., Martinson, J.,
Batzer, M. A., Cooper, R., Forrester, T., Allison,
D. B., Deka, R., Ferrell, R. E. & Shriver, M. D.
(1998). Estimating African-American admixture proportions by use of population-specific alleles. Am. J.
Hum. Genet. 63, 1839–1851.
Reed, T. E. (1971). The population variance of the
proportion of genetic admixture in human intergroup
hybrids. Proc. Natl. Acad. Sci. USA 68, 3168–3169.
Reed, T. E. (1973). Number of gene loci required for
accurate estimation of ancestral population proportions in individual human hybrids. Nature 244,
575–576.
Risch, N. & Merikangas, K. (1996). The future of
genetic studies of complex human diseases. Sciences
273, 1516–1517.
Spiegelhalter, D. J., Thomas, A., Best, N. G. & Gilks,
W. R. (1999). BUGS : Bayesian inference using Gibbs
sampling. WinBUGS version 1.2, Cambridge : Medical
Research Council Biostatistics Unit.
Stephens, J. C., Briscoe, D. & O’Brien, S. J. (1994).
Mapping by admixture linkage disequilibrium in
human populations : limits and guidelines. Am. J.
Hum. Genet. 55, 809–824.
Zheng, C. & Elston, R. C. (1999). Multipoint linkage
disequilibrium mapping with particular referent to the
African-American population. Genet. Epidemiol. 17,
79–101.