Download Haseman, J.; (1970)The genetic analysis of quantitative traits using twin and sib data."

Document related concepts

Medical genetics wikipedia , lookup

The Bell Curve wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Heritability of autism wikipedia , lookup

Genetic testing wikipedia , lookup

Biology and sexual orientation wikipedia , lookup

Gene expression profiling wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Genetic engineering wikipedia , lookup

Human genetic variation wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Public health genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Behavioural genetics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Twin study wikipedia , lookup

Heritability of IQ wikipedia , lookup

Transcript
1-
..I
I
[
'I ~
I
Ii
-•J
. . . .1
-'e·
j
.
=1
THE GENETIC ANALYSIS OF QUANTITATIVE TRAITS
USING TWIN AND SIB DATA
i.·.
ri.~
.Il
.;
by
Joseph Kyd Haseman
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No.671
March 1970
,
I
I
I
~
I
I
JOSEPH KYD HASEMAN. The Genetic Analysis of Quantitative Traits
Using Twin and Sib Data. (Under the direction of R.C. ELSTON.)
A paired observations model is given for the genetic analysis
of quantitative traits.
sidered and the biases in the usual procedures for estimating
I
I
I
I
I
genetic variance from twin data are examined.
zygotic and dizygotic twin data.
Procedures are given for estimating and detecting genetic
variance from sib pair data when the proportion of genes identical
by descent over the entire genome is known for all sib pairs.
Methods are also given for estimating this proportion when it is
I
I
I
I
I
I
unknown.
Procedures are described for detecting linkage between a single
major trait locus and a marker locus from sib pair data.
These
-procedures are based upon the estimated proportion of genes identical
by descent at the marker locus.
A maximum likelihood procedure is
given that permits estimation of both the recombination fraction
and the genetic effect at the trait locus.
Data from Gottesman's Harvard Twin Study are analyzed, the
quantitative traits being MMPI subtest scores and the markers being
the ABO, MNS and Rhesus blood groups.
It is found that there may
be a single locus, closely linked to the ABO blood group, that is
responsible for a major part of the genetic variation on the PaS
scale.
I
I
New methods are
described for estimating genetic variance simultaneously from mono-
~
I
The special case of twin pairs is con-
e
I
..
--•
....
I
ACKNOWLEDGMENTS
The author expresses his appreciation to his advisor, Dr. R.C.
Elston, who suggested the topic of this dissertation and provided
invaluable guidance and counsel.
Appreciation is also expressed to
the other members of the advisory connnittee, Professors J.E. Grizzle,
G.G. Koch, D.R. Brogan, L.V. Jones and E.M. Cramer.
All of these
connnittee members gave assistance and made valuable suggestions in
the preparation of this dissertation.
Miss Maureen Moczek and Miss Cheryl Sheps helped type the
dissertation and their assistance is gratefully acknowledged.
Finally, the author is indebted to Dr. 1.1. Gottesman, who
permitted his Harvard Twin Study data to be used in this dissertation and to Mrs. Ellen B. Kaplan, who provided valuable assistance with the computer aspects of the data analysis.
I
..I
I
I
I
I
I
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS.
ii
LIST OF TABLES .
vi
Chapter
I.
I
aI
I
I
INTRODUCTION AND BASIC GENETIC MODEL • •
1
1.1.
1.2.
1.3.
1.4.
Introduction • • • • . . •
Notation and Definitions •
The Seven Mating Types ••
Partitioning the Genetic Component of
1
3
6
1.5.
1.6.
1.7.
Heritability . • • • • • • • • • . • • • .
Underlying Model for Paired Observations .
Underlying Assumptions ••
Variance . . .
II.
I
I.
I
I
....
8
9
10
14
REVIEW OF LITERATURE • • • •
15
2.1.
15
15
Heritability Studies
2.1.1. Early Heritability Measures • . . • • .
2.1.2. Intrac1ass Correlation as a
Heritability Index • .
2.1.3. Testing the Significance of
,.,.2
b y an F test • • • •
u
I
I
. . . ..
2.2.
g
2.1.4. Procedures with Ident~ca1 Twins
Reared Apart . • . . • . • • .
2.1.5. Procedures that Allow for
Genotype-Environment Covariance.
Procedures for Detecting Linkage from
Sib Data . • • • • • • • • •
2.2.1. Bernstein's Method and
Fisher's U Scores ••
2.2.2. Penrose's Sib Pair Method . • • • •
2.2.3. Morton's Sequential Test
for Linkage. • • • • • • •
2.2.4. Other Methods for Detecting
Linkage. . . . . . . . . . .
17
22
23
25
26
26
27
28
28
I
..t
iv
Chapter
III.
I
3.2.
3.3.
3.4.
I
IV.
0
2
g
FROM TWIN DATA.
30
from the Analysis of
Variance Tables • • • • . . • . . .
3.1.1. Unweighted Least Squares Estimation
3.1.2. Weighted Least Squares Estimation
Significance Tests • . . • . .
Maximum Likelihood Estimation
Nonparametric Test Procedures in
Twin Studies.
..•• •
4.3.
4.4.
43
49
50
50
52
2
•
g
4.6.
Maximum Likelihood Estimation of 0 2 • • •
g
2
Detecting 0 g by Nonparametric Test Procedures •
54
4.7.1.
4.7.2.
55
56
0
Spearman's Rho ••
Kendall's Tau ••
MAXIMUM LIKELIHOOD ESTIMATION OF THE PROPORTION OF
GENES IDENTICAL BY DESCENT IN SIB PAIRS . • .
5.3.
VI.
44
Weighted Least Squares Estimation of
5.1.
5.2.
I
43
Genetic Variance at a Single Locus •.
The Sib Pair Probability Tables for
Two Alleles . • . • . • • • • . • •
Genetic Covariance at a Single Locus
for Sib Pairs • • • • . . . . •
Estimation of Genetic Variance by
Regression Analysis . • . • . • •
4.4.1. Assuming no Dominance • • • .
4.4.2. Allowing for Dominance ••
4.5.
4.7.
V.
41
g
4.2.
a-
30
30
33
35
37
ESTIMATION OF 0 2 FROM SIB DATA WHEN THE PROPORTION
4.1.
I
I
I
Ie
I
I
Estimation of
2
g
OF GENES IDENTICAL BY DESCENT IS KNOWN ••
I
I
I
I
DETECTING AND ESTIMATING 0
3.1.
I
I
Page
Case A: Both Parental Genotypes Known ••
Case B: One or Both Parental
Genotypes Unknown . . • . • • • • • . •
Estimation when only the Sib Phenotypes
52
55
58
58
63
are Known . . . . . . . . . • . .
67
DERIVATION OF THE CLASSIFICATION TABLES •
70
6.1.
6.2.
70
6.3.
The Sixteen Classification Types • .
The Classification Table when the
Genotypes are Known • • . . • • . • . • •
The Classification Tables when Some
Genotypes are Unknown • . . . • . • . • . • •
73
77
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
v
Page
Chapter
VII.
THE PROPORTION OF GENES IDENTICAL BY
DESCENT AT A SINGLE LOCUS IN SIB PAIRS.
ESTI}~TING
7.1.
7.2.
VIII.
DETECTING LINKAGE BETWEEN A TRAIT AND MARKER LOCUS.
8.1.
8.2.
8.3.
IX.
X.
90
90
93
98
100
9.1.
9.2.
100
104
Deriving the Likelihood Function . .
Obtaining the Maximum Likelihood Estimates.
ESTIMATING LINKAGE BETWEEN MARKERS WHEN BOTH
PARENTAL PHENOTYPES ARE UNKNOWN . • . . • •
105
Derivation of the Likelihood Function. •
Example of the Estimation Procedure. . .
105
107
AN EXAMPLE OF THE GENETIC ANALYSIS OF QUANTITATIVE
TRAITS USING SIB PAIR DATA.
. . . .
III
Data • . •
Results of the Genetic Analysis . •
III
112
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH . .
124
12.1.
12.2.
124
125
11.1.
11.2.
XII.
Conditional Expectation of the Squared
Pair Differences . • • . . • . • .
Deriving the Expected Value of the
Regression Coefficient . • • . . • .
Detecting Linkage by Nonparametric Methods •.
84
88
MAXIMUM LIKELIHOOD ESTIMATION OF LINKAGE.
10.l.
10.2.
XI.
Properties of the Estimator
.
Estimation for the 16 Classification Types.
84
Summary. . . . . . . . . . . . . .
Suggestions for Further Research .
APPENDIX I •.
.........
127
APPENDIX II .
129
BIBLIOGRAPHY • • .
131
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
Ie
I
I
LIST OF TABLES
Page
Table
1.1.
The Seven Mating Types.
2.1.
General ANOVA Table for Paired Data •
17
2.2.
Heritability Coefficients for Selected Pairs.
19
2.3.
ANOVA Table for n Families of k Sibs Each
20
4.1.
Sib Pair Probabilities for an A A x A A Mating
1 2
Conditional on n . • • • • . • . • . •3•4•
45
4.2.
4.3.
5.1.
8
Probabilities of Sib Pairs for a Two-Allele Locus
Conditional on Parental Mating and n.
.....
46
Probabilities of Sib Pairs for a Two-Allele Locus
Conditional on n.
.....
48
..
............
Sib Pair Probabilities for an m-Allele Locus
Conditional on Parental Mating and n . • • . •
59
5.2.
Likelihoods for a Sib Pair that is A A -A A •
1 l 1 2
64
5.3.
Probability of Sib Pairs for an m-Allele Gene
Conditional on n . . . . . • • • • • . • • •
65
Probability of Sib Pairs for an m-Allele Gene Conditional
on n When one Parental Genotype is Known. •
66
6.1.
The 16 Classification Types
72
6.2.
Classification Table:
Genotypes Known . •
Both Parental and Sib
• . • . • • • . • . • • • .
75
6.3.
Classification Table:
Both Parental Phenotypes Unknown • .
79
6.4.
Classification Table: One Parental Genotype
and Both Sib Genotypes Known. • • . . . • • • • . . • . •
80
5.4.
I
..I
I
I
I
I
I
I
vii
Table
6.5.
Page
Conditional Probability of Sib Pair Types Given
0, 1 or 2 Genes I.B.D..
. .•.
"'-
82
7.1.
n.
8.1.
Conditional Distribution of Y. . .
91
8.2.
Joint Distribution of n
96
9.1.
Values of the Coefficient
for the 16 Classification Types . • .
J
jm
and n
~i'
jt
.
.•...•.
89
103
Estimated Gene Frequencies From the Harvard Twin
Study Blood Group Data . . • • . . . . . •
108
ML Estimates of Linkage Between Blood Groups
Using the Harvard Twin Study Data. • . . .
109
Weighted Least Squares Estimates of the Genetic
Parameters for the MMPI Variables . . . . • • .
112
11.2.
The 23 MMPI Variables with the Best Model Fit . • •
115
--I
11.3.
MMPI Variables with Significant Genetic Variance •
117
11.4.
Comparison of ML and Weighted Least Squares Estimation
of the Genetic Parameters for Lie and PaS Variables.
118
Observed Means for Lie and PaS Variables for
the ABO Phenotypes
119
I
I
I
I
I
11.6.
I.
I
I
10.1.
Jm
10.2.
1l.1.
11.5.
11.7.
11.8.
...·. ··.····
Observed Means for Lie and PaS Variables for
the MNS Phenotypes .
...·
····
...··
Rank Correlations Between D. and n.
Jm ·
J
ML Estimates for PaS and its Linkage to ABO.
··.····
.
"'-
120
120
121
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
CHAPTER I - INTRODUCTION AND BASIC GENETIC MODEL
1.1.
Introduction
A major area of interest in human genetics is the study of
quantitative traits.
One important problem in this area is that of
determining the degree to which a quantitative trait is genetically
determined and the degree to which it is environmentally determined
in a specified human population.
The variance of a particular
quantitative trait is assumed to be composed of a genetic component
2
2
(0 ) and an environmental component (0 ), the relative sizes of
e
g
these values being a measure of the relative effects of heredity
and environment.
The problem thus reduces to one of estimating
these variance components.
Unfortunately, in many of the methods
proposed to date, it is not clearly specified what underlying model
is used and what assumptions are required in order to estimate 0
2
and o.
e
2
g
In this dissertation the underlying model is stated in de-
tail and various methods of estimation are discussed.
Furthermore, if we are to understand the mechanisms by which
quantitative traits are inherited, perhaps the best approach is to
attempt to map the major genes responsible for them.
So far in human
genetics techniques have been devised solely to detect linkage between the loci of these hypothesized major genes and those of marker
genes, the genetics of which are known.
If linkage is detected,
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
2
there is evidence that such major genes exist.
In this dissertation
both the detection and estimation of linkage are considered.
Rapid
progress is being made in mapping the markers (Renwick, 1969) and
thus the approximate location of any gene linked to them will also
soon be determinable.
First, a general model for paired observations is presented
and a list of the most common simplifying assumptions is given (Chapter I).
Then the literature in the area is reviewed and various
methods of estimating cr
(Chapter II).
2
g
and cr
2
from twin and sib data are discussed
e
In particular, the biases inherent in these estimation
methods are examined.
In Chapter III new methods of estimating cr 2
g
2
and cr from twin data are given, requiring less stringent assumptions
e
than those required by the methods of analysis discussed in Chapter II.
The use of nonparametric procedures in the analysis of twin data, a
topic that has received little mention in the literature, is also
briefly discussed.
In Chapter IV new procedures are derived for estimating cr
2
g
from
sib data, procedures based on TI, the proportion of genes two sibs have
identical by descent, which is assumed to be known for each sib pair.
Chapter V describes the maximum likelihood procedure for estimating
TI when its value is unknown.
In Chapters VI-IX new methods are given for estimating a single
major trait gene's genetic effect and distance from a marker locus.
These methods are based on TIm' the proportion of genes two sibs have
identical by descent at the marker locus.
voted to the problem of estimating TIm'
Chapters VI-VII are de-
In Chapter VIII the estimator
I
..I
I
I
I
I
I
I
ae
I
I
I
I-
I
,
•-e
3
of TI
m
and the sib pair differences are used in a regression analysis
to detect linkage between trait and marker locus.
In Chapter IX a
maximum likelihood procedure is described for estimating both the
genetic effect of a major trait gene and the linkage between it and
a marker locus, using sib data.
In Chapter X a new maximum likeli-
hood procedure is given for estimating the linkage between two
marker genes from sib pair data only, i.e., no information is available as to the phenotypes of the parents.
Finally, in Chapter XI, data from Gottesman's (1966) Harvard
Twin study are analyzed.
The 63 mental traits under investigation
are variables measured by sub test scores of the Minnesota Multiphasic Personality Inventory (MMPI).
First, the twin analyses of
Chapter III are performed to determine which variables have a strong
genetic effect.
These variables are then subj ected to the sib pair
analyses of Chapter VI-IX in an effort to link the variables to the
ABO, MNS and Rhes us blood groups.
1.2
Notation and Definitions
In this section we introduce the terminology that will be
used later and give some definitions.
Two genes are said to be iderttital~ descent (i.b.d.) if
they are derived from the same gene through division and subsequent
transmission (Cotterman, 1940; Malecot, 1948).
For example, sup-
pose that at a particular locus there are four possible alleles:
AI' A , A and A •
2
3
4
If a mating that is A A2 x A3A4 for this locus
l
yields two sibs that are A A and A A , then the sibs have one gene
l 4
l 3
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
4
i.b.d. (AI) at this locus.
If the sibs are both A A , then they
l 3
have two genes i.b.d. at this locus.
Two genetically related individuals have a certain proportion
of their genes i.b.d.
We shall denote by
i.b.d. over the entire genome, and by
TI
,
m
TI
the proportion of genes
the proportion of genes
i.b.d. at a particular locus m, for two individuals.
Thus
TI
be 0, ~ or 1, regardless of how the individuals are related,
m
must
Since
this dissertation deals primarily with sib pairs, we let TI . and TI.
Jm
J
.
. 1y f or t h e J.th S1'b pa1r.
d enote TI an d TI respect1ve
m
The term random mating refers to the situation in which every
individual in a large population has an equal probability of mating
with any individual of the opposite sex in the population.
random mating for the loci we shall be working with.
We assume
As a direct
consequence of this assumption (if there is no selection) the random
mating population will be in Hardy-Weinberg equilibrium.
That is,
a population in which the two alleles A and a occur with gene frequencies p and q=l-p respectively, will consist of three genotypes
2
AA, Aa and aa, and the probability of these genotypes are p , 2pq
and q
2
respectively.
Suppose there are two alleles at each of two loci, i.e., alleles
A and a with frequencies PI and I-PI at one locus and alleles Band
b with frequencies P2 and l-P2 at the other.
By linkage equilibrium
we shall mean that the proportion of gametes in the population that
respectively.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
5
If an individual is AB/ab, he can produce gametes that are
AB, ab, aB or Ab, the relative proportions depending upon how
tightly linked the two loci involved are.
Let c denote the pro-
portion of crossover gametes (Ab or aB) that are formed in this
situation.
Then c, often called the "recombination fraction", is
a measure of the distance between the two loci and will be assumed
to lie between 0
and~.
It will also be assumed that c is the same
for both sexes.
A Ehenoset is defined to be the set consisting of all genotypes that have a certain phenotype (Cotterman, 1969).
For example,
if there are two alleles A and a with A dominant to a, then the
two genotypes AA and Aa are phenotypically indistinguishable and
hence belong to the same"Ehenoset.
There are two phenosets in this
situation, namely
and
P 2:
aa
A marker gene for human populations is a gene involving a
single locus, the genetics of which are known.
That is, we can
specify the phenotype corresponding to each known genotype.
In
order to be a useful marker, a gene must also be EolymorEhic
(Ford, 1940), by which we shall mean that the gene frequency of
the most common allele cannot be too large, a commonly quoted figure being p
=
.99.
The ABO blood group is an example of a marker
gene.
By a trait gene we shall mean an hypothesized gene of unknown
I.
I
I
location and effect that influences a particular quantitative trait.
I
..I
I
I
I
I
I
I
6
Much of the present work deals with methods for detecting trait
genes and estimating the linkage distances between them and various
marker genes.
By the term Classification Table shall be meant a table that
gives the conditional probability that TI.
Jm
= 0,
~
or 1 for a parti-
cular locus m and sib pair j, given the phenotypes of both sibs and
the phenotypes of the parents (if known).
1.3
The Seven Mating Types
We next make clear what we mean by a mating type, a term
that has been used in the literature in two different ways.
Some
authors use this term to refer to genotypically distinct matings.
--I
However, we prefer to call these matings and will use the term
I
I
I
I
I
Aa x Aa.
I.
-
i
mating
~
in a broader sense as does Kempthorne (1957).
To illustrate the terminology, consider the six matings in the
two allele case:
AA x AA; aa x aa; AA x Aa; AA x aa; Aa x aa;
The first two matings are alike in a sense since they both
involve identical homozygotes.
the same
mating~,
Thus, these twomatings belong to
and a mating of two identical homo zygotes will
be called a Type I mating.
It is very easy to show that for an
~allele
autosomal gene,
(M > 4) there are exactly seven mating types for diploid organisms.
The proof of this is as follows:
Each parent must either be a hetero-
zygote (AiA ) or a homozygote (AiA ).
i
j
homozygotes.
(1)
Suppose both parents are
Then they must have either 2 or 0 (but not 1) allele
alike, i.e., the mating must be
I
7
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
A.A. x A.A.
I.
(2)
I.
I. I.
or
Suppose one parent is homozygous, the other heterozygous.
Then they must have 1 or 0 (but not 2) alleles alike, i.e., the
mating must be of the form
or
(3)
Finally, suppose that both parents are heterozygotes.
Then
they have either 2, 1 or 0 genes alike, i.e., the mating may be
or
A.A. x A.A.
I.J
I.J
or
Since the above cases exhaust all possibilities we conclude
that there are seven mating types.
Yasuda (1968) has tentatively
given names to these mating types as indicated in Table 1.1; in
this table Pi' Pj' Pk and PI are the gene frequencies of Ai' Aj ,
A and Al respectively.
k
For haploid organisms there are only two mating types
(A. x A. and A. x A.).
I.
I.
I.
J
For triploid organisms the number of mating
types increases sharply to 22.
The terms sib pair
to mating
~
~
and sib pair will be used analogously
and matin& respectively,
I
8
..I
I
I
I
I
I
I
1_
I
I
TABLE 1.1
THE SEVEN MATING TYPES
Mating
type
Name
(Yasuda)
Frequency
of mating
I
AiA i x A.A.
]. ].
Incross
4
Pi
II
x A,A,
A.A.
]. ].
Outcross
2 2
2P P
i j
III
A,A, x A.A.
]. ].
Backcross
3
4P P
i j
IV
A,A.
]. ]. x Aj~
3-Way Outcross
2
4P i Pj Pk
Intercross
2 2
4P i Pj
J J
]. J
V AiA x A,A,
j
]. J
VI
AiAj x Ai~
3-Way Intercross
2
8PiPjPk
VII
A,A, x AkA
l
]. J
4-Way Intercross
8PiPjPkPl
1.4
Partitioning the Genetic Component of Variance
Consider a quantitative trait that is influenced by both
environment and alleles at one or more loci.
The genetic component
2
g
of variance for that trait, a , can be partitioned as indicated
below (Li, 1955)
where:
0
2
a
is the sum over all loci of the additive genetic vari-
ance for each individual locus.
It is usually the major component
of genetic variance, and it is often assumed that 0
2
a
= 0 g2 .
2
ad is the sum of dominance variances for each locus and may
be thought of as intra-locus interaction.
For example, consider a
I
9
..I
I
I
I
I
I
I
single locus with two alleles A and a.
If the effects are additive,
then the heterozygote will have a trait value exactly midway between
the two homozygote values.
0~
is a measure of the amount of depar-
ture from this intra-locus additivity.
0:
~
is the variance due t6epistasis and can be thought of as
inter-locus interaction.
It represents the combined effects of
variance due to interaction among additive and dominance deviations
at two or more loci and can be written (Kempthorne, 1957)
2
0.
~
For example,
0
=
o
2
+.....•
aa
(1.2)
2
represents the sum over sets of loci of the variad
ance due to additive x dominance interaction; 0
2
aaa
represents the
variance due to additive x additive x additive interaction etc.
1_
For further discussion of epistasis see Cockerham (1954) and
Kempthorne (1957).
I
I
I
I
I
1.5
Heritability
Although the concept of heritability was used prior to Lush's
work (1945), his definitions form the basis for most considerations
today.
He defines heritability
sense.
Symbolically,
2
h
(broad)
0
=
2
h
=
(narrow)
-e
0
2
+
g
2
g
in both a broad and narrow
2
g
0
0
(194~)
0
2
e
2
a
+
0
(1.3)
2
e
(1.4)
I
I.
I
I
I
I
I
I
I
1_
10
Sometimes (1.3) is termed "the degree of genetic determination" and the
term "heritability" used only for (1.4)
(Falconer, 1960, p. 146).
A number of authors have cautioned against making undue inferences
2
in the interpretation of h •
Elston and Gottesman (1968) quote Fisher
(1951) as remarking that heritability
"has both a numerator and a denominator, and its value depends on both elements; whereas, however, the numerator
has a simple genetic meaning, and if properly determined
should be an accurate estimate of the genetic variance ..•
the denominator is the total variance due to errors of
measurement, in the strict sense, and, what in the wider
sense are also errors of measurement, namely, those due
to uncontrolled, but potentially controllable, environmental variation ••• Obvious1y, the information contained
in the numerator is largely jettisoned when its actual
value is forgotten, and it is only reported as a ratio
to this hotch-potch of a denominator."
For this and other reasons some authors feel that the primary
concern of studies in this area should not be with heritability, but
I
I
I
with the genetic component of variance.
That is, one should be con2
cerned primarily with techniques for estimating 0 , and testing whether
g
or not it is significantly different from zero.
The present work will
deal with both of these problems.
1.6
Underlying Model for Paired Observations
Since most of the work in this area has made use of pairs of indi-
vidua1s (generally twins or sibs), we will begin by introducing a model
that is designed to handle paired observations.
this general model will then be studied.
Some special cases of
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
11
Suppose we have data for n pairs of individuals, in particular
their observed values for a particular quantitative trait of interest,
such as I.Q.
Let x
lj
and x
2j
individuals in the jth pair.
due to three causes:
We assume that the observed values are
an overall mean, a genetic effect, and an
environmental effect.
=
be the observed values for the two
]J
The model may be written
+ glj + e lj
j
(1. 5)
= 1,2, ... n
We assume that the random variables g .. and e .. have means
1J
zero and variances 0
2
g
and 0
2
e
respectively.
1J
We make no distributional
assumptions other than a particular structure for the means, variances and covariances of the random variables in the model.
Since
in most cases we would expect the environmental effects of individuals in the same pair to be related, we let
= o ee'
(1. 6)
Later, when considering special cases, we will adopt the notation
of Elston and Gottesman (1968), e.g., we will let C ' C
nz and CFS
MZ
denote the environmental covariance for monozygotic twins, dizygotic
twins and full sibs respectively.
The genetic makeup of individuals in the same pair will certainly be related.
Thus we let
o
gg'
(1. 7)
I
le
I
I
I
I
I
I
I
I'
I
I
I
I
I
I
..I
I
12
2
2
0gg' can generally be expressed as a function of 0a' ad' and
2
ai' the exact expression depending upon how the paired individuals are
related.
For example, it is well known (Lush, 1949) that under random
mating, for the special cases of monozygotic twins, dizygotic twins,
sibs and parent-offspring pairs
Monozygotic twins:
Dizygotic twins
(and full sibs):
Parent-offspring:
a gg'
=
a gg'
=
~O;
a gg'
=
~O; + f2(0~)
(1.8)
+
~O~
+
fl(O~)
(1. 9)
(1.10)
2
2
Where fl(Oi) and f (Oi) refer to certain fractions of components of
2
2
0 (see Cockerham, 1954, for details).
i
The genetic effect for an individual may not be independent of his
environmental effect.
Thus, initially at least, we let
Cov(g .. , e .. )
J.J
J.J
=
a
Cov(glj' e 2j )
=
Cov(g2j' e lj )
i=1,2
ge
(1.11)
*ge
a
(1.12)
More will be said about this problem later.
Finally, we assume that individuals not in the same pair are independent with respect to genetic and environmental effects, that is,
Cov(e .. , e.,.,) = Cov(g .. g.,.,) = Cov(gJ..J., eJ..'J.') = 0
J.J
J. J
J.J, J. J
i
1,2
i'
1,2
j
r
j'
(1.13)
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
13
-
We now consider two special cases:
monozygotic (identical)
twins and dizygotic (fraternal) twins.
are genetically identical, glj
Since monozygotic twins
= g2j and from (1.5)-(1.8), (1.11)
and (1. 12)
Var(x .. )
1J
2
ax(HZ)
= ag2 + a e2 + 2a ge
(1.14)
i = 1,2
2
ag + 2a ge + CHZ
(1. 15)
From (1. 14) and (LIS) it follows that
=
2
a
MZ
(1.16)
=
Dizygotic twins are genetically the same as full sibs.
2
2
2
Var(x .. ) = ax(DZ) = a + a + 2a
g
e
ge
1J
Cov(x
lj
, x
2j
)
axx' (DZ)
= ~a
2
+
a
~a
Hence
i = 1,2
2
+ f l (a i2)
d
(1.17)
*
+ 2a ge
+ CDZ
(1.18)
From (1.17) and (1. 18) it follows that
Var (x
lj
- x
2
) = aDZ
2j
2
3 2
aa + ~d + 2a:1 - 2f l
2
2
2a e
i +
(a )
+ 4a ge - 4a*ge - 2C Dz
(1. 19)
For full sibs (1.17)-(1.19) will hold with C replacing C '
DZ
FS
We have assumed that
a~(MZ) = a;(DZ)'
an intuitive assumption that
should be true "under almost any circumstances" (Kanpthorne and
Osborne, 1961, p. 329).
In Chapter III a procedure will be given
that provides an approximate test of the validity of this assumption.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
14
1.7
Underlying Assumptions
Certain of the parameters in the model discussed in the previous
section are often assumed to be zero.
The assumptions most often
made are given below.
2
Assumption I
a.1.
Assumption II
ad
2
0
0
ge
a*
ge
Assumption IV
C
MZ
= Cnz
Assumption V
C
MZ
=
Assumption III a
0
0
and/or
nz =
C
0
One may doubt the validity of these assumptions, yet some or
all of them are commonly made by researchers, often without even
mentioning this fact.
In particular, Assumption V is often overlooked.
In a number
of instances authors have given results ostensibly depending upon
Assumption IV, but in actuality depending upon Assumption V.
For
a further discussion of these assumptions see Price (1950), Harris
(1965) and Ostlyngen (1949).
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
CHAPTER 11- LITERATURE REVIEW
In this chapter literature in two areas of genetic analysis
is reviewed.
First, published techniques for estimating herit-
ability from twin and sib data are reviewed and discussed in terms
of the model introduced in Section 1.6.
Then, a survey is made
of the literature dealing with the detection of linkage between
marker genes and major trait genes froN sib data.
2.1
Heritability Studies
2.1.1
Early Heritability Measures.
Although geneticists
have long been interested in quantitative traits, it was not until
Galton's work with twins (1875) that a methodology was developed
to deal with the heritability of such traits.
behind the "twin method" is this:
The reasoning
since monozygotic twins are
genetically identical and dizygotic twins are genetically the
same as full sibs, a trait that is primarily genetic results in
twin pair differences that are smaller for monozygotic twins than
for dizygotic twins.
On the other hand, an environmentally de-
termined trait produces twin pair differences that are approximately the same for both types of twins.
The problem is to con-
struct a measure that accurately reflects the relative effects
of heredity and environment.
The earliest heritability measures were based on the sample
I
..I
I
I
I
I
I
I
t'
I
I,
I
I
I
I
I.
I
I
16
mean deviation (MD = Ilxij=x2jl/n). Let MD(MZ) and MD(DZ) denote the
sample mean deviation for monozygotic and dizygotic twins respectively.
Then the "difference method" of Lenz and von Verschuer (1928) led to
the following statistic as a heritability measure:
MD(DZ) - MD(MZ)
(2.1)
MD (DZ)
This formula has a certain intuitive appeal.
An h 2 of zero in-
dicates that twin pair differences are virtually the same for both
types of twins, implying the absence of a genetic effect.
An h 2 of
one implies that all monozygotic twins have the same trait value, thus
indicating a strong genetic effect.
Intermediate values reflect the
relative "strength" of the two effects.
The "quotient method" of Gottscha1dt (1939) led to the following
measure:
MD (DZ)
=
(2.2)
MD(DZ)
+ MD(MZ)
The formula proposed by Wilde (1941) may be written
=
-VMD~DZ) - MD~MZ)
-VMD~DZ) - MD~MZ)
(2.3)
+
MD (MZ)
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
17
These early heritability measures are little used today.
The primary reason for this lack of acceptance is the fact that
these measures do not lend themselves easily to statistical treatment, since they are based on the mean deviation rather than the
standard deviation.
2.1.2
Intraclass Correlation
~~
Heritability Index.
A
number of recently proposed heritability coefficients have been
based upon the intraclass correlation p and its sample estimate
r.
We begin by defining p using Table 2.1, the general ANOVA
table for paired data derived by Kempthorne and Osborne (1961).
In this table 0
where x
lj
and x
2
x
2j
= Var(x loJ ) = Var(x 2oJ )
and 0
xx
, = Cov(x
1j
,x
2j
),
O~ and 0A2 are the usual
are defined by (1.5).
within and among group components of variance (Graybill, 1961).
TABLE 2.1
GENERAL ANOVA TABLE FOR PAIRED DATA
df
MS
Among pairs
n-l
M
Within pairs
n
~
Source
EMS
2
2
2
0 x + 0 xx' = Ow + 20
A
2
2
Ow
xx'
x
A
° -°
The intraclass correlation is defined by
°xx'
2
x
p =
2
2
0A + Ow
Note that since Var(x
lj
)
=
var(x
(2.4)
°
2j
)
=
the (population) correlation between x
2
Ox' p can be thought of as
lj
and x
2j
•
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
18
Kempthorne and Osborne (1961) remark that P, estimated by
(2.5)
"has been called heritabili ty" by some authors (Hancock, 1952;
Stormont, 1954), and "in some cases there are good reasons for
using this word which is likely to imply 'the degree' to which
a trait is inherited" (Kempthorne and Osborne, 1961, p. 324).
From (1.14) and (1.15) the intraclass correlation for monozygotic twins may be written as
0
2
g
+ 2°ge +
(2.6)
+ 2°ge
which reduces to Lush's h
2
if there is no genotype(broad)
environment covariance, and if C ' the environmental covariance
MZ
for monozygotic twins, is zero.
If in addition Assumptions I
and II of Section 1 • 7 hold , then PMZ
2
2
= h (broad) = h (narrml7) •
Similarly, it can be shown that if Assumptions I-III are valid,
2
2
0 t h en 2 PDZ -- h (broad) -- h (narrow)'
an d 1. f CDZ='
There is no reason why the paired observations need be tWins,
or even sibs, as long as it is clearly understood what assumptions
must be valid in order for the resulting heritability coefficient
2
to be Lush's h.
For example, Kempthorne and Tandon (1953) esti-
mate heritability from parent-offspring pairs.
Table 2.2 gives
the heritability coefficient for those pairs of individuals most
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
19
often used.
In this table h
2
h
2
(broad)
= h 2(narrow)
except
VJ h
ere
noted.
TABLE 2.2
HERITABILITY COEFFICIENTS FOR SELECTED PAIRS
Type of pair
Heritability Coefficient
Monozygotic Twins
P.Mz
h
P.Mz = h
Dizygotic Twins
2PDZ = h
Full Sibs
2PFS = h
Parent-Offspring
2PPO
h
Half-Sibs
4PHS
h
Uncle-Nephew
4PUN = h
2
(broad)
2
Assumptions required
III, CMZ=O
I, II, III, CMZ=O
2
I, II, III, CDZ=O
2
I, II, III, CFS=O
2
I, III, CpO=O
2
I, III, CHS=O
2
I, III, CUN=O
From Table 2.2 we see that in all cases the environmental
covariance between pair members must be zero in order for the
2
heritability coefficient to reduce to Lush's h .
This will sel-
dom be the case, however, and few studies to date have successfully handled this problem of correlated environmental effects.
Elston and Gottesman (1968) overcome this difficulty to a certain
extent by incorporating data on parents and non-twin sibs into
the analysis.
Using an Analysis of Variance approach, the authors
derive unbiased estimators of a
2
g
2
and a under Assumptions I-IV,
e
or under a set of alternative assumptions (I, III, IV and CpO=C
FS
In the next chapter some alternative procedures that also attempt
to handle this problem will be discussed.
I.
I
I
The ANOVA approach of Kempthorne and Osborne can be extended
).
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
Ie
I
I
20
to the case of k sibs per family (e.g., Fuller and Thompson, 1960).
Table 2.3 gives the ANOVA table for this more general situation.
For the special case of k=2, Table 2.3 reduces to Table 2.1.
TABLE 2.3
ANOVA TABLE FOR n FAMILIES OF k SIBS EACH
Source
df
MS
Among families
n-l
M
A
Within families
(k-l)n
~
EMS
0
2
2
2
+ (k-l) xx' = 0 W + kOA
x
2
2
0
= Ow
x - 0 xx'
°
The statistic corresponding to (2.5) may be written
(2.7)
and the appropriate heritability coefficient can be obtained from
Table 2.2.
However, the environmental correlation between family
members still must be zero in order for the heritability coeffi2
cient to reduce to Lush's h •
Another method that has been suggested is to construct a
heritability measure based on the intraclass correlations for both
monozygotic and dizygotic twins.
The best known of these measures
is due to Holzinger (1929) and may be written
(2.8)
where r MZ and r
are sample correlations between xl]' and x ' for
nZ
2J
monozygotic and dizygotic twins respectively. Although Holzinger's
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
21
measure involves only sample values, "Holzinger's Formula" has been
referred to by many authors (e.g., Kempthorne and Osborne, 1961;
Nichols, 1965; Harris, 1965) as involving population values and then
written as
=
2
where a
MZ
%Z - Pnz
1 -
Pnz
2
and aDZ are the variances of twin pair differences as
defined by (1.16) and (1.19).
Holzinger did not intend for his statistic to measure heritability as defined by Lush, but it has often been used for this
purpose.
From (1.16) and (1.19) we see that if Assumptions I-V
are valid, then (2.9) may be written
a
2
g
/g +
2
which is not Lush's h •
2/e
I
I
(2.10)
Assumption V is necessary even to obtain
(2.10), a point apparently overlooked by some authors (Harris,
1965; Elston and Gottesman, 1968), who felt that Assumptions I-IV
were sufficient for this result.
Another heritability index based on P
and P
has been proMZ
DZ
posed by Nichols (1965) and can be written
h2
=
2(PMZ - PDZ )
PMZ
I.
(2.9)
which under Assumptions I-IV reduces to
(2.11)
I
22
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
(2.12)
where C is the environmental covariance.
If Assumption V is also
2
made, then C=O, and (2.12) reduces to h =1.
2.1.3
Testing the Significance of a~
EY
an
K Test.
A number
of studies (e.g., Clark, 1956; Vandenberg, 1962; Vandenberg et. a1.,
1968; Block, 1968) use the F test proposed by Dahlberg (1926) to
'
'f'1cance
test t h e s1gn1
0
f
ag2 .
It is assumed that the pair differences
for monozygotic and dizygotic twins are normally and independently
distributed, with means
by (1.16) and (1.19).
2
° and variances aMZ2 and aDZ'
which are given
I f Assumptions I-IV are valid, then for
monozygotic and dizygotic twins respectively,
X
1j
-X
2j
is distributed
as follows:
Dj(MZ) = x 1j - x 2j
'V
N(0,2 (a
Dj (DZ) = x1j - x 2j
'V
N(0,2(a
2
- C))
e
2
2
(2.13)
- C) + a )
g
e
Suppose there are data for N pairs of monozygotic twins and
M
N pairs of dizygotic twins.
D
From (2.13) we see tha if
a~=o,
then the statistic
F
=
=
~(DZ)
~(MZ)
(2.14)
has a central F distribution with N and N degrees of freedom,
M
D
where MW(DZ) and MW(MZ) are the within pair mean squares of Table
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
23
2.1.
Thus, a significantly large F indicates that
0
2
>0, when the
g
conditions given above are satisfied.
Although it seems not to have been noted in the literature, it
is possible to construct an F test in the more general situation in
which the twins can be classified in some sense, such as birth order.
If there is an order effect when such a classification is made, the
means of the pair differences will not be zero and may not even be
the same for the two groups.
However, if in this more general case
Assumptions I-IV are valid and the differences are normally distributed, i.e.,
Dj(MZ)
~ N(~MZ' 2(0~
Dj(DZ)
~ N(~DZ'
then it can be shown that when
- C»
2(0; - C) +
0
(2.15)
O~)
2=0, the statistic
g
F*
where
S;z
(2.16)
and S~ are the sample variances of twin pair differences,
has a central F distribution with (ND-l) and (NM-l) degrees of
freedom.
Thus, when these more general conditions are satisfied,
a significantly large F* indicates the presence of a genetic effect.
2.1.4
Procedures with Identical Twins Reared Apart.
A method
often used in an effort to overcome the problem of correlated
environmental effects is a procedure based on monozygotic twins
reared apart (e.g., Newman et. al., 1937; Burks, 1942; Burt, 1966).
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
Ie
I
I
24
It is hoped in such studies that the environments of the separated
twins are not related, and hence the assumption of CMZ=O is not as
unreasonable as would be the case if the twins were reared together.
One objection to this method is that since separated twins
must have shared the same pre-natal environment, and often also
for a short while the same post-natal environment, one would not
expect the environmental effects to be independent no matter how
early in life the twins were separated.
A second objection, a1-
though a practical rather than a theoretical one, is that identical twins reared apart are difficult to obtain.
Even those studies that do use identical twins reared apart
do not use the heritability estimate suggested by Table 2.2.
The
conventional approach is to examine concurrently monozygotic and
dizygotic twins reared together, and to use Ho1zinger!s statistic
(2.8) based on these twins to estimate heritability (Newman et. al.,
1937).
Then, as "a logical extension of Holzinger's Formula,"
the following coefficient based on both sets of monozygotic twins
is used as an index of "the percentage of phenotypic variation
ascribable to environment" (Nee1 and Schull, 1954, p. 276):
E
=
PMZT - PMZA
1 - PMZA
=
2
2
O"MZT - O"MZA
2
O"MZA
(2.17)
where P
and P
are the intrac1ass correlations for monozygotic
MZT
MZA
2
2
twins reared together and apart respectively; O"MZT and O"MZA are
the corresponding variances of twin pair differences as defined by
(1.16).
E may be written as
I
25
..I
I
I
I
I
I
I
--I
I
I
I
I
I
Ie
I
I
E
C
- C
MZT
MZA
0
(2.18)
2
- CMZA
e
where C
and C
are the environmental covariances for monoMZT
MZA
zygotic twins reared together and apart respectively.
Note that
this is not the environmental proportion of phenotypic variation
as is claimed.
For other criticisms of the methodology and sta-
tistical techniques used in analyzing data from monozygotic twins
reared apart, see Burks (1938) and McNemar (1938).
2.1.5
variance.
Procedures that Allow for Genotype-Environment CoOne difficulty of all methods mentioned above is the
necessity of Assumption 111- independence of genotypic and environmental effects.
There are some cases in human genetics in which
this assumption seems unwarranted, and it would be desirable to
have a design and analysis that permits the estimation of this
covarianee.
One such design for animal genetics has been proposed by Le
Roy (1960), but in human genetics, where one can not "design" an
experiment or control environmental effects, this design is of
little value.
One way of avoiding the problem altogether is to treat it
as one of semantics and define the environmental effect to be
that effect "which affects the phenotype independently of genotype" (Roberts, 1967, p. 218).
However, one should question the
interpretability of an effect defined in this manner.
One method that tries to allow for genotype-environment
covariance is the Multiple Abstract Variance Analysis (MAVA) of
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
26
Cattell (1960).
Unfortunately, this method has problems, both
practical and theoretical.
Practically, the design requires data
for such difficult to obtain individuals as monozygotic twins
reared apart, half-sibs reared together (and apart), and half-sibs
reared together by one true parent.
A study necessitating data
from such individuals would be a mammoth undertaking.
Theoreti-
cally, there is a problem in interpreting the "abstract variances"
in terms of more familiar genetic parameters.
A more serious
problem, however, is that Loehlin (1965) has pointed out several
serious errors in the MAVA equations that invalidate much of
Cattell's results.
An attempt was made to correct these mis-
takes and reanalyze Cattell's published data, but "a number of
the corrected variances were negative, and the effort was abandoned" (Loehlin, 1965, p. 161).
Thus, the problem of allowing for genotype-environment covariance remains essentially unsolved.
For further discussion
of this problem see Falconer (1960), Cattell (1963) and Parsons
(1967) •
2.2
Procedures for Detecting Linkage from Sib Data
2.2.1
Bernstein's Method and Fisher's U Scores.
Bernstein
(1931) was the first to point out that linkage can be detected
and estimated from data involving information from only two
generations.
His method assigned each family a score, whose sum,
expected value and variance provide a linkage test in any body of
I.
I
I
data that is sufficiently large for the distribution of the total
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
27
score to be nearly normal.
Bernstein's approach was further developed
by Hogbcn (1934) and Haldane (1934).
Fisher (1935), following the same general procedure adopted by
Bernstein, devised a maximum likelihood scoring procedure that made
earlier methods obsolete.
Fisher's "D Scores" were found to be more
efficient than Bernstein's scores for all linkage intensities, and also
permitted easier combination of information from different sized families.
Although Fisher's U Score method is still recommended by some
authors (Bailey, 1961), these early methods have generally been replaced by the test procedures discussed in the following sections.
2.2.2 Penrose's Sib Pair Method.
Penrose (1935) was the first to
propose a method for detecting linkage that uses only sib pair data.
His 1935 paper dealt with the special case of detecting linkage when
each locus involves only two alleles; one dominant, the other recessive.
Penrose (1938) later extended the sib pair method to the case
of "graded human characters."
This essentially involved relaxing the
dominance assumption of the earlier paper and assuming that the trait
value of the heterozygote was midway between that of the two homozygotes.
The sib pair method was later made even more general (Penrose,
1950; 1953) to allow for multiple alleles.
With minor modifications the sib pair method has been used by a
number of authors in linkage studies (Kloepfer, 1946; Howells and
Slowey, 1956; Lowry and Shultz, 1959).
It has the advantages of
arithmetic simplicity and serving as a linkage test when the
par~nta1
genotypes are unknown, even when the genetic mechanisms of both traits
are unknown.
On the other hand, the method often requires a large
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
28
number of pairs in order to achieve significant results, and in certain situations (Finney, VI, 1942) it was found to extract only a
small fraction of the information that could be obtained by Fisher's
U Scores.
2.2.3 Morton's Sequential Test for Linkage.
Morton (1955) de-
rived a sequential probability ratio test for linkage when both parental genotypes are known and there are only two alleles at each
locus.
The test procedure is based on "lad scores," which for a par-
ticular family is defined as
Z =
Log
lO
[
p(Flc,c')
p(FI~,~)
where p(Flc,c') denotes the probability of occurence of a family F
when the recombination fraction is c in females and c' in males.
In a later paper Morton (1956) used lod scores to obtain likelihood ratio tests of homogeneity and maximum likelihood estimation
of linkage.
Later the method was extended to multiple allele test
loci (Steinberg and Morton, 1956) and multiple alleles at both loci
(Morton, 1957).
Morton's sequential test procedure has been found to be superior
to Fisher's U Scores and Penrose's sib pair method in a number of situations (Morton, 1955).
It has the advantage of allowing for both de-
tection and estimation of linkage.
On the other hand, it requires
knowledge of parental genotype and is cumbersome numerically, requiring
the calculation of lod scores.
Maynard-Smith et. ale (1961) give
tables of lod scores for the simpler mating types.
I.
I
I
2.2.4 Other Methods for Detecting Linkage.
techniques have been used to detect linkage.
A number of other
Haldane and Smith (1947)
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
I.
I
I
29
devised a probability ratio rest that avoided some of the assumptions
required by Fisher's U Scores.
However, the method is conservative
and a proposed modification (Smith, 1953) is less efficient (Morton,
1955).
Brues (1950) , using a test statistic based on the square root
of the average squared metric trait differences between sib pairs,
detected linkage between body build and freckling.
However, little
use has been made of this method in recent studies.
Recently Thoday (1967) suggested a new approach that may prove
useful.
Assuming that the metric trait has only two codominant al-
leles in the population in equal frequency and assuming complete
linkage between marker and metric trait gene and attainment of linkage
equilibrium, Thoday's model leads to a higher variance within and a
10we~
variance among families for sibs heterozygous for the marker
trait than for sibs homozygous for the same.
Thoday's method has been used to isolate major trait genes
in Drosophila and mice and seems well suited for adaptation to human
genetics.
However, the method as originally presented is quite re-
strictive, and as yet no general treatment has been given.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
CHAPTER III - DETECTING AND ESTIMATING 0 2 FROM TWIN DATA
g
In the previous chapter it was found that many heritability measures implicitly require the environmental covariances to be zero.
Some
procedures that avoid this difficulty, all based on twin data, are now
described.
3.1
Estimation of O~ from the Analysis of Variance Tables
In this section some procedures are derived for estimating 0
using the Analysis of Variance Table 2.1.
based on four mean squares:
2
g
The estimation procedure is
the within and among mean squares for mono-
zygotic and dizygotic twins.
3.1.1
Unweighted least squares estimation.
Suppose we have data
for N pairs of monozygous twins and N pairs of dizygous twins and perM
D
form two separate Analyses of Variance as indicated by Table 2.1.
make Assumptions I-IV and wish to estimate
O~' O~
We
and C=CMZ=CDZ from the
four independent mean squares MA(MZ), ~(MZ)' MA(DZ) and MW(DZ)'
From
(1.14)-(1.18) the expected mean squares are given by
E
AM
= E(MA(MZ»
2
2
= 20 g + 0 e + C
2
~ = E(~(MZ»)= 0e - C
E = E(MA(DZ»
AD
2
3 2
=-;:(5
+ 0 + C
2 g
e
= E(~(DZ»
2
= ~02 + 0 - C
g
e
E
wn
(3.1)
I
..I
I
I
I
I
I
I
31
which in matrix notation may be written
E(:!)
=
(3.2)
Xl?
where
MA(MZ)
~(MZ)
y
X
MA(DZ)
1
1
1
1
[L -~]
.5
B
=
-1
[:!]
~(DZ)
An intuitive procedure is to select estimators that give the best
least squares fit to the four mean squares.
The unweighted least squares
estimators are in general (Graybill, 1961)
ft
I
I
I
I
I
I
I.
I
I
(3.3)
(3.4)
and in this special case may be written
-"2
0g
= MA(MZ)
- ~(MZ) - MA(DZ) + ~(DZ)
(3.5)
c
= ~(-MA(MZ) + ~(DZ) +
2MA(DZ) - 2~(DZ»
The estimators of (3.5) will be unbiased if Assumptions I-IV are
valid.
If these assumptions are not valid, however, (3.5) will probably
tend to overestimate o2 and underestimate (52 and C.
g
(1.14)-(1.18), (3.5) and Table 2.1
e
For example, from
I
32
..I
I
I
I
I
I
I
Itt
I
I
I
I
I
I
I.
I
I
=
=
2
2
1
2
1
2
f ( 2)
2 ( a g + age + CMZ - '20a - 'l;0d - 1 a i
(3.6)
a* )
=
ge
222
ad and 0i -2f (Oi) must be nonnegative.
Moreover, we would expect
1
*
ICMZI>PDZI and 10gel>logel.
Hence, if these covariances are positive, as
.
A2
2
one would often expect, a may overestimate O. Note, however, that if
g
g
the quantitative trait is one in which the environmental covariances or
A2
genotype-environment covariances are negative, a may actually underesg
2
timate 0 •
g
2
E(a )
=
e
A
E (C)
It can also be shown that
=
2
2
2
2
- 20 * ) - 2(C
- a - (a. -2f (a.» - 2(0
- C )
ge
d
ge
e
1.
l 1.
MZ
DZ
2
2
2
- 2(0
-20 * )
(2C
- C ) _ 0
(a.1.2 -2f 1 (0.»
DZ
MZ
d
ge
ge
1.
2
a
resulting in probable underestimates of 0
2
e
(3.7)
(3.8)
and C if the covariances men-
tioned above are positive.
Bock and Vandenberg (1968) also use the four mean squares of (3.1)
to estimate the genetic and environmental components of variance, but
their model appears to be in error.
written in the form
E
AM
2
2
= 0 1 + O2
E
2
= 01
E
AD
2
2
2
= 0 1 + O2 + 0 3
WM
wn =
E
2
2
01 + 0 3
Their expected mean squares may be
I
..I
I
I
I
I
I
I
ft
I
I
I
I
I
I
33
EAM+~ ~
Note that
EAD+E WD and EAM-EWM = EAD-E WD •
2
we see that this implies that 0x(MZ)
typic covariance 0
xx
~
From Table 2.1
2
0x(DZ) and that the pheno-
,is the same for both types of twins.
The
authors do not justify these assumptions, and the estimators they
obtain are of questionable value.
3.1.2
2
Weighted least squares estimation of 0 •
-
-
g
If the quanti-
2
tative trait under investigation is normally distributed, then 0 ,
g
o2 and C can be estimated by a weighted least squares procedure.
e
Consider the general model given by (3.2) and suppose that Var(I)=V.
Then the weighted least squares estimator of ] may be written (see
e.g., Kendall and Stuart, 1967)
(3.9)
For the special case in which X, I and] are given by (3.3), if
we make the additional assumption that x
lj
and x
2j
are normally
distributed, the four mean squares that are elements of Yare independent and each distributed proportionally to a chi square.
is, for Y., the i
th
That
element of I,
1.
where E(Y.) is given by (3.1) and N. is the corresponding number
1.
1.
of degrees of freedom from Table 2.1.
It is well known that if a
random variable z has a chi square distribution with N degrees of
freedom, then Var(z) = 2N.
Hence, from (3.1) and (3.10)
2[E(y.»)2
I.
I
I
Var(Y.)
1.
1.
i=1,2,3,4
(3.11)
I
~
I
I
I
I
I
I
I
P
I
I
I
I
I
I
~
I
I
34
and the variance covariance matrix V is
2E
2
AM
~-l
2E
0
V=
0
0
0
2
WM
0
0
~
0
0
(3.12)
2
2~D
N -1
D
0
2E
0
0
0
2
WiD
~
Since V involves the unknown parameters, an iterative procedure
must be used in order to find the weighted least squares estimates
given by (3.9).
This procedure, which can easily be adapted for com-
puter use, is as follows:
choose an initial set of values for the
three parameters (e.g., the unweighted least squares estimates).
these values as the true values in V and calculate ~ by (3.9).
Use
Sub-
stitute these new values back into V and calculate another new set of
estimates.
the final
Continue this procedure until convergence is achieved,
A
~
being the weighted least squares solution.
A
The variance-covariance matrix of
~
may be written
(3.13)
which can be estimated by using the final weighted least squares estimates in the calculation of V.
The ratio of each parameter estimate
to its estimated standard error can be used as an approximate test of
the hypothesis that the parameter in question is zero.
Finally, note that if Assumptions I-IV are valid, then
I
35
..I
I
I,
I
I
I
I
--I
I,
I
I
I
I
I.
I
I
(3.14)
is also an unbiased estimator of
from the least squares estimate.
0
2
g
and should not differ greatly
It will be shown in Section 3.3
that under certain conditions ~2 as defined by (3.14) is the maxig
mum likelihood estimate of 0 2
g
3.2
Significance Tests
Before testing for the significance of
estimating
0
2
g
0
2
, and indeed even before
g
, a test should be made of the equality of the pheno-
typic variances for monozygotic and dizygotic twins; a comparison
of (1.14) and (1.17) shows that this has been implicitly assumed.
An appropriate test statistic is seen from Table 2.1 to be
F** =
~(DZ) + MA(DZ)
~(MZ) + MA(MZ)
"2
°X(DZ)
(3.15)
,,2
°X(MZ)
This statistic follows an approximate F-distribution if the observations are normally distributed.
The degrees of freedom for (3.15)
are calculated by Satterthwaite's (1946) formula; for a linear
function of mean squares ~a.MS., where MS. is the i
i1.
1.
1.
th
mean square
with f. degrees of freedom, the appropriate degrees of freedom is
1.
f
=
(~a.MS.)
1.
1.
2
2
2
(3.16)
~ (a .MSil f. )
1.
1.
If the data are normally distributed, a significantly large or small
value of (3.15) indicates one assumption of the model is violated.
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
36
Provided the phenotypic variances can be considered equal, and
provided Assumptions III and IV hold, (2.14) can be used to test the
hypothesis that
0
2
=0 against the alternative that
g
0
2
>0.
The corres-
g
ponding ratio of expected mean squares is
E(~(DZ) )
(3.17)
E(~(MZ) )
A number of approximate F tests can be used to test the sig2
nificance of o .
g
One such test suggested by (3.17) is
"2 + 0"2 - C
ko
2 g
e
A
F*
=
~MA(MZ) - ~(MZ) - 3MA(DZ) + 5MW(DZ)
-MA(MZ) + 3MW(MZ) + MA(DZ) + MW(DZ)
(3.18)
,,2
"
0
e - C
,,2
"
where 0"2 , o and C are the unweighted least squares estimates given
g
by (3.5).
e
F* has the advantage of using more information than does
(2.14), since it uses four rather than two mean squares.
On the other
hand, the test is approximate rather than exact, the degrees of freedom being calculated by Satterthwaite's formula (3.16).
Similarly, an F test using the weighted least squares estimators
can be constructed.
The coefficient of each mean square in the nu-
merator and denominator of F* depends upon the final least squares
estimates.
The resulting test will be approximate and the degrees
of freedom are again calculated from (3.16).
Another method of testing for the significance of 0
2
g
is to com-
pare an estimate of it directly with that estimate's standard error,
assuming the ratio of these quantities to be normally distributed.
2
For example, if the unweighted least squares estimate of 0 g given by
I
Ie
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
37
(3.5) is used, then from (3.11) the variance of ~~ is
Var (&~)
=
2
E2
AM
(3.19)
[ N -1
M
which would be estimated by substituting the actual mean squares for
their expected values.
This method of comparing an estimate of
02
g
with its standard error is probably always more powerful than use of
(2.14) or (3.18), especially if the weighted least squares estimate
is used.
In sections 3.3 and 3.4 other significance tests are dis-
cussed.
3.3
Maximum Likelihood Estimation
Although maximum likelihood (ML) techniques have been used to
estimate the genetic component of variance in plant-breeding experiments (Hayman, 1960), little use has been made of this method in
human genetics.
The two primary reasons for this are (1) practical
objections to the method as being computationally difficult and (2)
theoretical objections to the assumption that the trait of interest
is normally distributed.
The first objection might have been valid a decade ago, but it
is certainly not so today, given the availability of high speed computers.
The second objection is more serious, but one could argue
that empirically a large number of traits do approximately follow
a normal distribution, and if there is any doubt, one can do a preliminary test for non-normality before subjecting the data to ML
analysis.
I
..I
I
I
I
I
I
38
Suppose we have data for N pairs of monozygotic twins and N
M
D
pairs of dizygotic twins.
that x
lj
and x
I
I
I
I
I
I.
I
I
follow a bivariate normal distribution, i.e.,
~- [:~+ [~]
N[
Note that it is assumed that E(X
• V ]
lj
) = E(X
(3.20)
2j
), which implies that
the twins are ordered at random, so that there is no order effect.
The variance-covariance matrix V depends upon the type of twin
pair, and from (1.14), (1.15), (1.17) and (1.18) we see that
V
MZ
I
--I
2j
We make Assumptions I-IV and also assume
2
ag+a e2
a2+c
a~+c
2
ag2+a e
g
(3.21)
and
V =
DZ
2
ag+a e2
~a
~a2+c
g
2 2
ag+a e
2
g+c
(3.22)
The log likelihood (apart from a constant term) may be written
Log L
-~NMlogIVMZI -
1
~
~NDlogIVDzl -
1
~
N
L;M (x -l!.) 'V -1 (~-l!.) MZ
j
j=l
N
L;D
j=l
, -1
(~-ld) VDZ (~ -lJ)
(3.23)
Standard computer techniques can be used to find the ML estimates
of
~,
ag2 ,ae2 and C.
The simplest procedure is to search the likeli-
hood surface directly, as explained elsewhere (Elston and Kaplan,
1970).
The ratio of each estimate to its standard error provides a
test of the hypothesis that the parameter in question is zero.
I
..I
I
I
I
I
I
I
.I
I
I
I
I
I
I.
I
I
39
An alternative method of ML estimation that can be used is a
procedure based on twin pair differences.
There are both theoreti-
cal and practical advantages to this method.
Theoretically, it is
less restrictive, requiring only that the twin pair differences be
normally distributed.
tionally.
Practically, the method is simpler computa-
However, it has the serious disadvantage that information
is sacrificed in the estimation procedure; and this loss of information results in
0
2
e
being confounded with C.
If we make Assumptions I-IV and also assume that the twin differences are normally distributed, then we have
for dizygotic twins:
(3.24)
for monozygotic twins:
where from (1.16) and (1.19) we see that
2(0
2
- C)
e
2
2 (0 - C)
e
(3.25)
+
rlg
The log likelihood may be written
Log L
-~N
2
log(cr ) -
}1
MZ
(3.26)
where ZMZ and Znz are the sums of the observed squared pair differences
for monozygotic and dizygotic twins respectively.
I
Ie
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
40
Using a general result (proved e.g. by Lindgren (1960) pp. 224225), we can find the ML estimates of 2(0 2 - C) and 0 2 by finding
e
g
those estimates of o~z and o~z that maximize (3.26) and then, using
these estimates, solving (3.25) for 2(0; - C) and o~.
That is, the
ML solution is given by
(3.27)
and
"2 and 0DZ
"2 are the ML estimates of 0MZ
2 and 0DZ
2 respectively.
where 0MZ
It is well known (e.g., Kendall and Stuart, 1967) that the maximum likelihood estimate of the variance of a normally distributed
random variable x with mean zero and variance 0 2 is
n x?
r
i=l
(3.28)
1
n
From (3.26) we see that the log likelihood is simply the sum
of the two log likelihoods when monozygotic and dizygotic twins are
considered separately.
Hence from (3.28) the ML estimates of O~ and
2
0DZ are
(3.29)
From (3.27) and (3.29) the ML estimator of 0 2 is thus given by
g
"2
"2
°DZ - °MZ
= Zriz _ ZMZ
N-D
NM
Hence, we have shown that &2 as defined by (3.14) is the ~~ estimator
g
of 0 2 if the only information available consists of the twin pair
g
differences.
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
41
3.4.
Nonparametric Test Procedures in Twin Studies
If the primary consideration in a twin study is testing the
hypothesis 0 2
=
0 against the alternative that 0 2 > 0, rather than
estimating 0~
,
a number of nonparametric techniques can be used.
g
g
Few authors have suggested the use of nonparametric tests for
analyzing twin data, which is surprising, since most nonparametric
test procedures are not difficult, and they can be used in a variety
of situations.
They are particularly useful when the normality
assumptions are known to be false, and hence the F test given by
(2.14) is not appropriate.
To see how these techniques might be used, suppose that the
(absolute) twin pair difference for a particular quantitative trait
is calculated for each of the NM + N twin pairs.
n
These differences
are then ranked in order of magnitude from 1 to NM + N ' tied scores
n
being assigned the average of the tied ranks.
Let
~z
and R
nz
be
the sum of the ranks for monozygotic and dizygotic twin pairs respectively.
We make assumptions III and IV and first assume that 0~
= o.
Then from (1.16) and (1.19) we see that the variance of twin pair
differences is the same for the two groups, and hence the absolute
pair differences have the same expectation.
Alternatively, if there
is a genetic effect, then the expected absolute pair difference is
greater for dizygotic twins than for monozygotic twins.
Thus, the test procedure given by Mann and Whitney (1947) can
be used in this situation to test the hypothesis of no genetic effect.
I
I
I
I
I
I
I
42
-
I
I
--I
I
I
I
I
I
I.
I
I
Their test statistic may be written
u
(3.30)
Tables of the critical value of U are available (e.g., Siegel, 1956;
Owen, 1962), and a significantly large U implies the existence of a
genetic effect.
A normal approximation often used when N and N
M
D
are both large is obtained by calculating
(3.31)
In large samples Z has an approximate normal distribution with
mean 0 and variance 1, and hence the critical value of U can be
calculated from tables of the normal distribution.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
CHAPTER IV -
ESTIMATION OF
g
FROM SIB DATA WHEN THE PROPORTION
OF GENES IDENTICAL BY DESCENT IS KNOWN
In this chapter we derive procedures for estimating
02
g
using
TI,
the proportion of genes two sibs (or dizygotic twins) have i.b.d.,
which will be assumed to be known.
The reasoning behind this approach
is similar to that of the twin method:
if there is a large genetic
effect, then those sibs that have a large proportion of their genes
i. b.d. will be more "alike" with respect to the quantitative trait
than will those sibs who have a smaller proportion of their genes
i.b.d.
Alternatively, if there is little or no genetic effect, the
sib pair differences should be approximately the same, regardless of
TI.
4.1.
Genetic Variance at a Single Locus
Consider the subpopulation of sib pairs that have exactly
their genes i.b.d. over the entire genome (0
case of a two allele trait locus.
~ TI ~
TI
of
1), and the special
Denote the two alleles at this locus
by A and a, and the corresponding gene frequencies by p and q = 1 - p.
Without loss of generality we can let the genetic effect at this locus
be given by
a i f sib is AA
g
I.
I
I
02
=
d if sib is Aa
. -a i f sib is aa
(4.1)
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
Ie
I
I
44
The genetic component of variance at this locus can be partitioned into additive and dominance components, i.e.,
(4.2)
C.C. Li (1955) shows that for the special case of two alleles with
genetic effects given by (4.1),
0: and o~ may be written
2pq(et-d(p-q»
2
(4.3)
(4.4)
When no dominance is present, the heterozygote has a genetic
effect midway between the two homozygote
that d=O in (4.1) and hence
values.
This implies
O~=O. Thus, the genetic variance at
this locus for the special case of no dominance reduces to
(4.5)
4.2
The Sib Pair Probabilility Tables for Two Alleles
We now derive the sib pair probability tables for the sub-
population of sib pairs having TI of their genes i.b.d. at a twoallele locus.
We begin by considering the general mating A A xA A •
l 2 3 4
The probability of every sib pair that can result from this mating,
conditional on TI, is given in Table 4.1 below.
I
45
..I
I
I
I
I
I
I
--I
I
I
I
I
I
Ie
I
I
TABLE 4.1
SIB PAIR PROBABILITIES FOR AN Al A x A A
3 4
2
MATING CONDITIONAL ON rr
Sib I
Al A
4
A A
2 3
!-;;rr(l-rr) !-;;rr(l-rr)
A A !-;;rr (1-rr)
2 3
Sib II
!-;;rr 2
!-;;(1-rr) 2 !-;;rr(l-rr)
Al A4 !-;;TI (1-rr ) !-;;(l-rr)2
A A !-;;(l-rr) 2 !-;;rr(l-rr)
2 4
!-;;rr 2
!-;;rr(l-rr)
!-;;rr(l-rr)
!-;;rr
2
Table 4.1 can be thought of as a 4 x 4 matrix M with cell ij,
where i
~
1, 2, 3, 4 and j
~
1, 2, 3, 4.
Sibs falling into cells
11, 22, 33, or 44 have both genes i.b.d., while those in cells 14,
23, 32, or 41 have no genes i.b.d.
Sib
pairs in the remaining eight
cells have one gene i.b.d.
For example, consider cell 11.
sib is Al A is!-;;.
3
The probability that the first
Moreover, since the two sibs have rr of their
genes i.b.d., the conditional probability that the second sib will
2
also be Al A3 is rr •
Hence,
Similarly, the remaining elements in Table 4.1 can be derived.
Any particular mating and resulting sib pair can be handled as
a special case of Table 4.1.
In this chapter we are primarily concerned
with the special case of two alleles, A and a.
the mating AAx Aa
(Al~A2~A3
and
A4~a).
For example, consider
From Table 4.1 we see that
I
..I
I
I
I
I
I
I
46
sib pairs from this mating that are AA-AA must fall into
21~
or 22.
I.
I
I
12~
Hence~
Pr(both sibs AA-AAIAAxAa mating and rr)=2(~rr2)+2(~rr(1-rr))=Yzrr
Simi1ar1y~
the remaining elements of Table 4.2 can be derived.
TABLE 4.2
PROBABILITIES OF SIB PAIRS FOR A TWO-ALLELE LOCUS
CONDITIONAL ON PARENTAL MATING AND rr
Mating
Probability
of Mating
Sib pair
L*
cells in M
Lij
AAxAA
p4
AA-AA
1
all 16
p4
aaxaa
q4
aa-aa
1
all 16
q4
AAxaa
2p 2q 2
Aa-Aa
1
all 16
2p 2q 2
AAxAa
4p 3 q
AA-AA
~rr
11-12-21-22
2p 3 qrr
AA-Aa
1-rr
13-14-23-24
31-32-41-42
4p3q(1-rr)
Aa-Aa
Yzrr
33-34-43-44
2p 3qrr
Aa-Aa
Yzrr
11-12-21-22
2pq3rr
Aa-aa
1-rr
13-14-23-24
31-32-41-42
4pq3(1-rr)
aa-aa
~rr
33-34-43-44
2pq3rr
AA-AA
~rr2
11
p2 q2rr 2
aa-aa
~1T
44
p2 q 2rr 2
~
I
I
I
I
I
I
ce11s~ 11~
Aaxaa
AaxAa
4pq3
4p2q2
2
AA-aa
Yz(l-rr) 2
AA-Aa
rr(l-rr)
12-13-21-31
4p2q2rr(1-rr)
Aa-aa
rr(l-rr)
24-34-42-43
4p 2q 2rr (1-rr)
14-41
Aa-Aa Yz(1-2rr+21T 2) 22-23-32-33
2p 2q 2(1-rr)2
2p2q2(1-2rr+2rr2)
I
~
I
I
I
I
I
I
I
47
In Table 4.2 L* is the probability of the sib pair conditional on
the mating and TI; it is obtained by summing the probabilities of the
indicated cells in Matrix M (Table 4.1).
Lij
I.
I
I
1J
Pr(sib pair j and mating iln)
(4.6)
and is obtained by multiplying L* by the probability of mating.
Note
from Table 4.2 that for two alleles there are just six possible matings
and sib pairs.
The corresponding table for an m-allele locus will be
given in Chapter V (Table 5.1).
The sib pair probabilities conditional on TI alone can be obtained
from Table 4.2 by summing Lij over all matings.
Pr(sibs AA-AAITI)
6
=~
L ..
~~
= p4
p2(qTI+p)2
~
I
I
I
I
I
I
=
L .. is defined as
For example,
+ 2p3qTI + p2q2TI 2
p2(TI+(l-TI)p)2
Similarly, the remaining elements of Table 4.3 can be derived.
TI
~,Table
If
4.3 reduces to the usual table of sib pair frequencies in
a random mating population (e.g., Table 9 of C.C. Li, 1955).
I
48
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
TABLE 4.3
PROBABILITIES OF SIB PAIRS FOR A TWO-ALLELE
LOCUS CONDITIONAL ON TI
Sib Pair
Probability
AA-AA
2
2
p (TI + (l-TI)p)
aa-aa
2
q (TI + (1-TI)q)2
AA-aa
2 2
2
2p q (l-TI)
AA-Aa
4p 2q (l-TI) (TI + (l-TI)p)
Aa-aa
4pq 2 (l-TI) (TI + (l-TI) q)
Aa-Aa
2
2pq(TI + 2(1-TI) pq)
I
..I
I
I
I
I
I
I
49
4.3 Genetic Covariance at
I.
I
I
Single Locus for Sib Pairs
Table 4.3 can be used to derive the genetic covariance at a
single locus for sib pairs.
Let gl and gzdenote the genetic
effects at a locus for two sibs.
Then we have
o gg' = E(glgZ) - E(gl)E(gZ)
Z Z
Z
Z
Z
Z Z
Z
= a [p (TI+(l-TI)p) + q (TI+(l-TI)q) - Zp q (l-TI) ] +
Z
Z
ad[4p q(l-TI) (TI+(l-TI)p) - 4pq (l-TI)(TI+(l-TI)q)] +
Z
Z
2 2
Z
d [2pq(TI+Z(1-TI) pq)] - [a(p -q )+Zpqd]
= aZ[pZ(qTI+p)Z+qZ(q+PTI)Z
_ 2p2q2(1_TI)Z _ (p_q)2] +
ad[4pq(1-TI)(pTI+pZ(1-TI)-qTI-qZ(1-TI»
- 4pq(p-q)] +
2
d 2 [2pq(TI + 2pq - 4TIpq + 2TI pq - 2pq)]
--I
I
I
I
I
I
~
= a2[TI2(p2q2+p2q2_2p2q2) + TI(2pq(p2+q 2+2pq»
+ (p2_ q 2)2] +
a 2 (_(p_q)2) + 4pqTId[(1-TI) (TI(p-q) + (1_TI)(p2_ q 2»_(p_q)] +
2
2
2pqd [TI - 4TIpq + 2TI pq]
= a2[TI(2pq)
+ (p_q)2(p+q)2 _ (p_q)2] +
4pqad[(1-TI) (TI(p-q) + (l-TI)(p-q»
- (p-q)] +
2
2pqd TI[(p+q)2 - 4pq + 2pqTI]
= 2pqa 2TI
2
+ 4pqad[-TI(p-q)] + 2pqd TI[(p_q)2 + 2pqTI]
= TI[2pq(a-d(p-q»2] + TI2(4p2q2d2)= TIO; + TI20~
Suppose that an individual's total
(4.7)
genetic effect for a parti-
cular quantitative trait is due to the additive effect of n such
loci.
Because of linkage equilibrium (if we assume no epistasis)
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
50
~.7)
will still hold, where now instead of (4.3) and (4.4) we have
(4.8)
and
n
222
i=l~ ~ i
(4.9)
4 E p.q.d
where Pi and qi
= I-Pi
are the gene frequencies of the two alleles at
the i th locus and ai and d i are the genetic effects at this locus corresponding to a and d for the one allele locus.
Note that
are simply the sum over n loci of the additive and dominance
variances respectively at each locus.
Finally, the results of this section can be generalized to an
m-allele locus.
Fisher (1918), using a different approach, proved
(4.7) for an m-allele locus in sib pairs for the special case of
TI
=~.
His method can be used to obtain this result for the more
general case of TI
4.4
~ ~
as well.
Estimation of Genetic Variance
EY
4.4.1 Assuming no Dominance.
Suppose we have n sib pairs and ob-
serve trait values Xlj and x
2j
Regression Analysis
for individuals in the jth pair.
We
assume that TI., the proportion of genes i.b.d. for pair j, is known
J
and that CFS ' age and
a~e
do not depend upon TIj (note that we do not
require that these parameters be zero).
assume no dominance or epistasis.
We also, initially at least,
Thus, a2
g
= 0 a2 •
Consider the simple linear regression of the squared pair
differences on the proportion of genes i.b.d., i.e.,
I.
I
I
a; and a~
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
51
j=l, 2, .•.•.•.
n
(4.10)
where Y
j
j. Let
= (x lj -X )2 is the squared pair difference for sib pair
2j
a and S be
the usual unweighted least squares estimators of
S respectively.
a and
A
We now prove that
-~S
is an unbiased estimator
2
of 0 g'
Proof:
E(Y. )
J
(4.11)
From (4.10) and (4.11) it follows that
S =-20' 2
(4.12)
g
From (4.12) we conclude that
Sis
an unbiased estimator of -20~,
and hence -~S is an unbiased estimator of 0 2 .
g
This regression procedure may also be applied to twin data.
We know that
TI
j
assumed that
TI
j
=1
for all monozygotic twin pairs, and if it is
~
for all dizygotic twin pairs, then it is not
difficult to show that the regression estimator of 0 2 described
g
above (-~S) reduces to (3.14).
The proof of this is given in
Appendix I.
Note that when twins are used, one assumption we require in
A
order for -~S to be an unbiased estimator of O~, is CMZ
= Cnz '
1. e., the environmental covariance must not depend upon the pro-
portion of genes i.b.d.
This is less likely to be valid for twins
than for non-twins sibs, since monozygotic twins are generally treated
more "alike" (dressed alike, etc.) than are dizygotic twins.
Hence
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
52
the environmental forces influencing a particular quantitative trait
may likewise be more "alike" for monozygotic twins than for dizygotic
twins.
On the other hand, there is little evidence to suggest that
the proportion of genes i.b.d. influences the environmental forces
that affect non-twin sibs.
4.4.2
Allowing for Dominance.
The regression analysis of the
previous section can be modified slightly to allow for dominance.
In this situation we assume the more general underlying model
E(Y.)
J
= a + Sn. + yn.2
J
j=1,2,
J
(4.13)
n
Using (4.7) we have
where Y.
J
= E(X lj -X 2j )
E(Y.)
J
2
= E(glj+elj-g2j-e2j)
2
20 2
= 20a2+20 d2+20 e2+40 ge -40*ge -2C FS -2TI j 02_2TI
a
j d
(4.14)
and from (4.13) and (4.14) we have
(4.15)
Y = -20 d2
A
(4.16)
A
Denote by Sand y the least squares estimators of Band y reA
spectively.
-~y
4.5
From (4.13), (4.15) and (4.16) we see that
are unbiased estimators of
O~
and
O~
Weighted Least Squares Estimation of
-~S
and
respectively.
O~
The analysis above does not require the assumption of normality
I.
I
I
2
2
in order to obtain unbiased estimators of 0a and 0d'
However, if we
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
53
do assume that
(4.17)
then weighted least squares estimates of a, Sand y can be found by
an iterative procedure similar to the one in Chapter III.
2
2
From (4.14)
2
2
E(Y.) = ( 202+4o -4o* -2C ) + 2(1-TI.)0 + 2(1-TI )0
d
J
e
ge
ge
FS
J a
j
OJ (FS)
(4.18)
From properties of the chi-square distribution we have
(4.19)
Hence, if TI
j
is known, weighted least squares estimators of a, S
and yare given by (3.9) where
Y
l
2
1 TIl TIl
Y
2
Y=
X =
Y
n
Since
1 TI
1 TI
O'~(FS)
2
n
TI
2
2
TI
2
n
4
20'1 (FS)
0
o......... 0
4
20'2 (FS) ..... 0
V
0
(4.20)
.
4
0 .•...•. 20'n(FS)
is a function of a, Sand y, the iterative pro-
cedure described in Chapter III must be used in order to find the
weighted least squares estimators.
tVhen convergence is achieved,
"'2
"'2
The estimated standard error of O'a and ad can be calculated by
(3.13) where the variance-covariance matrix V is determined by the
I.
I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
54
final weighted least squares solution.
Then, for example, if there is
no additive genetic variance, the ratio of &2 to its estimated standard
a
error has an approximate normal distribution with mean zero and
variance one.
Hence, this ratio can be used as as approximate
test of the hypothesis that 0 a2 = 0 against the alternative that 0 a2 > O.
4.6
Maximum Likelihood Estimation -of J:';
02
In this section we describe the maximum likelihood procedures for
estimating the genetic variance, assuming no dominance.
It will be
apparent how the methods can be modified slightly to permit estimation
of 0 2 and 0 2 in the more general case when dominance is present.
a
d
These
two procedures are similar to those described in Section 3.3 for twin
pairs, the first being based upon the individual observations and the
second upon the sib pair differences.
If ML estimation is based upon the individual observations, then
the model is given by (3.20), where V for the jth sib pair is given
by
*
0 2 + 20
e
ge
CFS+20ge+1TjOg2
CFS+20 * +?T.o 2
ge J g
(4.21)
The log likelihood for this case (apart from a constant term) may be
written
Log L = -~
n
r
j=l
loglv.lJ
n
~ 2.;
(X.- 11)'
j=l -J
~
'
v -1
(x. - 11)
J
-J
~
(4.22)
I
..I
I
55
If sib differences are used, then the model is given by (4.17)
and using (4.18) the log likelihood may be written
Log L
=
n
-~ L:
log[C *+2(1-TI.)0 2 ]
J
j=l
g
n
- ~ L:
j=l
(xl' -x 2 ·)
J
I
I
I
I
,I
I
I
I
I
I
I·
-e
I
I
ML
(4.23)
C*+2 (l-TI . )0 2
J
I
Note that the first method requires
2
]
g
estimation of four parameters
2
(].l , C ' C and 0 ) while the method using twin differences requires
l
2
g
2
(C*
estimation of only two
and 0).
g
ML
estimates of these para-
meters can be found by the procedures mentioned in Section 3.3.
4.7
Detecting 0
2
-g
Ex Nonparametric Test Procedures
2
g
If we wish only to detect rather than estimate 0 , several nonparametric test procedures can be used.
Two statistics based on
rank correlation, Spearman's Rho and Kendall's Tau, are particularly
well suited for use with sibs when TI.J is known.
.
For a full dis-
cuss ion of these statistics see Kendall (1955) or Siegel (1956).
4.7.1
Spearman's Rho.
Suppose we know TI
j
for each of n pairs
of sibs and for each pair we calculate the absolute pair difference
IXlj-X2jl for a particular quantitative trait.
If there is no
genetic effect, then the absolute pair differences should be independent of TI .•
J
On the other hand, if there is a genetic effect,
the absolute pair differences should be smaller for those sib
pairs having a large proportion of genes i.b.d. than for those
pairs having a small TI. value.
J
First the TI. and
J
Ixl J.-x2J. I
We proceed as follows:
are separately ranked in order of
magnitude from 1 to n, tied scores being assigned the average of the
I
..I
I
I
I
I
I
I
56
tied ranks.
1Xlj-X2j
I
Let Rand R.* be the rank given sib pair j for TI. and
j
J
respectively.
*
J
Then Spearman's Rho may be written
1
*
nL:R
j ERj
L:R j Rj -
(4.24 )
If no ties are present, r
r
can be written more simply as
s
6E(Rj -Rj*) 2
I _
s
(4.25)
n3 - n
A significantly large r s indicates a significant genetic effect.
Tables of critical values of r
for small values of n.
s
are available (e.g., Siegel, 1956)
For large values of n the statistic
J"
(4.26)
I,
I
I
I
I,
I
Ie
I
I
has an approximate Student's t distribution with n-2 degrees of
freedom and hence critical values of r s can be obtained directly
from tables of the Student's t distribution.
4.7.2
Kendall's Tau.
An alternative nonparametric test statistic
that can be used to test a~
is Kendall's Tau.
rank TI j and
=
2
> 0
g
the alternative that a
We use the notation of the previous section and
IXlj- X2j l
Sij
o against
as is done above.
* *
(R.-R )(Ri-R.)
* J*
j
(R.-R.) (R.-R
* j*)
J
We define
1
i f (R. -R.) (Ri-R.)
1.
J
J
> 0
0
if
.. 0
1
if
1.
1.
1.
< 0
(4.27)
I
Ie
57
and
I
I,
I
I
I
I
re
I
I
I
I.
I
I
(4.28)
unique ways of selecting
We also define
and T*=~L;t * (t *-1)
~L;t(t-l)
T
(4.29)
Where t and t* are the number of tied observations at a given rank
for
TI j
and Ixlj -X ' respectively.
2j
Then Kendall's Tau may be
written
rk
=
[~
---.:S=--
_
k2 [~
n (n-l) - T]
Tables of critical values of r
k
n (n-l) -T *]
(4.30)
k2
are available for small n (e.g.,
Siegel, 1956), and if n is large
r
t
I
I
(~)
Note that the summation is over the
two of n sib pairs.
e
n-l n
L;
L;
S··
i=l j=i+l 1.J
S
z
=
k
[ 9n(n-l)
2(2n+5)j~
(4.31)
is distributed approximately normally with mean zero and variance
one and hence critical values of r
normal distribution.
k
can be found from tables of the
A significantly large r k indicates that cr~ > O.
I
..I
I
CHAPTER V- :-IAXD1C·l LIKELIHOOD ESTI:-IXrrO:"; (11' Till: I'I{lll'tlPT!l)::
OF GENES IDE:-.<nCAL BY DI:SU::,T I"~ SIB I':\IJ.'S
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
In Chapter IV a number of procedures Iwre described fl'r l'St i'J
.
mating o 2 an d tes tlng
the hypothesis that c-=O
g
g
,H~i1inst
2
native that 0 >0 when 7T is knm"rn for all sib pairs.
g
till'
:11
tl'r-
IIl)l"rover, in
general 7T is unknown and must be estimated for each sib pair.
In
this chapter we derive the maximum likelihood estimator of ;. I_'hen
the estimation procedure is based upon k marker genes.
Ideally, these markers should be mutually unlinked and sufficient
in number to give good coverage of the entire genome.
In practice
the markers will not satisfy this condition exactly, although the\'
may well do so approximately.
Consequently, if the ani11vses of thp
previous chapter are performed with 7T. replaced by its maximum likeJ
'"
lihood estimate 7T.,
then only that portion of the genetic variance
J
linked to the markers can be detected.
However, it is not unrei1son-
able to suppose that the number of markers will soon be sufficient to
permit detection of virtually all of 0
2
g
, as rapid progress is being
made in this area (Renwick, 1969).
5.1
Case A:
Both Parental Genotypes Known
Table 5.1 below is a generalization of Table 4.2 and gives the
probability of all possible matings and sib pairs for an rn-allcle
lOCUS, conditional on 7T, when both parental genotypes are known.
_··._1..
.
...
..
..
1e
"I"
,.'
"I"""
""
'''',''
'
.. -e-- .. ..... ... , .. .... ...... ..tr
I
'
'\
"
'~-
TABLE 5.1 (continued)
Number
Mating type
Probability
of mat.ing
Ve
Sib pair
A.A.-A.A.
J
Vf
J
J
1
A.A.-A.A.
1
VIa
L
A.A.
x
1 J
A.~
2
8PiPjPk
J
1
J
*
Cells in M
1T(1-1T)
24-34-42-43
2
~(1-21T+21T
2
) 22-23-32-33
A.A.-A.A.
~1T
A.A.-A.A.
~'lT(1-1T)
12-21
VIc
A.A.-A.~
~1T(1-1T)
13-31
VId
A.A.-A.~
~(1-1T)
A.A.-A.A.
~1T
VIf
Ai~-Ai~
~1T
VIg
Aj~-Aj~
~1T
VIh
A.A.-A.~
~(l-TI)
A.A.-A.~
~(l-TI)
24-42
Ai~-Aj~
~(1-1T)
34-43
VIb
1..
1
1
1
1
VIi
1
VIj
1
1
1
1
J
11.1
1
VIe
1
1
J
J
J
J
1
J
1
J
11
2
2
14-41
22
2
33
2
44
2
23-32
L ..
1J
2 2
4p.p.1T(1-1T)
1
J
2 2
2
2p.p.(1-21T+21T )
1
J
2
2
2p.P.P 1T
1 J k
2
4p.p.P 1T(1-1T)
1 J k
2
4p.P.P 1T(1-1T)
1 J k
2
2
4p.p.P (1-1T)
1 J k
2
2
2p.P.P 1T
1 J k
2
2
2p.P.Pk 1T
1 J
2
2
2PiPjPk1T
2
2
4p.p.P (1-TI)
1 J k
2
TI (1-1T)
4p.P.Pk
1 J
2
4p.p.P
TI(1-TI)
1 J k
0'
o
-
-e"
.. .. .... .. .. .... .. .. .. .. .. .. ..
~
:
TABLE 5.1 (continued)
Number
L*
Mating type
Probability
of mating
Sib pair
AiAj x '\A1
8PiPjPkPl
Ai,\-Ai ,\
~TI
Aj Ak-A j Ak
~TI
VIlc
A A -A A
i l i l
~TI
VIId
Aj Al-A j Al
~TI
VIle
Ai ,\-A i Al
~TI(l-TI)
VIlf
Ai,\-A j ,\
~(l-TI)
12-21
4PiPikP1TI(1-7T)
VIIg
Aj '\-Aj Al
~(l-TI)
24-42
4p.1 PJ.P k P1TI(l-7T)
VIla
VIlb
Cells in M
2
2
2
2
L..
1J
!
11
2PiPjF1kP1TI
22
2p.P.P k P1TI
33
2PiPikP1TI
44
2PiPk~jP1TI
2
2
1 J ;
I
2
2
I
13-31
4p.P.ri k Pl 7T(1-TI)
1 J
!
!
I
Viih
Ai A -A j A
1
l
~
VIIi
Ai A1 -A j ,\
~(l-TI)
VIlj
Ai ,\-A j Al
~(l-TI)
(l-TI)
2
2
!
34-43
4PiPjF1kP1TI(1-7T)
23-32
4PiPjPkP1(1-TI)
14-41
4PiPjPkP1(1-7T)
'
2
2
0'
r-
I
Ie
I
I
I
I
I
I
62
We are also assuming in this and in the next section that the genotypes of both sibs are known.
Suppose we wish to estimate
-
from k marker genes and both
parental genotypes are known for each locus.
From Table 5.1 L..
1.J
can be read for each locus and the overall likelihood calculated by
taking the product of these k individual likelihoods.
For example, suppose the estimation procedure is based upon two
two-allele marker loci:
A, with gene frequencies PA and Pa=l-PA;
and B, with gene frequencies PB and Pb=l-PB'
We observe a mating
that is AABb x AaBb and a sib pair that is AABB-AaBB.
Then from IIIb
and Va of Table 5.1 we see that if the loci are unlinked,
I
;e
TI
and
are the likelihoods for loci A and B respectively.
The overall
likelihood may be written
A
and it is easy to show that TI = 2/3 is the ML estimate of n.
Note that the gene frequencies need not be known in the above
I
I
I
'.I
example in order to find the ML estimate of
TI.
It can easily be
shown that this result holds for every Case A situation, i.e., the
gene frequencies need not be known as long as both parental genotypes are known for all k markers.
However, this result does not
hold if one or both parental genotypes are unknown.
I
..I
I
I
I
I
I
I
S-
a
I
I
63
5.2
Case B:
One or Both Parental Genotypes Unknown
We now derive the sib pair probabilities for an m-allele gene,
conditional only on TI (the generalization of Table 4.3).
The re-
suIting table (Table 5.3) can be used to obtain the likelihood for
a particular locus when both parental genotypes are unknown.
We
then give the corresponding table (Table 5.4) when one parental
genotype is known.
To obtain the sib pair probabilities when both parental genotypes are unknown, L .. from Table 5.1 is summed over all matings as
~J
in Chapter IV.
That is, for a particular sib pair we find from
Table 5.1 all matings that could have given rise to this sib pair
and sum the corresponding L .. values.
~J
Suppose, for example, that there are four alleles AI' A2 , A
3
and A4 with corresponding frequencies PI' P2' P3 and p4=1-P2- P3- Pl
at a single locus.
We observe a sib pair that is A A -A A and
l l l 2
do not know the parental genotypes.
To find the probability of
this pair, we let i, j, k and 1 each assume the values 1, 2, 3 and
4 in Table 5.1 and examine all matings to determine which ones
WI
could give rise to an A A -A A sib pair.
l l l 2
I
I
-
•-
•-e
We find that there are
only four such matings, and the corresponding probabilities and
L.. values are given in Table 5.2.
~J
The first mating in this table
is obtained from Table 5.1 by setting i=l and j=2 in IIIb; the
second by setting i=l and j=2 in Vd.
The last two matings have
the same number but different alleles, i.e., the first of these is
Vlb with i=l, j=2 and k=3; the second is Vlb with i=l, j=2 and k=4.
I
..I
J
I
I
I
I
I
a-
I
i
••
I
I
I
I.
I
I
64
TABLE 5.Z
LIKELIHOODS FOR A SIB PAIR THAT IS AlAl-AlA
Number
Mating
Probe of mating
L ..
Z
1J
I lIb
AlAI x AIA Z
3
4PlPZ
3
4PlPZ(1-TI)
Vd
AIA Z x AIA Z
Z Z
4PlPZ
Z Z
4PlPZTI(1-TI)
Vlb
AIA Z x Al A3
Z
8PlPZP3
Z
4P l PZP 3TI (1-TI)
Vlb
AIA Z x Al A4
Z
8PlPZP4
Z
4P l P ZP4TI (1-TI)
Thus the probability of an AlAl-AlA Z sib pair, conditional
on TI, is simply the sum of the four L.. values of Table 5.Z and
1J
may be written
L
Z
= 4(1-TI)PlP Z(P l +
PZTI + P3TI + P4TI)
= 4(1-TI)P lZPZ(P l +
(l-Pl)TI)
= 4P lZP Z(1-TI)(TI + Pl(l-TI»
We have derived the probability of a type III sib pair,
conditional on TI, when both parental genotypes are unknown.
The
probabilities of the six other sib pair types can likewise be
derived from Table 5.1 and are given in Table 5.3.
By summing
the appropriate L .. values from Table 5.1, the sib pair probabili1J
ties can also be obtained when one parental genotype is known.
These probabilities are given in Table 5.4.
I
65
..I
I
I
I
I
I
I
a-
I'
I
I
I
I
I
I.
I
I
TABLE 5.3
PROBABILITY OF SIB PAIRS FOR AN
M-ALLELE GENE CONDITIONAL ON n
Probability
Sib pair type
I
AiAi-AiA i
2
p.21. [n+(l-n)p.]
1.
II
A.A.-A.A.
222
2p.p. (l-n)
III
A.A.-A.A.
2
4p.p.(1-n)[n+p.(1-n)]
1. 1.
1.1.
J
J
1.J
IV A.A.-A.A k
1. 1.
J
V A.A.-A.A.
1.J
1.J
1.
J
1. J
2
4p.P.Pk(1-n)
1.
1.
2
J
2
2
2PiP.[n +1T(1-n)(p.+p.)+2(1-n) p.p.]
J
1.
J
VI
AiAj-Ai~
4PiPjPk(1-n) [n+2P i (1-n)]
VII
AiAj-AkA1
8PiPjPkPl (l-n)
2
+J
•
11
·'e......
..
..
..
..
....
..
..
'
...
-". TABLE 5.4
PRQBABILITY OF SIB PAIRS FOR AN M-ALLELE GENE CONDITIONAL ON n
WHEN ONE PARENTAL GENOTYPE IS KNOWN
Sib pair type
I
AiAi-AiAi
II
,
A.A.-A.A.
~ ~
J J
III
A.A.-A.A.
~
~
~
J
A.A.-A.A_
~ ~
~-K.
IV
~
J
~
J
VII
~
~
J
PiPk[Pk(l-n)+rr]
A.A.-A.A_
~ J
~--k
2PiPjPk(1-n)
2
2
2P i PkP1 (1-n)
J
J
J
~
~
2p.p.P kn(1-n)
~
J
2
o
J
~
2
2P i Pk (l-n)
~
~
2
2p.p.(1-n)[p.+rr(1+p.-p.)]
J
2
J
2 2
2
2p.p.(1-n)
3
2
J
~
2p.3 p. (l-n)
p.p. [po (l-n)+rr]
~
p.p.n[p.+(l-p.)n]
o
~
A.A.
2
Pi [(l-n)Pi+rr]
Aii\-AiAk
AiAk-AiA1
Known parent:
A.A.
~ ~
3
A.A.-A.A~ ~
J--k
V A.A.-A.A.
VI
Known parent:
2p.p.P k (1-n)
~
2
J
222
2
P.P. [ 2p . p . (l-n) +(p .+p . )n (l-n )+(p .+p . )n ]
~J
~J
~
J
~
J
PiPjPkn[Pk(l-n)+rr]
2p.p.Pk (1-n)[p.(1-n)+p.n]
~
J
~
J
2PiPj PkP 1n (l-n)
Aii\-Aji\
o
2PiPjPk(1-n) [Pk(l-n)+rr]
Aii\-AjA1
o
4PiPjPkPl(1-n)
2
(j\
(j\
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
67
5.3
Estimation When only the Sib Phenotypes
a~
Known
If the genotypes of both sibs are known for a particular locus,
then Tables 5.1, 5.3 and 5.4 can be used to obtain the corresponding
sib pair probability.
Often however, only the sibs' phenotypes are
known, as in the case when dominance is present.
That is, in some
cases there may be a number of genotypes in each sib's phenoset.
In
this more general situation the sib pair probability can still be
obtained from Tables 5.1, 5.3 and 5.4 by summing the probabilities
corresponding to all sib pair combinations that can be made by pairing
an element in one phenoset with an element in the other.
Thus, if
there are m elements in the first sib's phenoset and n elements in
that of the second, then the sib pair probability is calculated by
summing mn individual probabilities, each of which can be read from
Table 5.1, 5.3 or 5.4.
For example, suppose that there are two alleles A and a with
gene frequencies PA and Pa=l-PA respectively.
A is dominant to a
and we wish to calculate the probability of an A-a sib pair when
both parental genotypes are unknown.
There are two genotypes in
the first sib's phenoset (AA and Aa) and the second sib's genotype
is known (aa).
Hence, the sib pair must be either AA-aa or Aa-aa
and from Table 5.3 the probability of this sib pair is
L
222
2
2PAP a (l-TI) + 4PAP a (l-TI)(TI+p a (l-TI))
2
2PAP (l-TI){(l+p )+TI(l-p )}
a
a
a
I
..I
I
I
I
I
I
I
68
Clearly, a similar procedure can be used when only parental and
sib phenotypes are known.
gene above that both parents are A and both sibs are also A.
I-.-
•-I•
.
-
Then
the mating must be either AA x AA, AA x Aa or Aa x Aa, and the sibs
must also be one of these three types.
The nine matings and sib
pairs that could account for this observed result are given below
with the corresponding probabilities obtained from Table 5.1.
--I
I
I
I
For example, suppose for the two-allele
Mating
Sib pair
Probability
AAx AA
AA-AA
4
PA
AAx AA
AA-Aa
0
AAx AA
Aa-Aa
0
AAx Aa
AA-AA
AAx Aa
AA-Aa
AAx Aa
Aa-Aa
Aa x Aa
AA-AA
Aa x Aa
AA-Aa
2 2
2PAPa7T
2 2
4PAPa(1-7T)
2 2
2pAPa7T
222
PAPa 7T
2 2
4PAPa7T(1-7T)
Aa x Aa
Aa-Aa
2 2
2
2PAPa(1-27T+27T )
The likelihood of this event is simply the sum of these nine
probabilities.
L
We find that
I
69
~
I
I
I
I
I
I
I
~
I
I
I...
I
I....
•-e
The results of this chapter permit the calculation of the sib
pair probability for any marker gene, regardless of the number of
alleles or the number of genotypes in each sib's or parent's phenoset.
Tables 5.1, 5.3 and 5.4 give the sib pair probability for
the special cases in which both sibs' genotypes are known and 0, 1
or 2 parental genotypes are known.
In all other cases the sib pair
probability can be obtained from these three tables by summing the
appropriate probabilities as indicated above.
When the sib pair probabilities have been calculated for a
number of marker gene loci for a particular sib pair, the overall
sib pair likelihood is simply the product of these probabilities.
Standard computer procedures can then be used to find the value
of TI that makes the likelihood a maximum.
TI can then be replaced
A
by its ML estimate TI in the analyses of Chapter IV, and these
procedures can be used to detect the portion of genetic variance
closely linked to these markers.
I
..I
I
I
I
I
I
I
~
CliAPTER VI- DERIVATION OF THE CLASSIFICATION TABLES
In the two previous chapters a number of techniques were presented for estimating oZ based on TI, the proportion of genes two
g
sibs have i.b.d. over the entire genome.
In the next four chapters
we derive techniques for detecting and estimating the contribution
to the total genetic variance of a single major trait gene and the
distance of this gene from a particular marker gene.
The estimation
procedures are based on TI. , the proportion of genes two sibs have
Jm
i.b.d. at a particular locus m.
pair j.
Thus, TI.
Jm
= 0,
~
or 1 for each sib
In this chapter the Classification Tables (as defined in
Chapter I) are derived.
I
I
I
I
I
I
i.
-
I
-
!
6.1
The Sixteen Classification Types
In the population of all sib pairs a sib pair will have "on the
average" half of their genes i. b. d.
This follows from the assumption
made earlier that the sibs come from a large random mating population.
Since in general we will not know the value of TI. ,we now asJ
sume that sib pair j is randomly selected from the population of
all sib pairs.
That is, we assume that the sixteen possible sib
pairs that can result from a general AIA
are equally likely.
Z
x A3A mating (see Table 4.1)
4
I
..I
I
I
I
I
I
-
--
i
71
Under this assumption, if the sibs' genotypes are known, there
are sixteen :'Classification types" or classes, which are listed for
convenience in Table 6.1.
In this table p.* (i=O,1,2) is the prob1
ability that there are exactly i genes i.b.d. at a particular locus,
conditional upon the sibs' genotypes at that locus and also the
parental genotypes if known.
In a later section a general formula
will be given for the calculation of these probabilities.
6.1 Pi and Pj are gene frequencies.
In Table
I
72
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
TABLE 6.1
THE 16 CLASSIFICATION TYPES
Classification
type
PI*
Po*
P*
z
(i)
~
!z
~
(ii)
!z
!z
0
(iii)
0
!z
!z
(iv)
!z
0
!z
(v)
1
0
a
(vi)
a
1
a
(vii)
a
a
1
(viii)
a
Pi
l+p.
1
l+p.
~
~
(ix)
l+p.+p.
ZPiP j
Z
Z
Pi + Pj
J
J
~
(x)
l+p.
Pi
l+p.+p.
(Pi+P j) (l+P i +pj)
(xi)
(xii)
~
!z/(l+p.)
~
k2
1
l+p.
ZPiP j
l+p.+p.+Zp.p.
Pi+P j
l+Pi+Pj+ZP i Pj
Zp.~
1
l+Zp.~
a
ZP i
1
J
a
~
~
1+2p.
Pi
(l+p. ) Z
~
J
a
Pi
l+p.~
Z~
(xvi)
l+p.+p.
Pi+Pj
!zPi/(l+P i )
~
(xv)
(Pi+p j ) (l+Pi+Pj)
1
Pj
Pi
Pi+P j
(xiii)
(xiv)
0
J
~
J
(l+p. ) 2
~
1
l+p.+p.+Zp.p.
J
~
(l+p. )
~
~
Z
J
I
..I
I
I
I
I
I
I
1_
73
6.2
The Classification Table When the Genotypes are Known
When sib and parental genotypes are both known, Table 5.1 enables
us to derive Classification Table 6.2.
cells 11, 22, 33 and 44 correspond to having two genes i. b. d. ;
cells 14, 23, 32, and 41 correspond to no genes i.b.d.; the remaining
cells correspond to one gene i.b.d.
I.
I
I
Moreover, each cell
represen~s
an equally likely outcome for a particular mating and sib pair.
Thus, for an observed sib pair, p~ (i=O, 1, 2) can be obtained from
~
Table 5.1 by simply noting which cells correspond to this particular
event.
For example, consider an AA x AA mating.
This mating is
uninformative since only sibs that are also AA can result.
This
fact is reflected in I of Table 5.1 in which we see that all 16
(equally likely) cells could account for an AA-AA sib pair from
an AAxAA mating.
I
I
I
I
I
I
Recall that in Table 4.1
Hence
*
P2
p*
1
*
PO
4/16 = ~
= 8/16
..
~
~
4/16
and from Table 6.1 we see that this is a Class
Consider a second example:
sibs that are each AA.
(i) pair.
an AA x Aa mating resulting in two
From IlIa of Table 5.1 the corresponding
cells are 11, 12, 21 and 22 and hence
*
P2
= 2/4 = ~
PI*
2/4 = ~
*
Po
= 0
I
Ie
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
..
I
-
74
From Table 6.1 we see that this is a Class (iii) pair.
Thus, using Table 5.1 we can calculate p.1* for all matings and
sib pairs when the genotypes are known.
cation Table for this situation.
Table 6.2 is the classifi-
Note that this table is similar
in form to Table 5.1, but certain sib pairs are grouped into a
single entry.
This is seen from the "sib pair type" column of Table
6.2 in which the numbers in parenthesis for certain entries refer
to the number of sib pairs of the given sib pair type.
The actual
sib pairs are not given for these cases, but they can be read
directly from Table 5.1.
For example, from Table 5.1 we see that a type IV mating can
produce two distinct type V sib pairs, namely, AiAj-AiAj and
AiAk-AiAk.
However, these two sib pairs have the same probability
2
(P1 Pj Pk) and are of the same Classification Type; hence they are
presented as a single entry in Table 6.2.
- --- - - - - - - .. - - - - - - -I" TABLE 6.2
CLASSIFICATION TABLE:
Mating type
Sib pair
BOTH PARENTAL AND SIB GENOTYPES KNOWN
Probability of mating
and sib pair
4
PO*
PI*
P*
2
Classification
type
I:
A.A.xA.A.
1 1 ].].
I:
A.A.-A.A.
]. ].
]. ].
Pi
~
!z
~
(i)
II:
A.A.xA.A.
]. ].
J J
V:
A.A.-A.A.
]. J ]. J
2 2
2p.p.
]. J
~
!z
~
(i)
III:
A.A.xA.A.
1 ]. ]. J
I:
A.A.-A.A.
]. ].
1 ].
PiP j
0
!z
!z
(iii)
III:
A.A.-A.A.
]. ].
1 J
2P P
i j
!z
!z
0
(ii)
V:
A.A.-A.A.
1J 1J
p.p.
]. J
0
!z
!z
(iii)
2
PiPjPk
0
!z
!z
(iii)
2
2p.P.Pk
]. J
!z
!z
0
(ii)
1
2 2
0
0
I
(vii)
1
~iPj
2 2
I
0
a
(v)
2 2
PiP j
0
I
0
(vi)
2 2
PiP j
k2
a
!z
(iv)
IV:
A.A.xA.~
].].
J
V:
VI:
V:
A.A.xA.A.
]. J ]. J
I:
II:
III:
V:
3
3
3
(2)
A.A.-A.~
]. J
].
(2)
A.A.-A.A.
]. 1 J J
(2)
A.A.-A.A.
]. J ]. J
~PiPj
'-J
V1
- - -- - - - - - - ... - - - - - - -. TABLE 6.2 (continued)
VI:
A.A.xA.~
1.
J
*
PI
P*
2
2P i2Pj Pk
0
0
I
(vii)
(2)
2
P.P.Pk
1. J
0
I
0
(vi)
IV:
A.A.-A.Ak
2
PiPjPk
I
0
0
(v)
V:
(3)
2P i2Pj Pk
0
0
I
(vii)
I:
1.
A.A.-A.A.
1. 1.
III:
VII:
AiAjxAkA I
Classification
type
PO*
Probability of mating
and sib pair
Sib pair
Mating type
1. 1.
1. 1.
k
J
k
VI:
A.A.-A.~
2
PiPjPk
I
0
0
(v)
VI:
A.A.-A.~
2
PiPjPk
0
I
0
(vi)
VI:
Ai~-AjAk
2
PiPjP k
0
I
0
(vi)
1.
1.
J
J
1.
J
V:
(4)
~PiPjPkPI
0
0
I
(vii)
VI:
(4)
PiPjPkP I
0
I
0
(vi)
VII:
(2)
PiPjPkP I
I
0
0
(v)
"-J
""'
I
Ie
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
77
6.3
The Classification Tables When Some Genotypes are Unknown
*
An algorithm is now given that permits the calculation of PZ'
Pl* and PO* when some parental or sib genotypes are unknown.
The two
situations most likely to arise that require this algorithm are
cases involving genes in which (1) there is dominance, so that only
parental and sib phenotypes are known; or (Z) data is collected
only for sibs, so that even the parental phenotypes are unknown.
The algorithm given below handles both of these situations.
Let P
and P
lp
Zp
denote the phenosets for two parents for a
particular locus, i.e., P
lp
is the set of all genotypes that could
give rise to the phenotype of one parent, and P
set for the other parent.
Zp
is the analogous
If there is no information as to the
parental phenotypes t then the phenosets will consist of all posLet P
sible genotypes.
p
denote the set consisting of all possible
ordered pairs of genotypes resulting when an element of P
paired with an element of P
Zp
'
Thus t if there are n
l
lp
is
genotypes
in Plp and n genotypes in P Zpt then there are nln elements of
Z
Z
Pp '
SimilarlYt PIs and P
and P
P
ls
s
Zs
denote the phenosets for the two sibs t
the set of all possible ordered pairs of genotypes from
and P
Zs
'
Let X and Y be elements of P and P respectively.
p
s
* PI* and Pz* for a particular locus m can be calculated as
Then Pot
follows:
*=
Pk
L:
XEP
L:
XEP
p
L:
YEP
Pr(X and Y and n. =~k)
Jm
s
Pr(X and Y and n. =~h)
Jm
YEP h=Otlt Z
p
s
L:
L:
k=Otlt Z
(6.1)
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
78
Each term in the summations in (6.1) above can be obtained
from Table 6.2, since it is the product of one of the probabilities
*
in the third column and one of the corresponding Pk'
It is necessary
only to specify the genotypes that belong to PIp' P2p ' PIs and P2s '
or equivalently, the pairs of genotypes that belong to P
The elements of P
p
P
and P •
s
are the possible matings that could result in
the observed sib pair; the elements of P
that the observed sib pair could assume.
s
are the sib pair genotypes
Thus, if the elements in
these two sets are specified, then Table 6.2 can be used to find all
probabilities in the summations of (6.1) and Pk* can be found by summing the appropriate probabilities.
The calculation can easily be
programmed in general for a computer.
There are two special cases that are of interest and permit an
easy algebraic solution.
First, the sib pair genotypes may both be
known, but no information available on the genotypes of the parents.
Table 6.3 is the Classification Table for this special case.
Table
6.4 gives the Classification Table for the special case in which both
sibs' genotypes and one parental genotype are known, but no information is available as to the genotype of the other parent.
Both
tables were derived by repeated use of (6.1) using the information
in Table 6.2.
- .- - - - - - -. -
••
-
TABLE 6.3
CLASSIFICATION TABLE:
Probability
Sib pair type
I:
A.A.-A.A.
1 1
1 1
BOTH PARENTAL PHENOTYPES UNKNOWN
2(
)2
~Pi I+P i
1
PO*
PI*
2
Pi
2p.
(1+P.)2
A.A.-A.A.
III:
A.A.-A.A.
1
1
1
1
J
1
J
J
IV:
A.A.-A.~
V:
A.A.-A.A.
VI:
VII:
1
1
1
J
J
1
A.A.-A.A
1
J
1
J
k
AiAj-~AI
2 2
"2PiPj
1
(l+p. )
1
J
1
1
J
P.P.Pk(I+2p.)
1 J
1
2PiPjPkPI
1J
(xvi)
2
1
0
(v)
Pi
l+p.
I
l+p.1
0
(xiii)
I
0
0
(v)
2
PiPjPk
1J
(l+p. )
0
1
~p.p.(I+p.+p.+2p.p.)
2
1
I
2 (l+p.)
p.p.
Classification
type
I
1
1
II:
P*
2
2Pi Pj
(l+p.+p.+2p.p.)
J
1
1
2Pi
1+2p.
1
I
J
Pi+Pj
(l+p .+p .+2p.p.)
1
J
1
J
I
(l+p .+p .+2p. p.)
1
J
1
(xiv)
J
I
1+2p.1
0
(xv)
0
0
(v)
'-.I
\D
- --- - - - - -- .- - - - - - -tr TABLE 6.4
ONE PARENTAL GENOTYPE AND BOTH SIB GENOTYPES KNOWN
CLASSIFICATION TABLE:
Known
parent
A.A.
1
1
Sib pair type
I:
A.A.-A.A.
1
1
1
1
Probability
*
PO
!-zp~1 (P.1 +1)
Pi
P*2
PI*
1
2(1+p.)
(xii)
~
2(I+p.)
1
III:
A.A.-A.A.
1 1
1 J
V:
A.A.-A.A.
1 J
1 J
3
PiP j
~P~P. (p .+1)
1
J
J
1
~
~
Pj
~
A.A.
1 J
A.A.-A.A
1 J
1 k
I:
A.A.-A.A.
1
1
1
1
2
PiPkPj
~P~P.
(p .+1)
1 J
1
1
(xii)
2(1+P.)
J
A.A.-A.A.
1 1
J J
III:
A.A.-A.A.
1 1
1 J
III:
A.A.-A.A
111
2 2
"2P i Pj
1
~P~P.
(l+p.+p.)
1 J
1
J
1
k
2
"2P i Pj Pk
(ii)
~
0
Pi
1
l+p.
l+p.
1
0
0
(v)
Pi
l+p.+p.
1
J
l+p.
J
l+p.+p.
1
J
0
(ix)
0
1
0
(vi)
~
0
(viii)
1
II:
(ii)
0
2(1+P.)
J
VI:
Classifi cation
type
1
co
0
e
II
I• •
!
- - - -.- - - - - - -,r -
11-
TABLE 6.4 (continued)
Sib pair type
Known
parent
A.A.
~ J
IV:
A.A.-A.~
1.. 1.
J
Probability
*
PO
*
PI
Pz*
Z
"2P'P'P
1.. J k
1
0
0
1
2 p.p.
V:
A.A.-A.A.
1.. J
~ J
!t;p.p.
(Pi+P.)
+P. )
1... J
J (I+p.+p.)
1..
J (p+p)
. . (~+Jp .
~
J
1.
J
Ai~-Ai~
!t;PiPjPk(Pk+1)
A.A.-A.~
~p . P . P
VI:
Ai~-AiAI
1
VI:
Ai~-Aj~
~iPjPk(Pk+1)
VII:
Ai~-AjAl
Pi Pj PkPl
V:
VI:
1.
J
J
1.
(p .+P . )
J k ~ J
"2P i Pj PkP I
0
Pi
p.+p.
~
J
0
2 2
Pi+Pj
(p.+p.)(I+p.+p.)
~
J
1.
J
Pk
l+P
k
Pj
Pi+Pj
1
Classification
type
(v)
1
l+p.+p.
1.
(x)
J
1
l+P k
(viii)
0
(xi)
0
(vi)
Pk
l+Pk
1
l+P k
0
(xiii)
1
0
0
(v)
00
I-'
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
82
Table 6. 5 gives the probability of each sib pair type conditional upon the number of genes i.b.d. at that locus and can
easily be derived from Table 6.3.
For example, suppose a pair of
sibs have both genes i.b.d. at a locus and we wish to find the
conditional probability that this sib pair is type I (A.A.-A.A.).
1
1
1
= ~, and from Table 6.3 we have
We know that Pr(n. =1)
Jm
2
(~p.(l+p.)
Pr(type I sib pair and n. =1)
Jm
1
1
2
)/(l+p.)
1
2
= ~P.21
Hence,
n. =1)
Pr(type I sib pair
Jm
Similarly, the remaining elements of Table 6.5 can be derived.
TABLE 6.5
CONDITIONAL PROBABILITY OF SIB PAIR TYPES
GIVEN 0, 1 OR 2 GENES I.B.D.
Sib pair
type
Number of genes i.b.d.
012
4
I
Pi
2 2
II
2P P
i j
III
4P P
i j
IV
V
1
o
o
3
o
2
4PiPjPk
2 2
4P i P j
2
o
p.P.(p.+p.)
1
J
1
VI
8PiPjPk
2p.p,P
VII
8PiPjPkPl
o
1
J
J k
o
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
83
Alternatively, Table 6.5 can be derived by arguing as follows:
when
=0 the sibs are "unrelated" at that locus, and so
Jm
the distribution of sib pairs is simply the same as the distriTI.
bution of matings in a random mating population given in Table 1.1.
When
TI.
Jm
=1 a particular sib pair can occur only if both sibs have
the same genotype; and in that case the probability of the sib
pair is simply the probability 'in the population of one of them.
Finally, Table 6.3 gives the probability of each sib pair type
and hence the sib pair probability conditional on
TI.
Jm
=~
can be
found by subtraction, Le., denoting sib pair type by "SPT,"
Pr(SPT
I
TI.
TI. =O)-~Pr(SPT I TI. =1)]
Jm
Jm
=~)
Jm
Consider, for example, an AA-AA sib pair.
We have
Pr(Sibs AA-AAITI. =~) = 2Pr(Sibs AA-AA) - ~Pr(Sibs AA-AAITI. =0)
Jm
Jm
- ~Pr(Sibs AA-AAITI. =1)
Jm
= 2[p 2 (p+l) 2 /4]
4
2
- ~(p ) - ~(p )
(using Tables 1.1 and 6.3)
Similarly, the remaining elements in Table 6.5 can be derived.
I
..I
I
I
I
I
I
I
a-
CHAPTER VII- ESTIMATING THE PROPORTION OF GENES IDENTICAL
BY DESCENT AT A SINGLE LOCUS IN SIB PAIRS
In this chapter we present a method for estimating IT. , the
Jm
proportion of genes sib pair j has i.b.d. at a single locus m.
problem is one of estimating a parameter that takes on a different
(but known) value in each of three populations when it is not known
for certain from which population the observation comes.
We let p~.
~Jm
I.
I
I
be the probability that the jth sib pair should
have i genes i.b.d. at locus m, conditional on I , the information
m
available on the sib pair and parental phenotypes at this locus.
The estimator of IT. we shall use is
Jm
I
I
I
I
I
I
The
A
IT.
Jm
7.1
*
= 1~l·
Jm
*
+ P2jm
(7.1)
Properties of the Estimator
There are several desirable properties that the estimator (7.1)
possesses.
Among them are
Property I - IT. is the Bayes estimator of IT
when the
jm
Jm
squared error loss function is used, i.e.,
TI jm minimizes E(~jm - IT jm )2.
Property II- TI. has the maximum possible correlation with
Jm
IT
jm when IT jm is considered as a random variable
taking on the values 0,
~
and 1.
I
..I
I
I
I
I
I
85
A
Although TI. as defined by (7.1) is unbiased for the population
Jm
in which TI.
Jm
which TI.
Jm
~ ~~
it is not unbiased for the two populations in
= 0 and TI.Jm =1.
t
I
I
I
I
I
I.
I
I
an unbiased estimator would be
unreasonable for these two populations as it would require estimates
of TI
jm
outside the parameter range 0 to 1.
For
example~
in order
for an estimator to be unbiased for the population in which TI.
Jm
it must assume negative values for certain sib pair types.
= 0,
Clearly~
such an estimator is unreasonable.
We now prove Property I. Let f(TI. II ) denote the condition denJm m
.
sity of TI. given I
m
Jm
Then
*
*
Pljm
*
P2jm
=
POjm
f(TI mIIm) =
j
I
a-
However~
=
i f TI.
= 0
i f TI.
= ~
i f TI.
= 1
Jm
Jm
Jm
(7.2)
If we use the squared error loss function, then the Bayes
estimator will be the value of TI
S
=
l:
TI.
Jm
jm
that minimizes
(~jm
2 *
~ 2 *
2 *
(TIjm-O) POjm + (TI jm-2) Pljm + (TIjm-l) P2jm
A
A
A
In order to minimize S, we set the first derivative equal to zero,
obtaining
dS
A
dTI.
Jm
I
..I'
,I
I
I
I
I
86
which implies that
7T
'I
I
I
I
I
I
I.
I
I
=1<p *
+p *
2 Ijm
2jm
Note that the second derivative of S with respect to 7T
jm
is 2,
A
which, being greater than zero, implies that 7T. as defined by (7.1)
Jm
is indeed a solution that minimizes S.
We next prove Property II.
pairs 7T. takes on the values 0,
Jm
~
and
Thus, Property I is proved.
Since in the population of sib
~
and 1 with probabilities
~, ~
respectively, we see that
E (7T.
Jm
) =
Var(7T. ) =
Jm
I
ae
jm
~
(~)(~)
(7.3)
+
(~)(~)
(7.4)
1/8
Note also that
Im
[~Pr (7T . =~ I )+Pr (7T. =11 I )] = ~-P'4
L:
I
Jm
Jm
(7.5)
m
m
A
For each distinct 1m there will be a corresponding 7T jm • We
th
denote by I km the k
distinct 1m and denote the corresponding ;jm
A
by 7T jmk .
We define
=
Pr(I
km
and 7T. =~a)
Jm
a=0,1,2
(7.6)
and
(7.7)
Note that
I
87
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
Pr(TI. =1 and I )
km
Jm
+
Pr(TI. =~ and I )
km
Jm
(7.8)
Since Var(TI. )
Jm
= 1/8, a constant, in order to maximize the
correlation between TI. and n. , we must select n. to maximize
Jm
Jm
Jm
T
Cov (TI. , TI. )
Jm
Jm
=
1
(Var (TI. » ~
Jm
t
njmk(~Fkl+Fk2)
-
~~njmkFk
[~
2
2
A
k TIjmkF k - ( k~ TI.JmkFk) ]
Denote the numerator and denominator of (7.9) by
respectively.
(7.9)
=.:.:.-_-------~:....-_-
~
W~
e and
Then, taking the derivative of T with respect to
A
a particular TI.
we have
Jmr
=
dT
k
[w2(~F
~
A
A
l.TI.kFk)]/W
rl+F r2 -~F)
r - ~(e/w ) (2TI.Jmr Fr -2F rR
Jm
dff.
Jmr
Setting the first derivative equal to zero we find that (for
w(~F
rl
+F
r2
w~O)
(7.10)
-~F)
r
It has been found numerically in all cases so far that any values
of TI
jmr
satisfying (7.10) above will give a maximum rather than a
minimum or saddle point.
satisfies (7.10).
We now show that TI
jmr
as defined by (7.8)
I
88
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
If ~. k is defined by (7.8), then
Jm
using (7.5)
We also have
Furthermore,
= ~F rl +
Fr2 -~F r
Thus, we see that the right hand side of (7.10) may be written
w(~Fr 1 + Fr 2 - ~F r ), and hence (7.10) is satisfied
if ~.Jmk as
.
defined by (7.8) is used.
7.2
This proves Property II.
Estimation for the 16 Classification Types
Table 7.1 gives TI
jm
for the 16 Classification types of Table 6.1.
I
89
..I
I
I
I
I
I
I
at
I
I
I
I
I
I.
I
I
TABLE 7.1
n. FOR THE 16 CLASSIFICATION TYPES
Jm
Classification
type
Classification
type
A
n.
Jm
A
n.
Jm
XI
I
II
3+p.
III
3/4
1.
XII
4(1+p.)
1.
IV
o
V
~/(l+p.)
XIII
1.
VI
2+p.+p.
VII
VIII
1
2(1+p.+p.+2p.p.)
1.
l+p.
J
1.
XIV
J
1.
XV
1.
l+p.
IX
J
2(1+p.+p.)
1.
XVI
J
l/(l+p.)
1.
[Pi (2+Pi) ]+[Pj (2+Pj)]
X
2(p.+p.) (l+P'+Pj)
1.
J
1.
A
For all 16 Classification types n. is calculated by (7.1).
Jm
Moreover, since it involves only P*l' and P*2'
Jm
Jm
,n.Jm
can also
easily be calculated when only the sib phenotypes are known by
the simple algorithm described in the previous chapter.
J
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
CHAPTER VIII- DETECTING LINKAGE BETWEEN
A TRAIT AND MARKER LOCUS
In this chapter we derive a regression procedure for detecting
linkage between an m-allele marker locus and a two-allele trait locus.
We define the genotypic values at the trait locus as follows:
= ex.
=d
i f sib is BB
(8.1)
i f sib is Bb
=-ex. i f sib is bb
where g .. is the genetic effect in the general model (1.5).
1.J
Using
Thus, 0 2
E:
is a function of the environmental variance, the environmental covariance, and any order effect.
8.1
Conditional Expectation of the Squared Pair Differences
Let Y.
J
j.
= (x .. -x 2 .)2 be the squared pair difference for sib pair
1.J
J
Then, for fixed E., Y. can take on seven values depending upon
J
J
the genotypes of the first and second sibs.
These values, obtained
from (1.5) and (8.1) are shown in the second column of Table 8.1.
This table gives the distribution of Y. conditional on TI. , the proJ
portion of genes i.b.d. at the trait locus.
Jt
The conditional prob-
abilities in the last three columns of this table are read directly
I.
I
I
from Table 6.5.
I
91
..I
TABLE 8.1
CONDITIONAL DISTRIBUTION OF Y.
J
conditional probability
I
I
I
I
I
I
I_
7T
Y.
Sib pair
J
=O
7T jt=k2
4
p
4
q
22
4p q
pq
2pq
3
2p q
2
p q
0
2p 3q
2
p q
0
2pq 3
pq 2
0
2pq 3
pq 2
0
2 2
p q
0
0
p 2q 2
0
0
jt
p
BB-BB
2
E.
bb-bb
q
J
Bb-Bb
2
BB-Bb
(a-d+E. )
Bb-BB
(-a+d+E. )
Bb-bb
(a+d+E. )
bb-Bb
(-a-d+E. )
BB-bb
(2a+E. )
bb-BB
(-2a+E. )
J
2
J
2
J
2
J
2
J
2
J
7T
3
jt
=1
P
3
q
2
2
We can use Table 8.1 to find the expected value of Y. conditional
J
I
I
I
I
I
I
I.
I
I
We have
(8.2)
E(Y.
J
17T't=~)
J
= E(E7(p3+q3+pq ))+E[(a-d+E.)2+(-a+d+E.)2]p2 q +
J
J
J
E[(a+d+E.)2+(-a-d+E.)2] pq 2
J
J
= 0 2 + (a2+d2)(2p2q+2pq2) + 4ad( pq 2_p2q)
E
2 2
= 0 2 + 2pq(a +d -2ad(p-q))
E
=
02
+ 2pq(a-(p-q)d)2 + 2pqd 2 (1_(p_q)2)
E
=
0
= 0
2 +
E
2
E
0
2
a
2
2
+ 2pqd (4pq)
+ 0a +
20
2
d
(8.3)
I
..I
I
I
I
I
I
I
I_
I
I
I
I
I
I
I.
I
I
92
and similarly it can be shown that
In. =0)
J Jt
E(Y.
It
= 02 + 20 2 + 20 2
d
E
a
(8.4)
is clear from (8.2)-(8.4) that if there is no dominance
(d = 0, or equivalently,
E(Y . In.
J
Jt
o~
=
0) we can write
)
(8.5)
This implies that if njt were known and we fitted the simple linear
regression model
E (Y •
J
then
In.Jt )
(8.6)
-~S would be an unbiased estimator of o~, where
least squares estimator of
S.
Bis
the usual
This same result will hold asymptoti-
cally even when dominance is present.
It is shown in Appendix II
that in this more general case
(8.7)
where n. (i=1,2,3) is the number of sib pairs in the sample that have
1.
~i
genes i.b.d. at the trait gene locus.
As the sample size increases
n 2 and nO tend to equality, and so the term in
cally.
a~ vanishes asymptoti-
I
..I
I
I
I
I
I
I
93
8.2
Deriving the Expected Value of the Regression Coefficient
In the previous section we showed that if the proportion of
genes i.b.d. at the trait locus, TI
jt
, is known for each sib pair,
then the simple linear regression model given by (8.6) will result
in
2
-~S being an unbiased estimate of 0 g2 when 0d=O.
This estimate
is also asymptotically unbiased even when dominance is present.
In
"-
Chapter VII we derived an estimate TI jm , of the proportion of genes
i.b.d. at a marker locus.
In this section we investigate how the
"-
regression analysis of Section 8.1 is affected if we substitute TI.
Jm
for TI jt in the regression equation.
We shall show that if there is
no dominance, then
I_
E (Y •
J
I;.Jm)
=
C/,
+ STI, ,
(8.8)
Jm
where
I
I
I
I
I
I
I.
I
I
(8.9)
and c is the recombination fraction between the trait and marker
loci.
We shall also show that (8.8) holds approximately even when
dominance is present.
We assume linkage equilibrium between the trait and marker loci,
so that (i) for fixed TIJ't' Y, and;. are independent; and (ii) for
J
Jm
fixed TI
jm
, TI
"-
jt
and TI
jm
=
are independent.
It follows that
L L E(y.!TI.t)pr(TI.tITI, )Pr(TI, I;.)
TI. TI.
J J
J
Jm
Jm Jm
Jt Jm
(8.10)
I
..I
I
I
I
I
I
I
94
where the summations are over the three values that n
I.
I
I
and n
jm
can
assume.
E(Y In. ) is given by (8.Z)-(8.4) and Pr(n. I;. ) was defined
j Jt
Jm Jm
* (i=O,l,Z).
in Chapter VII to be Pijm
bution of n
jt
and n
jm
We now derive the joint distri-
•
Consider a general mating that at two loci A and B is
x
Let c be the recombination fraction (assumed the same for both sexes)
between these two loci.
Then the gametic frequencies are:
Parent I
I_
I
I
I
I
I
I
jt
Gamete
frequency
Parent II
Gamete
frequency
AlB1
~(l-c)
AB
3 3
~(l-c)
AZB Z
~(l-c)
A B
4 4
~(l-c)
AlB Z
~c
A B
3 4
~c
AZB l
~c
A B
4 3
~c
Suppose that two sibs result from the above mating and we wish
to find Pr(n. =n. =1) in these sibs, where n. and n ' are the proJm Jt
Jm
Jt
portion of genes these sibs have i.b.d. at the A and B loci respectively.
This probability can be found by summing the squares of all
16 zygote frequencies formed when a gamete frequency from Parent I
is multiplied by a gamete frequency from Parent II.
For example,
I
..I
95
probability is [~(1-c)]2[~(1-c)]2 = (1-c)4/ l6 .
probabilities are calculated similarly, and so
-I
I
I
I
I
I
I_
I
I
I
I
I
I
I.
I
I
The 15 remaining
Pr(n. =n. =1)
Jm Jt
where
(8.11)
By symmetry we have Pr(n. =n. =O)=Pr(n. =n. =1) = ~~2, which
Jm Jt
Jm Jt
can also be established by summing the appropriate cross product
frequencies.
We now find Pr(n. =1 and n. =0). Note that n. =1 and n.t=O
Jm
Jt
Jm
J
if, for example, the first sib is A B /A B , and the second sib is
1 l 3 3
A B /A B •
1 2 3 4
The probability of this sib pair is
There are fifteen other sib pairs that could result in njm=l and
njt=O, and all fifteen are found to have the same probability
c 2 (1-c)2/ l6 .
Hence,
By symmetry we have
Pr(n. =0 and n. =1) = Pr(n. =1 and n. =0) = ~(1_~)2
Jm
Jt
Jm
Jt
I
96
..I
I
I
I
I
I
I
The marginal distribution of TI jm (and TI ) is given by
jt
= \ i f TI. =0
Jm
f(TI. )
Jm
I.
-
I
(8.12)
= \ i f TI. =1
Jm
Hence, the remaining probabilities in the joint distribution of TI.
Jm
and TI. can be obtained by subtraction. For example,
Jt
Pr(TI. =1 and TI't=~)
Jm
J
Pr(TI. =1)
Jm
Pr(TI. =1 and TI. =0)
Jm
Jt
- Pr(TI. =1 and TI. =1)
Jm
Jt
\ _ \~2 _ \(1_~)2
= ~~(l-~)
Similarly, the other probabilities in Table 8.2 can be obtained.
I_
I
I
I
I
I
I
= ~ i f TI jm=1-.<2
TABLE 8.2
JOINT DISTRIBUTION OF TI. AND TI t
j
Jm
TI.
Jm
TI j t
Total
0
~
1
0
1 ~2
~
~~(l-W)
\(l-~)
~
~~(l-~)
~(1-2~+2~2)
1
\(l-~)
2
\~2
~
\
We now can find E(y.I;.).
J Jm
Tables 8.1 and 8.2 we have
2
~~(l-~)
~~(l-~)
\
Total
\
~
\
Using (8.10), (8.2)-(8.4) and
I
97
..I.
2
*"
+ Y(l-Y)p. + ~Lp*
'I;
S
1Jm'
OJffi
['
i~
( o 2 -I- 0 2 ,- /'U 2)
t- 2~'ll-~')ro.
I
a
E;
I
I
I
I
I
I
::: 0
2
£
==
(
2
2
.
Jm
. -];
2
*
(1·-2'i'+2~')p.
1Jm
] +
* +
+ 2It'(I-'j')p,
2Jm]
ok
d
*
-P . )
lJITl
2Jm
* +P * .
a 2+ 2
Og' [2 (1-2\f1) (%PI'
z
£:
+
2jm
* + (l-y) 2'P2jm
* +
+ 20 [~ (l-Pl' -P , ) + r(I-~)PljITl
g
Jill 2Jm
+ a2 )[2~(I-r)(1-p * .
2
.g
(0
.
.
Jm
Jill
)
* +
+ (1-2\f1+2,¥ 2 )P ljm
*
2~(1-\fI)P2jm]
.2*
+ 2'1'] + a2
(l-·2Y) P .
d
lJm
(8.13)
I_
I
I
I
I
I
I
2r
o L(l-~) p,
E(Y,I;, )
J I JD!
2
When 0d=O, we see from (8.13) above that
S in
the regression
model (8.8) may be written
[3= 2(1-2'1')0
2
When ad
that
7f,
Jill
~
2 2
-2(1-2c) a
2
==
g
g
0 this result will still hold approximately.
can be written in the
we would expect high values of
"forill~,
Jill
"-
~,
Jm
=
1
*
-PO,).
Jill
Jm
*
~(1+P2'
(8.14)
Note
Thus,
to be associated with high values
*Jm and low values of PO'
*Jill , and vice versa. However, there is
of P2'
"no reason to associate large values of~, with either high or low
Jm
* •
values of PI'
Jm
For this reason the bracketed expression in (8.13)
will be approximately the same for all
va~nes
of
A
~jm
and hence
(8.14) will hold approximatcly even when dominance is present.
I.
I
I
I
..I
I
I
I
98
Thus the regression procedure described in the previous section
can be used with
1_
I
I
I
I
I
I
I.
I
I
replacing n. , and the hypothesis that S=O can
Jt
A
be tested approximately by comparing the calculated
S to
its esti-
A
mated standard error:
a significantly large
and so linkage is present.
lsi
indicates that c#~,
Note, however, that it is only possible
to detect linkage, not to estimate c, using this regression procedure,
2
since c is confounded with 0 •
g
Finally, suppose there are K trait loci, each linked to the
marker locus.
I
I
n.Jm
Then (8.14) will hold for each trait locus separately,
and if the trait loci are mutually unlinked and there is no epistasis
A
E (S)
where
i
th
0: is
1
K
= -2.E
(1-2c.)
1= l
1
2 2
(8.:\.5)
0.
1
the
contribution to the total genetic variance of the
.
trait locus and c. is the recombination fraction between it and
1
the marker locus.
The equality is exact if there is no dominance.
An even stronger result holds if linkage equilibrium among the
trait loci is assumed.
At linkage equilibrium the genetic effects
at two loci are independent, which implies that (8.15) will hold at
equilibrium even if the trait loci are linked, as long as the effects
at the different loci are additive (i.e., there is no epistasis).
Thus a significantly large
lsi
indicates that there is a linkage
relationship between the marker locus and one or more trait loci.
8.3
Detecting Linkage
EY Nonparametric
Methods
If there is a major trait gene located near a marker, then there
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
I.
I
I
99
should be a definite (inverse) association between the sib pair
difference IX1j-X2jl and n jm , the proportion of genes i.b.d. at the
marker locus.
On the other hand, if there is no major trait gene
near the marker, then Ix ,-x , I and n. should be independent. Hence
1 J 2J
Jm
standard rank correlation procedures, such as Spearman's Rho and
Kendall's Tau, can be used as a test of such linkage.
In the ana1y-
sis n. is replaced by its estimate n, , as defined by (7.1).
Jm
Jm
First, TI
jm
and IX1j-X2jl are separately ranked in order of
magnitude, tied scores being assigned the average of the tied ranks.
The rank correlations are then calculated by the formulas given in
A significantly large correlation implies that there
Section 4.7.
is either a relatively large genetic effect at a moderate distance
from the marker, or that there is a smaller genetic effect close to
the marker.
This test procedure is easy to apply, requiring only the ca1A
cu1ation of n. and the sib pair differences.
Jm
Furthermore, it re-
quires no distributional assumptions for the trait of interest.
The
primarily disadvantage is that, being a nonparametric test with n,
Jm
A
estimated by n. , it is likely to require relatively large samples
Jm
in order to detect anything but fairly close linkage.
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
CHAPTER IX- MAXIMUM LIKELIHOOD ESTIMATION OF LINKAGE
One disadvantage of the methods discussed in the previous
chapter is that 0
2
g
is confounded with the recombination fraction
c, and hence although linkage can be detected, it can not be estimated.
In this chapter we show how maximum likelihood techniques
can be used to overcome this difficulty.
9.1
Deriving the Likelihood Function
We assume there is a two-allele trait locus, with genetic
effects given by (8.1), located at a linkage distance c from a
multi-allele marker locus.
Jm
and n. the proportion
Jt
of genes i.b.d. at marker and trait loci respectively for sib pair
j.
We assume linkage equilibrium for trait and marker loci and
also assume that sib pair differences are normally distributed.
More precisely, we assume that
of seven normal distributions.
X
lj
-X
2j
is distributed as a mixture
From the values of Y. given in
J
Table 8.1 we see immediately that if E(E.)=O the means of these
J
distributions are 0, a-d, a+d, -a-d, 2a and -2a, depending upon the
sib pair genotypes.
We shall assume E(Ej)=O, i.e., the data have
been corrected for any sib order effect, and so the variance of
each distribution is
I.
I
I
We denote by n.
0
2
E
•
Without loss of generality we can reduce the number of distri-
I
Ie
I
I
I
I
I
I
I
101
butions from seven to four by considering only the absolute pair
differences.
This is reasonable, since the order of the sibs'
scores is unimportant if we correct for age.
consider only the absolute differences Dj
J
at the trait locus is given by
f
1
= f(D.1 Sibs BB-BB, bb-bb) = 1.
fI exp(-D 2 /2a 2 )
a E "" TI
or Bb-Bb
J
j
E'
D.>O
J-
(9.1)
=
f (D.
1
J
1
Sibs BB-Bb) = or Bb-BB
aE
=
f
I.
I
I
IX 1j -X 2j 1.
The distribution of D., conditional on the sib pair genotypes
0
I_
I
I
I
I
I
I
=
Thus, we henceforth
3
=
f(D .1
J
Sibs Bb-bb) = 1
or bb-Bb
aE
0
otherwise
2
-(D .-a+d)
J
--;exp ( _ 0 4--=----
{2
V
2a~
7f
),
(9.2)
otherwise
ft
- exp(
-(Dj-a-d)
2
)
2a 2
7f
, D.>O
J(9.3)
E
= 0
f
4
=
f(D .1
J
otherwise
1 ~.
Sibs BB-bb) = or bb-BB
aE
-
exp(
-(D j -2a)
7f
2a2
2
),
E
0
D.>O
J-
(9.4)
otherwise
If we knew the sib pair genotypes at locus B for all sib pairs,
the likelihood function could be easily constructed.
Instead, we
have information on the sib pair phenotypes at the marker locus
and possibly, in addition, the phenotypes of one or both parents
I
..I
I
I
I
I
I
I
--I
102
at this locus.
locus, I , can be used to obtain the likelihood function for an
m
observed sib pair.
The likelihood function for sib pair j may be written
I
I
=
L = f (D. 1 I )
J m
f (D. and I
f(D. and I )
m
J
'TTL:
jt
Pr(I )
J
I'TT. )Pr('TT. )
Jt
m Jt
Pr(I )
ill
m
f(D.I'TTjt)pr(I 1'TT't)Pr('TT't)
J
m J
J
Pr(I )
(because of linkage
equilibrium)
m
Pr(I
= 'TTL:
't
f(D,I'TT't) {
J
J
L:
L:
= 1T
'TT
jt jm
J
L:
1T.Jm
_ _m
and 'TT'tl'TT,
)Pr(1T j m_
)
....J_-"'-Jill
_ _--..o_.....
}
Pr(I )
m
f(Djl'TT, )Pr(I 11T, )Pr('TT, l'TT j )Pr(1T, )
Jt
m Jm
Jt
m
Jm
Pr(I )
m
(because of linkage equilibrium)
I
I
~
We now show how this information at the marker
=
Pr('TT't=~hl1T, =~k), apart from a factor of 2 or 4, is given in
J
Jm
Table 8.2 and pr(1T m=~kIIm) can be obtained from the Classification
j
Tables in simple cases, or found numerically in more complex situa-
I.
I
I
tions by the use of (6.1).
I
..I
103
We now find f(D.ITI. =~h).
J
Note that D., conditional on TI. ,
]t
J
Jt
is distributed as a mixture of the four distributions given by
(9.1)-(9.4), Le.,
I
I
I
I
I
I
(9.6)
where
~i = Pr(D j has density function fiITIjt=~h)
(h=0,1,2 and i=1,2,3,4)
The coefficients
in Table 9.1.
•
~i
can be calculated from Table 8.1 and are given
Thus,
= Pr(sibs BB-BB, bb-bb or Bb-Bb TI.Jt =0) = P 4+ q 4+4P 2q 2
ae
I
I
I
I
(9.7)
.
3
Pr(sibs BB-Bb or Bb-BB TIjt=O) = 4p q
etc.
TABLE 9.1
VALUES OF THE COEFFICIENT
~i
i
1
0
h
2 2
4
p 4+q +4p q
2
3
4
4p 3q
4pq 3
222
p q
1
l-2pq
2p 2q
2pq 2
0
2
1
0
0
0
Thus, f(D. ITI't=~h) can be obtained using (9.1)-(9.4), (9.6)
J
J
and Table 9.1 and hence all elements in the likelihood (9.5) can
be calculated.
There are five parameters in the likelihood func-
I
..I
104
2
c, p, 0 , a and d (if the gene frequencies at the marker
tion:
E
locus are unknown, they too can be appropriately estimated).
After
ML estimates of these five parameters have been obtained, the esti-
I
I
I
I
I
I
a-
I
mated additive and dominance variance can be calculated by substituting the parameter estimates for the true values in (4.3) and (4.4).
9.2
Obtaining the Maximum Likelihood Estimates
Because of the complexity of the likelihood function, computer
methods must be used in order to find the ML estimates.
Note,
however, that little information is needed beyond that already
supplied for the computer calculation of ~'" . .
Jm
The only additional
information required in order to evaluate the likelihood function
are pr(IT
pairs.
j t
I~.
Jm
)
and f(D. I~.t)' which are constant for all sib
J
J
Thus, the only probabilities in the likelihood function
that vary from sib pair to sib pair are those given by (6.1).
Once
the likelihood has been programmed, various methods are available
for calculating the ML estimates; the simplest is to search the
likelihood surface directly, as explained elsewhere (Elston and
Kaplan, 1970).
Finally, it might be noted that if we make the simplifying
I
I
I
I.
I
I·
assumption that O~=O, then the number of distributions involved is
reduced from four to three, and the number of parameters to be estimated is reduced from five to four.
The ML procedure described
above can be modified accordingly to permit estimation of c, p,
and a.
o~
I
..I
I
I
I,
I
I
I
..
CHAPTER X- ESTIMATING LINKAGE BETWEEN MARKERS WHEN
BOTH PARENTAL PHENOTYPES ARE UNKNOWN
Although the problem of detecting linkage from sib data has
been dealt with before (see Chapter II), most work in this area
concerns ixself only with the detection rather than the estimation
of linkage.
A second shortcoming is that those studies that do
estimate linkage are restricted to rather simple cases in which
both parental genotypes are known.
genotypes will not be known.
Often, however, the parental
Although it is more difficult to de-
tect linkage in the absence of parental information, a general
maximum likelihood estimation procedure for this purpose can be
I
I
which can easily be adapted for computer use, can handle any pat-
I
IO.IDerivatiou'of the Likelihood Function
I
I
I
I.
I
I
derived from the results of the previous chapters.
tern of
dominan~e
This procedure,
and any number of alleles.
Suppose we have data for n pairs of sibs, all parental phenotypes are unknown, and we wish to estimate the linkage distance c
between two loci A and B.
We assume the sib pairs are independent
and there is linkage equilibrium for both loci.
Let
TI
jA
and
TI
jB
be the proportion of genes fob.d. for loci A and B respectively for
sib pair j.
Then for each locus the sibs are one of seven sib pair
I
Ie
I
I
I
I
I
I
I
a-
I
I
I
'I
106
types with corresponding frequencies as indicated in Table 6.3.
T
and TjB
jA
de~ote
the observed sib pair type for pair j for loci
A and B respectively.
The likelihood for this pair may be written
Lj = pr(T jA and TjB ) = pr(TjAITjB)pr(TjB)
2
2
I pr(TjAITIjA=~h)pr(TIjA=~hITIjB=~k)pr(TIjB=~kITjB)pr(TjB)
h=O k=O
= I
(using (9.5))
2
2
I pr(TIjB=~h and TIjA=~k)pr(TjAITIjA=~k)pr(TjBITIjB=~h)
k=O h=O
(10.1)
= I
The first of these probabilities can be obtained from Table 8.2
and the other two from Table 6.5.
For example, suppose that each
gene has only two alleles and that sib pair j is AABB-AaBB.
Let
PA' l-PA' PB and l-PB be the gene frequencies for alleles A, a, B
and b respectively.
Then from (10.1) and Tables 8.2 and 6.5 the
likelihood for this sib pair may be written
2
3
4
.
2
4
Lj = ~~ {4PA(1-PA)}PB + ~~(1-~){2PA(1-PA)}PB + 0 +
3
3
2
2
3
~~(l-~){4PA(1-PA)}PB + ~(l-2~+2~ ){2PA(l-PA)}PB + 0 +
23222
~(l-~) {4PA(l-PA)}PB +1~~(l-~){2PA(l-PA)}PB + 0
I
I
It is easy to program a computer to evaluate the likelihood
I.
I
I
Let
I
Ie
I
I
I
I
I
I
,
I_
I
I
I
I
I
I
I.
I
I
107
Simply store Tables 8.2 and 6.5 as matrices S (3x3)
numerically.
and T (7x3) respectively, specifying the gene frequencies if they
are known.
The likelihood for sib pair j is then simply
(10.2)
L.
J
where !lj and
~2j
are 3xl vectors of T corresponding to the ob-
served sib pair types for sib pair j for loci A and B respectively.
If only the sib phenotypes are known, a simple modification can be
made as follows:
the observed pair.
specify which sib pair types could account for
Th~
L:
i
likelihood is then simply
,
L: t .. St '
k - 1J1. - 2J k
(10.3)
where for each locus the summation is over all sib pair types that
could give rise to the observed pair.
Finally, the overall likelihood L for n sib pairs is simply
the product of the L , and computer techniques can be used to find
j
the ML estimate of the recombination fraction c.
If the gene fre-
quencies are unknown, they too can be estimated.
10.2
Example of the Estimation Procedure
As a practical example of this procedure, blood grouping data
for the 46 pairs of dizygotic twins of Gottesman's Harvard Twin
Study (1966) were analyzed.
The ML technique described above was
used to estimate the linkage distance c between the ABO and Rhesus
blood groups, the ABO and MNS blood groups and the Rhesus and MNS
I
..I
I
I
I
I
I
I
a-
t
I
108
blood groups.
between any of these groups.
In order to simplify the estimation
procedure, the gene frequencies for the three blood groups were first
estimated separately by ML procedures.
The ML estimates were then
used as the true gene frequencies in the ML estimation of c.
To illustrate how the gene frequencies were estimated, consider
a twin pair that for the ABO locus is A B-A .
l
2
This implies that the
pair must be either A B-A A or A B-A 0, and from Table 6.3 the
l
l
2 2
2
likelihood of this sib pair is
Similarly, all sib pair likelihoods can be obtained and ML estimates
of the gene frequencies for the three blood groups can be found.
Table 10.1 gives the ML estimates and standard errors for the
Harvard Twin Study blood group data.
TABLE 10.1
ESTIMATED GENE FREQUENCIES FROM THE
HARVARD TWIN STUDY BLOOD GROUP DATA
I
I
I
I
I.
I
I
Previous studies have shown no evidence for linkage
ABO
MNS
Rhesus
Al
.1947 ± .0367
CDe
.4293 ± .0453
MS
.2724 ± .0395
A
2
.0990 ± .0284
cde
.4044 ± .0450
Ms
.2495 ± .0383
0
.6265 ± .0454
cDE
.1361 ± .0306
NS
.0268 ± .0153
B
.0798 ± .0221
Cde
.0076 ± .0075
Ns
.4513 ± .0496
cdE
.0076 ± .0075
CWDe
.0150 ± .0106
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
109
The estimated gene frequencies for the ABO and Rhesus blood
groups from Table 10.1 agree closely with estimates obtained
for other Caucasian populations.
Agreement is not as close for
the MNS blood group, but the estimates are not radically different
(e.g., Race and Sanger, 1968, give:
MS-.2546; Ms-.3043; NS- .0607
and Ns-.3804).
The gene frequency estimates of Table 10.1 were then taken
as the true gene frequencies, and the ML estimate of linkage between the blood groups was calculated.
These estimates and their
standard errors are given in Table 10.2.
TABLE 10.2
ML ESTIMATES OF LINKAGE BETWEEN BLOOD GROUPS
USING THE HARVARD TWIN STUDY DATA
Blood groups
ML estimate
of c
estimated standard
error
ABO-Rhesus
.5
.6846
MNS-Rhesus
.5
.2109
ABO-MNS
.5
.1907
Not suprising1y, we see from Table 10.2 that there is no
evidence of linkage between any of the blood groups.
Note that
the estimated standard errors are fairly large, since the analysis
was based on only 46 twin pairs.
In order to determine the practical value of this ML procedure,
further work is necessary.
Monte Carlo studies are now in progress
to determine how well this procedure detects linkage for various
sample sizes and for various values of the recombination fraction
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
110
between 0 and .5.
The results of this study will be made known in
a future communication.
Preliminary evidence indicates that for no linkage the ML estimate of the recombination fraction is usually exactly .5, with a
fairly large standard error; for loose linkage the estimate is still
often .5, but the standard error is reduced.
For tight linkage the
estimate is often zero, and only for moderate linkage does the estimate fall within the interval 0 to .5.
The tendency of the estimated recombination fraction to fall at
the endpoints of the interval is reduced if the sample size is increased.
Nevertheless, because of this tendency, further research
may reveal that the procedure is best used only to detect linkage,
rather than to estimate the recombination fraction.
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
~
I
I
CHAPTER XI- AN EXAMELE OF THE GENETIC ANALYSIS OF
QUANTITATIVE TRAITS USING SIB PAIR DATA
In order to determine the practical value of the test procedures
described in the previous chapters, the data from Gottesman's (1966)
Harvard Twin Study were analyzed.
First, the data were subjected to
the twin analyses of Chapter III; then, since dizygotic twins are
genetically the same as full sibs, the sib analyses described in
Chapters VIII and IX were performed using these twin pairs.
11.1
Data
The final sample used in the Harvard Twin Study consisted of
147 pairs of same-sex twins taken from greater Boston area schools
(grades 9-12).
The breakdown by sex and zygosity is:
34 male mono-
zygotes, 45 female monozygotes, 32 male dizygotes and 36 female
dizygotes.
All subjects were administered the Minnesota Multiphasic
Personality Inventory (MMPI) and 63 subtest scores were recorded
for each subject.
The first column of Table 11.1 gives the 63 sub-
test scores used.
For an interpretation of the underlying factors
being measured by these subtest scores, see Dahlstrom and Welsh (1960).
For the Harvard Twins it was found that one variable (He) was essentially a dummy variable, since all 294 subjects had scores of 50
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
112
on this variable.
Blood grouping data was also collected for 40 of the 68 dizygotic twin pairs and 76 of the 79 monozygotic twin pairs.
Recall
from the previous chapter that gene frequency estimates for the blood
groups were based on 46 dizygotic twin pairs.
The slight discrepancy
in sample size is due to the fact that six dizygotic twin pairs were
not included in the final sample of 147 because one or both twins
invalidated their MMPI scores as determined by the Lie scale.
11.2
Results of the Genetic Analysis
First, Assumptions I-IV of Section 1.7 were made and unweighted
2
2
least squares estimates of 0g' 0e and the environmental covariance C
were calculated by (3.5) for each variable.
Then weighted least
squares estimates of these three parameters were calculated by (3.9),
using the iterative procedure described in Section 3.1.2.
Table 11.1
gives the resulting weighted least squares estimates for the MMPI
variables in order of estimated heritability.
TABLE 11.1
WEIGHTED LEAST SQUARES ESTIMATES OF THE
GENETIC PARAMETERS FOR THE MMPI VARIABLES
Variable
Lie
MaS
PaS
D
Ul
Pt
Rosen Sm
Mal
Sel
A2
g
°
55.32
75.36
67.30
63.66
59.72
58.76
54.19
44.13
46.58
A2
e
°
8.89
31.55
31.50
38.89
36.74
39.14
37.70
32.10
37.52
A
C
-16.76
-34.64
-25.56
-26.94
-14.58
-29.59
-19.04
-13.81
-9.12
Estimated
heritability
.86
.70
.68
.62
.62
.60
.59
.58
.55
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I'e
I
I
113
TABLE 11.1 (continued)
Variable
Si
D
Pt
Pd
Do
PaO
Welsh R
Pa
PdA
Pd
HyS
Pa
Hy
MaO
ScI
Sc
Sit
Sc
Co27
PdS
Es
K
Lp
D'
PdB
F
Ds
Ma
Eo
Rosen Cr
HyO
Nu
Fm
Welsh A
Edwards So
Taylor At
Rosen Ar
Rosen Dr
Ma
PdO
N
Hs
Et
Mf
No
Rosen pz
SCI
2
N.
6g
°e
39.80
58.08
53.68
55.88
91.68
72.03
49.65
47.39
55.21
34.81
46.06
43.29
33.94
69.39
37.35
54.01
42.63
32.98
44.16
31.65
28.60
29.25
40.87
38.81
33.35
31.13
43.39
24.31
36.91
25.88
27.74
24.79
37.25
27.11
24.76
27.81
14.78
21.10
20.60
15.35
13.82
7.78
12.47
11.86
6.78
5.43
4.29
32.88
48.22
46.85
50.90
83.92
65.93
45.62
46.30
55.82
36.16
48.73
47.31
37.16
80.76
43.91
68.46
56.25
45.15
65.54
47.27
45.96
48.30
72.99
70.10
61. 75
59.64
88.86
48.21
76.12
56.82
62.16
56.97
86.35
68.69
65.83
81.73
57.60
82.96
89.98
71. 76
81.23
54.50
90.26
91.60
80.96
68.66
68.15
C
Estimated
heritability
-8.79
-17.30
-4.69
-3.22
-12.25
-26.28
-3.80
-30.15
-10.59
0.21
1.43
-27.51
-3.15
-6.39
-7.88
-11.05
-3.91
2.88
2.52
-1.00
5.51
7.72
7.78
7.88
5.89
3.73
4.00
5.61
20.66
7.48
21.02
14.96
11.03
22.01
18.96
21.68
21.11
22.38
18.24
13.84
30.27
17.55
41.37
36.20
6.14
21. 40
17.88
.55
.55
.53
.53
.52
.52
.52
.51
.50
• 49
.49
.48
.48
.46
.46
.44
.43
.42
.40
.40
.38
.38
.36
.36
.35
.34
.34
.34
.33
.31
.31
.30
.30
.28
.27
.25
.20
.20
.19
.18
.15
.12
.12
.11
.08
.07
.06
A
I
..I
I
I
I
I
I
I
114
TABLE 11.1 (continued)
Variable
Dy
Hs
Pn
CoB
Dq
Rosen Dr
,,2
ag
-1.31
-3.22
-7.45
-6.48
-16.22
-24.36
ae
,,2
"
C
Estimated
heritability
91.80
90.18
105.48
90.28
97.22
113 .03
38.88
40.51
44.50
36.76
53.19
38.76
-.01
-.04
-.08
-.08
-.20
-.27
Note from Table 11.1 that 25 of the 62 variables, including the
18 with the largest estimated heritability, have negative environmental covariance estimates.
This association between large herit-
ability estimates and negative covariances is not suprising, since in
Section 3.1.1 we showed that invalidity of Assumptions I-IV will
result in likely overestimates of 0'2 and underestimates of 0'2 and C.
g
e
a-
Thus a negative covariance estimate suggests either that dominance
t
I
I
I
I
I
I.
I
I
zero; or that the environmental covariance is not the same for mono-
or epistasis is present; the genotype-environment covariance is not
zygotic and dizygotic twins.
Alternatively, a negative covariance
may reflect a true state of nature.
That is, the variable may just
be one in which the environmental forces tend to produce dissimilar
scores for members of the same twin pair.
In order to determine how well the model fits the data, the
observed mean squares for each variable were compared to those
"expected," Le., those obtained by substituting the least squares
estimates for the parameter values in the expected mean squares of
(3.1).
The criterion chosen to measure model fit was "SQD," the sum
of squared pair differences between observed and expected mean
I.
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
115
squares.
good.
It was found that for a number of variables, the fit was
Table 11.2 gives SQD for the 23 variables having the best
model fit.
Since large SQD scores indicate invalidity of the model,
the genetic analysis for variables with high SQD scores are of
questionable value.
TABLE 11. 2
THE 23 MMPI VARIABLES WITH THE BEST MODEL FIT
Variable
sQn
0.071
0.084
0.105
1.369
2.279
2.438
2.685
3.028
5.886
6.994
13.175
19.220
19.387
23.464
25.319
28.425
31.417
41.644
43.354
46.988
47.021
53.470
53.509
Ma
Edwards So
Es
Rosen Cr
Fm
Ma
MaO
C027
Taylor At
N
Ma'
Nu
Sc
HyO
Lie
Pt
No
Pt
Sit
PaS
Hy
PdA
Rosen Sm
The hypothesis that
0
2
=0 was then tested by four different
g
procedures for each variable.
The first test procedure used the
ratio of the weighted least squares estimate of
0
2
g
to its estimated
standard error and will hereafter be called the "Ratio Z Test."
I.
I
I
I
..I
I
I
I
I
I
I
a-
I
I
I
I
I
I
I.
I
I
116
The second test was the exact F test given by (2.14).
The third
test procedure was the approximate F test (3.18), which utilizes
more information than does the exact test.
Finally, the Mann Whitney
Test was performed, using the normal approximation (3.31).
Table 11.3 gives the 25 variables found to have a significant
genetic variance by at least one of the test procedures.
Of these
25 variables, 11 were judged to have a sufficiently poor model fit
(an SQD of 350 or more) to exclude them from further consideration.
Of the remaining variables, only Lie and PaS (subtle paranoia) were
found to have a significant genetic variance by all four tests.
Note also that these two variables are among the 23 best in terms
of model fit.
There is also clear evidence of a genetic factor for
the following variables:
Pd (psychopathic deviate), Si (social
introversion), MaS (subtle hypomania) and Pt (psychastheria).
However, the genetic influences are most apparent for Lie and
PaS, and these two variables are used later in the sib pair analyses.
The Lie scale was first introduced into the MMPI as a basis for
evaluating the general frankness with which the subjects were answering the test.
It is also sensitive to the subject's tendency
to cover up and deny undesirable personal faults (Dahlstrom and
Welsh, 1960).
The PaS subscale is due to Wiener (1948) and is de-
signed to measure paranoia by "subtle" rather than "obvious" test
items.
It is noteworthy that paranoid schizophrenics made up a
large proportion of the patient sample used in the derivation of
this particular scale.
Thus, we have evidence to support the hypo-
thesis that heredity plays a major role in schizophrenia, a hypothesis
I
..I
I
I
I
I
I
I
117
that has also been supported by results from a number of other
studies in this area.
major trait gene may be responsible for this particular variable.
TABLE 11.3
MMPI VARIABLES WITH SIGNIFICANT GENETIC VARIANCE
Variable
I.
I
I
Ratio Z
test
Lie
MaS
PaS
Ul
Pt
Rosen Sm
Pd
Ma'
Pt
Si
PdA
Hy
Es
HyO
D***
Sc1***
Welsh R***
Do***
HyS***
Pd***
D***
PaO***
Pa***
ScI***
Pa***
I_
I
I
I
I
I
I
Later we shall find evidence that a single
3.062**
2.204*
2.197*
2.104*
1.900*
1. 892*
1.881*
1.881*
1.838*
1. 834*
1.654*
1.641
1.392
1.288
1.949*
1.881*
1. 851*
1.802*
1.785*
1. 784
1.771*
1.647*
1.517
1.498
1.434
Exact F
test
Approximate
F test
1. 993**
2.078**
1.569*
1. 590*
1.582*
1.521*
1. 478)'~
1. 516*
1.481*
1.427
1. 780**
1.515*
1.453
1.475*
1.564*
1. 636*
1.432
1.506*
1.690*
1.479*
1.508*
1.350
1.369
1. 756**
1. 791**
1.245
1.659*
1.274
1. 770**
1. 798**
1. 700*
1. 693*
1.615*
1. 676*
Mann Whitney Z
statistic
1. 478)'~
1.416
1. 421
1.354
1.337
1.484*
1.499*
1.502*
1.477*
1. 487*
1.484*
1.443
1.391
1.310
1.361
1.289
2.531**
0.798
1.948*
0.416
1.348
0.581
0.633
1.488
1.183
1.548
0.909
1.600
1. 750*
1.841*
0.853
1. 968*
1.193
1.841*
1.136
1.177
1.278
0.686
2.016*
1.280
1. 943*
* Significant at .05 level
** Significant at .01 level
*** These variables judged to have inadequate model fit
Computer techniques were employed to obtain ML estimates of
o
0
2
and C for the Lie and PaS variables using the log likelihood
e
(3.22).
It was found that the ML estimates differed only slightly
2
,
g
I
..I
I
I
I
I
I
I
ae
I
I
I
I
I
I
I.
I
I
118
from the weighted least squares estimates of Table 11.1.
Table 11.4
compares the results of these two methods of estimation.
TABLE 11.4
COMPARISON OF ML fu~D WEIGHTED LEAST SQUARES ESTIMATION
OF THE GENETIC PARAMETERS FOR Lie AND PaS VARIABLES
Variable
Method of
estimation
a"'2g
a"'2e
'"
C
Lie
ML
Weighted L.S.
55.286
55.317
8.314
8.886
-17.341
-16.764
PaS
ML
Weighted L.S.
66.820
67.303
32.200
31.500
-25.640
-25.561
Having established the strong effect of heredity on Lie and PaS,
we next attempt to link these variables to the ABO, Rhesus and MNS
blood groups.
For this purpose the sib pair analyses of Chapters VIII
and IX are employed on the 40 dizygotic twin pairs for which blood
grouping data are available.
First, the assumption of no association (and hence linkage
equilibrium) between trait and marker locus was tested for each
possible marker-trait pair (the markers being the three blood groups
and the trait loci being the ones responsible for Lie and PaS).
The
assumption was tested by determining whether or not the phenotypes
for a particular blood group differed significantly with respect to
the variable of interest using all the Harvard Twin Study data.
example, Table 11.5 gives the observed means for Lie and PaS for
each ABO phenotype.
For
I
119
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
TABLE 11.5
OBSERVED MEANS FOR Lie AND PaS VARIABLES
FOR THE ABO PHENOTYPES
ABO phenotype
sample size
Lie
PaS
0
102
46.843
53.922
Al
A
2
A
70
47.171
55.829
26
49.308
52.308
26
48.846
56.000
5
50.600
52.000
3
49.333
57.333
AlB
AB
2
An analysis of variance reveals that the phenotypes do not differ significantly for either variable.
The F statistic (with 5 and
226 degrees of freedom) was calculated to be 0.759 for Lie and 0.791
for PaS.
Hence, the assumption of no association seems to be a valid
one here.
A similar result was found for the Rhesus blood group, the F
values being (9 and 222 degrees of freedom) 1.634 for Lie and 1.462
for PaS, both values nonsignificant at the .05 level.
However,
significant differences were found for the MNS system, "the F values
being (7 and 224 degrees of freedom) 2.092 for Lie (significant at
the .05 level) and 2.785 for PaS (significant at the .01 level).
This implies that our model does not hold in this case, and hence the
MNS system was excluded from further analyses with these two variables.
However, this association implies that the 8 MNS phenotypes differ significantly with respect to the trait of interest, which, if
it is not just a chance occurence, is an interesting result in itself;
it would be worth the attempt to discover the cause of such a phenomenon.
Table 11.6 gives the means for the 8 MNS phenotypes.
I
120
..I
I
I
I
I
I
I
--I
I
I
I
I
I
TABLE 11.6
OBSERVED MEANS FOR Lie AND PaS VARIABLES
FOR THE MNS PHENOTYPES
MNS phenotype
MSMS
MSMs
MSNS
MsMs
MsNs
NSNs
NsNs
MNSs
sample size
Lie
PaS
16
34
4
15
58
11
42
52
44.063
49.147
55.000
46.333
46.448
46.818
46.405
49.692
59.500
52.235
65.000
52.000
54.552
47.636
53.905
56.462
As a preliminary test for linkage, the simple nonparametric
tests described in Chapter VIII were applied.
First the absolute
twin pair difference (D.=lx .-x .1) was calculated for both variables
l J 2J
J
for all dizygotic twin pairs.
A
Then TI.
Jm
for the ABO and Rhesus blood
groups was calculated for all dizygotic twin pairs by the procedures
described earlier.
The rank correlations were found and are given in
Table 11. 7.
TABLE 11.7
A
RANK CORRELATIONS BETWEEN D. AND TI.
J
Variable
Marker
Lie
Lie
PaS
PaS
ABO
Rhesus
ABO
Rhesus
Jm
Spearman's Rho
.178
-.176
-.322
.039
Kendall's Tau
.144
-.118
-.250
.032
Using the large sample approximations (4.26) and (4.31), it was
found that both rank correlations between
TI.
Jm
based on the ABO system
and D. for the PaS variable are significant at the .05 level.
I.
I
I
J
other correlations are not significantly less than zero.
The
Thus, there
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
121
is evidence that the ABO blood group may be linked to a major trait
gene for PaS.
To further investigate this possibility, the maximum
likelihood procedures of Chapter IX were applied so that the recombination fraction c could actually be estimated.
the
f~
analysis are given in Table 11.8.
TABLE 11.8
ML ESTIMATES FOR PaS AND ITS LINKAGE TO ABO
Parameter
HL estimate
c
-0.1836
0.1008
0.0
p
0.5925
0.0842
0.5840
ex.
17.2216
1. 6172
17.2720
15.0403
6.3718
14.0405
5.1298
1.1861
5.1633
0
2
E
d
Restricted
estimates
Standard error
2
aa
127.8626
130.8005
2
ad
6.1360
6.2981
·
.
.
d b y su b
I n T. a b1 e 11 . 8 0a2 an d ad2 are estlmate
stltutlng
parameter
estimates for their true values in (4.3) and (4.4) respectively.
Note that the ML estimate of c falls outside the parameter range,
which is
0~c~.5.
The last column of Table 11.8 gives the ML esti-
mates when c is restricted to this range of values.
Note that there
is little change in the resulting parameter estimates.
There are several procedures that can be used to evaluate the
results of this analysis.
I.
I
I
The results of
First, the ML estimate of c is not within
two standard errors of .5, which is strong evidence that linkage is
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
I.
I
I
122
present.
The likelihood ratio test (comparing the ratio of the like-
lihood when c=.5 to the likelihood when c=O) results in
R
L(c=.5)
L(c=O)
= .0299
which again implies that linkage may be present.
Finally, if the
Bayes procedure suggested by Smith (1959) is used, the a priori
probability that c=.5 is reduced from 21/22=.9545 to .7509.
The regression analysis of Chapter VIII was then performed,
the squared pair differences being regressed on n . •
Jm
The estimated
regression coefficient was found to be 6=-499.8, which from (8.14)
implies that
6g2=249.9
if c=O.
This estimate of genetic variance does
not agree closely with that of the ML method.
However, inspection of
the data revealed one extreme observation [y.=(x .-x .)2=1296 and
l J 2J
J
;. =0], and since the regression procedure is very sensitive to such
Jm
observations, this particular observation was eliminated and a
estimated again by both methods.
ML estimate
2
g
was
The new estimates were found to be
Regression estimate
A2
aa = 104.428
A2
A2
ad = 4.778
ag = 103.600
Note that the resulting regression estimate of genetic variance
is drastically reduced by elimination of one observation, while the ML
estimate is only mildly affected.
Elimination of this observation
also reduced the nonparametric correlations (Spearman's Rho = -.266
and Kendall's Tau = -.204) to the point where they were barely sig-
I
..I
I
I
I
I
I
I
.-I
I
I
I
I
I
I.
I
123
nificant at the .05 level.
To summarize briefly:
we have found that the underlying vari-
abies being measured by the Lie and PaS scales of the
significant genetic component.
~~~I
have a
There is also evidence that a major
trait gene responsible for PaS may be linked to the ABO locus.
I
~
I
I
I
I
I
I
I
~
I
I
I
I
I
I
I.
I
I
CHAPTER XII- SUMMARY AND SUGGESTIONS
FOR FURTHER RESEARCH
12.1
Summary
In this dissertation a paired observations model is given for
the genetic analysis of quantitative traits.
The model for the special
case of twin pairs is discussed in detail, a model that should hold
in all cases where we can assume negligible biases in the sampling of
the twins and negligible effects due to non-random mating.
On the
basis of this model we indicate what further assumptions are necessary
in order to obtain unbiased estimates of genetic variance, environmental variance and environmental covariance.
Methods are presented
for estimating these parameters simultaneously from monozygotic and
dizygotic twin data.
Testing for the presence of genetic variance
is considered, by both parametric and nonparametric means.
We then consider the model for the special case of sib pairs
having
TI
of their genes identical by descent over the entire genome.
A regression procedure is described that permits unbiased estimation
of genetic variance when
TI
is known.
Maximum likelihood estimation
of genetic variance is also discussed and nonparametric procedures
for testimg for the presence of genetic variance are given.
A method
is then described for finding the maximum likelihood estimate of
when its value is unknown.
TI
I
..I
125
We then deal with the problem of detecting linkage between
a
major quantitative trait locus and a marker locus from sib pair data.
We allow for multiple allelism at the marker locus but, in view of
I
I
I
I
I
I
the numerical difficulties that would be involved in practice, not
at the trait locus.
We also allow for the incorporation of data on
the sibs' parents with regard to the marker locus.
are discussed for detecting linkage, using the estimated proportion
of genes two sibs have identical by descent at the marker locus.
In
addition, a maximum likelihood procedure is given that permits estimation of the recombination fraction between a trait and marker locus.
We also give a simple maximum likelihood procedure for estimating the
recombination fraction between two marker loci when both parental
--I
phenotypes at these loci are unknown.
I
I
I
I
I
notably Lie and PaS.
I.
I
I
Several methods
Finally, Gottesman's (1966) Harvard Twin Study data are analyzed using these test procedures.
The twin analyses give evidence
that certain MMPI variables have significant genetic components,
The sib pair analyses reveal that there may
be a single locus, closely linked to the ABO blood group, that is
responsible for a major part of the genetic variation on the PaS
scale.
12.2
Suggestions for Further Research
The assumption of random mating was an important one for all
test procedures described in this dissertation.
Since an important
consequence of assortative (nonrandom) mating is linkage disequili-
I
~
I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
126
librium~
further research is needed on the problem of allowing for
assortative mating and other situations in which linkage disequilibrium can occur.
It may also be possible to overcome the numerical
difficulties involved in generalizing the maximum likelihood analyses
in this dissertation to an m-allele trait locus.
It is believed that the methods presented in this dissertation
are more powerful and more general than those proposed so
far~
and
it is hoped that these new test procedures will be used more extensively on further sets of data.
I
..I
I
I
I
I
I
I
I_
I
I,
I
I
I
I
I.
I
I
APPENDIX I
Suppose there are N pairs of monozygotic twins and N pairs
M
D
of dizygotic twins.
The regression model (4.10) can be written
E(I)
Xl.
where! is an (NM+ND)xl vector of squared pair differences; X is a
(N +N )x2 matrix whose first N rows are (1,1) and whose next N
D
M
M D
rows are
(l,~);
l. is a 2xl vector whose two elements are a and S.
The normal equations may in general be written (e.g., Graybill,
1961)
X'Xy = X'y
which in the present case becomes
2NMMW(MZ) + 2ND~~(DZ)
I
128
..I
Eliminating a we have
I
I
I
I
which reduces to
I
I
I_
I
I
I
I
I
I
I.
I
I
A
or
as required.
I
..I
I
I
I
I
APPENDIX II
Consider the simple linear regression model (8.6) in which we
regress the squared pair differences Y. on TI. , which is assumed to
J
be known.
n
i
--I
I
I
I.
I
I
~i
of their genes i.b.d. at the trait locus.
Then in matrix notation we can write
E(~) = Xl.
where Y is an nxl column vector whose elements are Y.; X is an nx2
J
matrix whose first n 2 rows are (1,1), whose next n l rows are
and whose last nO rows are (1,0);
of 1.
i,
=
X'y
so that
X'XE(y)
=
X'E(:!)
In this particular case, we have
X'X
(l,~),
the least squares estimator
can be obtained from the normal equations
x'xy
I
I
I
Suppose that of the n sib pairs used in the analysis,
(i=0,1,2) have
I
I
Jt
I
..I
I
I
I
I
I
I
130
Since E(y) is a 2xl vector whose two elements are E(&) and
E(S), we can find E(S) by solving the following system of equations:
A
Eliminating E(a) we see that
which reduces to
--I
I
I
I
I
I
I.
I
I
or
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
BIBLIOGRAPHY
Bailey, N.T.J. Introduction to the Mathematical Theory of Genetic
Linkage, Clarendon, Oxford,-r96l.
Bernstein, F. "Zur Grundlegung der Chromosomentheorie der Vererbung
beim Menschen mit besondere Beriicksichtung der Blutgruppen." Z.
indukt. Abstamm. ~. VererbLehre, Vol. 57, 1931, pp. 113-138.
Block, J.B. "Hereditary Components in the Performance of Twins on
the WAIS." in Progress in Human Behavior Genetics, ed. by S.G.
Vandenberg. John Hopkins Press, Baltimore, 1968, pp. 221-228.
Bock, R.D. and S.G. Vandenberg. "Components of Heritable Variation
in Mental Test Scores." in Progress in Human Behavior Genetics,
ed. by S.G. Vandenberg. John Hopkins Press, Baltimore, 1968,
pp. 233-260.
Brues, A.M. "Linkage of Body Build with Sex, Eye Color and Freckling."
Am. I. Hum. Genet., Vol. 2, 1950, pp. 215-239.
Burks, B.S. "Review of Twins: a Study of Heredity and Environment."
J. Abnorm. Soc. Psychol., Vol. 33, 1938, pp. 128-133.
Burks, B.S. "A Study of Identical Twins Reared Apart under Differing
Types of Family Relationships." in Studies in Personality, ed.
by J.F. Dashiell. McGraw-Hill, New York, 1942, pp. 35-69.
Burt, C. "The Genetic Determination of Differences in Intelligence:
a Study of Monozygotic Twins Reared Together and Apart." Brit.
I. Psychol., Vol. 57, pp. 137-153.
Cattell, R.B. "The Multiple Abstract Variance Analysis Equations
and Solutions for Nature-Nurture Research on Continuous Variables. 1I
Psychol. Rev., Vol. 67, 1960, pp. 353-372.
Cattell, R.B. "The Interaction of Hereditary and Environmental
Influences." Brit. I. Stat. Psychol., Vol. 16, 1963, pp. 191-210.
Clark, P.J. "The Heritability of Certain Anthropometric Characters
as Ascertained from the Measurement of Twins." Am. I. Hum. Genet.,
Vol. 8, 1956, pp. 49-54.
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
132
Cockerham, C.C. "An Extension of the Concept of Partitioning
Hereditary Variance for Analysis of Covariance among Relatives
when Epistasis is Present." Genetics, Vol. 39, 1954, pp. 859-882.
Cotterman, C.W. A Calculus for Statistico-Genetics.
Ph.D. thesis, Ohia State University, 1940.
unpublished
Cotterman, C.W. "Factor-union Phenotype Systems." in Computer
Applications in Genetics, ed. by N.E. Morton. University of
Hawaii Press, Honolulu, 1969, pp. 1-19.
Dahlberg, G. Twin Births and Twins from
Tidens, Stockholm, 1926.
~
Hereditary Point of View,
Dalstrom, W.G. and G.S. Welsh. An MMPI Handbook, University of
Minnesota Press, Minneapolis, 1960:-Elston, R.C. and 1. 1. Gottesman. "The Analysis of Quantitative
Inheritance Simultaneously from Twin and Family Data." Am • .:!..
Hum. Genet., Vol. 20, 1968, pp. 512-521.
Elston, R.C. and E.B. Kaplan.
Paper in Preparation.
1970.
Falconer, D.S. Introduction to Quantitative Genetics, Oliver and
Boyd, Edinburgh, 1960.
Finney, D.J. "The Detection of Linkage, VI."
1942, pp. 233-244.
Ann. Eug., Vol. 11,
Fisher, R.A. "Correlation Between Relatives on the Supposition of
Mendelian Inheritance." Trans. Roy. Soc. Edinburgh, Vol. 52,
1918, pp. 399-433.
Fisher, R.A. "The Detection of Linkage with Dominant Abnormalities."
Ann. Eug., Vol. 6, 1935, pp. 187-201.
Fisher, R.A. "Limits to Intensive Production in Animals."
Agric. Bull., Vol. 4, 1951, pp. 217-218.
Brit.
Ford, E.B. "Polymorphism and Taxonomy." in The New Systematics,
ed. by J.S. Huxley. Oxford University Press, London, 1940,
pp. 493-513.
Fuller, J.L. and W.R. Thompson.
Sons, New York, 1960.
Behavior Genetics, John Wiley and
Galton, F. "The History of Twins as a Criterion of the Relative
Powers of Nature and Nurture." Pop. Sci. Monthly, Vol. 8, 1875,
pp. 345-357.
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
133
Gottesman, 1. 1. "Genetic Variance in Adaptive Personality Traits."
J. Child Psychol. Psychiat., Vol. 7, 1966, pp. 199-208.
Gottschaldt, K. "Phanogenetische Fragestellungen im Bereich der
Erbpsychologie." Z. indukt. Abstamm. u. VererbLehre, Vol. 76,
1939, pp. 118-157.
Graybill, F.A. An Introduction to Linear Statistical Models, McGrawHill, New York, 1961.
Haldane, J.B.S. "Methods for the Detection of Autosomal Linkage
inMan." Ann. Eug., Vol. 6,1934, pp. 26-65.
Haldane, J.B.S. and C.A.B. Smith. "A New Estimate of the Linkage
Between the Genes for Color Blindness and Haemophilia in Man."
Ann. Eug., Vol. 14, 1947, pp. 10-31.
Hancock, J. "Studies in Monozygotic Twins."
and Tech., Vol. 34A, 1952, pp. 131-152.
New Zealand J. Sci.
Harris, D.L. "Biometrical Genetics in Man," in Methods and Goals
in Human Behavior Genetics, ed. by S.G. Vandenberg. Academic,
New York, 1965, pp. 81-94.
Hayman, B.I. "Maximum Likelihood Estimation of the Genetic Components
of Variance." Biometrics, Vol. 16, 1960, pp. 369-381.
Hogben, L.T.
Royal Soc.
"The Dectection of Linkage in Human Families."
Vol. 114, 1934, pp. 340-363.
Proc.
~,
Holzinger, K.J. "The Relative Effects of Nature and Nurture
Influences on Twin Differences." J. Educ. Psychol., Vol. 20,
1929, pp. 241-248.
Howells, W.W. and A.P. Slowey. "Linkage Studies in Morphological
Traits." Am.~. Hum. Genet., Vol. 8, 1956, pp. 154-161.
Kemp thorne , o.
Population."
"The Correlation Between Relatives in a Random Mating
Proc. Royal Soc. ~, Vol. 143, 1954, pp. 103-113.
Kemp thorne , O. An Introduction to Genetic Statistics, John Wiley
and Sons, New York, 1957.
Kempthorne, O. and R.H. Osborne. "The Interpretation of Twin Data."
Am. ~. Hum. Genet., Vol. 13, 1961, pp. 320-339.
Kempthorne, o. and O.B. Tandon. "The Estimation of Heritability by
Regression of Offspring on Parent." Biometrics, Vol. 9, 1953,
pp. 90-100.
Kendall, M.G.
1955.
Rank Correlation Methods, Charles Griffin, London,
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
134
Kendall, M.G. and A. Stuart. The Advanced Theory of Statistics,
Vol. II, Hafner, New York, 1967.
Kloepfer, H.W. "An Investigation of 171 Possible Linkage Relationships in Man." Ann. Eug., Vol. 13, 1946, pp. 35-71.
Lenz, F. and O. von Verschuer. "Zur Bestimmung des Anteils von
Erbanlage und Umwelt an der Variabilitat." Archiv fur Rassenund Gesellschaftsbiologie, Vol. 20, 1928, pp. 425-428.
Le Roy, H.L. "The Interpretation of Calculated Heritability
Coefficients." in Biometrical Genetics, ed. by O. Kempthorne.
Pergamon Press, New York, pp. 107-116.
Li, C.C.
1955.
Population Genetics, University of Chicago Press, Chicago,
Lindgren, B.W.
Statistical Theory, MacMillan, New York, 1960.
Loehlin, J.C. "Some Methodological Problems in Cattell's Multiple
Abstract Variance Analysis." Psychol. Rev., Vol. 72,1965,
pp. 156-161.
Lowry, D.C. and LT. Shultz. "Testing Association of Metric Traits
and Marker Genes." Ann. Hum. Genet., Vol. 23, 1959, pp. 83-90.
Lush, J.L. Animal Breeding Plans, Iowa State University Press,
Ames, Iowa, 1945.
Lush, J.L. "Heritability of Quantitative Characters in Farm Animals."
in Proceedings of the Eighth International Congress of Genetics,
Stockholm, July 7-14, 1948, ed. by G. Bonnier and R. Larsson.
Berlingska Boktryckeriet, Lund., 1949, pp. 356-375.
McNemar, Q. "Special Review: Newman, Freeman and Holzinger's Twins."
Psychol. Bull., Vol. 35, 1938, pp. 237-249.
Malecot, G.
1948.
Les Mathematiques de l'Heredite, Masson et Cie, Paris,
Mann, H.B. and D.R. Whitney. "On a Test of Whether one of two Random
Variables is Stochastically Larger than the Other." Annals of
Mathematical Statistics, Vol. 18, 1947, pp. 50-60.
Maynard-Smith, S., L.S. Penrose and C.A.B. Smith. Mathematical
Tables for Research Workers in Human Genetics, J. and A. Churchill,
London, 1961.
Morton, N.E. "Sequential Test for Detection of Linkage."
Hum Genet., Vol. 7, 1955, pp. 277-318.
Am. J.
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
I
I
135
Morton, N.E. "The Detection and Estimation of Linkage Between the
Genes for Elliptocytosis and the Rh Blood Type." Am. l. Hum.
Genet., Vol. 8, 1956, pp. 80-96.
Morton, N.E. "Further Scoring Types in Sequential Linkage Tests
with a Critical Review of Autosomal and Partial Sex-Linkage in
Man." Am. l. Hum. Genet., Vol. 9, 1957, pp. 55-75.
Neel, J.V. and W.J. Schull.
Press, Chicago, 1954.
Human Heredity, University of Chicago
Newman, H.H., F.N. Freeman and K.J. Holzinger. Twins: ! Study £f
Heredity and Environment, University of Chicago Press, Chicago,
1937.
Nichols, R.C. "The National Merit Twin Study." in Methods and Goals
in Human Behavior Genetics, ed. by S.G. Vandenberg. Academic,
New York, 1965, pp. 231-243.
Noether, G.E. Elements of Nonparametric Statistics, John Wiley and
Sons, New York, 1967.
Ostlyngen, E. "Possibilities and Limitations of Twin Research as a
Means of Solving Problems of Heredity and Environment." Acta
Psychol., Vol. 6, 1949, pp. 59-90.
Owen, D.B.
1962.
Handbook of Statistical Tables, Pergamon Press, London,
Parsons, P.A.
1967.
The Genetic Analysis of Behavior, Methuen, London,
Penrose, L.S. "The Detection of Autosomal Linkage in Data which
Consists of Pairs of Brothers and Sisters of Unspecified Parentage."
Ann. Eug., Vol. 6, 1935, pp. 133-138.
Penrose, L.S. "Genetic Linkage in Graded Human Characters." Ann. Eug.,
Vol. 8, 1938, pp. 233-238.
Penrose, L.S. "Data for the Study of Linkage in Man: Red Hair and
the ABO Locus." Ann. Eug., Vol. 15, 1950, pp. 243-247.
Penrose, L.S. "The General Purpose Sib-Pair Linkage Test."
Eug., Vol. 18, 1953, pp. 120-124.
Ann.
Price, B. "Primary Biases in Twin Studies." Am. J. Hum. Genet.,
Vol. 2, 1950, pp. 293-352.
Race, R.R. and R. Sanger.
Philadelphia, 1968.
Blood Groups in Man, F.A. Davis Company,
I
..I
I
I
I
I
I
I
1_
I
I
I
I
I
I
I.
•
136
Renwick, J.H. "Progress in Mapping Human Autosomes."
Medical Bulletin, Vol. 25, 1969, pp. 65-73.
British
Roberts, R.C. "Some Concepts in Quantitative Genetics," in BehaviorGenetic Analysis, ed. by J. Hirsch. McGraw-Hill, New York, 1967,
pp. 214-257.
Satterthwaite, F.E. "An Approximate Distribution of Estimates of
Variance Components." Biometrics Bulletin, Vol. 2, 1946,
pp. 110-114.
Siegel, S. Nonparametric Statistics for the Behavioral Sciences,
McGraw-Hill, New York, 1956.
Smith, C.A.B. "The Detection of Linkage in Human Genetics."
Roy. Stat. Soc. ~, Vol. 15, 1953, pp. 153-192.
J.
Smith, C.A.B. "Some Comments on the Statistical Methods used in
Linkage Investigations." Am. {. Hum. Genet., Vol. 11,1959,
pp. 289-304.
Steinberg, A.G. and N.E. Morton. "Sequential Test for Linkage
Between Cystic Fibrosis of the Pancreas and the MNS Locus."
Am. {. Hum. Genet., Vol. 8, 1956, pp. 177-189.
Stormont, C. "Research with Cattle Twins." in Statistics and
Mathematics in Biology, ed. by O. Kempthorne and others. Iowa
State University Press, Ames, Iowa, 1954, pp. 407-418.
Thoday, J.M. "New Insights into Continuous Variation." in ProCeedings of the Third International Congress of Human Genetics,
ed. by J.F. Crow and J.V. Neel. John Hopkins Press, Baltimore,
1967, pp. 339-350.
Vandenberg, S.G. "How 'Stable' are Heritability Estimates?"
~. Phys. Anthrop., Vol. 20, 1962, pp. 331-338.
Amer.
Vandenberg, S.G., R.E. Stafford and A.M. Brown. "The Louisville
Twin Study." in Progress in Human Behavior Genetics, ed. by S.G.
Vandenberg. John Hopkins Press, Baltimore, 1968, pp. 153-204.
Wiener, D.N. "The Subtle-Obvious Factor in Vocational and Educational
Success." American Psychologist, Vol. 3, 1948, p. 299.
Wilde, K. "Mess- und Auswertungsmethoden in Erbpsychologischen
Zwillingsuntersuchungen." Archiv. fur Gesamte Psychologie,
Vol. 109, 1941, pp. 1-81.
Yasuda, N. "An Extension of Wahlund's Principle to Evaluating Mating
Type Frequency." Am. {. Hum. Genet., Vol. 20,1968, pp. 1-23 •