* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Haseman, J.; (1970)The genetic analysis of quantitative traits using twin and sib data."
Medical genetics wikipedia , lookup
The Bell Curve wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Heritability of autism wikipedia , lookup
Genetic testing wikipedia , lookup
Biology and sexual orientation wikipedia , lookup
Gene expression profiling wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic engineering wikipedia , lookup
Human genetic variation wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Public health genomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Population genetics wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Behavioural genetics wikipedia , lookup
1- ..I I [ 'I ~ I Ii -•J . . . .1 -'e· j . =1 THE GENETIC ANALYSIS OF QUANTITATIVE TRAITS USING TWIN AND SIB DATA i.·. ri.~ .Il .; by Joseph Kyd Haseman Department of Biostatistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No.671 March 1970 , I I I ~ I I JOSEPH KYD HASEMAN. The Genetic Analysis of Quantitative Traits Using Twin and Sib Data. (Under the direction of R.C. ELSTON.) A paired observations model is given for the genetic analysis of quantitative traits. sidered and the biases in the usual procedures for estimating I I I I I genetic variance from twin data are examined. zygotic and dizygotic twin data. Procedures are given for estimating and detecting genetic variance from sib pair data when the proportion of genes identical by descent over the entire genome is known for all sib pairs. Methods are also given for estimating this proportion when it is I I I I I I unknown. Procedures are described for detecting linkage between a single major trait locus and a marker locus from sib pair data. These -procedures are based upon the estimated proportion of genes identical by descent at the marker locus. A maximum likelihood procedure is given that permits estimation of both the recombination fraction and the genetic effect at the trait locus. Data from Gottesman's Harvard Twin Study are analyzed, the quantitative traits being MMPI subtest scores and the markers being the ABO, MNS and Rhesus blood groups. It is found that there may be a single locus, closely linked to the ABO blood group, that is responsible for a major part of the genetic variation on the PaS scale. I I New methods are described for estimating genetic variance simultaneously from mono- ~ I The special case of twin pairs is con- e I .. --• .... I ACKNOWLEDGMENTS The author expresses his appreciation to his advisor, Dr. R.C. Elston, who suggested the topic of this dissertation and provided invaluable guidance and counsel. Appreciation is also expressed to the other members of the advisory connnittee, Professors J.E. Grizzle, G.G. Koch, D.R. Brogan, L.V. Jones and E.M. Cramer. All of these connnittee members gave assistance and made valuable suggestions in the preparation of this dissertation. Miss Maureen Moczek and Miss Cheryl Sheps helped type the dissertation and their assistance is gratefully acknowledged. Finally, the author is indebted to Dr. 1.1. Gottesman, who permitted his Harvard Twin Study data to be used in this dissertation and to Mrs. Ellen B. Kaplan, who provided valuable assistance with the computer aspects of the data analysis. I ..I I I I I I TABLE OF CONTENTS Page ACKNOWLEDGMENTS. ii LIST OF TABLES . vi Chapter I. I aI I I INTRODUCTION AND BASIC GENETIC MODEL • • 1 1.1. 1.2. 1.3. 1.4. Introduction • • • • . . • Notation and Definitions • The Seven Mating Types •• Partitioning the Genetic Component of 1 3 6 1.5. 1.6. 1.7. Heritability . • • • • • • • • • . • • • . Underlying Model for Paired Observations . Underlying Assumptions •• Variance . . . II. I I. I I .... 8 9 10 14 REVIEW OF LITERATURE • • • • 15 2.1. 15 15 Heritability Studies 2.1.1. Early Heritability Measures • . . • • . 2.1.2. Intrac1ass Correlation as a Heritability Index • . 2.1.3. Testing the Significance of ,.,.2 b y an F test • • • • u I I . . . .. 2.2. g 2.1.4. Procedures with Ident~ca1 Twins Reared Apart . • . . • . • • . 2.1.5. Procedures that Allow for Genotype-Environment Covariance. Procedures for Detecting Linkage from Sib Data . • • • • • • • • • 2.2.1. Bernstein's Method and Fisher's U Scores •• 2.2.2. Penrose's Sib Pair Method . • • • • 2.2.3. Morton's Sequential Test for Linkage. • • • • • • • 2.2.4. Other Methods for Detecting Linkage. . . . . . . . . . . 17 22 23 25 26 26 27 28 28 I ..t iv Chapter III. I 3.2. 3.3. 3.4. I IV. 0 2 g FROM TWIN DATA. 30 from the Analysis of Variance Tables • • • • . . • . . . 3.1.1. Unweighted Least Squares Estimation 3.1.2. Weighted Least Squares Estimation Significance Tests • . . • . . Maximum Likelihood Estimation Nonparametric Test Procedures in Twin Studies. ..•• • 4.3. 4.4. 43 49 50 50 52 2 • g 4.6. Maximum Likelihood Estimation of 0 2 • • • g 2 Detecting 0 g by Nonparametric Test Procedures • 54 4.7.1. 4.7.2. 55 56 0 Spearman's Rho •• Kendall's Tau •• MAXIMUM LIKELIHOOD ESTIMATION OF THE PROPORTION OF GENES IDENTICAL BY DESCENT IN SIB PAIRS . • . 5.3. VI. 44 Weighted Least Squares Estimation of 5.1. 5.2. I 43 Genetic Variance at a Single Locus •. The Sib Pair Probability Tables for Two Alleles . • . • . • • • • . • • Genetic Covariance at a Single Locus for Sib Pairs • • • • . . . . • Estimation of Genetic Variance by Regression Analysis . • . • . • • 4.4.1. Assuming no Dominance • • • . 4.4.2. Allowing for Dominance •• 4.5. 4.7. V. 41 g 4.2. a- 30 30 33 35 37 ESTIMATION OF 0 2 FROM SIB DATA WHEN THE PROPORTION 4.1. I I I Ie I I Estimation of 2 g OF GENES IDENTICAL BY DESCENT IS KNOWN •• I I I I DETECTING AND ESTIMATING 0 3.1. I I Page Case A: Both Parental Genotypes Known •• Case B: One or Both Parental Genotypes Unknown . . • . • • • • • . • Estimation when only the Sib Phenotypes 52 55 58 58 63 are Known . . . . . . . . . • . . 67 DERIVATION OF THE CLASSIFICATION TABLES • 70 6.1. 6.2. 70 6.3. The Sixteen Classification Types • . The Classification Table when the Genotypes are Known • • . . • • . • . • • The Classification Tables when Some Genotypes are Unknown • . . . • . • . • . • • 73 77 I ..I I I I I I I ae I I I I I I I. I I v Page Chapter VII. THE PROPORTION OF GENES IDENTICAL BY DESCENT AT A SINGLE LOCUS IN SIB PAIRS. ESTI}~TING 7.1. 7.2. VIII. DETECTING LINKAGE BETWEEN A TRAIT AND MARKER LOCUS. 8.1. 8.2. 8.3. IX. X. 90 90 93 98 100 9.1. 9.2. 100 104 Deriving the Likelihood Function . . Obtaining the Maximum Likelihood Estimates. ESTIMATING LINKAGE BETWEEN MARKERS WHEN BOTH PARENTAL PHENOTYPES ARE UNKNOWN . • . . • • 105 Derivation of the Likelihood Function. • Example of the Estimation Procedure. . . 105 107 AN EXAMPLE OF THE GENETIC ANALYSIS OF QUANTITATIVE TRAITS USING SIB PAIR DATA. . . . . III Data • . • Results of the Genetic Analysis . • III 112 SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH . . 124 12.1. 12.2. 124 125 11.1. 11.2. XII. Conditional Expectation of the Squared Pair Differences . • • . . • . • . Deriving the Expected Value of the Regression Coefficient . • • . . • . Detecting Linkage by Nonparametric Methods •. 84 88 MAXIMUM LIKELIHOOD ESTIMATION OF LINKAGE. 10.l. 10.2. XI. Properties of the Estimator . Estimation for the 16 Classification Types. 84 Summary. . . . . . . . . . . . . . Suggestions for Further Research . APPENDIX I •. ......... 127 APPENDIX II . 129 BIBLIOGRAPHY • • . 131 I ..I I I I I I I ae I I I I I I Ie I I LIST OF TABLES Page Table 1.1. The Seven Mating Types. 2.1. General ANOVA Table for Paired Data • 17 2.2. Heritability Coefficients for Selected Pairs. 19 2.3. ANOVA Table for n Families of k Sibs Each 20 4.1. Sib Pair Probabilities for an A A x A A Mating 1 2 Conditional on n . • • • • . • . • . •3•4• 45 4.2. 4.3. 5.1. 8 Probabilities of Sib Pairs for a Two-Allele Locus Conditional on Parental Mating and n. ..... 46 Probabilities of Sib Pairs for a Two-Allele Locus Conditional on n. ..... 48 .. ............ Sib Pair Probabilities for an m-Allele Locus Conditional on Parental Mating and n . • • . • 59 5.2. Likelihoods for a Sib Pair that is A A -A A • 1 l 1 2 64 5.3. Probability of Sib Pairs for an m-Allele Gene Conditional on n . . . . . • • • • • . • • • 65 Probability of Sib Pairs for an m-Allele Gene Conditional on n When one Parental Genotype is Known. • 66 6.1. The 16 Classification Types 72 6.2. Classification Table: Genotypes Known . • Both Parental and Sib • . • . • • • . • . • • • . 75 6.3. Classification Table: Both Parental Phenotypes Unknown • . 79 6.4. Classification Table: One Parental Genotype and Both Sib Genotypes Known. • • . . . • • • • . . • . • 80 5.4. I ..I I I I I I I vii Table 6.5. Page Conditional Probability of Sib Pair Types Given 0, 1 or 2 Genes I.B.D.. . .•. "'- 82 7.1. n. 8.1. Conditional Distribution of Y. . . 91 8.2. Joint Distribution of n 96 9.1. Values of the Coefficient for the 16 Classification Types . • . J jm and n ~i' jt . .•...•. 89 103 Estimated Gene Frequencies From the Harvard Twin Study Blood Group Data . . • • . . . . . • 108 ML Estimates of Linkage Between Blood Groups Using the Harvard Twin Study Data. • . . . 109 Weighted Least Squares Estimates of the Genetic Parameters for the MMPI Variables . . . . • • . 112 11.2. The 23 MMPI Variables with the Best Model Fit . • • 115 --I 11.3. MMPI Variables with Significant Genetic Variance • 117 11.4. Comparison of ML and Weighted Least Squares Estimation of the Genetic Parameters for Lie and PaS Variables. 118 Observed Means for Lie and PaS Variables for the ABO Phenotypes 119 I I I I I 11.6. I. I I 10.1. Jm 10.2. 1l.1. 11.5. 11.7. 11.8. ...·. ··.···· Observed Means for Lie and PaS Variables for the MNS Phenotypes . ...· ···· ...·· Rank Correlations Between D. and n. Jm · J ML Estimates for PaS and its Linkage to ABO. ··.···· . "'- 120 120 121 I ~ I I I I I I I ~ I I I I I I ~ I I CHAPTER I - INTRODUCTION AND BASIC GENETIC MODEL 1.1. Introduction A major area of interest in human genetics is the study of quantitative traits. One important problem in this area is that of determining the degree to which a quantitative trait is genetically determined and the degree to which it is environmentally determined in a specified human population. The variance of a particular quantitative trait is assumed to be composed of a genetic component 2 2 (0 ) and an environmental component (0 ), the relative sizes of e g these values being a measure of the relative effects of heredity and environment. The problem thus reduces to one of estimating these variance components. Unfortunately, in many of the methods proposed to date, it is not clearly specified what underlying model is used and what assumptions are required in order to estimate 0 2 and o. e 2 g In this dissertation the underlying model is stated in de- tail and various methods of estimation are discussed. Furthermore, if we are to understand the mechanisms by which quantitative traits are inherited, perhaps the best approach is to attempt to map the major genes responsible for them. So far in human genetics techniques have been devised solely to detect linkage between the loci of these hypothesized major genes and those of marker genes, the genetics of which are known. If linkage is detected, I ~ I I I I I I I ~ I I I I I I ~ I I 2 there is evidence that such major genes exist. In this dissertation both the detection and estimation of linkage are considered. Rapid progress is being made in mapping the markers (Renwick, 1969) and thus the approximate location of any gene linked to them will also soon be determinable. First, a general model for paired observations is presented and a list of the most common simplifying assumptions is given (Chapter I). Then the literature in the area is reviewed and various methods of estimating cr (Chapter II). 2 g and cr 2 from twin and sib data are discussed e In particular, the biases inherent in these estimation methods are examined. In Chapter III new methods of estimating cr 2 g 2 and cr from twin data are given, requiring less stringent assumptions e than those required by the methods of analysis discussed in Chapter II. The use of nonparametric procedures in the analysis of twin data, a topic that has received little mention in the literature, is also briefly discussed. In Chapter IV new procedures are derived for estimating cr 2 g from sib data, procedures based on TI, the proportion of genes two sibs have identical by descent, which is assumed to be known for each sib pair. Chapter V describes the maximum likelihood procedure for estimating TI when its value is unknown. In Chapters VI-IX new methods are given for estimating a single major trait gene's genetic effect and distance from a marker locus. These methods are based on TIm' the proportion of genes two sibs have identical by descent at the marker locus. voted to the problem of estimating TIm' Chapters VI-VII are de- In Chapter VIII the estimator I ..I I I I I I I ae I I I I- I , •-e 3 of TI m and the sib pair differences are used in a regression analysis to detect linkage between trait and marker locus. In Chapter IX a maximum likelihood procedure is described for estimating both the genetic effect of a major trait gene and the linkage between it and a marker locus, using sib data. In Chapter X a new maximum likeli- hood procedure is given for estimating the linkage between two marker genes from sib pair data only, i.e., no information is available as to the phenotypes of the parents. Finally, in Chapter XI, data from Gottesman's (1966) Harvard Twin study are analyzed. The 63 mental traits under investigation are variables measured by sub test scores of the Minnesota Multiphasic Personality Inventory (MMPI). First, the twin analyses of Chapter III are performed to determine which variables have a strong genetic effect. These variables are then subj ected to the sib pair analyses of Chapter VI-IX in an effort to link the variables to the ABO, MNS and Rhes us blood groups. 1.2 Notation and Definitions In this section we introduce the terminology that will be used later and give some definitions. Two genes are said to be iderttital~ descent (i.b.d.) if they are derived from the same gene through division and subsequent transmission (Cotterman, 1940; Malecot, 1948). For example, sup- pose that at a particular locus there are four possible alleles: AI' A , A and A • 2 3 4 If a mating that is A A2 x A3A4 for this locus l yields two sibs that are A A and A A , then the sibs have one gene l 4 l 3 I ~ I I I I I I I ~ I I I I I I ~ I I 4 i.b.d. (AI) at this locus. If the sibs are both A A , then they l 3 have two genes i.b.d. at this locus. Two genetically related individuals have a certain proportion of their genes i.b.d. We shall denote by i.b.d. over the entire genome, and by TI , m TI the proportion of genes the proportion of genes i.b.d. at a particular locus m, for two individuals. Thus TI be 0, ~ or 1, regardless of how the individuals are related, m must Since this dissertation deals primarily with sib pairs, we let TI . and TI. Jm J . . 1y f or t h e J.th S1'b pa1r. d enote TI an d TI respect1ve m The term random mating refers to the situation in which every individual in a large population has an equal probability of mating with any individual of the opposite sex in the population. random mating for the loci we shall be working with. We assume As a direct consequence of this assumption (if there is no selection) the random mating population will be in Hardy-Weinberg equilibrium. That is, a population in which the two alleles A and a occur with gene frequencies p and q=l-p respectively, will consist of three genotypes 2 AA, Aa and aa, and the probability of these genotypes are p , 2pq and q 2 respectively. Suppose there are two alleles at each of two loci, i.e., alleles A and a with frequencies PI and I-PI at one locus and alleles Band b with frequencies P2 and l-P2 at the other. By linkage equilibrium we shall mean that the proportion of gametes in the population that respectively. I ..I I I I I I I --I I I I I I 5 If an individual is AB/ab, he can produce gametes that are AB, ab, aB or Ab, the relative proportions depending upon how tightly linked the two loci involved are. Let c denote the pro- portion of crossover gametes (Ab or aB) that are formed in this situation. Then c, often called the "recombination fraction", is a measure of the distance between the two loci and will be assumed to lie between 0 and~. It will also be assumed that c is the same for both sexes. A Ehenoset is defined to be the set consisting of all genotypes that have a certain phenotype (Cotterman, 1969). For example, if there are two alleles A and a with A dominant to a, then the two genotypes AA and Aa are phenotypically indistinguishable and hence belong to the same"Ehenoset. There are two phenosets in this situation, namely and P 2: aa A marker gene for human populations is a gene involving a single locus, the genetics of which are known. That is, we can specify the phenotype corresponding to each known genotype. In order to be a useful marker, a gene must also be EolymorEhic (Ford, 1940), by which we shall mean that the gene frequency of the most common allele cannot be too large, a commonly quoted figure being p = .99. The ABO blood group is an example of a marker gene. By a trait gene we shall mean an hypothesized gene of unknown I. I I location and effect that influences a particular quantitative trait. I ..I I I I I I I 6 Much of the present work deals with methods for detecting trait genes and estimating the linkage distances between them and various marker genes. By the term Classification Table shall be meant a table that gives the conditional probability that TI. Jm = 0, ~ or 1 for a parti- cular locus m and sib pair j, given the phenotypes of both sibs and the phenotypes of the parents (if known). 1.3 The Seven Mating Types We next make clear what we mean by a mating type, a term that has been used in the literature in two different ways. Some authors use this term to refer to genotypically distinct matings. --I However, we prefer to call these matings and will use the term I I I I I Aa x Aa. I. - i mating ~ in a broader sense as does Kempthorne (1957). To illustrate the terminology, consider the six matings in the two allele case: AA x AA; aa x aa; AA x Aa; AA x aa; Aa x aa; The first two matings are alike in a sense since they both involve identical homozygotes. the same mating~, Thus, these twomatings belong to and a mating of two identical homo zygotes will be called a Type I mating. It is very easy to show that for an ~allele autosomal gene, (M > 4) there are exactly seven mating types for diploid organisms. The proof of this is as follows: Each parent must either be a hetero- zygote (AiA ) or a homozygote (AiA ). i j homozygotes. (1) Suppose both parents are Then they must have either 2 or 0 (but not 1) allele alike, i.e., the mating must be I 7 ..I I I I I I I --I I I I I I I. I I A.A. x A.A. I. (2) I. I. I. or Suppose one parent is homozygous, the other heterozygous. Then they must have 1 or 0 (but not 2) alleles alike, i.e., the mating must be of the form or (3) Finally, suppose that both parents are heterozygotes. Then they have either 2, 1 or 0 genes alike, i.e., the mating may be or A.A. x A.A. I.J I.J or Since the above cases exhaust all possibilities we conclude that there are seven mating types. Yasuda (1968) has tentatively given names to these mating types as indicated in Table 1.1; in this table Pi' Pj' Pk and PI are the gene frequencies of Ai' Aj , A and Al respectively. k For haploid organisms there are only two mating types (A. x A. and A. x A.). I. I. I. J For triploid organisms the number of mating types increases sharply to 22. The terms sib pair to mating ~ ~ and sib pair will be used analogously and matin& respectively, I 8 ..I I I I I I I 1_ I I TABLE 1.1 THE SEVEN MATING TYPES Mating type Name (Yasuda) Frequency of mating I AiA i x A.A. ]. ]. Incross 4 Pi II x A,A, A.A. ]. ]. Outcross 2 2 2P P i j III A,A, x A.A. ]. ]. Backcross 3 4P P i j IV A,A. ]. ]. x Aj~ 3-Way Outcross 2 4P i Pj Pk Intercross 2 2 4P i Pj J J ]. J V AiA x A,A, j ]. J VI AiAj x Ai~ 3-Way Intercross 2 8PiPjPk VII A,A, x AkA l ]. J 4-Way Intercross 8PiPjPkPl 1.4 Partitioning the Genetic Component of Variance Consider a quantitative trait that is influenced by both environment and alleles at one or more loci. The genetic component 2 g of variance for that trait, a , can be partitioned as indicated below (Li, 1955) where: 0 2 a is the sum over all loci of the additive genetic vari- ance for each individual locus. It is usually the major component of genetic variance, and it is often assumed that 0 2 a = 0 g2 . 2 ad is the sum of dominance variances for each locus and may be thought of as intra-locus interaction. For example, consider a I 9 ..I I I I I I I single locus with two alleles A and a. If the effects are additive, then the heterozygote will have a trait value exactly midway between the two homozygote values. 0~ is a measure of the amount of depar- ture from this intra-locus additivity. 0: ~ is the variance due t6epistasis and can be thought of as inter-locus interaction. It represents the combined effects of variance due to interaction among additive and dominance deviations at two or more loci and can be written (Kempthorne, 1957) 2 0. ~ For example, 0 = o 2 +.....• aa (1.2) 2 represents the sum over sets of loci of the variad ance due to additive x dominance interaction; 0 2 aaa represents the variance due to additive x additive x additive interaction etc. 1_ For further discussion of epistasis see Cockerham (1954) and Kempthorne (1957). I I I I I 1.5 Heritability Although the concept of heritability was used prior to Lush's work (1945), his definitions form the basis for most considerations today. He defines heritability sense. Symbolically, 2 h (broad) 0 = 2 h = (narrow) -e 0 2 + g 2 g in both a broad and narrow 2 g 0 0 (194~) 0 2 e 2 a + 0 (1.3) 2 e (1.4) I I. I I I I I I I 1_ 10 Sometimes (1.3) is termed "the degree of genetic determination" and the term "heritability" used only for (1.4) (Falconer, 1960, p. 146). A number of authors have cautioned against making undue inferences 2 in the interpretation of h • Elston and Gottesman (1968) quote Fisher (1951) as remarking that heritability "has both a numerator and a denominator, and its value depends on both elements; whereas, however, the numerator has a simple genetic meaning, and if properly determined should be an accurate estimate of the genetic variance ..• the denominator is the total variance due to errors of measurement, in the strict sense, and, what in the wider sense are also errors of measurement, namely, those due to uncontrolled, but potentially controllable, environmental variation ••• Obvious1y, the information contained in the numerator is largely jettisoned when its actual value is forgotten, and it is only reported as a ratio to this hotch-potch of a denominator." For this and other reasons some authors feel that the primary concern of studies in this area should not be with heritability, but I I I with the genetic component of variance. That is, one should be con2 cerned primarily with techniques for estimating 0 , and testing whether g or not it is significantly different from zero. The present work will deal with both of these problems. 1.6 Underlying Model for Paired Observations Since most of the work in this area has made use of pairs of indi- vidua1s (generally twins or sibs), we will begin by introducing a model that is designed to handle paired observations. this general model will then be studied. Some special cases of I ..I I I I I I I 1_ I I I I I I I. I I 11 Suppose we have data for n pairs of individuals, in particular their observed values for a particular quantitative trait of interest, such as I.Q. Let x lj and x 2j individuals in the jth pair. due to three causes: We assume that the observed values are an overall mean, a genetic effect, and an environmental effect. = be the observed values for the two ]J The model may be written + glj + e lj j (1. 5) = 1,2, ... n We assume that the random variables g .. and e .. have means 1J zero and variances 0 2 g and 0 2 e respectively. 1J We make no distributional assumptions other than a particular structure for the means, variances and covariances of the random variables in the model. Since in most cases we would expect the environmental effects of individuals in the same pair to be related, we let = o ee' (1. 6) Later, when considering special cases, we will adopt the notation of Elston and Gottesman (1968), e.g., we will let C ' C nz and CFS MZ denote the environmental covariance for monozygotic twins, dizygotic twins and full sibs respectively. The genetic makeup of individuals in the same pair will certainly be related. Thus we let o gg' (1. 7) I le I I I I I I I I' I I I I I I ..I I 12 2 2 0gg' can generally be expressed as a function of 0a' ad' and 2 ai' the exact expression depending upon how the paired individuals are related. For example, it is well known (Lush, 1949) that under random mating, for the special cases of monozygotic twins, dizygotic twins, sibs and parent-offspring pairs Monozygotic twins: Dizygotic twins (and full sibs): Parent-offspring: a gg' = a gg' = ~O; a gg' = ~O; + f2(0~) (1.8) + ~O~ + fl(O~) (1. 9) (1.10) 2 2 Where fl(Oi) and f (Oi) refer to certain fractions of components of 2 2 0 (see Cockerham, 1954, for details). i The genetic effect for an individual may not be independent of his environmental effect. Thus, initially at least, we let Cov(g .. , e .. ) J.J J.J = a Cov(glj' e 2j ) = Cov(g2j' e lj ) i=1,2 ge (1.11) *ge a (1.12) More will be said about this problem later. Finally, we assume that individuals not in the same pair are independent with respect to genetic and environmental effects, that is, Cov(e .. , e.,.,) = Cov(g .. g.,.,) = Cov(gJ..J., eJ..'J.') = 0 J.J J. J J.J, J. J i 1,2 i' 1,2 j r j' (1.13) I ..I I I I I I I --I I I I I I I. I I 13 - We now consider two special cases: monozygotic (identical) twins and dizygotic (fraternal) twins. are genetically identical, glj Since monozygotic twins = g2j and from (1.5)-(1.8), (1.11) and (1. 12) Var(x .. ) 1J 2 ax(HZ) = ag2 + a e2 + 2a ge (1.14) i = 1,2 2 ag + 2a ge + CHZ (1. 15) From (1. 14) and (LIS) it follows that = 2 a MZ (1.16) = Dizygotic twins are genetically the same as full sibs. 2 2 2 Var(x .. ) = ax(DZ) = a + a + 2a g e ge 1J Cov(x lj , x 2j ) axx' (DZ) = ~a 2 + a ~a Hence i = 1,2 2 + f l (a i2) d (1.17) * + 2a ge + CDZ (1.18) From (1.17) and (1. 18) it follows that Var (x lj - x 2 ) = aDZ 2j 2 3 2 aa + ~d + 2a:1 - 2f l 2 2 2a e i + (a ) + 4a ge - 4a*ge - 2C Dz (1. 19) For full sibs (1.17)-(1.19) will hold with C replacing C ' DZ FS We have assumed that a~(MZ) = a;(DZ)' an intuitive assumption that should be true "under almost any circumstances" (Kanpthorne and Osborne, 1961, p. 329). In Chapter III a procedure will be given that provides an approximate test of the validity of this assumption. I ..I I I I I I I --I I I I I I I. I I 14 1.7 Underlying Assumptions Certain of the parameters in the model discussed in the previous section are often assumed to be zero. The assumptions most often made are given below. 2 Assumption I a.1. Assumption II ad 2 0 0 ge a* ge Assumption IV C MZ = Cnz Assumption V C MZ = Assumption III a 0 0 and/or nz = C 0 One may doubt the validity of these assumptions, yet some or all of them are commonly made by researchers, often without even mentioning this fact. In particular, Assumption V is often overlooked. In a number of instances authors have given results ostensibly depending upon Assumption IV, but in actuality depending upon Assumption V. For a further discussion of these assumptions see Price (1950), Harris (1965) and Ostlyngen (1949). I ..I I I I I I I ae I I I I I I I. I I CHAPTER 11- LITERATURE REVIEW In this chapter literature in two areas of genetic analysis is reviewed. First, published techniques for estimating herit- ability from twin and sib data are reviewed and discussed in terms of the model introduced in Section 1.6. Then, a survey is made of the literature dealing with the detection of linkage between marker genes and major trait genes froN sib data. 2.1 Heritability Studies 2.1.1 Early Heritability Measures. Although geneticists have long been interested in quantitative traits, it was not until Galton's work with twins (1875) that a methodology was developed to deal with the heritability of such traits. behind the "twin method" is this: The reasoning since monozygotic twins are genetically identical and dizygotic twins are genetically the same as full sibs, a trait that is primarily genetic results in twin pair differences that are smaller for monozygotic twins than for dizygotic twins. On the other hand, an environmentally de- termined trait produces twin pair differences that are approximately the same for both types of twins. The problem is to con- struct a measure that accurately reflects the relative effects of heredity and environment. The earliest heritability measures were based on the sample I ..I I I I I I I t' I I, I I I I I. I I 16 mean deviation (MD = Ilxij=x2jl/n). Let MD(MZ) and MD(DZ) denote the sample mean deviation for monozygotic and dizygotic twins respectively. Then the "difference method" of Lenz and von Verschuer (1928) led to the following statistic as a heritability measure: MD(DZ) - MD(MZ) (2.1) MD (DZ) This formula has a certain intuitive appeal. An h 2 of zero in- dicates that twin pair differences are virtually the same for both types of twins, implying the absence of a genetic effect. An h 2 of one implies that all monozygotic twins have the same trait value, thus indicating a strong genetic effect. Intermediate values reflect the relative "strength" of the two effects. The "quotient method" of Gottscha1dt (1939) led to the following measure: MD (DZ) = (2.2) MD(DZ) + MD(MZ) The formula proposed by Wilde (1941) may be written = -VMD~DZ) - MD~MZ) -VMD~DZ) - MD~MZ) (2.3) + MD (MZ) I ..I I I I I I I a- I I I I I I I. I I 17 These early heritability measures are little used today. The primary reason for this lack of acceptance is the fact that these measures do not lend themselves easily to statistical treatment, since they are based on the mean deviation rather than the standard deviation. 2.1.2 Intraclass Correlation ~~ Heritability Index. A number of recently proposed heritability coefficients have been based upon the intraclass correlation p and its sample estimate r. We begin by defining p using Table 2.1, the general ANOVA table for paired data derived by Kempthorne and Osborne (1961). In this table 0 where x lj and x 2 x 2j = Var(x loJ ) = Var(x 2oJ ) and 0 xx , = Cov(x 1j ,x 2j ), O~ and 0A2 are the usual are defined by (1.5). within and among group components of variance (Graybill, 1961). TABLE 2.1 GENERAL ANOVA TABLE FOR PAIRED DATA df MS Among pairs n-l M Within pairs n ~ Source EMS 2 2 2 0 x + 0 xx' = Ow + 20 A 2 2 Ow xx' x A ° -° The intraclass correlation is defined by °xx' 2 x p = 2 2 0A + Ow Note that since Var(x lj ) = var(x (2.4) ° 2j ) = the (population) correlation between x 2 Ox' p can be thought of as lj and x 2j • I ..I I I I I I I a- I I I I I I I. I I 18 Kempthorne and Osborne (1961) remark that P, estimated by (2.5) "has been called heritabili ty" by some authors (Hancock, 1952; Stormont, 1954), and "in some cases there are good reasons for using this word which is likely to imply 'the degree' to which a trait is inherited" (Kempthorne and Osborne, 1961, p. 324). From (1.14) and (1.15) the intraclass correlation for monozygotic twins may be written as 0 2 g + 2°ge + (2.6) + 2°ge which reduces to Lush's h 2 if there is no genotype(broad) environment covariance, and if C ' the environmental covariance MZ for monozygotic twins, is zero. If in addition Assumptions I and II of Section 1 • 7 hold , then PMZ 2 2 = h (broad) = h (narrml7) • Similarly, it can be shown that if Assumptions I-III are valid, 2 2 0 t h en 2 PDZ -- h (broad) -- h (narrow)' an d 1. f CDZ=' There is no reason why the paired observations need be tWins, or even sibs, as long as it is clearly understood what assumptions must be valid in order for the resulting heritability coefficient 2 to be Lush's h. For example, Kempthorne and Tandon (1953) esti- mate heritability from parent-offspring pairs. Table 2.2 gives the heritability coefficient for those pairs of individuals most I ..I I I I I I I a- I I I I I I 19 often used. In this table h 2 h 2 (broad) = h 2(narrow) except VJ h ere noted. TABLE 2.2 HERITABILITY COEFFICIENTS FOR SELECTED PAIRS Type of pair Heritability Coefficient Monozygotic Twins P.Mz h P.Mz = h Dizygotic Twins 2PDZ = h Full Sibs 2PFS = h Parent-Offspring 2PPO h Half-Sibs 4PHS h Uncle-Nephew 4PUN = h 2 (broad) 2 Assumptions required III, CMZ=O I, II, III, CMZ=O 2 I, II, III, CDZ=O 2 I, II, III, CFS=O 2 I, III, CpO=O 2 I, III, CHS=O 2 I, III, CUN=O From Table 2.2 we see that in all cases the environmental covariance between pair members must be zero in order for the 2 heritability coefficient to reduce to Lush's h . This will sel- dom be the case, however, and few studies to date have successfully handled this problem of correlated environmental effects. Elston and Gottesman (1968) overcome this difficulty to a certain extent by incorporating data on parents and non-twin sibs into the analysis. Using an Analysis of Variance approach, the authors derive unbiased estimators of a 2 g 2 and a under Assumptions I-IV, e or under a set of alternative assumptions (I, III, IV and CpO=C FS In the next chapter some alternative procedures that also attempt to handle this problem will be discussed. I. I I The ANOVA approach of Kempthorne and Osborne can be extended ). I ..I I I I I I I --I I I I I I Ie I I 20 to the case of k sibs per family (e.g., Fuller and Thompson, 1960). Table 2.3 gives the ANOVA table for this more general situation. For the special case of k=2, Table 2.3 reduces to Table 2.1. TABLE 2.3 ANOVA TABLE FOR n FAMILIES OF k SIBS EACH Source df MS Among families n-l M A Within families (k-l)n ~ EMS 0 2 2 2 + (k-l) xx' = 0 W + kOA x 2 2 0 = Ow x - 0 xx' ° The statistic corresponding to (2.5) may be written (2.7) and the appropriate heritability coefficient can be obtained from Table 2.2. However, the environmental correlation between family members still must be zero in order for the heritability coeffi2 cient to reduce to Lush's h • Another method that has been suggested is to construct a heritability measure based on the intraclass correlations for both monozygotic and dizygotic twins. The best known of these measures is due to Holzinger (1929) and may be written (2.8) where r MZ and r are sample correlations between xl]' and x ' for nZ 2J monozygotic and dizygotic twins respectively. Although Holzinger's I ..I I I I I I I --I I I I I I 21 measure involves only sample values, "Holzinger's Formula" has been referred to by many authors (e.g., Kempthorne and Osborne, 1961; Nichols, 1965; Harris, 1965) as involving population values and then written as = 2 where a MZ %Z - Pnz 1 - Pnz 2 and aDZ are the variances of twin pair differences as defined by (1.16) and (1.19). Holzinger did not intend for his statistic to measure heritability as defined by Lush, but it has often been used for this purpose. From (1.16) and (1.19) we see that if Assumptions I-V are valid, then (2.9) may be written a 2 g /g + 2 which is not Lush's h • 2/e I I (2.10) Assumption V is necessary even to obtain (2.10), a point apparently overlooked by some authors (Harris, 1965; Elston and Gottesman, 1968), who felt that Assumptions I-IV were sufficient for this result. Another heritability index based on P and P has been proMZ DZ posed by Nichols (1965) and can be written h2 = 2(PMZ - PDZ ) PMZ I. (2.9) which under Assumptions I-IV reduces to (2.11) I 22 ..I I I I I I I --I I I I I I I. I I (2.12) where C is the environmental covariance. If Assumption V is also 2 made, then C=O, and (2.12) reduces to h =1. 2.1.3 Testing the Significance of a~ EY an K Test. A number of studies (e.g., Clark, 1956; Vandenberg, 1962; Vandenberg et. a1., 1968; Block, 1968) use the F test proposed by Dahlberg (1926) to ' 'f'1cance test t h e s1gn1 0 f ag2 . It is assumed that the pair differences for monozygotic and dizygotic twins are normally and independently distributed, with means by (1.16) and (1.19). 2 ° and variances aMZ2 and aDZ' which are given I f Assumptions I-IV are valid, then for monozygotic and dizygotic twins respectively, X 1j -X 2j is distributed as follows: Dj(MZ) = x 1j - x 2j 'V N(0,2 (a Dj (DZ) = x1j - x 2j 'V N(0,2(a 2 - C)) e 2 2 (2.13) - C) + a ) g e Suppose there are data for N pairs of monozygotic twins and M N pairs of dizygotic twins. D From (2.13) we see tha if a~=o, then the statistic F = = ~(DZ) ~(MZ) (2.14) has a central F distribution with N and N degrees of freedom, M D where MW(DZ) and MW(MZ) are the within pair mean squares of Table I ~ I I I I I I I ~ I I I I I I ~ I I 23 2.1. Thus, a significantly large F indicates that 0 2 >0, when the g conditions given above are satisfied. Although it seems not to have been noted in the literature, it is possible to construct an F test in the more general situation in which the twins can be classified in some sense, such as birth order. If there is an order effect when such a classification is made, the means of the pair differences will not be zero and may not even be the same for the two groups. However, if in this more general case Assumptions I-IV are valid and the differences are normally distributed, i.e., Dj(MZ) ~ N(~MZ' 2(0~ Dj(DZ) ~ N(~DZ' then it can be shown that when - C» 2(0; - C) + 0 (2.15) O~) 2=0, the statistic g F* where S;z (2.16) and S~ are the sample variances of twin pair differences, has a central F distribution with (ND-l) and (NM-l) degrees of freedom. Thus, when these more general conditions are satisfied, a significantly large F* indicates the presence of a genetic effect. 2.1.4 Procedures with Identical Twins Reared Apart. A method often used in an effort to overcome the problem of correlated environmental effects is a procedure based on monozygotic twins reared apart (e.g., Newman et. al., 1937; Burks, 1942; Burt, 1966). I ..I I I I I I I 1_ I I I I I I Ie I I 24 It is hoped in such studies that the environments of the separated twins are not related, and hence the assumption of CMZ=O is not as unreasonable as would be the case if the twins were reared together. One objection to this method is that since separated twins must have shared the same pre-natal environment, and often also for a short while the same post-natal environment, one would not expect the environmental effects to be independent no matter how early in life the twins were separated. A second objection, a1- though a practical rather than a theoretical one, is that identical twins reared apart are difficult to obtain. Even those studies that do use identical twins reared apart do not use the heritability estimate suggested by Table 2.2. The conventional approach is to examine concurrently monozygotic and dizygotic twins reared together, and to use Ho1zinger!s statistic (2.8) based on these twins to estimate heritability (Newman et. al., 1937). Then, as "a logical extension of Holzinger's Formula," the following coefficient based on both sets of monozygotic twins is used as an index of "the percentage of phenotypic variation ascribable to environment" (Nee1 and Schull, 1954, p. 276): E = PMZT - PMZA 1 - PMZA = 2 2 O"MZT - O"MZA 2 O"MZA (2.17) where P and P are the intrac1ass correlations for monozygotic MZT MZA 2 2 twins reared together and apart respectively; O"MZT and O"MZA are the corresponding variances of twin pair differences as defined by (1.16). E may be written as I 25 ..I I I I I I I --I I I I I I Ie I I E C - C MZT MZA 0 (2.18) 2 - CMZA e where C and C are the environmental covariances for monoMZT MZA zygotic twins reared together and apart respectively. Note that this is not the environmental proportion of phenotypic variation as is claimed. For other criticisms of the methodology and sta- tistical techniques used in analyzing data from monozygotic twins reared apart, see Burks (1938) and McNemar (1938). 2.1.5 variance. Procedures that Allow for Genotype-Environment CoOne difficulty of all methods mentioned above is the necessity of Assumption 111- independence of genotypic and environmental effects. There are some cases in human genetics in which this assumption seems unwarranted, and it would be desirable to have a design and analysis that permits the estimation of this covarianee. One such design for animal genetics has been proposed by Le Roy (1960), but in human genetics, where one can not "design" an experiment or control environmental effects, this design is of little value. One way of avoiding the problem altogether is to treat it as one of semantics and define the environmental effect to be that effect "which affects the phenotype independently of genotype" (Roberts, 1967, p. 218). However, one should question the interpretability of an effect defined in this manner. One method that tries to allow for genotype-environment covariance is the Multiple Abstract Variance Analysis (MAVA) of I ..I I I I I I I --I I I I I I 26 Cattell (1960). Unfortunately, this method has problems, both practical and theoretical. Practically, the design requires data for such difficult to obtain individuals as monozygotic twins reared apart, half-sibs reared together (and apart), and half-sibs reared together by one true parent. A study necessitating data from such individuals would be a mammoth undertaking. Theoreti- cally, there is a problem in interpreting the "abstract variances" in terms of more familiar genetic parameters. A more serious problem, however, is that Loehlin (1965) has pointed out several serious errors in the MAVA equations that invalidate much of Cattell's results. An attempt was made to correct these mis- takes and reanalyze Cattell's published data, but "a number of the corrected variances were negative, and the effort was abandoned" (Loehlin, 1965, p. 161). Thus, the problem of allowing for genotype-environment covariance remains essentially unsolved. For further discussion of this problem see Falconer (1960), Cattell (1963) and Parsons (1967) • 2.2 Procedures for Detecting Linkage from Sib Data 2.2.1 Bernstein's Method and Fisher's U Scores. Bernstein (1931) was the first to point out that linkage can be detected and estimated from data involving information from only two generations. His method assigned each family a score, whose sum, expected value and variance provide a linkage test in any body of I. I I data that is sufficiently large for the distribution of the total I ..I I I I I I I --I I I I I I I. I I 27 score to be nearly normal. Bernstein's approach was further developed by Hogbcn (1934) and Haldane (1934). Fisher (1935), following the same general procedure adopted by Bernstein, devised a maximum likelihood scoring procedure that made earlier methods obsolete. Fisher's "D Scores" were found to be more efficient than Bernstein's scores for all linkage intensities, and also permitted easier combination of information from different sized families. Although Fisher's U Score method is still recommended by some authors (Bailey, 1961), these early methods have generally been replaced by the test procedures discussed in the following sections. 2.2.2 Penrose's Sib Pair Method. Penrose (1935) was the first to propose a method for detecting linkage that uses only sib pair data. His 1935 paper dealt with the special case of detecting linkage when each locus involves only two alleles; one dominant, the other recessive. Penrose (1938) later extended the sib pair method to the case of "graded human characters." This essentially involved relaxing the dominance assumption of the earlier paper and assuming that the trait value of the heterozygote was midway between that of the two homozygotes. The sib pair method was later made even more general (Penrose, 1950; 1953) to allow for multiple alleles. With minor modifications the sib pair method has been used by a number of authors in linkage studies (Kloepfer, 1946; Howells and Slowey, 1956; Lowry and Shultz, 1959). It has the advantages of arithmetic simplicity and serving as a linkage test when the par~nta1 genotypes are unknown, even when the genetic mechanisms of both traits are unknown. On the other hand, the method often requires a large I ..I I I I I I I 1_ I I I I I I 28 number of pairs in order to achieve significant results, and in certain situations (Finney, VI, 1942) it was found to extract only a small fraction of the information that could be obtained by Fisher's U Scores. 2.2.3 Morton's Sequential Test for Linkage. Morton (1955) de- rived a sequential probability ratio test for linkage when both parental genotypes are known and there are only two alleles at each locus. The test procedure is based on "lad scores," which for a par- ticular family is defined as Z = Log lO [ p(Flc,c') p(FI~,~) where p(Flc,c') denotes the probability of occurence of a family F when the recombination fraction is c in females and c' in males. In a later paper Morton (1956) used lod scores to obtain likelihood ratio tests of homogeneity and maximum likelihood estimation of linkage. Later the method was extended to multiple allele test loci (Steinberg and Morton, 1956) and multiple alleles at both loci (Morton, 1957). Morton's sequential test procedure has been found to be superior to Fisher's U Scores and Penrose's sib pair method in a number of situations (Morton, 1955). It has the advantage of allowing for both de- tection and estimation of linkage. On the other hand, it requires knowledge of parental genotype and is cumbersome numerically, requiring the calculation of lod scores. Maynard-Smith et. ale (1961) give tables of lod scores for the simpler mating types. I. I I 2.2.4 Other Methods for Detecting Linkage. techniques have been used to detect linkage. A number of other Haldane and Smith (1947) I ~ I I I I I I I ~ I I I I I I I. I I 29 devised a probability ratio rest that avoided some of the assumptions required by Fisher's U Scores. However, the method is conservative and a proposed modification (Smith, 1953) is less efficient (Morton, 1955). Brues (1950) , using a test statistic based on the square root of the average squared metric trait differences between sib pairs, detected linkage between body build and freckling. However, little use has been made of this method in recent studies. Recently Thoday (1967) suggested a new approach that may prove useful. Assuming that the metric trait has only two codominant al- leles in the population in equal frequency and assuming complete linkage between marker and metric trait gene and attainment of linkage equilibrium, Thoday's model leads to a higher variance within and a 10we~ variance among families for sibs heterozygous for the marker trait than for sibs homozygous for the same. Thoday's method has been used to isolate major trait genes in Drosophila and mice and seems well suited for adaptation to human genetics. However, the method as originally presented is quite re- strictive, and as yet no general treatment has been given. I ..I I I I I I I --I I I I I I I. I I CHAPTER III - DETECTING AND ESTIMATING 0 2 FROM TWIN DATA g In the previous chapter it was found that many heritability measures implicitly require the environmental covariances to be zero. Some procedures that avoid this difficulty, all based on twin data, are now described. 3.1 Estimation of O~ from the Analysis of Variance Tables In this section some procedures are derived for estimating 0 using the Analysis of Variance Table 2.1. based on four mean squares: 2 g The estimation procedure is the within and among mean squares for mono- zygotic and dizygotic twins. 3.1.1 Unweighted least squares estimation. Suppose we have data for N pairs of monozygous twins and N pairs of dizygous twins and perM D form two separate Analyses of Variance as indicated by Table 2.1. make Assumptions I-IV and wish to estimate O~' O~ We and C=CMZ=CDZ from the four independent mean squares MA(MZ), ~(MZ)' MA(DZ) and MW(DZ)' From (1.14)-(1.18) the expected mean squares are given by E AM = E(MA(MZ» 2 2 = 20 g + 0 e + C 2 ~ = E(~(MZ»)= 0e - C E = E(MA(DZ» AD 2 3 2 =-;:(5 + 0 + C 2 g e = E(~(DZ» 2 = ~02 + 0 - C g e E wn (3.1) I ..I I I I I I I 31 which in matrix notation may be written E(:!) = (3.2) Xl? where MA(MZ) ~(MZ) y X MA(DZ) 1 1 1 1 [L -~] .5 B = -1 [:!] ~(DZ) An intuitive procedure is to select estimators that give the best least squares fit to the four mean squares. The unweighted least squares estimators are in general (Graybill, 1961) ft I I I I I I I. I I (3.3) (3.4) and in this special case may be written -"2 0g = MA(MZ) - ~(MZ) - MA(DZ) + ~(DZ) (3.5) c = ~(-MA(MZ) + ~(DZ) + 2MA(DZ) - 2~(DZ» The estimators of (3.5) will be unbiased if Assumptions I-IV are valid. If these assumptions are not valid, however, (3.5) will probably tend to overestimate o2 and underestimate (52 and C. g (1.14)-(1.18), (3.5) and Table 2.1 e For example, from I 32 ..I I I I I I I Itt I I I I I I I. I I = = 2 2 1 2 1 2 f ( 2) 2 ( a g + age + CMZ - '20a - 'l;0d - 1 a i (3.6) a* ) = ge 222 ad and 0i -2f (Oi) must be nonnegative. Moreover, we would expect 1 * ICMZI>PDZI and 10gel>logel. Hence, if these covariances are positive, as . A2 2 one would often expect, a may overestimate O. Note, however, that if g g the quantitative trait is one in which the environmental covariances or A2 genotype-environment covariances are negative, a may actually underesg 2 timate 0 • g 2 E(a ) = e A E (C) It can also be shown that = 2 2 2 2 - 20 * ) - 2(C - a - (a. -2f (a.» - 2(0 - C ) ge d ge e 1. l 1. MZ DZ 2 2 2 - 2(0 -20 * ) (2C - C ) _ 0 (a.1.2 -2f 1 (0.» DZ MZ d ge ge 1. 2 a resulting in probable underestimates of 0 2 e (3.7) (3.8) and C if the covariances men- tioned above are positive. Bock and Vandenberg (1968) also use the four mean squares of (3.1) to estimate the genetic and environmental components of variance, but their model appears to be in error. written in the form E AM 2 2 = 0 1 + O2 E 2 = 01 E AD 2 2 2 = 0 1 + O2 + 0 3 WM wn = E 2 2 01 + 0 3 Their expected mean squares may be I ..I I I I I I I ft I I I I I I 33 EAM+~ ~ Note that EAD+E WD and EAM-EWM = EAD-E WD • 2 we see that this implies that 0x(MZ) typic covariance 0 xx ~ From Table 2.1 2 0x(DZ) and that the pheno- ,is the same for both types of twins. The authors do not justify these assumptions, and the estimators they obtain are of questionable value. 3.1.2 2 Weighted least squares estimation of 0 • - - g If the quanti- 2 tative trait under investigation is normally distributed, then 0 , g o2 and C can be estimated by a weighted least squares procedure. e Consider the general model given by (3.2) and suppose that Var(I)=V. Then the weighted least squares estimator of ] may be written (see e.g., Kendall and Stuart, 1967) (3.9) For the special case in which X, I and] are given by (3.3), if we make the additional assumption that x lj and x 2j are normally distributed, the four mean squares that are elements of Yare independent and each distributed proportionally to a chi square. is, for Y., the i th That element of I, 1. where E(Y.) is given by (3.1) and N. is the corresponding number 1. 1. of degrees of freedom from Table 2.1. It is well known that if a random variable z has a chi square distribution with N degrees of freedom, then Var(z) = 2N. Hence, from (3.1) and (3.10) 2[E(y.»)2 I. I I Var(Y.) 1. 1. i=1,2,3,4 (3.11) I ~ I I I I I I I P I I I I I I ~ I I 34 and the variance covariance matrix V is 2E 2 AM ~-l 2E 0 V= 0 0 0 2 WM 0 0 ~ 0 0 (3.12) 2 2~D N -1 D 0 2E 0 0 0 2 WiD ~ Since V involves the unknown parameters, an iterative procedure must be used in order to find the weighted least squares estimates given by (3.9). This procedure, which can easily be adapted for com- puter use, is as follows: choose an initial set of values for the three parameters (e.g., the unweighted least squares estimates). these values as the true values in V and calculate ~ by (3.9). Use Sub- stitute these new values back into V and calculate another new set of estimates. the final Continue this procedure until convergence is achieved, A ~ being the weighted least squares solution. A The variance-covariance matrix of ~ may be written (3.13) which can be estimated by using the final weighted least squares estimates in the calculation of V. The ratio of each parameter estimate to its estimated standard error can be used as an approximate test of the hypothesis that the parameter in question is zero. Finally, note that if Assumptions I-IV are valid, then I 35 ..I I I, I I I I --I I, I I I I I. I I (3.14) is also an unbiased estimator of from the least squares estimate. 0 2 g and should not differ greatly It will be shown in Section 3.3 that under certain conditions ~2 as defined by (3.14) is the maxig mum likelihood estimate of 0 2 g 3.2 Significance Tests Before testing for the significance of estimating 0 2 g 0 2 , and indeed even before g , a test should be made of the equality of the pheno- typic variances for monozygotic and dizygotic twins; a comparison of (1.14) and (1.17) shows that this has been implicitly assumed. An appropriate test statistic is seen from Table 2.1 to be F** = ~(DZ) + MA(DZ) ~(MZ) + MA(MZ) "2 °X(DZ) (3.15) ,,2 °X(MZ) This statistic follows an approximate F-distribution if the observations are normally distributed. The degrees of freedom for (3.15) are calculated by Satterthwaite's (1946) formula; for a linear function of mean squares ~a.MS., where MS. is the i i1. 1. 1. th mean square with f. degrees of freedom, the appropriate degrees of freedom is 1. f = (~a.MS.) 1. 1. 2 2 2 (3.16) ~ (a .MSil f. ) 1. 1. If the data are normally distributed, a significantly large or small value of (3.15) indicates one assumption of the model is violated. I ..I I I I I I I ae I I I I I I I. I I 36 Provided the phenotypic variances can be considered equal, and provided Assumptions III and IV hold, (2.14) can be used to test the hypothesis that 0 2 =0 against the alternative that g 0 2 >0. The corres- g ponding ratio of expected mean squares is E(~(DZ) ) (3.17) E(~(MZ) ) A number of approximate F tests can be used to test the sig2 nificance of o . g One such test suggested by (3.17) is "2 + 0"2 - C ko 2 g e A F* = ~MA(MZ) - ~(MZ) - 3MA(DZ) + 5MW(DZ) -MA(MZ) + 3MW(MZ) + MA(DZ) + MW(DZ) (3.18) ,,2 " 0 e - C ,,2 " where 0"2 , o and C are the unweighted least squares estimates given g by (3.5). e F* has the advantage of using more information than does (2.14), since it uses four rather than two mean squares. On the other hand, the test is approximate rather than exact, the degrees of freedom being calculated by Satterthwaite's formula (3.16). Similarly, an F test using the weighted least squares estimators can be constructed. The coefficient of each mean square in the nu- merator and denominator of F* depends upon the final least squares estimates. The resulting test will be approximate and the degrees of freedom are again calculated from (3.16). Another method of testing for the significance of 0 2 g is to com- pare an estimate of it directly with that estimate's standard error, assuming the ratio of these quantities to be normally distributed. 2 For example, if the unweighted least squares estimate of 0 g given by I Ie I I I I I I I --I I I I I I I. I I 37 (3.5) is used, then from (3.11) the variance of ~~ is Var (&~) = 2 E2 AM (3.19) [ N -1 M which would be estimated by substituting the actual mean squares for their expected values. This method of comparing an estimate of 02 g with its standard error is probably always more powerful than use of (2.14) or (3.18), especially if the weighted least squares estimate is used. In sections 3.3 and 3.4 other significance tests are dis- cussed. 3.3 Maximum Likelihood Estimation Although maximum likelihood (ML) techniques have been used to estimate the genetic component of variance in plant-breeding experiments (Hayman, 1960), little use has been made of this method in human genetics. The two primary reasons for this are (1) practical objections to the method as being computationally difficult and (2) theoretical objections to the assumption that the trait of interest is normally distributed. The first objection might have been valid a decade ago, but it is certainly not so today, given the availability of high speed computers. The second objection is more serious, but one could argue that empirically a large number of traits do approximately follow a normal distribution, and if there is any doubt, one can do a preliminary test for non-normality before subjecting the data to ML analysis. I ..I I I I I I 38 Suppose we have data for N pairs of monozygotic twins and N M D pairs of dizygotic twins. that x lj and x I I I I I I. I I follow a bivariate normal distribution, i.e., ~- [:~+ [~] N[ Note that it is assumed that E(X • V ] lj ) = E(X (3.20) 2j ), which implies that the twins are ordered at random, so that there is no order effect. The variance-covariance matrix V depends upon the type of twin pair, and from (1.14), (1.15), (1.17) and (1.18) we see that V MZ I --I 2j We make Assumptions I-IV and also assume 2 ag+a e2 a2+c a~+c 2 ag2+a e g (3.21) and V = DZ 2 ag+a e2 ~a ~a2+c g 2 2 ag+a e 2 g+c (3.22) The log likelihood (apart from a constant term) may be written Log L -~NMlogIVMZI - 1 ~ ~NDlogIVDzl - 1 ~ N L;M (x -l!.) 'V -1 (~-l!.) MZ j j=l N L;D j=l , -1 (~-ld) VDZ (~ -lJ) (3.23) Standard computer techniques can be used to find the ML estimates of ~, ag2 ,ae2 and C. The simplest procedure is to search the likeli- hood surface directly, as explained elsewhere (Elston and Kaplan, 1970). The ratio of each estimate to its standard error provides a test of the hypothesis that the parameter in question is zero. I ..I I I I I I I .I I I I I I I. I I 39 An alternative method of ML estimation that can be used is a procedure based on twin pair differences. There are both theoreti- cal and practical advantages to this method. Theoretically, it is less restrictive, requiring only that the twin pair differences be normally distributed. tionally. Practically, the method is simpler computa- However, it has the serious disadvantage that information is sacrificed in the estimation procedure; and this loss of information results in 0 2 e being confounded with C. If we make Assumptions I-IV and also assume that the twin differences are normally distributed, then we have for dizygotic twins: (3.24) for monozygotic twins: where from (1.16) and (1.19) we see that 2(0 2 - C) e 2 2 (0 - C) e (3.25) + rlg The log likelihood may be written Log L -~N 2 log(cr ) - }1 MZ (3.26) where ZMZ and Znz are the sums of the observed squared pair differences for monozygotic and dizygotic twins respectively. I Ie I I I I I I I --I I I I I I I. I I 40 Using a general result (proved e.g. by Lindgren (1960) pp. 224225), we can find the ML estimates of 2(0 2 - C) and 0 2 by finding e g those estimates of o~z and o~z that maximize (3.26) and then, using these estimates, solving (3.25) for 2(0; - C) and o~. That is, the ML solution is given by (3.27) and "2 and 0DZ "2 are the ML estimates of 0MZ 2 and 0DZ 2 respectively. where 0MZ It is well known (e.g., Kendall and Stuart, 1967) that the maximum likelihood estimate of the variance of a normally distributed random variable x with mean zero and variance 0 2 is n x? r i=l (3.28) 1 n From (3.26) we see that the log likelihood is simply the sum of the two log likelihoods when monozygotic and dizygotic twins are considered separately. Hence from (3.28) the ML estimates of O~ and 2 0DZ are (3.29) From (3.27) and (3.29) the ML estimator of 0 2 is thus given by g "2 "2 °DZ - °MZ = Zriz _ ZMZ N-D NM Hence, we have shown that &2 as defined by (3.14) is the ~~ estimator g of 0 2 if the only information available consists of the twin pair g differences. I ~ I I I I I I I ~ I I I I I I ~ I I 41 3.4. Nonparametric Test Procedures in Twin Studies If the primary consideration in a twin study is testing the hypothesis 0 2 = 0 against the alternative that 0 2 > 0, rather than estimating 0~ , a number of nonparametric techniques can be used. g g Few authors have suggested the use of nonparametric tests for analyzing twin data, which is surprising, since most nonparametric test procedures are not difficult, and they can be used in a variety of situations. They are particularly useful when the normality assumptions are known to be false, and hence the F test given by (2.14) is not appropriate. To see how these techniques might be used, suppose that the (absolute) twin pair difference for a particular quantitative trait is calculated for each of the NM + N twin pairs. n These differences are then ranked in order of magnitude from 1 to NM + N ' tied scores n being assigned the average of the tied ranks. Let ~z and R nz be the sum of the ranks for monozygotic and dizygotic twin pairs respectively. We make assumptions III and IV and first assume that 0~ = o. Then from (1.16) and (1.19) we see that the variance of twin pair differences is the same for the two groups, and hence the absolute pair differences have the same expectation. Alternatively, if there is a genetic effect, then the expected absolute pair difference is greater for dizygotic twins than for monozygotic twins. Thus, the test procedure given by Mann and Whitney (1947) can be used in this situation to test the hypothesis of no genetic effect. I I I I I I I 42 - I I --I I I I I I I. I I Their test statistic may be written u (3.30) Tables of the critical value of U are available (e.g., Siegel, 1956; Owen, 1962), and a significantly large U implies the existence of a genetic effect. A normal approximation often used when N and N M D are both large is obtained by calculating (3.31) In large samples Z has an approximate normal distribution with mean 0 and variance 1, and hence the critical value of U can be calculated from tables of the normal distribution. I ..I I I I I I I --I I I I I I CHAPTER IV - ESTIMATION OF g FROM SIB DATA WHEN THE PROPORTION OF GENES IDENTICAL BY DESCENT IS KNOWN In this chapter we derive procedures for estimating 02 g using TI, the proportion of genes two sibs (or dizygotic twins) have i.b.d., which will be assumed to be known. The reasoning behind this approach is similar to that of the twin method: if there is a large genetic effect, then those sibs that have a large proportion of their genes i. b.d. will be more "alike" with respect to the quantitative trait than will those sibs who have a smaller proportion of their genes i.b.d. Alternatively, if there is little or no genetic effect, the sib pair differences should be approximately the same, regardless of TI. 4.1. Genetic Variance at a Single Locus Consider the subpopulation of sib pairs that have exactly their genes i.b.d. over the entire genome (0 case of a two allele trait locus. ~ TI ~ TI of 1), and the special Denote the two alleles at this locus by A and a, and the corresponding gene frequencies by p and q = 1 - p. Without loss of generality we can let the genetic effect at this locus be given by a i f sib is AA g I. I I 02 = d if sib is Aa . -a i f sib is aa (4.1) I ..I I I I I I I --I I I I I I Ie I I 44 The genetic component of variance at this locus can be partitioned into additive and dominance components, i.e., (4.2) C.C. Li (1955) shows that for the special case of two alleles with genetic effects given by (4.1), 0: and o~ may be written 2pq(et-d(p-q» 2 (4.3) (4.4) When no dominance is present, the heterozygote has a genetic effect midway between the two homozygote that d=O in (4.1) and hence values. This implies O~=O. Thus, the genetic variance at this locus for the special case of no dominance reduces to (4.5) 4.2 The Sib Pair Probabilility Tables for Two Alleles We now derive the sib pair probability tables for the sub- population of sib pairs having TI of their genes i.b.d. at a twoallele locus. We begin by considering the general mating A A xA A • l 2 3 4 The probability of every sib pair that can result from this mating, conditional on TI, is given in Table 4.1 below. I 45 ..I I I I I I I --I I I I I I Ie I I TABLE 4.1 SIB PAIR PROBABILITIES FOR AN Al A x A A 3 4 2 MATING CONDITIONAL ON rr Sib I Al A 4 A A 2 3 !-;;rr(l-rr) !-;;rr(l-rr) A A !-;;rr (1-rr) 2 3 Sib II !-;;rr 2 !-;;(1-rr) 2 !-;;rr(l-rr) Al A4 !-;;TI (1-rr ) !-;;(l-rr)2 A A !-;;(l-rr) 2 !-;;rr(l-rr) 2 4 !-;;rr 2 !-;;rr(l-rr) !-;;rr(l-rr) !-;;rr 2 Table 4.1 can be thought of as a 4 x 4 matrix M with cell ij, where i ~ 1, 2, 3, 4 and j ~ 1, 2, 3, 4. Sibs falling into cells 11, 22, 33, or 44 have both genes i.b.d., while those in cells 14, 23, 32, or 41 have no genes i.b.d. Sib pairs in the remaining eight cells have one gene i.b.d. For example, consider cell 11. sib is Al A is!-;;. 3 The probability that the first Moreover, since the two sibs have rr of their genes i.b.d., the conditional probability that the second sib will 2 also be Al A3 is rr • Hence, Similarly, the remaining elements in Table 4.1 can be derived. Any particular mating and resulting sib pair can be handled as a special case of Table 4.1. In this chapter we are primarily concerned with the special case of two alleles, A and a. the mating AAx Aa (Al~A2~A3 and A4~a). For example, consider From Table 4.1 we see that I ..I I I I I I I 46 sib pairs from this mating that are AA-AA must fall into 21~ or 22. I. I I 12~ Hence~ Pr(both sibs AA-AAIAAxAa mating and rr)=2(~rr2)+2(~rr(1-rr))=Yzrr Simi1ar1y~ the remaining elements of Table 4.2 can be derived. TABLE 4.2 PROBABILITIES OF SIB PAIRS FOR A TWO-ALLELE LOCUS CONDITIONAL ON PARENTAL MATING AND rr Mating Probability of Mating Sib pair L* cells in M Lij AAxAA p4 AA-AA 1 all 16 p4 aaxaa q4 aa-aa 1 all 16 q4 AAxaa 2p 2q 2 Aa-Aa 1 all 16 2p 2q 2 AAxAa 4p 3 q AA-AA ~rr 11-12-21-22 2p 3 qrr AA-Aa 1-rr 13-14-23-24 31-32-41-42 4p3q(1-rr) Aa-Aa Yzrr 33-34-43-44 2p 3qrr Aa-Aa Yzrr 11-12-21-22 2pq3rr Aa-aa 1-rr 13-14-23-24 31-32-41-42 4pq3(1-rr) aa-aa ~rr 33-34-43-44 2pq3rr AA-AA ~rr2 11 p2 q2rr 2 aa-aa ~1T 44 p2 q 2rr 2 ~ I I I I I I ce11s~ 11~ Aaxaa AaxAa 4pq3 4p2q2 2 AA-aa Yz(l-rr) 2 AA-Aa rr(l-rr) 12-13-21-31 4p2q2rr(1-rr) Aa-aa rr(l-rr) 24-34-42-43 4p 2q 2rr (1-rr) 14-41 Aa-Aa Yz(1-2rr+21T 2) 22-23-32-33 2p 2q 2(1-rr)2 2p2q2(1-2rr+2rr2) I ~ I I I I I I I 47 In Table 4.2 L* is the probability of the sib pair conditional on the mating and TI; it is obtained by summing the probabilities of the indicated cells in Matrix M (Table 4.1). Lij I. I I 1J Pr(sib pair j and mating iln) (4.6) and is obtained by multiplying L* by the probability of mating. Note from Table 4.2 that for two alleles there are just six possible matings and sib pairs. The corresponding table for an m-allele locus will be given in Chapter V (Table 5.1). The sib pair probabilities conditional on TI alone can be obtained from Table 4.2 by summing Lij over all matings. Pr(sibs AA-AAITI) 6 =~ L .. ~~ = p4 p2(qTI+p)2 ~ I I I I I I = L .. is defined as For example, + 2p3qTI + p2q2TI 2 p2(TI+(l-TI)p)2 Similarly, the remaining elements of Table 4.3 can be derived. TI ~,Table If 4.3 reduces to the usual table of sib pair frequencies in a random mating population (e.g., Table 9 of C.C. Li, 1955). I 48 ..I I I I I I I --I I I I I I I. I I TABLE 4.3 PROBABILITIES OF SIB PAIRS FOR A TWO-ALLELE LOCUS CONDITIONAL ON TI Sib Pair Probability AA-AA 2 2 p (TI + (l-TI)p) aa-aa 2 q (TI + (1-TI)q)2 AA-aa 2 2 2 2p q (l-TI) AA-Aa 4p 2q (l-TI) (TI + (l-TI)p) Aa-aa 4pq 2 (l-TI) (TI + (l-TI) q) Aa-Aa 2 2pq(TI + 2(1-TI) pq) I ..I I I I I I I 49 4.3 Genetic Covariance at I. I I Single Locus for Sib Pairs Table 4.3 can be used to derive the genetic covariance at a single locus for sib pairs. Let gl and gzdenote the genetic effects at a locus for two sibs. Then we have o gg' = E(glgZ) - E(gl)E(gZ) Z Z Z Z Z Z Z Z = a [p (TI+(l-TI)p) + q (TI+(l-TI)q) - Zp q (l-TI) ] + Z Z ad[4p q(l-TI) (TI+(l-TI)p) - 4pq (l-TI)(TI+(l-TI)q)] + Z Z 2 2 Z d [2pq(TI+Z(1-TI) pq)] - [a(p -q )+Zpqd] = aZ[pZ(qTI+p)Z+qZ(q+PTI)Z _ 2p2q2(1_TI)Z _ (p_q)2] + ad[4pq(1-TI)(pTI+pZ(1-TI)-qTI-qZ(1-TI» - 4pq(p-q)] + 2 d 2 [2pq(TI + 2pq - 4TIpq + 2TI pq - 2pq)] --I I I I I I ~ = a2[TI2(p2q2+p2q2_2p2q2) + TI(2pq(p2+q 2+2pq» + (p2_ q 2)2] + a 2 (_(p_q)2) + 4pqTId[(1-TI) (TI(p-q) + (1_TI)(p2_ q 2»_(p_q)] + 2 2 2pqd [TI - 4TIpq + 2TI pq] = a2[TI(2pq) + (p_q)2(p+q)2 _ (p_q)2] + 4pqad[(1-TI) (TI(p-q) + (l-TI)(p-q» - (p-q)] + 2 2pqd TI[(p+q)2 - 4pq + 2pqTI] = 2pqa 2TI 2 + 4pqad[-TI(p-q)] + 2pqd TI[(p_q)2 + 2pqTI] = TI[2pq(a-d(p-q»2] + TI2(4p2q2d2)= TIO; + TI20~ Suppose that an individual's total (4.7) genetic effect for a parti- cular quantitative trait is due to the additive effect of n such loci. Because of linkage equilibrium (if we assume no epistasis) I ~ I I I I I I I ~ I I I I I I 50 ~.7) will still hold, where now instead of (4.3) and (4.4) we have (4.8) and n 222 i=l~ ~ i (4.9) 4 E p.q.d where Pi and qi = I-Pi are the gene frequencies of the two alleles at the i th locus and ai and d i are the genetic effects at this locus corresponding to a and d for the one allele locus. Note that are simply the sum over n loci of the additive and dominance variances respectively at each locus. Finally, the results of this section can be generalized to an m-allele locus. Fisher (1918), using a different approach, proved (4.7) for an m-allele locus in sib pairs for the special case of TI =~. His method can be used to obtain this result for the more general case of TI 4.4 ~ ~ as well. Estimation of Genetic Variance EY 4.4.1 Assuming no Dominance. Suppose we have n sib pairs and ob- serve trait values Xlj and x 2j Regression Analysis for individuals in the jth pair. We assume that TI., the proportion of genes i.b.d. for pair j, is known J and that CFS ' age and a~e do not depend upon TIj (note that we do not require that these parameters be zero). assume no dominance or epistasis. We also, initially at least, Thus, a2 g = 0 a2 • Consider the simple linear regression of the squared pair differences on the proportion of genes i.b.d., i.e., I. I I a; and a~ I ..I I I I I I I --I I I I I I I. I I 51 j=l, 2, .•.•.•. n (4.10) where Y j j. Let = (x lj -X )2 is the squared pair difference for sib pair 2j a and S be the usual unweighted least squares estimators of S respectively. a and A We now prove that -~S is an unbiased estimator 2 of 0 g' Proof: E(Y. ) J (4.11) From (4.10) and (4.11) it follows that S =-20' 2 (4.12) g From (4.12) we conclude that Sis an unbiased estimator of -20~, and hence -~S is an unbiased estimator of 0 2 . g This regression procedure may also be applied to twin data. We know that TI j assumed that TI j =1 for all monozygotic twin pairs, and if it is ~ for all dizygotic twin pairs, then it is not difficult to show that the regression estimator of 0 2 described g above (-~S) reduces to (3.14). The proof of this is given in Appendix I. Note that when twins are used, one assumption we require in A order for -~S to be an unbiased estimator of O~, is CMZ = Cnz ' 1. e., the environmental covariance must not depend upon the pro- portion of genes i.b.d. This is less likely to be valid for twins than for non-twins sibs, since monozygotic twins are generally treated more "alike" (dressed alike, etc.) than are dizygotic twins. Hence I ..I I I I I I I --I I I I I I 52 the environmental forces influencing a particular quantitative trait may likewise be more "alike" for monozygotic twins than for dizygotic twins. On the other hand, there is little evidence to suggest that the proportion of genes i.b.d. influences the environmental forces that affect non-twin sibs. 4.4.2 Allowing for Dominance. The regression analysis of the previous section can be modified slightly to allow for dominance. In this situation we assume the more general underlying model E(Y.) J = a + Sn. + yn.2 J j=1,2, J (4.13) n Using (4.7) we have where Y. J = E(X lj -X 2j ) E(Y.) J 2 = E(glj+elj-g2j-e2j) 2 20 2 = 20a2+20 d2+20 e2+40 ge -40*ge -2C FS -2TI j 02_2TI a j d (4.14) and from (4.13) and (4.14) we have (4.15) Y = -20 d2 A (4.16) A Denote by Sand y the least squares estimators of Band y reA spectively. -~y 4.5 From (4.13), (4.15) and (4.16) we see that are unbiased estimators of O~ and O~ Weighted Least Squares Estimation of -~S and respectively. O~ The analysis above does not require the assumption of normality I. I I 2 2 in order to obtain unbiased estimators of 0a and 0d' However, if we I ..I I I I I I I --I I I I I I 53 do assume that (4.17) then weighted least squares estimates of a, Sand y can be found by an iterative procedure similar to the one in Chapter III. 2 2 From (4.14) 2 2 E(Y.) = ( 202+4o -4o* -2C ) + 2(1-TI.)0 + 2(1-TI )0 d J e ge ge FS J a j OJ (FS) (4.18) From properties of the chi-square distribution we have (4.19) Hence, if TI j is known, weighted least squares estimators of a, S and yare given by (3.9) where Y l 2 1 TIl TIl Y 2 Y= X = Y n Since 1 TI 1 TI O'~(FS) 2 n TI 2 2 TI 2 n 4 20'1 (FS) 0 o......... 0 4 20'2 (FS) ..... 0 V 0 (4.20) . 4 0 .•...•. 20'n(FS) is a function of a, Sand y, the iterative pro- cedure described in Chapter III must be used in order to find the weighted least squares estimators. tVhen convergence is achieved, "'2 "'2 The estimated standard error of O'a and ad can be calculated by (3.13) where the variance-covariance matrix V is determined by the I. I I I ..I I I I I I I --I I I I I I I. I I 54 final weighted least squares solution. Then, for example, if there is no additive genetic variance, the ratio of &2 to its estimated standard a error has an approximate normal distribution with mean zero and variance one. Hence, this ratio can be used as as approximate test of the hypothesis that 0 a2 = 0 against the alternative that 0 a2 > O. 4.6 Maximum Likelihood Estimation -of J:'; 02 In this section we describe the maximum likelihood procedures for estimating the genetic variance, assuming no dominance. It will be apparent how the methods can be modified slightly to permit estimation of 0 2 and 0 2 in the more general case when dominance is present. a d These two procedures are similar to those described in Section 3.3 for twin pairs, the first being based upon the individual observations and the second upon the sib pair differences. If ML estimation is based upon the individual observations, then the model is given by (3.20), where V for the jth sib pair is given by * 0 2 + 20 e ge CFS+20ge+1TjOg2 CFS+20 * +?T.o 2 ge J g (4.21) The log likelihood for this case (apart from a constant term) may be written Log L = -~ n r j=l loglv.lJ n ~ 2.; (X.- 11)' j=l -J ~ ' v -1 (x. - 11) J -J ~ (4.22) I ..I I 55 If sib differences are used, then the model is given by (4.17) and using (4.18) the log likelihood may be written Log L = n -~ L: log[C *+2(1-TI.)0 2 ] J j=l g n - ~ L: j=l (xl' -x 2 ·) J I I I I ,I I I I I I I· -e I I ML (4.23) C*+2 (l-TI . )0 2 J I Note that the first method requires 2 ] g estimation of four parameters 2 (].l , C ' C and 0 ) while the method using twin differences requires l 2 g 2 (C* estimation of only two and 0). g ML estimates of these para- meters can be found by the procedures mentioned in Section 3.3. 4.7 Detecting 0 2 -g Ex Nonparametric Test Procedures 2 g If we wish only to detect rather than estimate 0 , several nonparametric test procedures can be used. Two statistics based on rank correlation, Spearman's Rho and Kendall's Tau, are particularly well suited for use with sibs when TI.J is known. . For a full dis- cuss ion of these statistics see Kendall (1955) or Siegel (1956). 4.7.1 Spearman's Rho. Suppose we know TI j for each of n pairs of sibs and for each pair we calculate the absolute pair difference IXlj-X2jl for a particular quantitative trait. If there is no genetic effect, then the absolute pair differences should be independent of TI .• J On the other hand, if there is a genetic effect, the absolute pair differences should be smaller for those sib pairs having a large proportion of genes i.b.d. than for those pairs having a small TI. value. J First the TI. and J Ixl J.-x2J. I We proceed as follows: are separately ranked in order of magnitude from 1 to n, tied scores being assigned the average of the I ..I I I I I I I 56 tied ranks. 1Xlj-X2j I Let Rand R.* be the rank given sib pair j for TI. and j J respectively. * J Then Spearman's Rho may be written 1 * nL:R j ERj L:R j Rj - (4.24 ) If no ties are present, r r can be written more simply as s 6E(Rj -Rj*) 2 I _ s (4.25) n3 - n A significantly large r s indicates a significant genetic effect. Tables of critical values of r for small values of n. s are available (e.g., Siegel, 1956) For large values of n the statistic J" (4.26) I, I I I I, I Ie I I has an approximate Student's t distribution with n-2 degrees of freedom and hence critical values of r s can be obtained directly from tables of the Student's t distribution. 4.7.2 Kendall's Tau. An alternative nonparametric test statistic that can be used to test a~ is Kendall's Tau. rank TI j and = 2 > 0 g the alternative that a We use the notation of the previous section and IXlj- X2j l Sij o against as is done above. * * (R.-R )(Ri-R.) * J* j (R.-R.) (R.-R * j*) J We define 1 i f (R. -R.) (Ri-R.) 1. J J > 0 0 if .. 0 1 if 1. 1. 1. < 0 (4.27) I Ie 57 and I I, I I I I re I I I I. I I (4.28) unique ways of selecting We also define and T*=~L;t * (t *-1) ~L;t(t-l) T (4.29) Where t and t* are the number of tied observations at a given rank for TI j and Ixlj -X ' respectively. 2j Then Kendall's Tau may be written rk = [~ ---.:S=-- _ k2 [~ n (n-l) - T] Tables of critical values of r k n (n-l) -T *] (4.30) k2 are available for small n (e.g., Siegel, 1956), and if n is large r t I I (~) Note that the summation is over the two of n sib pairs. e n-l n L; L; S·· i=l j=i+l 1.J S z = k [ 9n(n-l) 2(2n+5)j~ (4.31) is distributed approximately normally with mean zero and variance one and hence critical values of r normal distribution. k can be found from tables of the A significantly large r k indicates that cr~ > O. I ..I I CHAPTER V- :-IAXD1C·l LIKELIHOOD ESTI:-IXrrO:"; (11' Till: I'I{lll'tlPT!l):: OF GENES IDE:-.<nCAL BY DI:SU::,T I"~ SIB I':\IJ.'S I I I I I ae I I I I I I I. I I In Chapter IV a number of procedures Iwre described fl'r l'St i'J . mating o 2 an d tes tlng the hypothesis that c-=O g g ,H~i1inst 2 native that 0 >0 when 7T is knm"rn for all sib pairs. g till' :11 tl'r- IIl)l"rover, in general 7T is unknown and must be estimated for each sib pair. In this chapter we derive the maximum likelihood estimator of ;. I_'hen the estimation procedure is based upon k marker genes. Ideally, these markers should be mutually unlinked and sufficient in number to give good coverage of the entire genome. In practice the markers will not satisfy this condition exactly, although the\' may well do so approximately. Consequently, if the ani11vses of thp previous chapter are performed with 7T. replaced by its maximum likeJ '" lihood estimate 7T., then only that portion of the genetic variance J linked to the markers can be detected. However, it is not unrei1son- able to suppose that the number of markers will soon be sufficient to permit detection of virtually all of 0 2 g , as rapid progress is being made in this area (Renwick, 1969). 5.1 Case A: Both Parental Genotypes Known Table 5.1 below is a generalization of Table 4.2 and gives the probability of all possible matings and sib pairs for an rn-allcle lOCUS, conditional on 7T, when both parental genotypes are known. _··._1.. . ... .. .. 1e "I" ,.' "I""" "" '''','' ' .. -e-- .. ..... ... , .. .... ...... ..tr I ' '\ " '~- TABLE 5.1 (continued) Number Mating type Probability of mat.ing Ve Sib pair A.A.-A.A. J Vf J J 1 A.A.-A.A. 1 VIa L A.A. x 1 J A.~ 2 8PiPjPk J 1 J * Cells in M 1T(1-1T) 24-34-42-43 2 ~(1-21T+21T 2 ) 22-23-32-33 A.A.-A.A. ~1T A.A.-A.A. ~'lT(1-1T) 12-21 VIc A.A.-A.~ ~1T(1-1T) 13-31 VId A.A.-A.~ ~(1-1T) A.A.-A.A. ~1T VIf Ai~-Ai~ ~1T VIg Aj~-Aj~ ~1T VIh A.A.-A.~ ~(l-TI) A.A.-A.~ ~(l-TI) 24-42 Ai~-Aj~ ~(1-1T) 34-43 VIb 1.. 1 1 1 1 VIi 1 VIj 1 1 1 1 J 11.1 1 VIe 1 1 J J J J 1 J 1 J 11 2 2 14-41 22 2 33 2 44 2 23-32 L .. 1J 2 2 4p.p.1T(1-1T) 1 J 2 2 2 2p.p.(1-21T+21T ) 1 J 2 2 2p.P.P 1T 1 J k 2 4p.p.P 1T(1-1T) 1 J k 2 4p.P.P 1T(1-1T) 1 J k 2 2 4p.p.P (1-1T) 1 J k 2 2 2p.P.P 1T 1 J k 2 2 2p.P.Pk 1T 1 J 2 2 2PiPjPk1T 2 2 4p.p.P (1-TI) 1 J k 2 TI (1-1T) 4p.P.Pk 1 J 2 4p.p.P TI(1-TI) 1 J k 0' o - -e" .. .. .... .. .. .... .. .. .. .. .. .. .. ~ : TABLE 5.1 (continued) Number L* Mating type Probability of mating Sib pair AiAj x '\A1 8PiPjPkPl Ai,\-Ai ,\ ~TI Aj Ak-A j Ak ~TI VIlc A A -A A i l i l ~TI VIId Aj Al-A j Al ~TI VIle Ai ,\-A i Al ~TI(l-TI) VIlf Ai,\-A j ,\ ~(l-TI) 12-21 4PiPikP1TI(1-7T) VIIg Aj '\-Aj Al ~(l-TI) 24-42 4p.1 PJ.P k P1TI(l-7T) VIla VIlb Cells in M 2 2 2 2 L.. 1J ! 11 2PiPjF1kP1TI 22 2p.P.P k P1TI 33 2PiPikP1TI 44 2PiPk~jP1TI 2 2 1 J ; I 2 2 I 13-31 4p.P.ri k Pl 7T(1-TI) 1 J ! ! I Viih Ai A -A j A 1 l ~ VIIi Ai A1 -A j ,\ ~(l-TI) VIlj Ai ,\-A j Al ~(l-TI) (l-TI) 2 2 ! 34-43 4PiPjF1kP1TI(1-7T) 23-32 4PiPjPkP1(1-TI) 14-41 4PiPjPkP1(1-7T) ' 2 2 0' r- I Ie I I I I I I 62 We are also assuming in this and in the next section that the genotypes of both sibs are known. Suppose we wish to estimate - from k marker genes and both parental genotypes are known for each locus. From Table 5.1 L.. 1.J can be read for each locus and the overall likelihood calculated by taking the product of these k individual likelihoods. For example, suppose the estimation procedure is based upon two two-allele marker loci: A, with gene frequencies PA and Pa=l-PA; and B, with gene frequencies PB and Pb=l-PB' We observe a mating that is AABb x AaBb and a sib pair that is AABB-AaBB. Then from IIIb and Va of Table 5.1 we see that if the loci are unlinked, I ;e TI and are the likelihoods for loci A and B respectively. The overall likelihood may be written A and it is easy to show that TI = 2/3 is the ML estimate of n. Note that the gene frequencies need not be known in the above I I I '.I example in order to find the ML estimate of TI. It can easily be shown that this result holds for every Case A situation, i.e., the gene frequencies need not be known as long as both parental genotypes are known for all k markers. However, this result does not hold if one or both parental genotypes are unknown. I ..I I I I I I I S- a I I 63 5.2 Case B: One or Both Parental Genotypes Unknown We now derive the sib pair probabilities for an m-allele gene, conditional only on TI (the generalization of Table 4.3). The re- suIting table (Table 5.3) can be used to obtain the likelihood for a particular locus when both parental genotypes are unknown. We then give the corresponding table (Table 5.4) when one parental genotype is known. To obtain the sib pair probabilities when both parental genotypes are unknown, L .. from Table 5.1 is summed over all matings as ~J in Chapter IV. That is, for a particular sib pair we find from Table 5.1 all matings that could have given rise to this sib pair and sum the corresponding L .. values. ~J Suppose, for example, that there are four alleles AI' A2 , A 3 and A4 with corresponding frequencies PI' P2' P3 and p4=1-P2- P3- Pl at a single locus. We observe a sib pair that is A A -A A and l l l 2 do not know the parental genotypes. To find the probability of this pair, we let i, j, k and 1 each assume the values 1, 2, 3 and 4 in Table 5.1 and examine all matings to determine which ones WI could give rise to an A A -A A sib pair. l l l 2 I I - •- •-e We find that there are only four such matings, and the corresponding probabilities and L.. values are given in Table 5.2. ~J The first mating in this table is obtained from Table 5.1 by setting i=l and j=2 in IIIb; the second by setting i=l and j=2 in Vd. The last two matings have the same number but different alleles, i.e., the first of these is Vlb with i=l, j=2 and k=3; the second is Vlb with i=l, j=2 and k=4. I ..I J I I I I I a- I i •• I I I I. I I 64 TABLE 5.Z LIKELIHOODS FOR A SIB PAIR THAT IS AlAl-AlA Number Mating Probe of mating L .. Z 1J I lIb AlAI x AIA Z 3 4PlPZ 3 4PlPZ(1-TI) Vd AIA Z x AIA Z Z Z 4PlPZ Z Z 4PlPZTI(1-TI) Vlb AIA Z x Al A3 Z 8PlPZP3 Z 4P l PZP 3TI (1-TI) Vlb AIA Z x Al A4 Z 8PlPZP4 Z 4P l P ZP4TI (1-TI) Thus the probability of an AlAl-AlA Z sib pair, conditional on TI, is simply the sum of the four L.. values of Table 5.Z and 1J may be written L Z = 4(1-TI)PlP Z(P l + PZTI + P3TI + P4TI) = 4(1-TI)P lZPZ(P l + (l-Pl)TI) = 4P lZP Z(1-TI)(TI + Pl(l-TI» We have derived the probability of a type III sib pair, conditional on TI, when both parental genotypes are unknown. The probabilities of the six other sib pair types can likewise be derived from Table 5.1 and are given in Table 5.3. By summing the appropriate L .. values from Table 5.1, the sib pair probabili1J ties can also be obtained when one parental genotype is known. These probabilities are given in Table 5.4. I 65 ..I I I I I I I a- I' I I I I I I. I I TABLE 5.3 PROBABILITY OF SIB PAIRS FOR AN M-ALLELE GENE CONDITIONAL ON n Probability Sib pair type I AiAi-AiA i 2 p.21. [n+(l-n)p.] 1. II A.A.-A.A. 222 2p.p. (l-n) III A.A.-A.A. 2 4p.p.(1-n)[n+p.(1-n)] 1. 1. 1.1. J J 1.J IV A.A.-A.A k 1. 1. J V A.A.-A.A. 1.J 1.J 1. J 1. J 2 4p.P.Pk(1-n) 1. 1. 2 J 2 2 2PiP.[n +1T(1-n)(p.+p.)+2(1-n) p.p.] J 1. J VI AiAj-Ai~ 4PiPjPk(1-n) [n+2P i (1-n)] VII AiAj-AkA1 8PiPjPkPl (l-n) 2 +J • 11 ·'e...... .. .. .. .. .... .. .. ' ... -". TABLE 5.4 PRQBABILITY OF SIB PAIRS FOR AN M-ALLELE GENE CONDITIONAL ON n WHEN ONE PARENTAL GENOTYPE IS KNOWN Sib pair type I AiAi-AiAi II , A.A.-A.A. ~ ~ J J III A.A.-A.A. ~ ~ ~ J A.A.-A.A_ ~ ~ ~-K. IV ~ J ~ J VII ~ ~ J PiPk[Pk(l-n)+rr] A.A.-A.A_ ~ J ~--k 2PiPjPk(1-n) 2 2 2P i PkP1 (1-n) J J J ~ ~ 2p.p.P kn(1-n) ~ J 2 o J ~ 2 2P i Pk (l-n) ~ ~ 2 2p.p.(1-n)[p.+rr(1+p.-p.)] J 2 J 2 2 2 2p.p.(1-n) 3 2 J ~ 2p.3 p. (l-n) p.p. [po (l-n)+rr] ~ p.p.n[p.+(l-p.)n] o ~ A.A. 2 Pi [(l-n)Pi+rr] Aii\-AiAk AiAk-AiA1 Known parent: A.A. ~ ~ 3 A.A.-A.A~ ~ J--k V A.A.-A.A. VI Known parent: 2p.p.P k (1-n) ~ 2 J 222 2 P.P. [ 2p . p . (l-n) +(p .+p . )n (l-n )+(p .+p . )n ] ~J ~J ~ J ~ J PiPjPkn[Pk(l-n)+rr] 2p.p.Pk (1-n)[p.(1-n)+p.n] ~ J ~ J 2PiPj PkP 1n (l-n) Aii\-Aji\ o 2PiPjPk(1-n) [Pk(l-n)+rr] Aii\-AjA1 o 4PiPjPkPl(1-n) 2 (j\ (j\ I ~ I I I I I I I ~ I I I I I I ~ I I 67 5.3 Estimation When only the Sib Phenotypes a~ Known If the genotypes of both sibs are known for a particular locus, then Tables 5.1, 5.3 and 5.4 can be used to obtain the corresponding sib pair probability. Often however, only the sibs' phenotypes are known, as in the case when dominance is present. That is, in some cases there may be a number of genotypes in each sib's phenoset. In this more general situation the sib pair probability can still be obtained from Tables 5.1, 5.3 and 5.4 by summing the probabilities corresponding to all sib pair combinations that can be made by pairing an element in one phenoset with an element in the other. Thus, if there are m elements in the first sib's phenoset and n elements in that of the second, then the sib pair probability is calculated by summing mn individual probabilities, each of which can be read from Table 5.1, 5.3 or 5.4. For example, suppose that there are two alleles A and a with gene frequencies PA and Pa=l-PA respectively. A is dominant to a and we wish to calculate the probability of an A-a sib pair when both parental genotypes are unknown. There are two genotypes in the first sib's phenoset (AA and Aa) and the second sib's genotype is known (aa). Hence, the sib pair must be either AA-aa or Aa-aa and from Table 5.3 the probability of this sib pair is L 222 2 2PAP a (l-TI) + 4PAP a (l-TI)(TI+p a (l-TI)) 2 2PAP (l-TI){(l+p )+TI(l-p )} a a a I ..I I I I I I I 68 Clearly, a similar procedure can be used when only parental and sib phenotypes are known. gene above that both parents are A and both sibs are also A. I-.- •-I• . - Then the mating must be either AA x AA, AA x Aa or Aa x Aa, and the sibs must also be one of these three types. The nine matings and sib pairs that could account for this observed result are given below with the corresponding probabilities obtained from Table 5.1. --I I I I For example, suppose for the two-allele Mating Sib pair Probability AAx AA AA-AA 4 PA AAx AA AA-Aa 0 AAx AA Aa-Aa 0 AAx Aa AA-AA AAx Aa AA-Aa AAx Aa Aa-Aa Aa x Aa AA-AA Aa x Aa AA-Aa 2 2 2PAPa7T 2 2 4PAPa(1-7T) 2 2 2pAPa7T 222 PAPa 7T 2 2 4PAPa7T(1-7T) Aa x Aa Aa-Aa 2 2 2 2PAPa(1-27T+27T ) The likelihood of this event is simply the sum of these nine probabilities. L We find that I 69 ~ I I I I I I I ~ I I I... I I.... •-e The results of this chapter permit the calculation of the sib pair probability for any marker gene, regardless of the number of alleles or the number of genotypes in each sib's or parent's phenoset. Tables 5.1, 5.3 and 5.4 give the sib pair probability for the special cases in which both sibs' genotypes are known and 0, 1 or 2 parental genotypes are known. In all other cases the sib pair probability can be obtained from these three tables by summing the appropriate probabilities as indicated above. When the sib pair probabilities have been calculated for a number of marker gene loci for a particular sib pair, the overall sib pair likelihood is simply the product of these probabilities. Standard computer procedures can then be used to find the value of TI that makes the likelihood a maximum. TI can then be replaced A by its ML estimate TI in the analyses of Chapter IV, and these procedures can be used to detect the portion of genetic variance closely linked to these markers. I ..I I I I I I I ~ CliAPTER VI- DERIVATION OF THE CLASSIFICATION TABLES In the two previous chapters a number of techniques were presented for estimating oZ based on TI, the proportion of genes two g sibs have i.b.d. over the entire genome. In the next four chapters we derive techniques for detecting and estimating the contribution to the total genetic variance of a single major trait gene and the distance of this gene from a particular marker gene. The estimation procedures are based on TI. , the proportion of genes two sibs have Jm i.b.d. at a particular locus m. pair j. Thus, TI. Jm = 0, ~ or 1 for each sib In this chapter the Classification Tables (as defined in Chapter I) are derived. I I I I I I i. - I - ! 6.1 The Sixteen Classification Types In the population of all sib pairs a sib pair will have "on the average" half of their genes i. b. d. This follows from the assumption made earlier that the sibs come from a large random mating population. Since in general we will not know the value of TI. ,we now asJ sume that sib pair j is randomly selected from the population of all sib pairs. That is, we assume that the sixteen possible sib pairs that can result from a general AIA are equally likely. Z x A3A mating (see Table 4.1) 4 I ..I I I I I I - -- i 71 Under this assumption, if the sibs' genotypes are known, there are sixteen :'Classification types" or classes, which are listed for convenience in Table 6.1. In this table p.* (i=O,1,2) is the prob1 ability that there are exactly i genes i.b.d. at a particular locus, conditional upon the sibs' genotypes at that locus and also the parental genotypes if known. In a later section a general formula will be given for the calculation of these probabilities. 6.1 Pi and Pj are gene frequencies. In Table I 72 ..I I I I I I I ae I I I I I I I. I I TABLE 6.1 THE 16 CLASSIFICATION TYPES Classification type PI* Po* P* z (i) ~ !z ~ (ii) !z !z 0 (iii) 0 !z !z (iv) !z 0 !z (v) 1 0 a (vi) a 1 a (vii) a a 1 (viii) a Pi l+p. 1 l+p. ~ ~ (ix) l+p.+p. ZPiP j Z Z Pi + Pj J J ~ (x) l+p. Pi l+p.+p. (Pi+P j) (l+P i +pj) (xi) (xii) ~ !z/(l+p.) ~ k2 1 l+p. ZPiP j l+p.+p.+Zp.p. Pi+P j l+Pi+Pj+ZP i Pj Zp.~ 1 l+Zp.~ a ZP i 1 J a ~ ~ 1+2p. Pi (l+p. ) Z ~ J a Pi l+p.~ Z~ (xvi) l+p.+p. Pi+Pj !zPi/(l+P i ) ~ (xv) (Pi+p j ) (l+Pi+Pj) 1 Pj Pi Pi+P j (xiii) (xiv) 0 J ~ J (l+p. ) 2 ~ 1 l+p.+p.+Zp.p. J ~ (l+p. ) ~ ~ Z J I ..I I I I I I I 1_ 73 6.2 The Classification Table When the Genotypes are Known When sib and parental genotypes are both known, Table 5.1 enables us to derive Classification Table 6.2. cells 11, 22, 33 and 44 correspond to having two genes i. b. d. ; cells 14, 23, 32, and 41 correspond to no genes i.b.d.; the remaining cells correspond to one gene i.b.d. I. I I Moreover, each cell represen~s an equally likely outcome for a particular mating and sib pair. Thus, for an observed sib pair, p~ (i=O, 1, 2) can be obtained from ~ Table 5.1 by simply noting which cells correspond to this particular event. For example, consider an AA x AA mating. This mating is uninformative since only sibs that are also AA can result. This fact is reflected in I of Table 5.1 in which we see that all 16 (equally likely) cells could account for an AA-AA sib pair from an AAxAA mating. I I I I I I Recall that in Table 4.1 Hence * P2 p* 1 * PO 4/16 = ~ = 8/16 .. ~ ~ 4/16 and from Table 6.1 we see that this is a Class Consider a second example: sibs that are each AA. (i) pair. an AA x Aa mating resulting in two From IlIa of Table 5.1 the corresponding cells are 11, 12, 21 and 22 and hence * P2 = 2/4 = ~ PI* 2/4 = ~ * Po = 0 I Ie I I I I I I I --I I I I I I I. .. I - 74 From Table 6.1 we see that this is a Class (iii) pair. Thus, using Table 5.1 we can calculate p.1* for all matings and sib pairs when the genotypes are known. cation Table for this situation. Table 6.2 is the classifi- Note that this table is similar in form to Table 5.1, but certain sib pairs are grouped into a single entry. This is seen from the "sib pair type" column of Table 6.2 in which the numbers in parenthesis for certain entries refer to the number of sib pairs of the given sib pair type. The actual sib pairs are not given for these cases, but they can be read directly from Table 5.1. For example, from Table 5.1 we see that a type IV mating can produce two distinct type V sib pairs, namely, AiAj-AiAj and AiAk-AiAk. However, these two sib pairs have the same probability 2 (P1 Pj Pk) and are of the same Classification Type; hence they are presented as a single entry in Table 6.2. - --- - - - - - - .. - - - - - - -I" TABLE 6.2 CLASSIFICATION TABLE: Mating type Sib pair BOTH PARENTAL AND SIB GENOTYPES KNOWN Probability of mating and sib pair 4 PO* PI* P* 2 Classification type I: A.A.xA.A. 1 1 ].]. I: A.A.-A.A. ]. ]. ]. ]. Pi ~ !z ~ (i) II: A.A.xA.A. ]. ]. J J V: A.A.-A.A. ]. J ]. J 2 2 2p.p. ]. J ~ !z ~ (i) III: A.A.xA.A. 1 ]. ]. J I: A.A.-A.A. ]. ]. 1 ]. PiP j 0 !z !z (iii) III: A.A.-A.A. ]. ]. 1 J 2P P i j !z !z 0 (ii) V: A.A.-A.A. 1J 1J p.p. ]. J 0 !z !z (iii) 2 PiPjPk 0 !z !z (iii) 2 2p.P.Pk ]. J !z !z 0 (ii) 1 2 2 0 0 I (vii) 1 ~iPj 2 2 I 0 a (v) 2 2 PiP j 0 I 0 (vi) 2 2 PiP j k2 a !z (iv) IV: A.A.xA.~ ].]. J V: VI: V: A.A.xA.A. ]. J ]. J I: II: III: V: 3 3 3 (2) A.A.-A.~ ]. J ]. (2) A.A.-A.A. ]. 1 J J (2) A.A.-A.A. ]. J ]. J ~PiPj '-J V1 - - -- - - - - - - ... - - - - - - -. TABLE 6.2 (continued) VI: A.A.xA.~ 1. J * PI P* 2 2P i2Pj Pk 0 0 I (vii) (2) 2 P.P.Pk 1. J 0 I 0 (vi) IV: A.A.-A.Ak 2 PiPjPk I 0 0 (v) V: (3) 2P i2Pj Pk 0 0 I (vii) I: 1. A.A.-A.A. 1. 1. III: VII: AiAjxAkA I Classification type PO* Probability of mating and sib pair Sib pair Mating type 1. 1. 1. 1. k J k VI: A.A.-A.~ 2 PiPjPk I 0 0 (v) VI: A.A.-A.~ 2 PiPjPk 0 I 0 (vi) VI: Ai~-AjAk 2 PiPjP k 0 I 0 (vi) 1. 1. J J 1. J V: (4) ~PiPjPkPI 0 0 I (vii) VI: (4) PiPjPkP I 0 I 0 (vi) VII: (2) PiPjPkP I I 0 0 (v) "-J ""' I Ie I I I I I I I --I I I I I I I. I I 77 6.3 The Classification Tables When Some Genotypes are Unknown * An algorithm is now given that permits the calculation of PZ' Pl* and PO* when some parental or sib genotypes are unknown. The two situations most likely to arise that require this algorithm are cases involving genes in which (1) there is dominance, so that only parental and sib phenotypes are known; or (Z) data is collected only for sibs, so that even the parental phenotypes are unknown. The algorithm given below handles both of these situations. Let P and P lp Zp denote the phenosets for two parents for a particular locus, i.e., P lp is the set of all genotypes that could give rise to the phenotype of one parent, and P set for the other parent. Zp is the analogous If there is no information as to the parental phenotypes t then the phenosets will consist of all posLet P sible genotypes. p denote the set consisting of all possible ordered pairs of genotypes resulting when an element of P paired with an element of P Zp ' Thus t if there are n l lp is genotypes in Plp and n genotypes in P Zpt then there are nln elements of Z Z Pp ' SimilarlYt PIs and P and P P ls s Zs denote the phenosets for the two sibs t the set of all possible ordered pairs of genotypes from and P Zs ' Let X and Y be elements of P and P respectively. p s * PI* and Pz* for a particular locus m can be calculated as Then Pot follows: *= Pk L: XEP L: XEP p L: YEP Pr(X and Y and n. =~k) Jm s Pr(X and Y and n. =~h) Jm YEP h=Otlt Z p s L: L: k=Otlt Z (6.1) I ~ I I I I I I I ~ I I I I I I ~ I I 78 Each term in the summations in (6.1) above can be obtained from Table 6.2, since it is the product of one of the probabilities * in the third column and one of the corresponding Pk' It is necessary only to specify the genotypes that belong to PIp' P2p ' PIs and P2s ' or equivalently, the pairs of genotypes that belong to P The elements of P p P and P • s are the possible matings that could result in the observed sib pair; the elements of P that the observed sib pair could assume. s are the sib pair genotypes Thus, if the elements in these two sets are specified, then Table 6.2 can be used to find all probabilities in the summations of (6.1) and Pk* can be found by summing the appropriate probabilities. The calculation can easily be programmed in general for a computer. There are two special cases that are of interest and permit an easy algebraic solution. First, the sib pair genotypes may both be known, but no information available on the genotypes of the parents. Table 6.3 is the Classification Table for this special case. Table 6.4 gives the Classification Table for the special case in which both sibs' genotypes and one parental genotype are known, but no information is available as to the genotype of the other parent. Both tables were derived by repeated use of (6.1) using the information in Table 6.2. - .- - - - - - -. - •• - TABLE 6.3 CLASSIFICATION TABLE: Probability Sib pair type I: A.A.-A.A. 1 1 1 1 BOTH PARENTAL PHENOTYPES UNKNOWN 2( )2 ~Pi I+P i 1 PO* PI* 2 Pi 2p. (1+P.)2 A.A.-A.A. III: A.A.-A.A. 1 1 1 1 J 1 J J IV: A.A.-A.~ V: A.A.-A.A. VI: VII: 1 1 1 J J 1 A.A.-A.A 1 J 1 J k AiAj-~AI 2 2 "2PiPj 1 (l+p. ) 1 J 1 1 J P.P.Pk(I+2p.) 1 J 1 2PiPjPkPI 1J (xvi) 2 1 0 (v) Pi l+p. I l+p.1 0 (xiii) I 0 0 (v) 2 PiPjPk 1J (l+p. ) 0 1 ~p.p.(I+p.+p.+2p.p.) 2 1 I 2 (l+p.) p.p. Classification type I 1 1 II: P* 2 2Pi Pj (l+p.+p.+2p.p.) J 1 1 2Pi 1+2p. 1 I J Pi+Pj (l+p .+p .+2p.p.) 1 J 1 J I (l+p .+p .+2p. p.) 1 J 1 (xiv) J I 1+2p.1 0 (xv) 0 0 (v) '-.I \D - --- - - - - -- .- - - - - - -tr TABLE 6.4 ONE PARENTAL GENOTYPE AND BOTH SIB GENOTYPES KNOWN CLASSIFICATION TABLE: Known parent A.A. 1 1 Sib pair type I: A.A.-A.A. 1 1 1 1 Probability * PO !-zp~1 (P.1 +1) Pi P*2 PI* 1 2(1+p.) (xii) ~ 2(I+p.) 1 III: A.A.-A.A. 1 1 1 J V: A.A.-A.A. 1 J 1 J 3 PiP j ~P~P. (p .+1) 1 J J 1 ~ ~ Pj ~ A.A. 1 J A.A.-A.A 1 J 1 k I: A.A.-A.A. 1 1 1 1 2 PiPkPj ~P~P. (p .+1) 1 J 1 1 (xii) 2(1+P.) J A.A.-A.A. 1 1 J J III: A.A.-A.A. 1 1 1 J III: A.A.-A.A 111 2 2 "2P i Pj 1 ~P~P. (l+p.+p.) 1 J 1 J 1 k 2 "2P i Pj Pk (ii) ~ 0 Pi 1 l+p. l+p. 1 0 0 (v) Pi l+p.+p. 1 J l+p. J l+p.+p. 1 J 0 (ix) 0 1 0 (vi) ~ 0 (viii) 1 II: (ii) 0 2(1+P.) J VI: Classifi cation type 1 co 0 e II I• • ! - - - -.- - - - - - -,r - 11- TABLE 6.4 (continued) Sib pair type Known parent A.A. ~ J IV: A.A.-A.~ 1.. 1. J Probability * PO * PI Pz* Z "2P'P'P 1.. J k 1 0 0 1 2 p.p. V: A.A.-A.A. 1.. J ~ J !t;p.p. (Pi+P.) +P. ) 1... J J (I+p.+p.) 1.. J (p+p) . . (~+Jp . ~ J 1. J Ai~-Ai~ !t;PiPjPk(Pk+1) A.A.-A.~ ~p . P . P VI: Ai~-AiAI 1 VI: Ai~-Aj~ ~iPjPk(Pk+1) VII: Ai~-AjAl Pi Pj PkPl V: VI: 1. J J 1. (p .+P . ) J k ~ J "2P i Pj PkP I 0 Pi p.+p. ~ J 0 2 2 Pi+Pj (p.+p.)(I+p.+p.) ~ J 1. J Pk l+P k Pj Pi+Pj 1 Classification type (v) 1 l+p.+p. 1. (x) J 1 l+P k (viii) 0 (xi) 0 (vi) Pk l+Pk 1 l+P k 0 (xiii) 1 0 0 (v) 00 I-' I ..I I I I I I I ae I I I I I I I. I I 82 Table 6. 5 gives the probability of each sib pair type conditional upon the number of genes i.b.d. at that locus and can easily be derived from Table 6.3. For example, suppose a pair of sibs have both genes i.b.d. at a locus and we wish to find the conditional probability that this sib pair is type I (A.A.-A.A.). 1 1 1 = ~, and from Table 6.3 we have We know that Pr(n. =1) Jm 2 (~p.(l+p.) Pr(type I sib pair and n. =1) Jm 1 1 2 )/(l+p.) 1 2 = ~P.21 Hence, n. =1) Pr(type I sib pair Jm Similarly, the remaining elements of Table 6.5 can be derived. TABLE 6.5 CONDITIONAL PROBABILITY OF SIB PAIR TYPES GIVEN 0, 1 OR 2 GENES I.B.D. Sib pair type Number of genes i.b.d. 012 4 I Pi 2 2 II 2P P i j III 4P P i j IV V 1 o o 3 o 2 4PiPjPk 2 2 4P i P j 2 o p.P.(p.+p.) 1 J 1 VI 8PiPjPk 2p.p,P VII 8PiPjPkPl o 1 J J k o I ..I I I I I I I a- I I I I I I I. I I 83 Alternatively, Table 6.5 can be derived by arguing as follows: when =0 the sibs are "unrelated" at that locus, and so Jm the distribution of sib pairs is simply the same as the distriTI. bution of matings in a random mating population given in Table 1.1. When TI. Jm =1 a particular sib pair can occur only if both sibs have the same genotype; and in that case the probability of the sib pair is simply the probability 'in the population of one of them. Finally, Table 6.3 gives the probability of each sib pair type and hence the sib pair probability conditional on TI. Jm =~ can be found by subtraction, Le., denoting sib pair type by "SPT," Pr(SPT I TI. TI. =O)-~Pr(SPT I TI. =1)] Jm Jm =~) Jm Consider, for example, an AA-AA sib pair. We have Pr(Sibs AA-AAITI. =~) = 2Pr(Sibs AA-AA) - ~Pr(Sibs AA-AAITI. =0) Jm Jm - ~Pr(Sibs AA-AAITI. =1) Jm = 2[p 2 (p+l) 2 /4] 4 2 - ~(p ) - ~(p ) (using Tables 1.1 and 6.3) Similarly, the remaining elements in Table 6.5 can be derived. I ..I I I I I I I a- CHAPTER VII- ESTIMATING THE PROPORTION OF GENES IDENTICAL BY DESCENT AT A SINGLE LOCUS IN SIB PAIRS In this chapter we present a method for estimating IT. , the Jm proportion of genes sib pair j has i.b.d. at a single locus m. problem is one of estimating a parameter that takes on a different (but known) value in each of three populations when it is not known for certain from which population the observation comes. We let p~. ~Jm I. I I be the probability that the jth sib pair should have i genes i.b.d. at locus m, conditional on I , the information m available on the sib pair and parental phenotypes at this locus. The estimator of IT. we shall use is Jm I I I I I I The A IT. Jm 7.1 * = 1~l· Jm * + P2jm (7.1) Properties of the Estimator There are several desirable properties that the estimator (7.1) possesses. Among them are Property I - IT. is the Bayes estimator of IT when the jm Jm squared error loss function is used, i.e., TI jm minimizes E(~jm - IT jm )2. Property II- TI. has the maximum possible correlation with Jm IT jm when IT jm is considered as a random variable taking on the values 0, ~ and 1. I ..I I I I I I 85 A Although TI. as defined by (7.1) is unbiased for the population Jm in which TI. Jm which TI. Jm ~ ~~ it is not unbiased for the two populations in = 0 and TI.Jm =1. t I I I I I I. I I an unbiased estimator would be unreasonable for these two populations as it would require estimates of TI jm outside the parameter range 0 to 1. For example~ in order for an estimator to be unbiased for the population in which TI. Jm it must assume negative values for certain sib pair types. = 0, Clearly~ such an estimator is unreasonable. We now prove Property I. Let f(TI. II ) denote the condition denJm m . sity of TI. given I m Jm Then * * Pljm * P2jm = POjm f(TI mIIm) = j I a- However~ = i f TI. = 0 i f TI. = ~ i f TI. = 1 Jm Jm Jm (7.2) If we use the squared error loss function, then the Bayes estimator will be the value of TI S = l: TI. Jm jm that minimizes (~jm 2 * ~ 2 * 2 * (TIjm-O) POjm + (TI jm-2) Pljm + (TIjm-l) P2jm A A A In order to minimize S, we set the first derivative equal to zero, obtaining dS A dTI. Jm I ..I' ,I I I I I 86 which implies that 7T 'I I I I I I I. I I =1<p * +p * 2 Ijm 2jm Note that the second derivative of S with respect to 7T jm is 2, A which, being greater than zero, implies that 7T. as defined by (7.1) Jm is indeed a solution that minimizes S. We next prove Property II. pairs 7T. takes on the values 0, Jm ~ and Thus, Property I is proved. Since in the population of sib ~ and 1 with probabilities ~, ~ respectively, we see that E (7T. Jm ) = Var(7T. ) = Jm I ae jm ~ (~)(~) (7.3) + (~)(~) (7.4) 1/8 Note also that Im [~Pr (7T . =~ I )+Pr (7T. =11 I )] = ~-P'4 L: I Jm Jm (7.5) m m A For each distinct 1m there will be a corresponding 7T jm • We th denote by I km the k distinct 1m and denote the corresponding ;jm A by 7T jmk . We define = Pr(I km and 7T. =~a) Jm a=0,1,2 (7.6) and (7.7) Note that I 87 ..I I I I I I I a- I I I I I I I. I I Pr(TI. =1 and I ) km Jm + Pr(TI. =~ and I ) km Jm (7.8) Since Var(TI. ) Jm = 1/8, a constant, in order to maximize the correlation between TI. and n. , we must select n. to maximize Jm Jm Jm T Cov (TI. , TI. ) Jm Jm = 1 (Var (TI. » ~ Jm t njmk(~Fkl+Fk2) - ~~njmkFk [~ 2 2 A k TIjmkF k - ( k~ TI.JmkFk) ] Denote the numerator and denominator of (7.9) by respectively. (7.9) =.:.:.-_-------~:....-_- ~ W~ e and Then, taking the derivative of T with respect to A a particular TI. we have Jmr = dT k [w2(~F ~ A A l.TI.kFk)]/W rl+F r2 -~F) r - ~(e/w ) (2TI.Jmr Fr -2F rR Jm dff. Jmr Setting the first derivative equal to zero we find that (for w(~F rl +F r2 w~O) (7.10) -~F) r It has been found numerically in all cases so far that any values of TI jmr satisfying (7.10) above will give a maximum rather than a minimum or saddle point. satisfies (7.10). We now show that TI jmr as defined by (7.8) I 88 ..I I I I I I I a- I I I I I I I. I If ~. k is defined by (7.8), then Jm using (7.5) We also have Furthermore, = ~F rl + Fr2 -~F r Thus, we see that the right hand side of (7.10) may be written w(~Fr 1 + Fr 2 - ~F r ), and hence (7.10) is satisfied if ~.Jmk as . defined by (7.8) is used. 7.2 This proves Property II. Estimation for the 16 Classification Types Table 7.1 gives TI jm for the 16 Classification types of Table 6.1. I 89 ..I I I I I I I at I I I I I I. I I TABLE 7.1 n. FOR THE 16 CLASSIFICATION TYPES Jm Classification type Classification type A n. Jm A n. Jm XI I II 3+p. III 3/4 1. XII 4(1+p.) 1. IV o V ~/(l+p.) XIII 1. VI 2+p.+p. VII VIII 1 2(1+p.+p.+2p.p.) 1. l+p. J 1. XIV J 1. XV 1. l+p. IX J 2(1+p.+p.) 1. XVI J l/(l+p.) 1. [Pi (2+Pi) ]+[Pj (2+Pj)] X 2(p.+p.) (l+P'+Pj) 1. J 1. A For all 16 Classification types n. is calculated by (7.1). Jm Moreover, since it involves only P*l' and P*2' Jm Jm ,n.Jm can also easily be calculated when only the sib phenotypes are known by the simple algorithm described in the previous chapter. J I ..I I I I I I I a- I I I I I I CHAPTER VIII- DETECTING LINKAGE BETWEEN A TRAIT AND MARKER LOCUS In this chapter we derive a regression procedure for detecting linkage between an m-allele marker locus and a two-allele trait locus. We define the genotypic values at the trait locus as follows: = ex. =d i f sib is BB (8.1) i f sib is Bb =-ex. i f sib is bb where g .. is the genetic effect in the general model (1.5). 1.J Using Thus, 0 2 E: is a function of the environmental variance, the environmental covariance, and any order effect. 8.1 Conditional Expectation of the Squared Pair Differences Let Y. J j. = (x .. -x 2 .)2 be the squared pair difference for sib pair 1.J J Then, for fixed E., Y. can take on seven values depending upon J J the genotypes of the first and second sibs. These values, obtained from (1.5) and (8.1) are shown in the second column of Table 8.1. This table gives the distribution of Y. conditional on TI. , the proJ portion of genes i.b.d. at the trait locus. Jt The conditional prob- abilities in the last three columns of this table are read directly I. I I from Table 6.5. I 91 ..I TABLE 8.1 CONDITIONAL DISTRIBUTION OF Y. J conditional probability I I I I I I I_ 7T Y. Sib pair J =O 7T jt=k2 4 p 4 q 22 4p q pq 2pq 3 2p q 2 p q 0 2p 3q 2 p q 0 2pq 3 pq 2 0 2pq 3 pq 2 0 2 2 p q 0 0 p 2q 2 0 0 jt p BB-BB 2 E. bb-bb q J Bb-Bb 2 BB-Bb (a-d+E. ) Bb-BB (-a+d+E. ) Bb-bb (a+d+E. ) bb-Bb (-a-d+E. ) BB-bb (2a+E. ) bb-BB (-2a+E. ) J 2 J 2 J 2 J 2 J 2 J 7T 3 jt =1 P 3 q 2 2 We can use Table 8.1 to find the expected value of Y. conditional J I I I I I I I. I I We have (8.2) E(Y. J 17T't=~) J = E(E7(p3+q3+pq ))+E[(a-d+E.)2+(-a+d+E.)2]p2 q + J J J E[(a+d+E.)2+(-a-d+E.)2] pq 2 J J = 0 2 + (a2+d2)(2p2q+2pq2) + 4ad( pq 2_p2q) E 2 2 = 0 2 + 2pq(a +d -2ad(p-q)) E = 02 + 2pq(a-(p-q)d)2 + 2pqd 2 (1_(p_q)2) E = 0 = 0 2 + E 2 E 0 2 a 2 2 + 2pqd (4pq) + 0a + 20 2 d (8.3) I ..I I I I I I I I_ I I I I I I I. I I 92 and similarly it can be shown that In. =0) J Jt E(Y. It = 02 + 20 2 + 20 2 d E a (8.4) is clear from (8.2)-(8.4) that if there is no dominance (d = 0, or equivalently, E(Y . In. J Jt o~ = 0) we can write ) (8.5) This implies that if njt were known and we fitted the simple linear regression model E (Y • J then In.Jt ) (8.6) -~S would be an unbiased estimator of o~, where least squares estimator of S. Bis the usual This same result will hold asymptoti- cally even when dominance is present. It is shown in Appendix II that in this more general case (8.7) where n. (i=1,2,3) is the number of sib pairs in the sample that have 1. ~i genes i.b.d. at the trait gene locus. As the sample size increases n 2 and nO tend to equality, and so the term in cally. a~ vanishes asymptoti- I ..I I I I I I I 93 8.2 Deriving the Expected Value of the Regression Coefficient In the previous section we showed that if the proportion of genes i.b.d. at the trait locus, TI jt , is known for each sib pair, then the simple linear regression model given by (8.6) will result in 2 -~S being an unbiased estimate of 0 g2 when 0d=O. This estimate is also asymptotically unbiased even when dominance is present. In "- Chapter VII we derived an estimate TI jm , of the proportion of genes i.b.d. at a marker locus. In this section we investigate how the "- regression analysis of Section 8.1 is affected if we substitute TI. Jm for TI jt in the regression equation. We shall show that if there is no dominance, then I_ E (Y • J I;.Jm) = C/, + STI, , (8.8) Jm where I I I I I I I. I I (8.9) and c is the recombination fraction between the trait and marker loci. We shall also show that (8.8) holds approximately even when dominance is present. We assume linkage equilibrium between the trait and marker loci, so that (i) for fixed TIJ't' Y, and;. are independent; and (ii) for J Jm fixed TI jm , TI "- jt and TI jm = are independent. It follows that L L E(y.!TI.t)pr(TI.tITI, )Pr(TI, I;.) TI. TI. J J J Jm Jm Jm Jt Jm (8.10) I ..I I I I I I I 94 where the summations are over the three values that n I. I I and n jm can assume. E(Y In. ) is given by (8.Z)-(8.4) and Pr(n. I;. ) was defined j Jt Jm Jm * (i=O,l,Z). in Chapter VII to be Pijm bution of n jt and n jm We now derive the joint distri- • Consider a general mating that at two loci A and B is x Let c be the recombination fraction (assumed the same for both sexes) between these two loci. Then the gametic frequencies are: Parent I I_ I I I I I I jt Gamete frequency Parent II Gamete frequency AlB1 ~(l-c) AB 3 3 ~(l-c) AZB Z ~(l-c) A B 4 4 ~(l-c) AlB Z ~c A B 3 4 ~c AZB l ~c A B 4 3 ~c Suppose that two sibs result from the above mating and we wish to find Pr(n. =n. =1) in these sibs, where n. and n ' are the proJm Jt Jm Jt portion of genes these sibs have i.b.d. at the A and B loci respectively. This probability can be found by summing the squares of all 16 zygote frequencies formed when a gamete frequency from Parent I is multiplied by a gamete frequency from Parent II. For example, I ..I 95 probability is [~(1-c)]2[~(1-c)]2 = (1-c)4/ l6 . probabilities are calculated similarly, and so -I I I I I I I_ I I I I I I I. I I The 15 remaining Pr(n. =n. =1) Jm Jt where (8.11) By symmetry we have Pr(n. =n. =O)=Pr(n. =n. =1) = ~~2, which Jm Jt Jm Jt can also be established by summing the appropriate cross product frequencies. We now find Pr(n. =1 and n. =0). Note that n. =1 and n.t=O Jm Jt Jm J if, for example, the first sib is A B /A B , and the second sib is 1 l 3 3 A B /A B • 1 2 3 4 The probability of this sib pair is There are fifteen other sib pairs that could result in njm=l and njt=O, and all fifteen are found to have the same probability c 2 (1-c)2/ l6 . Hence, By symmetry we have Pr(n. =0 and n. =1) = Pr(n. =1 and n. =0) = ~(1_~)2 Jm Jt Jm Jt I 96 ..I I I I I I I The marginal distribution of TI jm (and TI ) is given by jt = \ i f TI. =0 Jm f(TI. ) Jm I. - I (8.12) = \ i f TI. =1 Jm Hence, the remaining probabilities in the joint distribution of TI. Jm and TI. can be obtained by subtraction. For example, Jt Pr(TI. =1 and TI't=~) Jm J Pr(TI. =1) Jm Pr(TI. =1 and TI. =0) Jm Jt - Pr(TI. =1 and TI. =1) Jm Jt \ _ \~2 _ \(1_~)2 = ~~(l-~) Similarly, the other probabilities in Table 8.2 can be obtained. I_ I I I I I I = ~ i f TI jm=1-.<2 TABLE 8.2 JOINT DISTRIBUTION OF TI. AND TI t j Jm TI. Jm TI j t Total 0 ~ 1 0 1 ~2 ~ ~~(l-W) \(l-~) ~ ~~(l-~) ~(1-2~+2~2) 1 \(l-~) 2 \~2 ~ \ We now can find E(y.I;.). J Jm Tables 8.1 and 8.2 we have 2 ~~(l-~) ~~(l-~) \ Total \ ~ \ Using (8.10), (8.2)-(8.4) and I 97 ..I. 2 *" + Y(l-Y)p. + ~Lp* 'I; S 1Jm' OJffi [' i~ ( o 2 -I- 0 2 ,- /'U 2) t- 2~'ll-~')ro. I a E; I I I I I I ::: 0 2 £ == ( 2 2 . Jm . -]; 2 * (1·-2'i'+2~')p. 1Jm ] + * + + 2It'(I-'j')p, 2Jm] ok d * -P . ) lJITl 2Jm * +P * . a 2+ 2 Og' [2 (1-2\f1) (%PI' z £: + 2jm * + (l-y) 2'P2jm * + + 20 [~ (l-Pl' -P , ) + r(I-~)PljITl g Jill 2Jm + a2 )[2~(I-r)(1-p * . 2 .g (0 . . Jm Jill ) * + + (1-2\f1+2,¥ 2 )P ljm * 2~(1-\fI)P2jm] .2* + 2'1'] + a2 (l-·2Y) P . d lJm (8.13) I_ I I I I I I 2r o L(l-~) p, E(Y,I;, ) J I JD! 2 When 0d=O, we see from (8.13) above that S in the regression model (8.8) may be written [3= 2(1-2'1')0 2 When ad that 7f, Jill ~ 2 2 -2(1-2c) a 2 == g g 0 this result will still hold approximately. can be written in the we would expect high values of "forill~, Jill "- ~, Jm = 1 * -PO,). Jill Jm * ~(1+P2' (8.14) Note Thus, to be associated with high values *Jm and low values of PO' *Jill , and vice versa. However, there is of P2' "no reason to associate large values of~, with either high or low Jm * • values of PI' Jm For this reason the bracketed expression in (8.13) will be approximately the same for all va~nes of A ~jm and hence (8.14) will hold approximatcly even when dominance is present. I. I I I ..I I I I 98 Thus the regression procedure described in the previous section can be used with 1_ I I I I I I I. I I replacing n. , and the hypothesis that S=O can Jt A be tested approximately by comparing the calculated S to its esti- A mated standard error: a significantly large and so linkage is present. lsi indicates that c#~, Note, however, that it is only possible to detect linkage, not to estimate c, using this regression procedure, 2 since c is confounded with 0 • g Finally, suppose there are K trait loci, each linked to the marker locus. I I n.Jm Then (8.14) will hold for each trait locus separately, and if the trait loci are mutually unlinked and there is no epistasis A E (S) where i th 0: is 1 K = -2.E (1-2c.) 1= l 1 2 2 (8.:\.5) 0. 1 the contribution to the total genetic variance of the . trait locus and c. is the recombination fraction between it and 1 the marker locus. The equality is exact if there is no dominance. An even stronger result holds if linkage equilibrium among the trait loci is assumed. At linkage equilibrium the genetic effects at two loci are independent, which implies that (8.15) will hold at equilibrium even if the trait loci are linked, as long as the effects at the different loci are additive (i.e., there is no epistasis). Thus a significantly large lsi indicates that there is a linkage relationship between the marker locus and one or more trait loci. 8.3 Detecting Linkage EY Nonparametric Methods If there is a major trait gene located near a marker, then there I ~ I I I I I I I ~ I I I I I I I. I I 99 should be a definite (inverse) association between the sib pair difference IX1j-X2jl and n jm , the proportion of genes i.b.d. at the marker locus. On the other hand, if there is no major trait gene near the marker, then Ix ,-x , I and n. should be independent. Hence 1 J 2J Jm standard rank correlation procedures, such as Spearman's Rho and Kendall's Tau, can be used as a test of such linkage. In the ana1y- sis n. is replaced by its estimate n, , as defined by (7.1). Jm Jm First, TI jm and IX1j-X2jl are separately ranked in order of magnitude, tied scores being assigned the average of the tied ranks. The rank correlations are then calculated by the formulas given in A significantly large correlation implies that there Section 4.7. is either a relatively large genetic effect at a moderate distance from the marker, or that there is a smaller genetic effect close to the marker. This test procedure is easy to apply, requiring only the ca1A cu1ation of n. and the sib pair differences. Jm Furthermore, it re- quires no distributional assumptions for the trait of interest. The primarily disadvantage is that, being a nonparametric test with n, Jm A estimated by n. , it is likely to require relatively large samples Jm in order to detect anything but fairly close linkage. I ~ I I I I I I I ~ I I I I I I CHAPTER IX- MAXIMUM LIKELIHOOD ESTIMATION OF LINKAGE One disadvantage of the methods discussed in the previous chapter is that 0 2 g is confounded with the recombination fraction c, and hence although linkage can be detected, it can not be estimated. In this chapter we show how maximum likelihood techniques can be used to overcome this difficulty. 9.1 Deriving the Likelihood Function We assume there is a two-allele trait locus, with genetic effects given by (8.1), located at a linkage distance c from a multi-allele marker locus. Jm and n. the proportion Jt of genes i.b.d. at marker and trait loci respectively for sib pair j. We assume linkage equilibrium for trait and marker loci and also assume that sib pair differences are normally distributed. More precisely, we assume that of seven normal distributions. X lj -X 2j is distributed as a mixture From the values of Y. given in J Table 8.1 we see immediately that if E(E.)=O the means of these J distributions are 0, a-d, a+d, -a-d, 2a and -2a, depending upon the sib pair genotypes. We shall assume E(Ej)=O, i.e., the data have been corrected for any sib order effect, and so the variance of each distribution is I. I I We denote by n. 0 2 E • Without loss of generality we can reduce the number of distri- I Ie I I I I I I I 101 butions from seven to four by considering only the absolute pair differences. This is reasonable, since the order of the sibs' scores is unimportant if we correct for age. consider only the absolute differences Dj J at the trait locus is given by f 1 = f(D.1 Sibs BB-BB, bb-bb) = 1. fI exp(-D 2 /2a 2 ) a E "" TI or Bb-Bb J j E' D.>O J- (9.1) = f (D. 1 J 1 Sibs BB-Bb) = or Bb-BB aE = f I. I I IX 1j -X 2j 1. The distribution of D., conditional on the sib pair genotypes 0 I_ I I I I I I = Thus, we henceforth 3 = f(D .1 J Sibs Bb-bb) = 1 or bb-Bb aE 0 otherwise 2 -(D .-a+d) J --;exp ( _ 0 4--=---- {2 V 2a~ 7f ), (9.2) otherwise ft - exp( -(Dj-a-d) 2 ) 2a 2 7f , D.>O J(9.3) E = 0 f 4 = f(D .1 J otherwise 1 ~. Sibs BB-bb) = or bb-BB aE - exp( -(D j -2a) 7f 2a2 2 ), E 0 D.>O J- (9.4) otherwise If we knew the sib pair genotypes at locus B for all sib pairs, the likelihood function could be easily constructed. Instead, we have information on the sib pair phenotypes at the marker locus and possibly, in addition, the phenotypes of one or both parents I ..I I I I I I I --I 102 at this locus. locus, I , can be used to obtain the likelihood function for an m observed sib pair. The likelihood function for sib pair j may be written I I = L = f (D. 1 I ) J m f (D. and I f(D. and I ) m J 'TTL: jt Pr(I ) J I'TT. )Pr('TT. ) Jt m Jt Pr(I ) ill m f(D.I'TTjt)pr(I 1'TT't)Pr('TT't) J m J J Pr(I ) (because of linkage equilibrium) m Pr(I = 'TTL: 't f(D,I'TT't) { J J L: L: = 1T 'TT jt jm J L: 1T.Jm _ _m and 'TT'tl'TT, )Pr(1T j m_ ) ....J_-"'-Jill _ _--..o_..... } Pr(I ) m f(Djl'TT, )Pr(I 11T, )Pr('TT, l'TT j )Pr(1T, ) Jt m Jm Jt m Jm Pr(I ) m (because of linkage equilibrium) I I ~ We now show how this information at the marker = Pr('TT't=~hl1T, =~k), apart from a factor of 2 or 4, is given in J Jm Table 8.2 and pr(1T m=~kIIm) can be obtained from the Classification j Tables in simple cases, or found numerically in more complex situa- I. I I tions by the use of (6.1). I ..I 103 We now find f(D.ITI. =~h). J Note that D., conditional on TI. , ]t J Jt is distributed as a mixture of the four distributions given by (9.1)-(9.4), Le., I I I I I I (9.6) where ~i = Pr(D j has density function fiITIjt=~h) (h=0,1,2 and i=1,2,3,4) The coefficients in Table 9.1. • ~i can be calculated from Table 8.1 and are given Thus, = Pr(sibs BB-BB, bb-bb or Bb-Bb TI.Jt =0) = P 4+ q 4+4P 2q 2 ae I I I I (9.7) . 3 Pr(sibs BB-Bb or Bb-BB TIjt=O) = 4p q etc. TABLE 9.1 VALUES OF THE COEFFICIENT ~i i 1 0 h 2 2 4 p 4+q +4p q 2 3 4 4p 3q 4pq 3 222 p q 1 l-2pq 2p 2q 2pq 2 0 2 1 0 0 0 Thus, f(D. ITI't=~h) can be obtained using (9.1)-(9.4), (9.6) J J and Table 9.1 and hence all elements in the likelihood (9.5) can be calculated. There are five parameters in the likelihood func- I ..I 104 2 c, p, 0 , a and d (if the gene frequencies at the marker tion: E locus are unknown, they too can be appropriately estimated). After ML estimates of these five parameters have been obtained, the esti- I I I I I I a- I mated additive and dominance variance can be calculated by substituting the parameter estimates for the true values in (4.3) and (4.4). 9.2 Obtaining the Maximum Likelihood Estimates Because of the complexity of the likelihood function, computer methods must be used in order to find the ML estimates. Note, however, that little information is needed beyond that already supplied for the computer calculation of ~'" . . Jm The only additional information required in order to evaluate the likelihood function are pr(IT pairs. j t I~. Jm ) and f(D. I~.t)' which are constant for all sib J J Thus, the only probabilities in the likelihood function that vary from sib pair to sib pair are those given by (6.1). Once the likelihood has been programmed, various methods are available for calculating the ML estimates; the simplest is to search the likelihood surface directly, as explained elsewhere (Elston and Kaplan, 1970). Finally, it might be noted that if we make the simplifying I I I I. I I· assumption that O~=O, then the number of distributions involved is reduced from four to three, and the number of parameters to be estimated is reduced from five to four. The ML procedure described above can be modified accordingly to permit estimation of c, p, and a. o~ I ..I I I I, I I I .. CHAPTER X- ESTIMATING LINKAGE BETWEEN MARKERS WHEN BOTH PARENTAL PHENOTYPES ARE UNKNOWN Although the problem of detecting linkage from sib data has been dealt with before (see Chapter II), most work in this area concerns ixself only with the detection rather than the estimation of linkage. A second shortcoming is that those studies that do estimate linkage are restricted to rather simple cases in which both parental genotypes are known. genotypes will not be known. Often, however, the parental Although it is more difficult to de- tect linkage in the absence of parental information, a general maximum likelihood estimation procedure for this purpose can be I I which can easily be adapted for computer use, can handle any pat- I IO.IDerivatiou'of the Likelihood Function I I I I. I I derived from the results of the previous chapters. tern of dominan~e This procedure, and any number of alleles. Suppose we have data for n pairs of sibs, all parental phenotypes are unknown, and we wish to estimate the linkage distance c between two loci A and B. We assume the sib pairs are independent and there is linkage equilibrium for both loci. Let TI jA and TI jB be the proportion of genes fob.d. for loci A and B respectively for sib pair j. Then for each locus the sibs are one of seven sib pair I Ie I I I I I I I a- I I I 'I 106 types with corresponding frequencies as indicated in Table 6.3. T and TjB jA de~ote the observed sib pair type for pair j for loci A and B respectively. The likelihood for this pair may be written Lj = pr(T jA and TjB ) = pr(TjAITjB)pr(TjB) 2 2 I pr(TjAITIjA=~h)pr(TIjA=~hITIjB=~k)pr(TIjB=~kITjB)pr(TjB) h=O k=O = I (using (9.5)) 2 2 I pr(TIjB=~h and TIjA=~k)pr(TjAITIjA=~k)pr(TjBITIjB=~h) k=O h=O (10.1) = I The first of these probabilities can be obtained from Table 8.2 and the other two from Table 6.5. For example, suppose that each gene has only two alleles and that sib pair j is AABB-AaBB. Let PA' l-PA' PB and l-PB be the gene frequencies for alleles A, a, B and b respectively. Then from (10.1) and Tables 8.2 and 6.5 the likelihood for this sib pair may be written 2 3 4 . 2 4 Lj = ~~ {4PA(1-PA)}PB + ~~(1-~){2PA(1-PA)}PB + 0 + 3 3 2 2 3 ~~(l-~){4PA(1-PA)}PB + ~(l-2~+2~ ){2PA(l-PA)}PB + 0 + 23222 ~(l-~) {4PA(l-PA)}PB +1~~(l-~){2PA(l-PA)}PB + 0 I I It is easy to program a computer to evaluate the likelihood I. I I Let I Ie I I I I I I , I_ I I I I I I I. I I 107 Simply store Tables 8.2 and 6.5 as matrices S (3x3) numerically. and T (7x3) respectively, specifying the gene frequencies if they are known. The likelihood for sib pair j is then simply (10.2) L. J where !lj and ~2j are 3xl vectors of T corresponding to the ob- served sib pair types for sib pair j for loci A and B respectively. If only the sib phenotypes are known, a simple modification can be made as follows: the observed pair. specify which sib pair types could account for Th~ L: i likelihood is then simply , L: t .. St ' k - 1J1. - 2J k (10.3) where for each locus the summation is over all sib pair types that could give rise to the observed pair. Finally, the overall likelihood L for n sib pairs is simply the product of the L , and computer techniques can be used to find j the ML estimate of the recombination fraction c. If the gene fre- quencies are unknown, they too can be estimated. 10.2 Example of the Estimation Procedure As a practical example of this procedure, blood grouping data for the 46 pairs of dizygotic twins of Gottesman's Harvard Twin Study (1966) were analyzed. The ML technique described above was used to estimate the linkage distance c between the ABO and Rhesus blood groups, the ABO and MNS blood groups and the Rhesus and MNS I ..I I I I I I I a- t I 108 blood groups. between any of these groups. In order to simplify the estimation procedure, the gene frequencies for the three blood groups were first estimated separately by ML procedures. The ML estimates were then used as the true gene frequencies in the ML estimation of c. To illustrate how the gene frequencies were estimated, consider a twin pair that for the ABO locus is A B-A . l 2 This implies that the pair must be either A B-A A or A B-A 0, and from Table 6.3 the l l 2 2 2 likelihood of this sib pair is Similarly, all sib pair likelihoods can be obtained and ML estimates of the gene frequencies for the three blood groups can be found. Table 10.1 gives the ML estimates and standard errors for the Harvard Twin Study blood group data. TABLE 10.1 ESTIMATED GENE FREQUENCIES FROM THE HARVARD TWIN STUDY BLOOD GROUP DATA I I I I I. I I Previous studies have shown no evidence for linkage ABO MNS Rhesus Al .1947 ± .0367 CDe .4293 ± .0453 MS .2724 ± .0395 A 2 .0990 ± .0284 cde .4044 ± .0450 Ms .2495 ± .0383 0 .6265 ± .0454 cDE .1361 ± .0306 NS .0268 ± .0153 B .0798 ± .0221 Cde .0076 ± .0075 Ns .4513 ± .0496 cdE .0076 ± .0075 CWDe .0150 ± .0106 I ~ I I I I I I I ~ I I I I I I ~ I I 109 The estimated gene frequencies for the ABO and Rhesus blood groups from Table 10.1 agree closely with estimates obtained for other Caucasian populations. Agreement is not as close for the MNS blood group, but the estimates are not radically different (e.g., Race and Sanger, 1968, give: MS-.2546; Ms-.3043; NS- .0607 and Ns-.3804). The gene frequency estimates of Table 10.1 were then taken as the true gene frequencies, and the ML estimate of linkage between the blood groups was calculated. These estimates and their standard errors are given in Table 10.2. TABLE 10.2 ML ESTIMATES OF LINKAGE BETWEEN BLOOD GROUPS USING THE HARVARD TWIN STUDY DATA Blood groups ML estimate of c estimated standard error ABO-Rhesus .5 .6846 MNS-Rhesus .5 .2109 ABO-MNS .5 .1907 Not suprising1y, we see from Table 10.2 that there is no evidence of linkage between any of the blood groups. Note that the estimated standard errors are fairly large, since the analysis was based on only 46 twin pairs. In order to determine the practical value of this ML procedure, further work is necessary. Monte Carlo studies are now in progress to determine how well this procedure detects linkage for various sample sizes and for various values of the recombination fraction I ~ I I I I I I I ~ I I I I I I ~ I I 110 between 0 and .5. The results of this study will be made known in a future communication. Preliminary evidence indicates that for no linkage the ML estimate of the recombination fraction is usually exactly .5, with a fairly large standard error; for loose linkage the estimate is still often .5, but the standard error is reduced. For tight linkage the estimate is often zero, and only for moderate linkage does the estimate fall within the interval 0 to .5. The tendency of the estimated recombination fraction to fall at the endpoints of the interval is reduced if the sample size is increased. Nevertheless, because of this tendency, further research may reveal that the procedure is best used only to detect linkage, rather than to estimate the recombination fraction. I ~ I I I I I I I ~ I I I I I I ~ I I CHAPTER XI- AN EXAMELE OF THE GENETIC ANALYSIS OF QUANTITATIVE TRAITS USING SIB PAIR DATA In order to determine the practical value of the test procedures described in the previous chapters, the data from Gottesman's (1966) Harvard Twin Study were analyzed. First, the data were subjected to the twin analyses of Chapter III; then, since dizygotic twins are genetically the same as full sibs, the sib analyses described in Chapters VIII and IX were performed using these twin pairs. 11.1 Data The final sample used in the Harvard Twin Study consisted of 147 pairs of same-sex twins taken from greater Boston area schools (grades 9-12). The breakdown by sex and zygosity is: 34 male mono- zygotes, 45 female monozygotes, 32 male dizygotes and 36 female dizygotes. All subjects were administered the Minnesota Multiphasic Personality Inventory (MMPI) and 63 subtest scores were recorded for each subject. The first column of Table 11.1 gives the 63 sub- test scores used. For an interpretation of the underlying factors being measured by these subtest scores, see Dahlstrom and Welsh (1960). For the Harvard Twins it was found that one variable (He) was essentially a dummy variable, since all 294 subjects had scores of 50 I ..I I I I I I I a- I I I I I I I. I I 112 on this variable. Blood grouping data was also collected for 40 of the 68 dizygotic twin pairs and 76 of the 79 monozygotic twin pairs. Recall from the previous chapter that gene frequency estimates for the blood groups were based on 46 dizygotic twin pairs. The slight discrepancy in sample size is due to the fact that six dizygotic twin pairs were not included in the final sample of 147 because one or both twins invalidated their MMPI scores as determined by the Lie scale. 11.2 Results of the Genetic Analysis First, Assumptions I-IV of Section 1.7 were made and unweighted 2 2 least squares estimates of 0g' 0e and the environmental covariance C were calculated by (3.5) for each variable. Then weighted least squares estimates of these three parameters were calculated by (3.9), using the iterative procedure described in Section 3.1.2. Table 11.1 gives the resulting weighted least squares estimates for the MMPI variables in order of estimated heritability. TABLE 11.1 WEIGHTED LEAST SQUARES ESTIMATES OF THE GENETIC PARAMETERS FOR THE MMPI VARIABLES Variable Lie MaS PaS D Ul Pt Rosen Sm Mal Sel A2 g ° 55.32 75.36 67.30 63.66 59.72 58.76 54.19 44.13 46.58 A2 e ° 8.89 31.55 31.50 38.89 36.74 39.14 37.70 32.10 37.52 A C -16.76 -34.64 -25.56 -26.94 -14.58 -29.59 -19.04 -13.81 -9.12 Estimated heritability .86 .70 .68 .62 .62 .60 .59 .58 .55 I ..I I I I I I I ae I I I I I I I'e I I 113 TABLE 11.1 (continued) Variable Si D Pt Pd Do PaO Welsh R Pa PdA Pd HyS Pa Hy MaO ScI Sc Sit Sc Co27 PdS Es K Lp D' PdB F Ds Ma Eo Rosen Cr HyO Nu Fm Welsh A Edwards So Taylor At Rosen Ar Rosen Dr Ma PdO N Hs Et Mf No Rosen pz SCI 2 N. 6g °e 39.80 58.08 53.68 55.88 91.68 72.03 49.65 47.39 55.21 34.81 46.06 43.29 33.94 69.39 37.35 54.01 42.63 32.98 44.16 31.65 28.60 29.25 40.87 38.81 33.35 31.13 43.39 24.31 36.91 25.88 27.74 24.79 37.25 27.11 24.76 27.81 14.78 21.10 20.60 15.35 13.82 7.78 12.47 11.86 6.78 5.43 4.29 32.88 48.22 46.85 50.90 83.92 65.93 45.62 46.30 55.82 36.16 48.73 47.31 37.16 80.76 43.91 68.46 56.25 45.15 65.54 47.27 45.96 48.30 72.99 70.10 61. 75 59.64 88.86 48.21 76.12 56.82 62.16 56.97 86.35 68.69 65.83 81.73 57.60 82.96 89.98 71. 76 81.23 54.50 90.26 91.60 80.96 68.66 68.15 C Estimated heritability -8.79 -17.30 -4.69 -3.22 -12.25 -26.28 -3.80 -30.15 -10.59 0.21 1.43 -27.51 -3.15 -6.39 -7.88 -11.05 -3.91 2.88 2.52 -1.00 5.51 7.72 7.78 7.88 5.89 3.73 4.00 5.61 20.66 7.48 21.02 14.96 11.03 22.01 18.96 21.68 21.11 22.38 18.24 13.84 30.27 17.55 41.37 36.20 6.14 21. 40 17.88 .55 .55 .53 .53 .52 .52 .52 .51 .50 • 49 .49 .48 .48 .46 .46 .44 .43 .42 .40 .40 .38 .38 .36 .36 .35 .34 .34 .34 .33 .31 .31 .30 .30 .28 .27 .25 .20 .20 .19 .18 .15 .12 .12 .11 .08 .07 .06 A I ..I I I I I I I 114 TABLE 11.1 (continued) Variable Dy Hs Pn CoB Dq Rosen Dr ,,2 ag -1.31 -3.22 -7.45 -6.48 -16.22 -24.36 ae ,,2 " C Estimated heritability 91.80 90.18 105.48 90.28 97.22 113 .03 38.88 40.51 44.50 36.76 53.19 38.76 -.01 -.04 -.08 -.08 -.20 -.27 Note from Table 11.1 that 25 of the 62 variables, including the 18 with the largest estimated heritability, have negative environmental covariance estimates. This association between large herit- ability estimates and negative covariances is not suprising, since in Section 3.1.1 we showed that invalidity of Assumptions I-IV will result in likely overestimates of 0'2 and underestimates of 0'2 and C. g e a- Thus a negative covariance estimate suggests either that dominance t I I I I I I. I I zero; or that the environmental covariance is not the same for mono- or epistasis is present; the genotype-environment covariance is not zygotic and dizygotic twins. Alternatively, a negative covariance may reflect a true state of nature. That is, the variable may just be one in which the environmental forces tend to produce dissimilar scores for members of the same twin pair. In order to determine how well the model fits the data, the observed mean squares for each variable were compared to those "expected," Le., those obtained by substituting the least squares estimates for the parameter values in the expected mean squares of (3.1). The criterion chosen to measure model fit was "SQD," the sum of squared pair differences between observed and expected mean I. ..I I I I I I I a- I I I I I I 115 squares. good. It was found that for a number of variables, the fit was Table 11.2 gives SQD for the 23 variables having the best model fit. Since large SQD scores indicate invalidity of the model, the genetic analysis for variables with high SQD scores are of questionable value. TABLE 11. 2 THE 23 MMPI VARIABLES WITH THE BEST MODEL FIT Variable sQn 0.071 0.084 0.105 1.369 2.279 2.438 2.685 3.028 5.886 6.994 13.175 19.220 19.387 23.464 25.319 28.425 31.417 41.644 43.354 46.988 47.021 53.470 53.509 Ma Edwards So Es Rosen Cr Fm Ma MaO C027 Taylor At N Ma' Nu Sc HyO Lie Pt No Pt Sit PaS Hy PdA Rosen Sm The hypothesis that 0 2 =0 was then tested by four different g procedures for each variable. The first test procedure used the ratio of the weighted least squares estimate of 0 2 g to its estimated standard error and will hereafter be called the "Ratio Z Test." I. I I I ..I I I I I I I a- I I I I I I I. I I 116 The second test was the exact F test given by (2.14). The third test procedure was the approximate F test (3.18), which utilizes more information than does the exact test. Finally, the Mann Whitney Test was performed, using the normal approximation (3.31). Table 11.3 gives the 25 variables found to have a significant genetic variance by at least one of the test procedures. Of these 25 variables, 11 were judged to have a sufficiently poor model fit (an SQD of 350 or more) to exclude them from further consideration. Of the remaining variables, only Lie and PaS (subtle paranoia) were found to have a significant genetic variance by all four tests. Note also that these two variables are among the 23 best in terms of model fit. There is also clear evidence of a genetic factor for the following variables: Pd (psychopathic deviate), Si (social introversion), MaS (subtle hypomania) and Pt (psychastheria). However, the genetic influences are most apparent for Lie and PaS, and these two variables are used later in the sib pair analyses. The Lie scale was first introduced into the MMPI as a basis for evaluating the general frankness with which the subjects were answering the test. It is also sensitive to the subject's tendency to cover up and deny undesirable personal faults (Dahlstrom and Welsh, 1960). The PaS subscale is due to Wiener (1948) and is de- signed to measure paranoia by "subtle" rather than "obvious" test items. It is noteworthy that paranoid schizophrenics made up a large proportion of the patient sample used in the derivation of this particular scale. Thus, we have evidence to support the hypo- thesis that heredity plays a major role in schizophrenia, a hypothesis I ..I I I I I I I 117 that has also been supported by results from a number of other studies in this area. major trait gene may be responsible for this particular variable. TABLE 11.3 MMPI VARIABLES WITH SIGNIFICANT GENETIC VARIANCE Variable I. I I Ratio Z test Lie MaS PaS Ul Pt Rosen Sm Pd Ma' Pt Si PdA Hy Es HyO D*** Sc1*** Welsh R*** Do*** HyS*** Pd*** D*** PaO*** Pa*** ScI*** Pa*** I_ I I I I I I Later we shall find evidence that a single 3.062** 2.204* 2.197* 2.104* 1.900* 1. 892* 1.881* 1.881* 1.838* 1. 834* 1.654* 1.641 1.392 1.288 1.949* 1.881* 1. 851* 1.802* 1.785* 1. 784 1.771* 1.647* 1.517 1.498 1.434 Exact F test Approximate F test 1. 993** 2.078** 1.569* 1. 590* 1.582* 1.521* 1. 478)'~ 1. 516* 1.481* 1.427 1. 780** 1.515* 1.453 1.475* 1.564* 1. 636* 1.432 1.506* 1.690* 1.479* 1.508* 1.350 1.369 1. 756** 1. 791** 1.245 1.659* 1.274 1. 770** 1. 798** 1. 700* 1. 693* 1.615* 1. 676* Mann Whitney Z statistic 1. 478)'~ 1.416 1. 421 1.354 1.337 1.484* 1.499* 1.502* 1.477* 1. 487* 1.484* 1.443 1.391 1.310 1.361 1.289 2.531** 0.798 1.948* 0.416 1.348 0.581 0.633 1.488 1.183 1.548 0.909 1.600 1. 750* 1.841* 0.853 1. 968* 1.193 1.841* 1.136 1.177 1.278 0.686 2.016* 1.280 1. 943* * Significant at .05 level ** Significant at .01 level *** These variables judged to have inadequate model fit Computer techniques were employed to obtain ML estimates of o 0 2 and C for the Lie and PaS variables using the log likelihood e (3.22). It was found that the ML estimates differed only slightly 2 , g I ..I I I I I I I ae I I I I I I I. I I 118 from the weighted least squares estimates of Table 11.1. Table 11.4 compares the results of these two methods of estimation. TABLE 11.4 COMPARISON OF ML fu~D WEIGHTED LEAST SQUARES ESTIMATION OF THE GENETIC PARAMETERS FOR Lie AND PaS VARIABLES Variable Method of estimation a"'2g a"'2e '" C Lie ML Weighted L.S. 55.286 55.317 8.314 8.886 -17.341 -16.764 PaS ML Weighted L.S. 66.820 67.303 32.200 31.500 -25.640 -25.561 Having established the strong effect of heredity on Lie and PaS, we next attempt to link these variables to the ABO, Rhesus and MNS blood groups. For this purpose the sib pair analyses of Chapters VIII and IX are employed on the 40 dizygotic twin pairs for which blood grouping data are available. First, the assumption of no association (and hence linkage equilibrium) between trait and marker locus was tested for each possible marker-trait pair (the markers being the three blood groups and the trait loci being the ones responsible for Lie and PaS). The assumption was tested by determining whether or not the phenotypes for a particular blood group differed significantly with respect to the variable of interest using all the Harvard Twin Study data. example, Table 11.5 gives the observed means for Lie and PaS for each ABO phenotype. For I 119 ..I I I I I I I --I I I I I I I. I I TABLE 11.5 OBSERVED MEANS FOR Lie AND PaS VARIABLES FOR THE ABO PHENOTYPES ABO phenotype sample size Lie PaS 0 102 46.843 53.922 Al A 2 A 70 47.171 55.829 26 49.308 52.308 26 48.846 56.000 5 50.600 52.000 3 49.333 57.333 AlB AB 2 An analysis of variance reveals that the phenotypes do not differ significantly for either variable. The F statistic (with 5 and 226 degrees of freedom) was calculated to be 0.759 for Lie and 0.791 for PaS. Hence, the assumption of no association seems to be a valid one here. A similar result was found for the Rhesus blood group, the F values being (9 and 222 degrees of freedom) 1.634 for Lie and 1.462 for PaS, both values nonsignificant at the .05 level. However, significant differences were found for the MNS system, "the F values being (7 and 224 degrees of freedom) 2.092 for Lie (significant at the .05 level) and 2.785 for PaS (significant at the .01 level). This implies that our model does not hold in this case, and hence the MNS system was excluded from further analyses with these two variables. However, this association implies that the 8 MNS phenotypes differ significantly with respect to the trait of interest, which, if it is not just a chance occurence, is an interesting result in itself; it would be worth the attempt to discover the cause of such a phenomenon. Table 11.6 gives the means for the 8 MNS phenotypes. I 120 ..I I I I I I I --I I I I I I TABLE 11.6 OBSERVED MEANS FOR Lie AND PaS VARIABLES FOR THE MNS PHENOTYPES MNS phenotype MSMS MSMs MSNS MsMs MsNs NSNs NsNs MNSs sample size Lie PaS 16 34 4 15 58 11 42 52 44.063 49.147 55.000 46.333 46.448 46.818 46.405 49.692 59.500 52.235 65.000 52.000 54.552 47.636 53.905 56.462 As a preliminary test for linkage, the simple nonparametric tests described in Chapter VIII were applied. First the absolute twin pair difference (D.=lx .-x .1) was calculated for both variables l J 2J J for all dizygotic twin pairs. A Then TI. Jm for the ABO and Rhesus blood groups was calculated for all dizygotic twin pairs by the procedures described earlier. The rank correlations were found and are given in Table 11. 7. TABLE 11.7 A RANK CORRELATIONS BETWEEN D. AND TI. J Variable Marker Lie Lie PaS PaS ABO Rhesus ABO Rhesus Jm Spearman's Rho .178 -.176 -.322 .039 Kendall's Tau .144 -.118 -.250 .032 Using the large sample approximations (4.26) and (4.31), it was found that both rank correlations between TI. Jm based on the ABO system and D. for the PaS variable are significant at the .05 level. I. I I J other correlations are not significantly less than zero. The Thus, there I ..I I I I I I I 1_ I I I I I I 121 is evidence that the ABO blood group may be linked to a major trait gene for PaS. To further investigate this possibility, the maximum likelihood procedures of Chapter IX were applied so that the recombination fraction c could actually be estimated. the f~ analysis are given in Table 11.8. TABLE 11.8 ML ESTIMATES FOR PaS AND ITS LINKAGE TO ABO Parameter HL estimate c -0.1836 0.1008 0.0 p 0.5925 0.0842 0.5840 ex. 17.2216 1. 6172 17.2720 15.0403 6.3718 14.0405 5.1298 1.1861 5.1633 0 2 E d Restricted estimates Standard error 2 aa 127.8626 130.8005 2 ad 6.1360 6.2981 · . . d b y su b I n T. a b1 e 11 . 8 0a2 an d ad2 are estlmate stltutlng parameter estimates for their true values in (4.3) and (4.4) respectively. Note that the ML estimate of c falls outside the parameter range, which is 0~c~.5. The last column of Table 11.8 gives the ML esti- mates when c is restricted to this range of values. Note that there is little change in the resulting parameter estimates. There are several procedures that can be used to evaluate the results of this analysis. I. I I The results of First, the ML estimate of c is not within two standard errors of .5, which is strong evidence that linkage is I ~ I I I I I I I ~ I I I I I I I. I I 122 present. The likelihood ratio test (comparing the ratio of the like- lihood when c=.5 to the likelihood when c=O) results in R L(c=.5) L(c=O) = .0299 which again implies that linkage may be present. Finally, if the Bayes procedure suggested by Smith (1959) is used, the a priori probability that c=.5 is reduced from 21/22=.9545 to .7509. The regression analysis of Chapter VIII was then performed, the squared pair differences being regressed on n . • Jm The estimated regression coefficient was found to be 6=-499.8, which from (8.14) implies that 6g2=249.9 if c=O. This estimate of genetic variance does not agree closely with that of the ML method. However, inspection of the data revealed one extreme observation [y.=(x .-x .)2=1296 and l J 2J J ;. =0], and since the regression procedure is very sensitive to such Jm observations, this particular observation was eliminated and a estimated again by both methods. ML estimate 2 g was The new estimates were found to be Regression estimate A2 aa = 104.428 A2 A2 ad = 4.778 ag = 103.600 Note that the resulting regression estimate of genetic variance is drastically reduced by elimination of one observation, while the ML estimate is only mildly affected. Elimination of this observation also reduced the nonparametric correlations (Spearman's Rho = -.266 and Kendall's Tau = -.204) to the point where they were barely sig- I ..I I I I I I I .-I I I I I I I. I 123 nificant at the .05 level. To summarize briefly: we have found that the underlying vari- abies being measured by the Lie and PaS scales of the significant genetic component. ~~~I have a There is also evidence that a major trait gene responsible for PaS may be linked to the ABO locus. I ~ I I I I I I I ~ I I I I I I I. I I CHAPTER XII- SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH 12.1 Summary In this dissertation a paired observations model is given for the genetic analysis of quantitative traits. The model for the special case of twin pairs is discussed in detail, a model that should hold in all cases where we can assume negligible biases in the sampling of the twins and negligible effects due to non-random mating. On the basis of this model we indicate what further assumptions are necessary in order to obtain unbiased estimates of genetic variance, environmental variance and environmental covariance. Methods are presented for estimating these parameters simultaneously from monozygotic and dizygotic twin data. Testing for the presence of genetic variance is considered, by both parametric and nonparametric means. We then consider the model for the special case of sib pairs having TI of their genes identical by descent over the entire genome. A regression procedure is described that permits unbiased estimation of genetic variance when TI is known. Maximum likelihood estimation of genetic variance is also discussed and nonparametric procedures for testimg for the presence of genetic variance are given. A method is then described for finding the maximum likelihood estimate of when its value is unknown. TI I ..I 125 We then deal with the problem of detecting linkage between a major quantitative trait locus and a marker locus from sib pair data. We allow for multiple allelism at the marker locus but, in view of I I I I I I the numerical difficulties that would be involved in practice, not at the trait locus. We also allow for the incorporation of data on the sibs' parents with regard to the marker locus. are discussed for detecting linkage, using the estimated proportion of genes two sibs have identical by descent at the marker locus. In addition, a maximum likelihood procedure is given that permits estimation of the recombination fraction between a trait and marker locus. We also give a simple maximum likelihood procedure for estimating the recombination fraction between two marker loci when both parental --I phenotypes at these loci are unknown. I I I I I notably Lie and PaS. I. I I Several methods Finally, Gottesman's (1966) Harvard Twin Study data are analyzed using these test procedures. The twin analyses give evidence that certain MMPI variables have significant genetic components, The sib pair analyses reveal that there may be a single locus, closely linked to the ABO blood group, that is responsible for a major part of the genetic variation on the PaS scale. 12.2 Suggestions for Further Research The assumption of random mating was an important one for all test procedures described in this dissertation. Since an important consequence of assortative (nonrandom) mating is linkage disequili- I ~ I I I I I I I 1_ I I I I I I I. I I 126 librium~ further research is needed on the problem of allowing for assortative mating and other situations in which linkage disequilibrium can occur. It may also be possible to overcome the numerical difficulties involved in generalizing the maximum likelihood analyses in this dissertation to an m-allele trait locus. It is believed that the methods presented in this dissertation are more powerful and more general than those proposed so far~ and it is hoped that these new test procedures will be used more extensively on further sets of data. I ..I I I I I I I I_ I I, I I I I I. I I APPENDIX I Suppose there are N pairs of monozygotic twins and N pairs M D of dizygotic twins. The regression model (4.10) can be written E(I) Xl. where! is an (NM+ND)xl vector of squared pair differences; X is a (N +N )x2 matrix whose first N rows are (1,1) and whose next N D M M D rows are (l,~); l. is a 2xl vector whose two elements are a and S. The normal equations may in general be written (e.g., Graybill, 1961) X'Xy = X'y which in the present case becomes 2NMMW(MZ) + 2ND~~(DZ) I 128 ..I Eliminating a we have I I I I which reduces to I I I_ I I I I I I I. I I A or as required. I ..I I I I I APPENDIX II Consider the simple linear regression model (8.6) in which we regress the squared pair differences Y. on TI. , which is assumed to J be known. n i --I I I I. I I ~i of their genes i.b.d. at the trait locus. Then in matrix notation we can write E(~) = Xl. where Y is an nxl column vector whose elements are Y.; X is an nx2 J matrix whose first n 2 rows are (1,1), whose next n l rows are and whose last nO rows are (1,0); of 1. i, = X'y so that X'XE(y) = X'E(:!) In this particular case, we have X'X (l,~), the least squares estimator can be obtained from the normal equations x'xy I I I Suppose that of the n sib pairs used in the analysis, (i=0,1,2) have I I Jt I ..I I I I I I I 130 Since E(y) is a 2xl vector whose two elements are E(&) and E(S), we can find E(S) by solving the following system of equations: A Eliminating E(a) we see that which reduces to --I I I I I I I. I I or I ..I I I I I I I --I I I I I I I. I I BIBLIOGRAPHY Bailey, N.T.J. Introduction to the Mathematical Theory of Genetic Linkage, Clarendon, Oxford,-r96l. Bernstein, F. "Zur Grundlegung der Chromosomentheorie der Vererbung beim Menschen mit besondere Beriicksichtung der Blutgruppen." Z. indukt. Abstamm. ~. VererbLehre, Vol. 57, 1931, pp. 113-138. Block, J.B. "Hereditary Components in the Performance of Twins on the WAIS." in Progress in Human Behavior Genetics, ed. by S.G. Vandenberg. John Hopkins Press, Baltimore, 1968, pp. 221-228. Bock, R.D. and S.G. Vandenberg. "Components of Heritable Variation in Mental Test Scores." in Progress in Human Behavior Genetics, ed. by S.G. Vandenberg. John Hopkins Press, Baltimore, 1968, pp. 233-260. Brues, A.M. "Linkage of Body Build with Sex, Eye Color and Freckling." Am. I. Hum. Genet., Vol. 2, 1950, pp. 215-239. Burks, B.S. "Review of Twins: a Study of Heredity and Environment." J. Abnorm. Soc. Psychol., Vol. 33, 1938, pp. 128-133. Burks, B.S. "A Study of Identical Twins Reared Apart under Differing Types of Family Relationships." in Studies in Personality, ed. by J.F. Dashiell. McGraw-Hill, New York, 1942, pp. 35-69. Burt, C. "The Genetic Determination of Differences in Intelligence: a Study of Monozygotic Twins Reared Together and Apart." Brit. I. Psychol., Vol. 57, pp. 137-153. Cattell, R.B. "The Multiple Abstract Variance Analysis Equations and Solutions for Nature-Nurture Research on Continuous Variables. 1I Psychol. Rev., Vol. 67, 1960, pp. 353-372. Cattell, R.B. "The Interaction of Hereditary and Environmental Influences." Brit. I. Stat. Psychol., Vol. 16, 1963, pp. 191-210. Clark, P.J. "The Heritability of Certain Anthropometric Characters as Ascertained from the Measurement of Twins." Am. I. Hum. Genet., Vol. 8, 1956, pp. 49-54. I ..I I I I I I I 1_ I I I I I I I. I I 132 Cockerham, C.C. "An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariance among Relatives when Epistasis is Present." Genetics, Vol. 39, 1954, pp. 859-882. Cotterman, C.W. A Calculus for Statistico-Genetics. Ph.D. thesis, Ohia State University, 1940. unpublished Cotterman, C.W. "Factor-union Phenotype Systems." in Computer Applications in Genetics, ed. by N.E. Morton. University of Hawaii Press, Honolulu, 1969, pp. 1-19. Dahlberg, G. Twin Births and Twins from Tidens, Stockholm, 1926. ~ Hereditary Point of View, Dalstrom, W.G. and G.S. Welsh. An MMPI Handbook, University of Minnesota Press, Minneapolis, 1960:-Elston, R.C. and 1. 1. Gottesman. "The Analysis of Quantitative Inheritance Simultaneously from Twin and Family Data." Am • .:!.. Hum. Genet., Vol. 20, 1968, pp. 512-521. Elston, R.C. and E.B. Kaplan. Paper in Preparation. 1970. Falconer, D.S. Introduction to Quantitative Genetics, Oliver and Boyd, Edinburgh, 1960. Finney, D.J. "The Detection of Linkage, VI." 1942, pp. 233-244. Ann. Eug., Vol. 11, Fisher, R.A. "Correlation Between Relatives on the Supposition of Mendelian Inheritance." Trans. Roy. Soc. Edinburgh, Vol. 52, 1918, pp. 399-433. Fisher, R.A. "The Detection of Linkage with Dominant Abnormalities." Ann. Eug., Vol. 6, 1935, pp. 187-201. Fisher, R.A. "Limits to Intensive Production in Animals." Agric. Bull., Vol. 4, 1951, pp. 217-218. Brit. Ford, E.B. "Polymorphism and Taxonomy." in The New Systematics, ed. by J.S. Huxley. Oxford University Press, London, 1940, pp. 493-513. Fuller, J.L. and W.R. Thompson. Sons, New York, 1960. Behavior Genetics, John Wiley and Galton, F. "The History of Twins as a Criterion of the Relative Powers of Nature and Nurture." Pop. Sci. Monthly, Vol. 8, 1875, pp. 345-357. I ..I I I I I I I 1_ I I I I I I I. I I 133 Gottesman, 1. 1. "Genetic Variance in Adaptive Personality Traits." J. Child Psychol. Psychiat., Vol. 7, 1966, pp. 199-208. Gottschaldt, K. "Phanogenetische Fragestellungen im Bereich der Erbpsychologie." Z. indukt. Abstamm. u. VererbLehre, Vol. 76, 1939, pp. 118-157. Graybill, F.A. An Introduction to Linear Statistical Models, McGrawHill, New York, 1961. Haldane, J.B.S. "Methods for the Detection of Autosomal Linkage inMan." Ann. Eug., Vol. 6,1934, pp. 26-65. Haldane, J.B.S. and C.A.B. Smith. "A New Estimate of the Linkage Between the Genes for Color Blindness and Haemophilia in Man." Ann. Eug., Vol. 14, 1947, pp. 10-31. Hancock, J. "Studies in Monozygotic Twins." and Tech., Vol. 34A, 1952, pp. 131-152. New Zealand J. Sci. Harris, D.L. "Biometrical Genetics in Man," in Methods and Goals in Human Behavior Genetics, ed. by S.G. Vandenberg. Academic, New York, 1965, pp. 81-94. Hayman, B.I. "Maximum Likelihood Estimation of the Genetic Components of Variance." Biometrics, Vol. 16, 1960, pp. 369-381. Hogben, L.T. Royal Soc. "The Dectection of Linkage in Human Families." Vol. 114, 1934, pp. 340-363. Proc. ~, Holzinger, K.J. "The Relative Effects of Nature and Nurture Influences on Twin Differences." J. Educ. Psychol., Vol. 20, 1929, pp. 241-248. Howells, W.W. and A.P. Slowey. "Linkage Studies in Morphological Traits." Am.~. Hum. Genet., Vol. 8, 1956, pp. 154-161. Kemp thorne , o. Population." "The Correlation Between Relatives in a Random Mating Proc. Royal Soc. ~, Vol. 143, 1954, pp. 103-113. Kemp thorne , O. An Introduction to Genetic Statistics, John Wiley and Sons, New York, 1957. Kempthorne, O. and R.H. Osborne. "The Interpretation of Twin Data." Am. ~. Hum. Genet., Vol. 13, 1961, pp. 320-339. Kempthorne, o. and O.B. Tandon. "The Estimation of Heritability by Regression of Offspring on Parent." Biometrics, Vol. 9, 1953, pp. 90-100. Kendall, M.G. 1955. Rank Correlation Methods, Charles Griffin, London, I ..I I I I I I I 1_ I I I I I I I. I I 134 Kendall, M.G. and A. Stuart. The Advanced Theory of Statistics, Vol. II, Hafner, New York, 1967. Kloepfer, H.W. "An Investigation of 171 Possible Linkage Relationships in Man." Ann. Eug., Vol. 13, 1946, pp. 35-71. Lenz, F. and O. von Verschuer. "Zur Bestimmung des Anteils von Erbanlage und Umwelt an der Variabilitat." Archiv fur Rassenund Gesellschaftsbiologie, Vol. 20, 1928, pp. 425-428. Le Roy, H.L. "The Interpretation of Calculated Heritability Coefficients." in Biometrical Genetics, ed. by O. Kempthorne. Pergamon Press, New York, pp. 107-116. Li, C.C. 1955. Population Genetics, University of Chicago Press, Chicago, Lindgren, B.W. Statistical Theory, MacMillan, New York, 1960. Loehlin, J.C. "Some Methodological Problems in Cattell's Multiple Abstract Variance Analysis." Psychol. Rev., Vol. 72,1965, pp. 156-161. Lowry, D.C. and LT. Shultz. "Testing Association of Metric Traits and Marker Genes." Ann. Hum. Genet., Vol. 23, 1959, pp. 83-90. Lush, J.L. Animal Breeding Plans, Iowa State University Press, Ames, Iowa, 1945. Lush, J.L. "Heritability of Quantitative Characters in Farm Animals." in Proceedings of the Eighth International Congress of Genetics, Stockholm, July 7-14, 1948, ed. by G. Bonnier and R. Larsson. Berlingska Boktryckeriet, Lund., 1949, pp. 356-375. McNemar, Q. "Special Review: Newman, Freeman and Holzinger's Twins." Psychol. Bull., Vol. 35, 1938, pp. 237-249. Malecot, G. 1948. Les Mathematiques de l'Heredite, Masson et Cie, Paris, Mann, H.B. and D.R. Whitney. "On a Test of Whether one of two Random Variables is Stochastically Larger than the Other." Annals of Mathematical Statistics, Vol. 18, 1947, pp. 50-60. Maynard-Smith, S., L.S. Penrose and C.A.B. Smith. Mathematical Tables for Research Workers in Human Genetics, J. and A. Churchill, London, 1961. Morton, N.E. "Sequential Test for Detection of Linkage." Hum Genet., Vol. 7, 1955, pp. 277-318. Am. J. I ..I I I I I I I 1_ I I I I I I I. I I 135 Morton, N.E. "The Detection and Estimation of Linkage Between the Genes for Elliptocytosis and the Rh Blood Type." Am. l. Hum. Genet., Vol. 8, 1956, pp. 80-96. Morton, N.E. "Further Scoring Types in Sequential Linkage Tests with a Critical Review of Autosomal and Partial Sex-Linkage in Man." Am. l. Hum. Genet., Vol. 9, 1957, pp. 55-75. Neel, J.V. and W.J. Schull. Press, Chicago, 1954. Human Heredity, University of Chicago Newman, H.H., F.N. Freeman and K.J. Holzinger. Twins: ! Study £f Heredity and Environment, University of Chicago Press, Chicago, 1937. Nichols, R.C. "The National Merit Twin Study." in Methods and Goals in Human Behavior Genetics, ed. by S.G. Vandenberg. Academic, New York, 1965, pp. 231-243. Noether, G.E. Elements of Nonparametric Statistics, John Wiley and Sons, New York, 1967. Ostlyngen, E. "Possibilities and Limitations of Twin Research as a Means of Solving Problems of Heredity and Environment." Acta Psychol., Vol. 6, 1949, pp. 59-90. Owen, D.B. 1962. Handbook of Statistical Tables, Pergamon Press, London, Parsons, P.A. 1967. The Genetic Analysis of Behavior, Methuen, London, Penrose, L.S. "The Detection of Autosomal Linkage in Data which Consists of Pairs of Brothers and Sisters of Unspecified Parentage." Ann. Eug., Vol. 6, 1935, pp. 133-138. Penrose, L.S. "Genetic Linkage in Graded Human Characters." Ann. Eug., Vol. 8, 1938, pp. 233-238. Penrose, L.S. "Data for the Study of Linkage in Man: Red Hair and the ABO Locus." Ann. Eug., Vol. 15, 1950, pp. 243-247. Penrose, L.S. "The General Purpose Sib-Pair Linkage Test." Eug., Vol. 18, 1953, pp. 120-124. Ann. Price, B. "Primary Biases in Twin Studies." Am. J. Hum. Genet., Vol. 2, 1950, pp. 293-352. Race, R.R. and R. Sanger. Philadelphia, 1968. Blood Groups in Man, F.A. Davis Company, I ..I I I I I I I 1_ I I I I I I I. • 136 Renwick, J.H. "Progress in Mapping Human Autosomes." Medical Bulletin, Vol. 25, 1969, pp. 65-73. British Roberts, R.C. "Some Concepts in Quantitative Genetics," in BehaviorGenetic Analysis, ed. by J. Hirsch. McGraw-Hill, New York, 1967, pp. 214-257. Satterthwaite, F.E. "An Approximate Distribution of Estimates of Variance Components." Biometrics Bulletin, Vol. 2, 1946, pp. 110-114. Siegel, S. Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, New York, 1956. Smith, C.A.B. "The Detection of Linkage in Human Genetics." Roy. Stat. Soc. ~, Vol. 15, 1953, pp. 153-192. J. Smith, C.A.B. "Some Comments on the Statistical Methods used in Linkage Investigations." Am. {. Hum. Genet., Vol. 11,1959, pp. 289-304. Steinberg, A.G. and N.E. Morton. "Sequential Test for Linkage Between Cystic Fibrosis of the Pancreas and the MNS Locus." Am. {. Hum. Genet., Vol. 8, 1956, pp. 177-189. Stormont, C. "Research with Cattle Twins." in Statistics and Mathematics in Biology, ed. by O. Kempthorne and others. Iowa State University Press, Ames, Iowa, 1954, pp. 407-418. Thoday, J.M. "New Insights into Continuous Variation." in ProCeedings of the Third International Congress of Human Genetics, ed. by J.F. Crow and J.V. Neel. John Hopkins Press, Baltimore, 1967, pp. 339-350. Vandenberg, S.G. "How 'Stable' are Heritability Estimates?" ~. Phys. Anthrop., Vol. 20, 1962, pp. 331-338. Amer. Vandenberg, S.G., R.E. Stafford and A.M. Brown. "The Louisville Twin Study." in Progress in Human Behavior Genetics, ed. by S.G. Vandenberg. John Hopkins Press, Baltimore, 1968, pp. 153-204. Wiener, D.N. "The Subtle-Obvious Factor in Vocational and Educational Success." American Psychologist, Vol. 3, 1948, p. 299. Wilde, K. "Mess- und Auswertungsmethoden in Erbpsychologischen Zwillingsuntersuchungen." Archiv. fur Gesamte Psychologie, Vol. 109, 1941, pp. 1-81. Yasuda, N. "An Extension of Wahlund's Principle to Evaluating Mating Type Frequency." Am. {. Hum. Genet., Vol. 20,1968, pp. 1-23 •