* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Print - Physiological Genomics
Survey
Document related concepts
Biology and consumer behaviour wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Genetic engineering wikipedia , lookup
Microevolution wikipedia , lookup
Genetically modified crops wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Behavioural genetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Public health genomics wikipedia , lookup
Genetically modified organism containment and escape wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Transcript
Physiol Genomics 19: 262–269, 2004; doi:10.1152/physiolgenomics.00052.2004. A model for estimating joint maternal-offspring effects on seed development in autogamous plants Li Zhang,1 Mark C. K. Yang,1 Xuelu Wang,2 Brian A. Larkins,2 Maria Gallo-Meagher,3 and Rongling Wu1 1 Department of Statistics, 2Agronomy Department, University of Florida, Gainesville, Florida; and 3Department of Plant Sciences, University of Arizona, Tucson, Arizona Submitted 3 March 2004; accepted in final form 6 August 2004 autogamous plants; EM algorithm; linkage; maternal-offspring interaction; quantitative trait loci by a complex life cycle that consists of alternating haploid and diploid generations. The diploid plant life form, called the “sporophyte,” supports meiosis which produces the haploid male and female spores that initiate the gametophytic generation. The sporophyte also nurtures the reproductive structures, such as the integuments within which the embryo develops (4). Gametogenesis and fertilization take place in an environment where gametophytic and sporophytic structures interact and are placed under several layers of haploid and diploid genetic controls (4). This interaction culminates in the formation of a new diploid generation during a complex process called “double fertilization” (12). Following meiosis, three of the four megaspores degenerate, and the surviving megaspore produces the female gametophyte (embryo sac), which typically contains eight nuclei and seven cells. Two cells are female gametes: the haploid egg cell and the homodiploid central cell. The product of meiosis in the male gametophyte (pollen) produces a tipgrowing pollen tube that migrates to stigma and eventually enters the ovule through the micropyle (sporophytic) and delivers two sperms into the embryo sac. Two zygotic products are produced, following fusion with one of the two sperm cells: the diploid embryo zygote that develops as the daughter plant and a triploid cell that develops as endosperm HIGHER PLANTS ARE CHARACTERIZED Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org). Address for reprint requests and other correspondence: R. Wu, Dept. of Statistics, 533 McCarty Hall C, Univ. of Florida, Gainesville, FL 32611 (E-mail: [email protected]). 262 with a balance number of maternal and paternal genomes 2m:1p (3, 5). Higher plant reproduction is thus characterized by five developmental phases: the diploid sporophyte, the haploid female gametophyte, the haploid male gametophyte, the developing diploid embryo, and the developing triploid endosperm. The development of the embryo sac and the seed are under control of both sporophytic and the female gametophytic origin. The paternal gametophytic and postfertilization sporophytic controls are additional levels of the complex genetic interactions that govern seed development. Recent genetic studies have identified different classes of maternal effect genes involved in seed development. These include genes required in the sporophyte for proper development of the embryo sac (14), genes required in the (maternal) sporophyte for normal embryo development (7), and genes required in the female gametophyte for proper embryo development (20). More recently, Evans and Kermicle (10) isolated a mutant in maize with effects on postfertilization development. By performing quantitative genetic analysis of different generations initiated with inbred lines, Dilkes et al. (9) detected significant evidence of sporophytic gene control over endoreduplication in maize endosperm. It is expected that molecular markers, in conjunction with segregating plant pedigrees, have greater power and precision of detecting maternal effect genes affecting embryo and endosperm development in higher plants. The current molecular dissection of endosperm is mostly based on the assumption that endosperm-specific traits are only controlled by genes from the maternal sporophyte (22, 23). With this assumption (15), a traditional interval mapping method for diploid tissues can be directly used. Wu et al. (29) proposed an improved statistical model for dissecting endosperm traits by taking its trisomic inheritance property into consideration. The traditional interval mapping method may be appropriate for unraveling the genetic basis of early seed development, because at this stage the seed’s own genome has not yet played a role. For example, a recent study suggested that a large part of the paternal genome is silenced during early seed development (21). However, for agriculturally important, mature seed traits, which are to an increasing extent controlled by the seed’s genome (17), the triploid model of Wu et al. (29) should be biologically more relevant. In a mature maize endosperm analysis (30), the model of Wu et al. detected more significant quantitative trait loci (QTL) than the method of Lander and Botstein (15). Because seed development is under control of both the sporophytic (maternal) genome and the seed’s own genome (offspring), joint maternal-offspring effects should be modeled 1094-8341/04 $5.00 Copyright © 2004 the American Physiological Society Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 Zhang, Li, Mark C. K. Yang, Xuelu Wang, Brian A. Larkins, Maria Gallo-Meagher, and Rongling Wu. A model for estimating joint maternal-offspring effects on seed development in autogamous plants. Physiol Genomics 19: 262–269, 2004; doi:10.1152/ physiolgenomics.00052.2004.—We present a statistical model for testing and estimating the effects of maternal-offspring genome interaction on the embryo and endosperm traits during seed development in autogamous plants. Our model is constructed within the context of maximum likelihood implemented with the EM algorithm. Extensive simulations were performed to investigate the statistical properties of our approach. We have successfully identified a quantitative trait locus that exerts a significant maternal-offspring interaction effect on amino acid contents of the endosperm in maize, demonstrating the power of our approach. This approach will be broadly useful in mapping endosperm traits for many agriculturally important crop plants and also make it possible to study the genetic significance of double fertilization in the evolution of higher plants. 263 MAPPING SEED DEVELOPMENT Table 1. Genetic values for the embryo and endosperm traits as affected by both the maternal and offspring genomes in the F2 population of an autogamous plant Embryo t ⫹ 1 Maternal t Genotype Symbol QQ(t) QQ(t ⫹ 1) 1 Qq(t) QQ(t ⫹ 1) Values Genotype Symbol 1 ⫽ ⫹ 2a QQQ(t ⫹ 1) 1 1 ⫽ ⫹ 2 2 ⫽ ⫹ a ⫹ 1 QQQ(t ⫹ 1) 2 2 ⫽ ⫹ Qq(t ⫹ 1) 3 3 ⫽ ⫹  2 QQq(t ⫹ 1) 3 3 ⫽ ⫹ qQ(t ⫹ 1) 3 3 ⫽ ⫹  2 Qqq(t ⫹ 1) 4 4 ⫽ ⫺ qq(t ⫹ 1) 4 4 ⫽ ⫺ a ⫹ 3 qqq(t ⫹ 1) 5 5 ⫽ ⫺ qq(t ⫹ 1) 5 5 ⫽ ⫺ 2a qqq(t ⫹ 1) 6 6 ⫽ ⫺ for control mechanisms influencing seed development. In this article, we develop a new statistical model for mapping seedspecific QTL expressed in both the sporophytic and offspring genomes. Our model is based on a statistical mixture model, consisting of quantitative genetic parameters contained in each normal density and the proportion of each genome-of-originspecific QTL genotype. The maximum likelihood implemented with the EM algorithm (8) has been employed to estimate QTL effect and position parameters. An extensive simulation study is used to examine the statistical behavior of our mapping model. THE GENETIC MODEL Seed development in angiosperms includes two major components, the embryo and the endosperm. These two tissues have different ploidy levels and are formed through different inheritance mechanisms. Therefore, we consider their underlying genetic models separately. The embryo model. For a QTL of two alleles (designated by Q and q) affecting a seed trait, tissue of diploid origin can have one of three possible genotypes, QQ, Qq and qq. Because a seed-specific trait is under control of both the sporophytic maternal genome and the offspring genome, its genetic value should be described by a joint effect of the two genomes. More specifically, modeling the overall genotypic value of an embryo trait in the seed needs to consider gene transition from the sporophyte (generation t) to its zygotic offspring (generation t ⫹ 1). For an autogamous species, the sporophytic genotype QQ(t) generates one embryo genotype QQ(t ⫹ 1); the sporophytic genotype Qq(t) generates three embryo genotypes QQ(t ⫹ 1), Qq(t ⫹ 1), and qq(t ⫹ 1) with the respective probabilities of 1/4, 1/2, and 1/4; and the sporophytic genotype qq(t) generates one embryo genotype qq(t ⫹ 1). Our quantitative genetic model for seed development will be constructed on the basis of the combination of two-generation (maternal and offspring) QTL genotypes at the putative QTL. Let a and d be the additive and dominant effects of the QTL, respectively. Thus the genotypic values of three genotypes QQ, Qq, and qq can be specified as ⫹ a, ⫹ d, and ⫺ a, where is the overall mean. For a joint maternal-offspring QTL homozygote, only the additive effects are involved; for examPhysiol Genomics • VOL 19 • Value 5 a 2 3 a 2 1 a 2 1 a 2 3 a 2 5 a 2 ⫹ 1 ⫹ 2 ⫹ 3 ⫹ 4 ple, the genotypic values of QQ(t), QQ(t ⫹ 1), and qq(t) qq(t ⫹ 1) can be denoted by ⫹ 2a and ⫺ 2a, respectively. For joint maternal-offspring QTL heterozygotes, Qq(t)QQ(t ⫹ 1), Qq(t)Qq(t ⫹ 1), and Qq(t)qq(t ⫹ 1), we need to model both the additive and dominant effects, whose genotypic values are expressed as ⫹ a ⫹ 1, ⫹ 2, and ⫺ a ⫹ 3 (Table 1). The dominant effect of a joint heterozygote (1, 2, or 3) can be partitioned into two components due to the intra-locus interaction within (d) and between the generations. For the diploid embryo, we denote b1 and b2 to be the betweengeneration dominant effects between the maternal heterozygote and offspring homozygote and between the maternal heterozygote and offspring heterozygote, respectively. We thus have 再 1 ⫽ d ⫹ b1  2 ⫽ 2d ⫹ b 2 3 ⫽ d ⫺ b1 (1) The compositions of different joint maternal-offspring QTL genotypes for the embryo are given in Table 1. The endosperm model. Three F2 sporophytic QTL genotypes are self-crossed to form different endosperm genotypes, i.e., QQ(t) to QQQ(t ⫹ 1); Qq(t) to QQQ(t ⫹ 1), QQq(t ⫹ 1), Qqq(t ⫹ 1), and qqq(t ⫹ 1), each with a probability of 1/4; and qq(t) to qqq(t ⫹ 1). For the triploid endosperm, the within-generation dominant effect can be due to the interactions between different numbers of dominant vs. recessive alleles. Let d1 and d2 be the dominant effects of two Q vs. one q(QQq) and one Q vs. two q(Qqq), respectively. Thus four endosperm QTL genotypes, QQQ, QQq, Qqq, and qqq, can be modeled by ⫹ 3⁄2a, ⫹ 1⁄2a ⫹ d1, ⫺ 1⁄2a ⫹ d2, and ⫺ 3⁄2a. We use b1 to denote the between-generation dominant effect between the maternal heterozygote and the offspring homozygote, and we use b2 and b3 to denote the between-generation dominant effects between the maternal heterozygote and offspring heterozygotes QQq and Qqq, respectively. Thus the dominant effects of joint maternaloffspring endosperm heterozygotes, Qq(t)QQQ(t ⫹ 1), Qq(t)QQq(t ⫹ 1), Qq(t)Qqq(t ⫹ 1), and Qq(t)qqq(t ⫹ 1), are expressed, respectively, as www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 qq(t) Endosperm t ⫹ 1 264 MAPPING SEED DEVELOPMENT 冦 1 ⫽ d ⫹ b1 2 ⫽ d ⫹ d1 ⫹ b2 3 ⫽ d ⫹ d2 ⫹ b3 4 ⫽ d ⫺ b1 and right marker M⫹1, respectively. A general expression of the conditional probabilities is written as (2) See Table 1 for the compositions of different joint maternaloffspring QTL genotypes for the endosperm. EXPERIMENTAL DESIGN where gk(t ⫹ 1) is the joint maternal-offspring QTL genotype for an embryo (L ⫽ 5) or an endosperm (L ⫽ 6) (Table 1), Gi(t) is the marker genotype of F2 plant i (in generation t), and Gij(t ⫹ 1) is the embryo marker genotype of the jth seed (in generation t ⫹ 1) which sporophytic plant i produces. The table in Fig. 1 materializes the conditional probabilities shown in Eq. 1 for the embryo. A similar conditional probability matrix can also be derived for the endosperm. THE STATISTICAL MODEL The mixture model. A fundamental statistical model for mapping QTL is based on a mixture model that has been previously developed (15, 28). In the mixture model, each observation y is assumed to have arisen from one of L components, with each component being modeled by a density from the parametric family f. In this study, the phenotype yij derived from the jth seed of the ith F2 plant is assumed to be determined by one of the L joint maternal-offspring QTL genotypes, plus a random error, with the likelihood function expressed as a mixture model as follows: l共yij兩, , 2兲 ⫽ ij1f1共yij, 1, 2兲 ⫹ . . . ⫹ ijLfL共yij; L, 2兲 (4) where ij ⫽ (ij1, . . . , ijL)T are the mixture proportions specified by conditional probabilities of the joint maternal- Fig. 1. Table shows joint probabilities of maternal-offspring QTL genotypes and marker genotypes in an F2 population (conditional probabilities of QTL given marker genotypes can be derived according to the Bayes theorem). Physiol Genomics • VOL 19 • (3) www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 For tissue of ploidy level ⬎2, genotypic characterization using molecular markers can be difficult. For this reason, the triploid endosperm is generally not genotyped in endosperm traits mapping. Marker genotypes can be derived from two different tissues, the sporophyte (generation t) and the embryo (generation t ⫹ 1). This thus establishes a two-stage hierarchical design for genotyping. Suppose there is an F2 population of size M, initiated with two contrasting inbred lines. A number of molecular markers (denoted by M) are genotyped both for the F2 samples and the embryos from their seeds. Let ni denote the number of seeds collected from F2 plant i. From these sampled seeds, various phenotypes of interest are measured for the diploid embryo and triploid endosperm. The conditional probabilities of joint QTL genotypes for the maternal plant and offspring, conditional upon the genotypes of two flanking markers M and M⫹1 (of a recombination fraction of r) from the sporophytic plant and its embryo, can be derived on the basis of gene transition patterns for different F2 genotypes. We use r1 and r2 to denote the recombination fractions between the left marker M and QTL, and the QTL ijk ⫽ Prob 关gk共t ⫹ 1兲兩Gi共t兲, Gij共t ⫹ 1兲, M⫺M ⫹ 1, r1, r2], i ⫽ 1, . . . , m; j ⫽ 1, . . . , n i ; k ⫽ 1, . . . , L 265 MAPPING SEED DEVELOPMENT offspring QTL genotype given a two-stage hierarchic marker genotype for the jth seed from the ith F2 plant; ⫽ (1, . . . , L)T are the expected genotypic values of different QTL genotypes; and 2 is the residual variance within each QTL genotype. The EM algorithm. Under the two-stage hierarchical genotyping design, we have the likelihood of all observations as m l共y兩, , 2兲 ⫽ ni 兿 兿 l共y 兩, , 兲 2 ij i⫽1 j⫽1 m ⫽ ni 兿兿 兺 i⫽1 j⫽1 册 (5) f 共y ij兲 ijk k k⫽1 ijk f k 共y ij 兲 L ¥ k⫽1 ijk f k 共y ij 兲 (6) which could be thought of as a posterior probability that the jth seed of the ith F2 plant has the kth joint maternal-offspring QTL genotype. In the M step, the calculated posterior probabilities were used to solve the unknown parameters m ni ¥ i⫽1 ¥ j⫽1 y ij ⌸ ijk m ni ¥ i⫽1 ¥ j⫽1 ⌸ ijk (7) m ni 6 ¥ i⫽1 ¥ j⫽1 ¥ k⫽1 共y ij ⫺ k 兲 2 ⌸ ijk m ¥ i⫽1 ni (8) ˆ k ⫽ ˆ 2 ⫽ H 0 : a ⫽ d ⫽ b1 ⫽ b2 ⫽ 0 H1: at least one of the effects is not equal to zero. Iterations are repeated between Eqs. 6–8 until convergence. The values at convergence are the MLEs. With the MLEs of k values, the MLEs of the overall mean, the additive effect, and within- and between-generation dominant effects of the QTL, as indicated in Table 1, can be obtained by solving a system of regular equations. It should be pointed out that the separation of within-generation from between-generation dominant effects for the endosperm has two difficulties. First, the endosperm model is overparameterized because six unknown dominant parameters (at the left side of Eq. 2) are contained within the estimated genotypic means of four joint maternaloffspring heterozygotes (2, . . ., 5; Table 1). Second, 3 and 4 are indistinguishable because the conditional probabilities of the corresponding QTL genotypes QQ(t)QQq(t ⫹ 1) and QQ(t)Qqq(t ⫹ 1) given the marker genotypes are identical (results not shown). The estimation of the QTL position can be obtained using a grid approach. This approach views r1 or r2 as a known parameter in the likelihood function (4) by scanning the QTL over all marker intervals. The position corresponding to the maximum of the log-likelihood ratio across a linkage group is the MLE of the QTL position. Physiol Genomics • VOL 19 • (9) The test statistics for testing the above hypotheses are calculated as the log-likelihood ratio of the full model (H1) over the reduce model (H0), LR ⫽ ⫺ 2log 关l0共a ⫽ d ⫽ b1 ⫽ b2 ⫽ 0, ˜ , ˜ 2兲 We have formulated a procedure for implementing the EM algorithm to obtain the maximum likelihood estimates (MLEs) of the unknown parameters including the QTL effects and residual variance (k, 2) and the QTL position (r1) contained within ijk (table in Fig. 1). The EM algorithm is described as follows. In the E step, the conditional probabilities (priors) of the QTL genotypes given the marker genotypes and the normal distribution function are used to calculate ⌸ ijk ⫽ 再 ⫺ l1共ˆ , â, d̂, b̂1, b̂2,ˆ 2兲兴 (10) where the tilde (⬃) and the carat (ˆ) symbols denote the MLEs of the unknown parameters under H0 and H1, respectively. The log-likelihood ratio (LR) is asymptotically 2 distributed with 4 degrees of freedom. However, the critical threshold value for declaring the existence of a QTL is generally calculated on the basis of permutation tests (6). After a significant QTL is found, any specific components of the genotypic values can be tested. For example, the maternaloffspring intra-locus interaction effect on the embryo trait can be tested by formulating the following hypotheses, 再 H0: b1 ⫽ b2 ⫽ 0 (11) H1: at least one of the effects is not equal to zero whose log-likelihood ratio test statistics is asymptotically 2 distributed with 2 degrees of freedom. Testing b1 ⫽ 0 and b2 ⫽ 0 is equivalent to testing 1 ⫺ 5 ⫽ 2(2 ⫺ 4) and 1 ⫹ 5 ⫽ 2(2 ⫹ 4 ⫺ 3), respectively. Similar hypotheses can also be formulated to test whether there is a QTL affecting a endosperm trait and whether there is a significant intra-locus interaction between the maternal heterozygote and the offspring homozygote (b1; Table 1). The latter hypothesis test can be performed under constraint 5(2 ⫺ 5) ⫽ 3(1 ⫺ 6). One can also test whether one or both of the sums d1 ⫹ b2 and d2 ⫹ b3 (Table 1) are significantly different from zero. But the separation of d1 and b2 or d2 and b3 is not possible unless some particular constraints are used. The critical thresholds for all these hypotheses mentioned above can be obtained by simulation studies. RESULTS Monte Carlo simulation. We performed a series of simulation experiments to examine the statistical properties of the method proposed to map seed development. A linkage group length of 180 cM, comprising 10 equidistant markers ordered M1, . . ., M10, is simulated for an F2 population. We hypothesize a QTL affecting an embryo trait located at 5 cM from the left marker of the third interval or at 45 cM from the first marker of the linkage group. As a result of the nature of our approach, we simulate two-stage hierarchical marker genotypes for the F2 individuals (in generation t) and their autogamous progeny (in generation t ⫹ 1). The autogamous embryos derived from the F2 are affected by five joint maternaloffspring QTL genotypes, QQ(t)QQ(t ⫹ 1), Qq(t)QQ(t ⫹ 1), Qq(t)Qq(t ⫹ 1), Qq(t)qq(t ⫹ 1), and qq(t)qq(t ⫹ 1), with the frequencies of 1/4, 1/8, 1/4, 1/8, and 1/4, respectively. The www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 冋 L Hypothesis tests. A number of hypothesis tests can be formulated for our seed model proposed above. The first hypothesis test considers the existence of any QTL affecting the expression of an embryo or endosperm trait. For the embryo model, for example, we have the hypotheses, 266 MAPPING SEED DEVELOPMENT Table 2. Averaged MLEs of the QTL position and effects among 100 replicated simulations under different heritability (H2) levels and sampling strategies (family number ⫻ family size) H 2 ⫽ 0.2 (2 ⫽ 14.42) H 2 ⫽ 0.6 (2 ⫽ 2.40) True Parameter 10 ⫻ 40 40 ⫻ 10 10 ⫻ 40 40 ⫻ 10 Position ⫽ 14 a⫽1.0 d⫽0.8 b1⫽0.7 b2⫽0.7 2 44.92 (2.8920) 13.84 (0.8735) 1.04 (0.8590) 0.97 (1.0035) 0.59 (0.7714) 0.47 (1.5725) 14.11 (1.2995) 44.70 (2.3866) 14.04 (0.4765) 0.99 (0.4746) 0.84 (0.6783) 0.69 (0.5475) 0.55 (1.3159) 14.25 (1.1760) 44.76 (1.0442) 14.06 (0.6933) 0.94 (0.6932) 0.76 (0.7608) 0.74 (0.3918) 0.70 (0.8587) 2.38 (0.2175) 45.02 (1.0445) 13.99 (0.1672) 1.00 (0.1672) 0.83 (0.2862) 0.66 (0.2442) 0.65 (0.5519) 2.38 (0.1889) Values are maximum likelihood estimates. (MLEs); the numbers in the parentheses are the square roots of the mean square errors of the MLEs. Physiol Genomics • VOL 19 • Fig. 2. The profile of the log-likelihood ratio (LR) test statistics calculated as a function of genome position for a simulated linkage group for sampling strategy 10 ⫻ 40 under different heritability levels: 0.2 (broken curve) and 0.6 (solid curve). The horizontal line refers to the critical threshold value at significance level ␣ ⫽ 0.05. The vertical solid line refers to the genomic position at which the LR has the maximum value. The true position of the QTL on the linkage group is indicated by a triangle. log-likelihood ratio values over 1,000 simulation replicates can be approximated by a 2 distribution. The 99th percentiles of the distribution of the maximum are used as empirical critical values to declare the existence of a QTL on the linkage groups at the significance level ␣ ⫽ 0.01. Figures 2 and 3 illustrate the profiles of the log-likelihood ratio test statistics across the simulated linkage group under different sampling strategies and heritability levels. In all the situations, QTL can be detected given that the peaks of the profiles are greater than the critical threshold. But an increase of heritability from 0.2 to 0.6 can increase the power to detect QTL. It appears that the two sampling strategies provide similar power and accuracy to detect the QTL position (Figs. 2 and 3). In general, our model can provide reliable estimates of the QTL effects including the additive, within-generation dominant, and between-generation dominant (Table 2). As expected, the additive effect and residual variance can be better estimated than the dominant effects. The within-generation dominant effects (b1) can be more precisely estimated than the between-generation dominant effects (b2). The MLEs of dominant effects have large sampling errors estimated from 100 simulation replicates, but the sampling errors can be reduced when an effective measure is taken to increase the heritability level or when sampling strategy 40 ⫻ 10 is used (Table 2). The increase of heritability from 0.2 to 0.6 markedly reduces the sampling errors of dominant effect estimation. We perform a hypothesis test for the significance of between-generation dominant effects based on the hypothesis described by Eq. 11. Given the data set simulated under the condition as given in Table 2, we estimate the log-likelihood ratios under the hypotheses by Eq. 11 and reject the null hypothesis b1 ⫽ b2 ⫽ 0 in all 100 simulations for different heritabilities and sampling strategies. This suggests that our model has great power to detect maternal-offspring dominant effects on an embryo trait. Similar simulation designs were also made to study the statistical properties of the endosperm model. Because the www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 genetic variance due to this QTL is calculated using assumed genetic effect values (Table 2). Given a sample size of 400, our simulation scenarios include few families (10) each with large size (40) or many families (40) each with small size (10). These two different sampling strategies (10 ⫻ 40 and 40 ⫻ 10) are combined with different levels of heritabilities (H2 ⫽ 0.2 vs. 0.6). Our simulation for the marker genotypes includes a twostage hierarchy. The upper level in the hierarchy is the F2 genotype, whereas the lower level in the hierarchy is the autogamous embryos from the F2. Let us consider the first marker that has three genotypes M1M1(t), M1m1(t), and m1m1(t), with a probability of 1/4, 1/2, and 1/4, at the upper level. Genotype M1M1(t) is self-pollinated to produce a single genotype M1M1(t ⫹ 1); genotype M1m1(t) to produce three genotypes M1M1(t ⫹ 1) (1/4), M1m1(t ⫹ 1) (1/2), and m1m1(t ⫹ 1) (1/4); and m1m1(t) to produce a single genotype m1m1(t ⫹ 1) at the lower level. Similarly, the second marker also has three genotypes at the higher level, each of which is self-pollinated to produce the corresponding genotypes at the lower level. At the higher level, three genotypes for the first marker are combined with three genotypes for the second marker to form nine 2-locus genotypes, each with a probability being a function of the recombination fraction between these two markers. Meanwhile, considering the difference of genotypes at the higher level, each marker should have five joint maternal-offspring genotypes, and thus, a pair of markers produces 25 such joint genotypes at the lower level. The probability of each of these joint genotypes depends on the probability of the corresponding two-locus genotype at the higher level and the Mendelian segregation ratios of a heterozygote, if any. This simulation strategy is extended to consider all markers. We use the Kosambi map function to convert the map distance into the recombination fraction. The declaration for the existence of QTL is based on a critical threshold for the log-likelihood ratio test statistic that controls the chromosome-wide type I error rate. The characterization of the threshold for declaring the existence of a QTL is a difficult issue. The simulation test is regarded as a useful approach for calculating the threshold, because it is not dependent on the distribution of the test statistic. We simulate the marker genotype data and the phenotype data under the null hypothesis that there is no QTL. The simulated data are analyzed by the proposed model. The distribution of the MAPPING SEED DEVELOPMENT endosperm model is overparameterized and because the genotypic means of two heterozygote QTL genotypes are indistinguishable, we cannot estimate all within-generation and between-generation dominant effects defined in Eq. 2 and Table 1. However, it is possible to make hypothesis tests for some of the dominant effects or the sum of them. In practice, we may simplify the endosperm model and reduce the number of dominant effect parameters involved. For example, by letting d1 ⫹ b2 ⫽ e1 and d2 ⫹ b3 ⫽ e2 so that the number of the unknown parameters equal the number of equations, we can make the model more tractable. However, because the estimated genotypic values of joint maternaloffspring genotypes Qq(t)QQq(t ⫹ 1) and Qq(t)Qqq(t ⫹ 1) are unidentifiable, e1 and e2 can still not be uniquely estimated. We simulated the endosperm data assuming that e1 and e2 are distinguishable. Results from the simulation suggest that the estimation precision of the endosperm parameters (Eq. 2 and Table 1) is broadly consistent with that of the embryo model. We performed an additional simulation to test the sensitivity of our model to false positives. A data set for two-stage maternal-offspring marker genotypes and offspring phenotypes was simulated under the assumption that there is no maternal effect. This simulated data set was then analyzed with our (full) model incorporating the maternal-offspring interaction effects and a (reduced) model with no such effects. We did not find significant maternal and maternal-offspring interaction effects by the full model, although both the full and reduced models can detect offspring QTL (results not shown). A case study. We use an example of maize for two endosperm traits, elongation factor 1␣ (eEF1A) and free amino acid (FAA) contents, to demonstrate the power of our statistical approach. An F2 population of 106 plants was derived from a cross between two contrasting maize inbred lines, Oh51Ao2 (high eEF1A and low FAA content) and Oh545o2 (low eEF1A and high FAA content). The F2 and F2:3 progeny from this Physiol Genomics • VOL 19 • cross were prepared for genotypic and phenotypic analysis as previously described by Wang and Larkins (22). DNA was extracted from young leaves of the F2 plants, whereas grain protein quality traits were measured from the F3 kernels of the F2, as described in Wang and Larkins (22) and Wang et al. (23). Simple sequence repeat (SSR) primers were selected from the Maize Microsatellite-RFLP consensus map. The primer sequences were described in the Maize Genome Database (22). A linkage map of 83 SSR markers of the F2 plants was constructed, based on the known order of SSR markers on maize chromosomes. Our proposed autogamous model is used in allogamous maize, because the F2:3 on which the endosperm traits were measured were derived from artificial self-pollination. Given the structure of the data used in this example, we modified our model to map joint maternal-offspring QTL effects on endosperm traits based on the marker genotypes only derived from the F2 (maternal) plants (29). We recognize that this one-stage genotyping scheme provides limited information to separate the estimated genotypic means of the six 2-generation QTL genotypes (see Table 1). But this data set is still useful for providing a test for the existence of a maternal-offspring QTL affecting the endosperm trait using the hypothesis described by Eq. 10. All the 10 chromosomes were scanned for the existence of QTL for two endosperm traits. We successfully detect a QTL for eEF1A on chromosome 6, as indicated by LR of 31.2 greater than the genome-wide threshold of 22.4 at the significance level ␣ ⫽ 0.001. The estimation position of QTL is 28 cM from the first marker of chromosome 6 (Fig. 4). The threshold for claiming the detection of a QTL was calculated on the basis of the 99.9th percentile of the LR distribution from 1,000 permutation tests. DISCUSSION We have proposed a new statistical model for testing and estimating the joint effects of maternal and offspring genomes Fig. 4. The profile of the log-likelihood-ratio test statistics calculated as a function of genome position for eEF1A on the chromosome 6 (Wang and Larkins, Ref. 22). The horizontal solid line refers to the critical threshold value at significance level ␣ ⫽ 0.05. The vertical solid line refers to the true position of a QTL on the linkage group. The vertical dot lines refer to the marker position. www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 Fig. 3. The profile of the LR test statistics calculated as a function of genome position for a simulated linkage group for sampling strategy 40 ⫻ 10 under different heritability levels: 0.2 (broken curve) and 0.6 (solid curve). The horizontal line refers to the critical threshold value at significance level ␣ ⫽ 0.05. The vertical broken and solid lines refer to the genomic positions at which the LR has the maximum value for the two different heritabilities, 0.2 and 0.4, respectively. The true position of the QTL on the linkage group is indicated by a triangle. 267 268 MAPPING SEED DEVELOPMENT Physiol Genomics • VOL 19 • pears for allogamous and mixed-pollinated systems in which all QTL genotypes can be uniquely determined by marker genotypes. Understanding the maternal and paternal genetic regulation of seed development helps to answer many fundamental evolutionary questions in higher plants. There is a straightforward application of our model to evolutionary genetic studies, but this will need to consider the patterns of gene segregation and transmission in natural plant populations. Additional parameters characterizing population structure and organization, such as allele frequencies, linkage disequilibrium, and haplotype frequencies (16), should be incorporated into our seed development mapping model. In addition, a considerable body of literature has suggested different roles of the paternal and maternal loci in seed development, a phenomenon called “parent-of-origin effects” (18, 19). Our model provides a fundamental platform for detecting and characterizing these socalled imprinting genes whose expression depends on the origin of parents. In sum, the model framework presented in this article will make us closer to unravel the genetic basis of embryogenesis and seed development in higher plants. ACKNOWLEDGMENTS The publication of this manuscript is approved as journal series R-10069 by the Florida Agricultural Experiment Station. GRANTS This work is partially supported by an Outstanding Young Investigators Award of the National Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense Grant (7222061-12) to R. Wu. REFERENCES 1. Agrawal AF, Brodie ED III, and Brown J. Parent-offspring coadaptation and the dual genetic control of maternal care. Science 292: 1710– 1712, 2001. 2. Bazzaz FA. Plant biology in the future. Proc Natl Acad Sci USA 98: 5441–5445, 2001. 3. Berger F. Endosperm development. Curr Opin Plant Biol 2: 28–32, 1999. 4. Chaudhury AM, Koltunow A, Payne T, Luo M, Tucker MR, Dennis ES, and Peacock WJ. Control of early seed development. Annu Rev Cell Dev Biol 17: 677–699, 2001. 5. Chaudhury AM, Ming L, Miller C, Craig S, Dennis ES, and Peacock WJ. Fertilization-independent seed development in Arabidopsis thaliana. Proc Natl Acad Sci USA 94: 4223–4228, 1997. 6. Churchill GA and Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971, 1994. 7. Colombo L, Franken J, VanderKrol AR, Wittich PE, Dons HJM, and Angenent GC. Downregulation of ovule-specific MADS box genes from petunia results in maternally controlled defects in seed development. Plant Cell 9: 703–715, 1997. 8. Dempster AP, Laird NM, and Rubin DB. Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B 39: 1–38, 1997. 9. Dilkes BP, Dante RA, Coelho C, and Larkins BA. Genetic analyses of endoreduplication in Zea mays endosperm: evidence of sporophytic and zygotic maternal control. Genetics 160: 1163–1177, 2002. 10. Evans MMS and Kermicle JL. Interaction between maternal effect and zygotic effect mutations during maize seed development. Genetics 159: 303–315, 2001. 11. Friedman WE. Double fertilization in Ephedra, a nonflowering seed plant: its bearing on the origin of angiosperms. Science 247: 951–954, 1990. 12. Friedman WE. The evolution of double fertilization and endosperm: an “historical” perspective. Sex Plant Reprod 11: 6–16, 1998. 13. Hager R and Johnstone RA. The genetic basis of family conflict resolution in mice. Nature 421: 533–535, 2003. 14. Hulskamp M, Schneitz K, and Pruitt RE. Genetic evidence for a long range activity that directs pollen tube guidance in Arabidopsis. Plant Cell 7: 57–64, 1995. www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 on quantitative traits expressed during seed development. Maternal inheritance has long been thought to affect animal traits (1, 13), but based on recent observations (9), maternal effects may be of greater importance in the seed formation of flowering plants than originally appreciated. The formation of seed in flowering plants results from double fertilization (11), in which one of the two sperm cells from a pollen tube fertilizes the haploid egg cell to form a diploid zygote (embryo) and the other sperm cell fertilizes the diploid central cell and fuses with the central cell (polar) nuclei, thus giving rise to the triploid endosperm. Both the embryo and endosperm affect seed size and seed quality, including oil, protein, and carbohydrate contents. An understanding of how these traits are genetically determined through the integration of the formation mechanisms of the embryo and endosperm is of paramount importance to increase grain production and quality (2). Wu et al. (29) have derived a maximum likelihood-based one-QTL model for mapping endosperm-specific traits in autogamous plants. This endosperm model has proven to be more powerful to detect significant QTL than can usual diploidmodels. But the model of Wu et al. did not take into account the effects due to the interaction between the maternal genome and offspring. A considerable body of literature supports the view that the maternal and offspring genomes interact to determine the developmental processes of seeds, including those in both the embryo and the endosperm. In a recent study in Arabidopsis, the paternal genes were found not to be expressed during early stages of seed development (21). But this finding is not supported by other studies in the same plant species or different species (24). Our model can be used to test for the maternal-offspring interaction effect of a QTL on seed development and separate the effects due to the maternal genome from the effect due to the offspring genome. This separation can be made through performing the hypothesis testing of Eq. 11. Thus the approach proposed here can be used to examine how and when the paternal genome exerts effects on seed development and, ultimately, address the above-mentioned fundamentally important debate arising from Arabidopsis genetic research. Studies of joint effects of maternal and offspring genomes on offspring traits have received considerable attention in animals (25–27), but there is surprisingly a paucity in this kind of study for plants. Our proposed model will provide a powerful tool for mapping specific genetic loci that trigger joint maternal-offspring effects in plants. Our model is based on an F2 population for an autogamous plant system. It is not difficult to extend the model to other reproductive systems, such as allogamous, mixed-pollinated, and other mapping populations. For an autogamous plant, the eggs and two polar nuclei cells are self-fertilized so that the frequencies of male gamete genotypes are identical to those of female gamete genotypes. But in an allogamous plant, such as maize, each female gamete from each mother plant will be pollinated by all possible male gametes from the pollen pool. This difference should be considered when the current model is used to study the genetics of the allogamous seed development. As mentioned above, the autogamous system has an inherited limitation to separate genotypic means of two joint maternal-offspring QTL heterozygotes for the endosperm because the conditional probabilities of these two QTL heterozygotes given two-stage hierarchical marker genotypes are identical. This problem disap- MAPPING SEED DEVELOPMENT Physiol Genomics • VOL 19 • 23. Wang XL, Woo YM, Kim CS, and Larkins BA. Quantitative trait locus mapping of loci influencing elongation factor 1 alpha content in maize endosperm. Plant Physiol 125: 1271–1282, 2001. 24. Weijers D, Geldner N, Offringa R, and Jorgens G. Early paternal gene activity in Arabidopsis. Nature 414: 709–710, 2001. 25. Wolf JB. Gene interactions from maternal effects. Evolution 54: 1882– 1898, 2000. 26. Wolf JB. Genetic architecture and evolutionary constraint when the environment contains genes. Proc Natl Acad Sci USA 100: 4655–4660, 2003. 27. Wolf JB, Vaughn TT, Pletscher LS, and Cheverud JM. Contribution of maternal effect QTL to genetic architecture of early growth in mice. Heredity 89: 300–310, 2002. 28. Wu RL. Mapping quantitative trait loci by genotyping haploid tissues. Genetics 52: 1741–1752, 1999. 29. Wu RL, Ma CX, Gallo-Meagher M, Littell RC, and Casella G. Statistical methods for dissecting triploid endosperm traits using molecular markers: An autogamous model. Genetics 162: 875–892, 2002. 30. Wu RL, Lou XY, Ma CX, Wang X, Larkins BA, and Casella G. An improved mapping model generates high-resolution mapping of protein quality traits in maize endosperm. Proc Natl Acad Sci USA 99: 11281– 11286, 2002. www.physiolgenomics.org Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017 15. Lander ES and Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199, 1989. 16. Lou XY, Casella G, Littell RC, Yang MCK, Johnson JA, and Wu RL. A haplotype-based algorithm for multilocus linkage disequilibrium mapping of quantitative trait loci with epistasis. Genetics 163: 1533–1548, 2003. 17. Mazur B, Krebbers E, and Tingey S. Gene discovery and product development for grain quality traits. Science 285: 372–375, 1999. 18. Scott RJ, Spielman M, Bailey J, and Dickinson HG. Parent-of-origin effects on seed development in Arabidopsis thaliana. Development 125: 3329–3341, 1998. 19. Spielman M, Vinkenoog R, Dickinson HG, and Scott RJ. The epigenetic basis of gender in flowering plants and mammals. Trends Genet 17: 705–711, 2001. 20. Springer PS, Holding DR, Groover A, Yordan C, Martienssen RA. The essential Mcm7 protein PROLIFERA is localized to the nucleus of dividing cells during the G(1) phase and is required maternally for early Arabidopsis development. Development 127: 1815–1822, 2000. 21. Vielle-Calzada JP, Baskar R, and Grossniklaus U. Delayed activation of the paternal genome during seed development. Nature 404: 91–94, 2000. 22. Wang XL and Larkins BA. Genetic analysis of amino acid accumulation in opaque-2 maize endosperm. Plant Physiol 125: 1766–1777, 2001. 269