Download Print - Physiological Genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Polyploid wikipedia , lookup

Genetic engineering wikipedia , lookup

Microevolution wikipedia , lookup

Genetically modified crops wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Behavioural genetics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Public health genomics wikipedia , lookup

Twin study wikipedia , lookup

Genetically modified organism containment and escape wikipedia , lookup

History of genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

Heritability of IQ wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Physiol Genomics 19: 262–269, 2004;
doi:10.1152/physiolgenomics.00052.2004.
A model for estimating joint maternal-offspring effects on seed development
in autogamous plants
Li Zhang,1 Mark C. K. Yang,1 Xuelu Wang,2 Brian A. Larkins,2
Maria Gallo-Meagher,3 and Rongling Wu1
1
Department of Statistics, 2Agronomy Department, University of Florida, Gainesville, Florida;
and 3Department of Plant Sciences, University of Arizona, Tucson, Arizona
Submitted 3 March 2004; accepted in final form 6 August 2004
autogamous plants; EM algorithm; linkage; maternal-offspring interaction; quantitative trait loci
by a complex life cycle that
consists of alternating haploid and diploid generations. The
diploid plant life form, called the “sporophyte,” supports meiosis which produces the haploid male and female spores that
initiate the gametophytic generation. The sporophyte also nurtures the reproductive structures, such as the integuments
within which the embryo develops (4). Gametogenesis and
fertilization take place in an environment where gametophytic
and sporophytic structures interact and are placed under several
layers of haploid and diploid genetic controls (4).
This interaction culminates in the formation of a new diploid
generation during a complex process called “double fertilization” (12). Following meiosis, three of the four megaspores
degenerate, and the surviving megaspore produces the female
gametophyte (embryo sac), which typically contains eight
nuclei and seven cells. Two cells are female gametes: the
haploid egg cell and the homodiploid central cell. The product
of meiosis in the male gametophyte (pollen) produces a tipgrowing pollen tube that migrates to stigma and eventually
enters the ovule through the micropyle (sporophytic) and
delivers two sperms into the embryo sac. Two zygotic
products are produced, following fusion with one of the two
sperm cells: the diploid embryo zygote that develops as the
daughter plant and a triploid cell that develops as endosperm
HIGHER PLANTS ARE CHARACTERIZED
Article published online before print. See web site for date of publication
(http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: R. Wu, Dept. of
Statistics, 533 McCarty Hall C, Univ. of Florida, Gainesville, FL 32611
(E-mail: [email protected]).
262
with a balance number of maternal and paternal genomes
2m:1p (3, 5).
Higher plant reproduction is thus characterized by five
developmental phases: the diploid sporophyte, the haploid
female gametophyte, the haploid male gametophyte, the developing diploid embryo, and the developing triploid endosperm.
The development of the embryo sac and the seed are under
control of both sporophytic and the female gametophytic origin. The paternal gametophytic and postfertilization sporophytic controls are additional levels of the complex genetic
interactions that govern seed development. Recent genetic
studies have identified different classes of maternal effect
genes involved in seed development. These include genes
required in the sporophyte for proper development of the
embryo sac (14), genes required in the (maternal) sporophyte
for normal embryo development (7), and genes required in the
female gametophyte for proper embryo development (20).
More recently, Evans and Kermicle (10) isolated a mutant in
maize with effects on postfertilization development. By performing quantitative genetic analysis of different generations
initiated with inbred lines, Dilkes et al. (9) detected significant
evidence of sporophytic gene control over endoreduplication in
maize endosperm.
It is expected that molecular markers, in conjunction with
segregating plant pedigrees, have greater power and precision
of detecting maternal effect genes affecting embryo and endosperm development in higher plants. The current molecular
dissection of endosperm is mostly based on the assumption that
endosperm-specific traits are only controlled by genes from the
maternal sporophyte (22, 23). With this assumption (15), a
traditional interval mapping method for diploid tissues can be
directly used. Wu et al. (29) proposed an improved statistical
model for dissecting endosperm traits by taking its trisomic
inheritance property into consideration. The traditional interval
mapping method may be appropriate for unraveling the genetic
basis of early seed development, because at this stage the
seed’s own genome has not yet played a role. For example, a
recent study suggested that a large part of the paternal genome
is silenced during early seed development (21). However, for
agriculturally important, mature seed traits, which are to an
increasing extent controlled by the seed’s genome (17), the
triploid model of Wu et al. (29) should be biologically more
relevant. In a mature maize endosperm analysis (30), the model
of Wu et al. detected more significant quantitative trait loci
(QTL) than the method of Lander and Botstein (15).
Because seed development is under control of both the
sporophytic (maternal) genome and the seed’s own genome
(offspring), joint maternal-offspring effects should be modeled
1094-8341/04 $5.00 Copyright © 2004 the American Physiological Society
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
Zhang, Li, Mark C. K. Yang, Xuelu Wang, Brian A. Larkins,
Maria Gallo-Meagher, and Rongling Wu. A model for estimating joint maternal-offspring effects on seed development in autogamous plants. Physiol Genomics 19: 262–269, 2004; doi:10.1152/
physiolgenomics.00052.2004.—We present a statistical model for
testing and estimating the effects of maternal-offspring genome
interaction on the embryo and endosperm traits during seed development in autogamous plants. Our model is constructed within the
context of maximum likelihood implemented with the EM algorithm.
Extensive simulations were performed to investigate the statistical
properties of our approach. We have successfully identified a quantitative trait locus that exerts a significant maternal-offspring interaction effect on amino acid contents of the endosperm in maize,
demonstrating the power of our approach. This approach will be
broadly useful in mapping endosperm traits for many agriculturally
important crop plants and also make it possible to study the genetic
significance of double fertilization in the evolution of higher plants.
263
MAPPING SEED DEVELOPMENT
Table 1. Genetic values for the embryo and endosperm traits as affected by both the maternal and offspring
genomes in the F2 population of an autogamous plant
Embryo t ⫹ 1
Maternal t
Genotype
Symbol
QQ(t)
QQ(t ⫹ 1)
1
Qq(t)
QQ(t ⫹ 1)
Values
Genotype
Symbol
␮1 ⫽ ␮ ⫹ 2a
QQQ(t ⫹ 1)
1
␮1 ⫽ ␮ ⫹
2
␮2 ⫽ ␮ ⫹ a ⫹ ␤1
QQQ(t ⫹ 1)
2
␮2 ⫽ ␮ ⫹
Qq(t ⫹ 1)
3
␮3 ⫽ ␮ ⫹ ␤ 2
QQq(t ⫹ 1)
3
␮3 ⫽ ␮ ⫹
qQ(t ⫹ 1)
3
␮3 ⫽ ␮ ⫹ ␤ 2
Qqq(t ⫹ 1)
4
␮4 ⫽ ␮ ⫺
qq(t ⫹ 1)
4
␮4 ⫽ ␮ ⫺ a ⫹ ␤3
qqq(t ⫹ 1)
5
␮5 ⫽ ␮ ⫺
qq(t ⫹ 1)
5
␮5 ⫽ ␮ ⫺ 2a
qqq(t ⫹ 1)
6
␮6 ⫽ ␮ ⫺
for control mechanisms influencing seed development. In this
article, we develop a new statistical model for mapping seedspecific QTL expressed in both the sporophytic and offspring
genomes. Our model is based on a statistical mixture model,
consisting of quantitative genetic parameters contained in each
normal density and the proportion of each genome-of-originspecific QTL genotype. The maximum likelihood implemented
with the EM algorithm (8) has been employed to estimate QTL
effect and position parameters. An extensive simulation study
is used to examine the statistical behavior of our mapping
model.
THE GENETIC MODEL
Seed development in angiosperms includes two major components, the embryo and the endosperm. These two tissues
have different ploidy levels and are formed through different
inheritance mechanisms. Therefore, we consider their underlying genetic models separately.
The embryo model. For a QTL of two alleles (designated by
Q and q) affecting a seed trait, tissue of diploid origin can have
one of three possible genotypes, QQ, Qq and qq. Because a
seed-specific trait is under control of both the sporophytic
maternal genome and the offspring genome, its genetic value
should be described by a joint effect of the two genomes. More
specifically, modeling the overall genotypic value of an embryo trait in the seed needs to consider gene transition from the
sporophyte (generation t) to its zygotic offspring (generation
t ⫹ 1). For an autogamous species, the sporophytic genotype
QQ(t) generates one embryo genotype QQ(t ⫹ 1); the sporophytic genotype Qq(t) generates three embryo genotypes
QQ(t ⫹ 1), Qq(t ⫹ 1), and qq(t ⫹ 1) with the respective
probabilities of 1/4, 1/2, and 1/4; and the sporophytic genotype
qq(t) generates one embryo genotype qq(t ⫹ 1).
Our quantitative genetic model for seed development will be
constructed on the basis of the combination of two-generation
(maternal and offspring) QTL genotypes at the putative QTL.
Let a and d be the additive and dominant effects of the QTL,
respectively. Thus the genotypic values of three genotypes QQ,
Qq, and qq can be specified as ␮ ⫹ a, ␮ ⫹ d, and ␮ ⫺ a, where
␮ is the overall mean. For a joint maternal-offspring QTL
homozygote, only the additive effects are involved; for examPhysiol Genomics • VOL
19 •
Value
5
a
2
3
a
2
1
a
2
1
a
2
3
a
2
5
a
2
⫹ ␤1
⫹ ␤2
⫹ ␤3
⫹ ␤4
ple, the genotypic values of QQ(t), QQ(t ⫹ 1), and qq(t) qq(t ⫹
1) can be denoted by ␮ ⫹ 2a and ␮ ⫺ 2a, respectively. For
joint maternal-offspring QTL heterozygotes, Qq(t)QQ(t ⫹ 1),
Qq(t)Qq(t ⫹ 1), and Qq(t)qq(t ⫹ 1), we need to model both the
additive and dominant effects, whose genotypic values are
expressed as ␮ ⫹ a ⫹ ␤1, ␮ ⫹ ␤2, and ␮ ⫺ a ⫹ ␤3 (Table 1).
The dominant effect of a joint heterozygote (␤1, ␤2, or ␤3) can
be partitioned into two components due to the intra-locus
interaction within (d) and between the generations. For the
diploid embryo, we denote b1 and b2 to be the betweengeneration dominant effects between the maternal heterozygote
and offspring homozygote and between the maternal heterozygote and offspring heterozygote, respectively. We thus have
再
␤1 ⫽ d ⫹ b1
␤ 2 ⫽ 2d ⫹ b 2
␤3 ⫽ d ⫺ b1
(1)
The compositions of different joint maternal-offspring QTL
genotypes for the embryo are given in Table 1.
The endosperm model. Three F2 sporophytic QTL genotypes
are self-crossed to form different endosperm genotypes, i.e.,
QQ(t) to QQQ(t ⫹ 1); Qq(t) to QQQ(t ⫹ 1), QQq(t ⫹ 1),
Qqq(t ⫹ 1), and qqq(t ⫹ 1), each with a probability of 1/4; and
qq(t) to qqq(t ⫹ 1).
For the triploid endosperm, the within-generation dominant
effect can be due to the interactions between different numbers
of dominant vs. recessive alleles. Let d1 and d2 be the dominant
effects of two Q vs. one q(QQq) and one Q vs. two q(Qqq),
respectively. Thus four endosperm QTL genotypes, QQQ,
QQq, Qqq, and qqq, can be modeled by ␮ ⫹ 3⁄2a, ␮ ⫹ 1⁄2a ⫹
d1, ␮ ⫺ 1⁄2a ⫹ d2, and ␮ ⫺ 3⁄2a. We use b1 to denote the
between-generation dominant effect between the maternal heterozygote and the offspring homozygote, and we use b2 and b3
to denote the between-generation dominant effects between the
maternal heterozygote and offspring heterozygotes QQq and
Qqq, respectively. Thus the dominant effects of joint maternaloffspring endosperm heterozygotes, Qq(t)QQQ(t ⫹ 1),
Qq(t)QQq(t ⫹ 1), Qq(t)Qqq(t ⫹ 1), and Qq(t)qqq(t ⫹ 1), are
expressed, respectively, as
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
qq(t)
Endosperm t ⫹ 1
264
MAPPING SEED DEVELOPMENT
冦
␤1 ⫽ d ⫹ b1
␤2 ⫽ d ⫹ d1 ⫹ b2
␤3 ⫽ d ⫹ d2 ⫹ b3
␤4 ⫽ d ⫺ b1
and right marker M␩⫹1, respectively. A general expression of
the conditional probabilities is written as
(2)
See Table 1 for the compositions of different joint maternaloffspring QTL genotypes for the endosperm.
EXPERIMENTAL DESIGN
where gk(t ⫹ 1) is the joint maternal-offspring QTL genotype
for an embryo (L ⫽ 5) or an endosperm (L ⫽ 6) (Table 1), Gi(t)
is the marker genotype of F2 plant i (in generation t), and
Gij(t ⫹ 1) is the embryo marker genotype of the jth seed (in
generation t ⫹ 1) which sporophytic plant i produces. The table
in Fig. 1 materializes the conditional probabilities shown in Eq.
1 for the embryo. A similar conditional probability matrix can
also be derived for the endosperm.
THE STATISTICAL MODEL
The mixture model. A fundamental statistical model for
mapping QTL is based on a mixture model that has been
previously developed (15, 28). In the mixture model, each
observation y is assumed to have arisen from one of L components, with each component being modeled by a density
from the parametric family f. In this study, the phenotype yij
derived from the jth seed of the ith F2 plant is assumed to be
determined by one of the L joint maternal-offspring QTL
genotypes, plus a random error, with the likelihood function
expressed as a mixture model as follows:
l共yij兩␲, ␮, ␴2兲 ⫽ ␲ij1f1共yij, ␮1, ␴2兲 ⫹ . . . ⫹ ␲ijLfL共yij; ␮L, ␴2兲 (4)
where ␲ij ⫽ (␲ij1, . . . , ␲ijL)T are the mixture proportions
specified by conditional probabilities of the joint maternal-
Fig. 1. Table shows joint probabilities of maternal-offspring QTL genotypes and marker genotypes in an F2 population (conditional
probabilities of QTL given marker genotypes can be derived according to the Bayes theorem).
Physiol Genomics • VOL
19 •
(3)
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
For tissue of ploidy level ⬎2, genotypic characterization
using molecular markers can be difficult. For this reason, the
triploid endosperm is generally not genotyped in endosperm
traits mapping. Marker genotypes can be derived from two
different tissues, the sporophyte (generation t) and the embryo
(generation t ⫹ 1). This thus establishes a two-stage hierarchical design for genotyping. Suppose there is an F2 population of
size M, initiated with two contrasting inbred lines. A number of
molecular markers (denoted by M) are genotyped both for the
F2 samples and the embryos from their seeds. Let ni denote the
number of seeds collected from F2 plant i. From these sampled
seeds, various phenotypes of interest are measured for the
diploid embryo and triploid endosperm.
The conditional probabilities of joint QTL genotypes for the
maternal plant and offspring, conditional upon the genotypes
of two flanking markers M␩ and M␩⫹1 (of a recombination
fraction of r) from the sporophytic plant and its embryo, can be
derived on the basis of gene transition patterns for different F2
genotypes. We use r␩1 and r␩2 to denote the recombination
fractions between the left marker M␩ and QTL, and the QTL
␲ijk ⫽ Prob 关gk共t ⫹ 1兲兩Gi共t兲, Gij共t ⫹ 1兲, M␩⫺M␩ ⫹ 1, r␩1, r␩2],
i ⫽ 1, . . . , m; j ⫽ 1, . . . , n i ; k ⫽ 1, . . . , L
265
MAPPING SEED DEVELOPMENT
offspring QTL genotype given a two-stage hierarchic marker
genotype for the jth seed from the ith F2 plant; ␮ ⫽ (␮1, . . . ,
␮L)T are the expected genotypic values of different QTL
genotypes; and ␴2 is the residual variance within each QTL
genotype.
The EM algorithm. Under the two-stage hierarchical genotyping design, we have the likelihood of all observations as
m
l共y兩␲, ␮, ␴2兲 ⫽
ni
兿 兿 l共y 兩␲, ␮, ␴ 兲
2
ij
i⫽1 j⫽1
m
⫽
ni
兿兿 兺␲
i⫽1 j⫽1
册
(5)
f 共y ij兲
ijk k
k⫽1
␲ ijk f k 共y ij 兲
L
¥ k⫽1
␲ ijk f k 共y ij 兲
(6)
which could be thought of as a posterior probability that the jth
seed of the ith F2 plant has the kth joint maternal-offspring
QTL genotype.
In the M step, the calculated posterior probabilities were
used to solve the unknown parameters
m
ni
¥ i⫽1
¥ j⫽1
y ij ⌸ ijk
m
ni
¥ i⫽1 ¥ j⫽1 ⌸ ijk
(7)
m
ni
6
¥ i⫽1
¥ j⫽1
¥ k⫽1
共y ij ⫺ ␮ k 兲 2 ⌸ ijk
m
¥ i⫽1
ni
(8)
␮ˆ k ⫽
␴ˆ 2 ⫽
H 0 : a ⫽ d ⫽ b1 ⫽ b2 ⫽ 0
H1: at least one of the effects is not equal to zero.
Iterations are repeated between Eqs. 6–8 until convergence.
The values at convergence are the MLEs. With the MLEs of ␮k
values, the MLEs of the overall mean, the additive effect, and
within- and between-generation dominant effects of the QTL,
as indicated in Table 1, can be obtained by solving a system of
regular equations. It should be pointed out that the separation
of within-generation from between-generation dominant effects for the endosperm has two difficulties. First, the endosperm model is overparameterized because six unknown
dominant parameters (at the left side of Eq. 2) are contained
within the estimated genotypic means of four joint maternaloffspring heterozygotes (␮2, . . ., ␮5; Table 1). Second, ␮3 and
␮4 are indistinguishable because the conditional probabilities
of the corresponding QTL genotypes QQ(t)QQq(t ⫹ 1) and
QQ(t)Qqq(t ⫹ 1) given the marker genotypes are identical
(results not shown).
The estimation of the QTL position can be obtained using a
grid approach. This approach views r␩1 or r␩2 as a known
parameter in the likelihood function (4) by scanning the QTL
over all marker intervals. The position corresponding to the
maximum of the log-likelihood ratio across a linkage group is
the MLE of the QTL position.
Physiol Genomics • VOL
19 •
(9)
The test statistics for testing the above hypotheses are calculated as the log-likelihood ratio of the full model (H1) over the
reduce model (H0),
LR ⫽ ⫺ 2log 关l0共a ⫽ d ⫽ b1 ⫽ b2 ⫽ 0, ␮˜ , ␴˜ 2兲
We have formulated a procedure for implementing the EM
algorithm to obtain the maximum likelihood estimates (MLEs)
of the unknown parameters including the QTL effects and
residual variance (␮k, ␴2) and the QTL position (r␩1) contained
within ␲ijk (table in Fig. 1). The EM algorithm is described as
follows.
In the E step, the conditional probabilities (priors) of the
QTL genotypes given the marker genotypes and the normal
distribution function are used to calculate
⌸ ijk ⫽
再
⫺ l1共␮ˆ , â, d̂, b̂1, b̂2,␴ˆ 2兲兴
(10)
where the tilde (⬃) and the carat (ˆ) symbols denote the MLEs
of the unknown parameters under H0 and H1, respectively. The
log-likelihood ratio (LR) is asymptotically ␹2 distributed with
4 degrees of freedom. However, the critical threshold value for
declaring the existence of a QTL is generally calculated on the
basis of permutation tests (6).
After a significant QTL is found, any specific components of
the genotypic values can be tested. For example, the maternaloffspring intra-locus interaction effect on the embryo trait can
be tested by formulating the following hypotheses,
再
H0: b1 ⫽ b2 ⫽ 0
(11)
H1: at least one of the effects is not equal to zero
whose log-likelihood ratio test statistics is asymptotically ␹2
distributed with 2 degrees of freedom. Testing b1 ⫽ 0 and b2 ⫽
0 is equivalent to testing ␮1 ⫺ ␮5 ⫽ 2(␮2 ⫺ ␮4) and ␮1 ⫹
␮5 ⫽ 2(␮2 ⫹ ␮4 ⫺ ␮3), respectively.
Similar hypotheses can also be formulated to test whether
there is a QTL affecting a endosperm trait and whether there is
a significant intra-locus interaction between the maternal heterozygote and the offspring homozygote (b1; Table 1). The
latter hypothesis test can be performed under constraint 5(␮2 ⫺
␮5) ⫽ 3(␮1 ⫺ ␮6). One can also test whether one or both of the
sums d1 ⫹ b2 and d2 ⫹ b3 (Table 1) are significantly different
from zero. But the separation of d1 and b2 or d2 and b3 is not
possible unless some particular constraints are used. The critical thresholds for all these hypotheses mentioned above can be
obtained by simulation studies.
RESULTS
Monte Carlo simulation. We performed a series of simulation experiments to examine the statistical properties of the
method proposed to map seed development. A linkage group
length of 180 cM, comprising 10 equidistant markers ordered
M1, . . ., M10, is simulated for an F2 population. We hypothesize a QTL affecting an embryo trait located at 5 cM from the
left marker of the third interval or at 45 cM from the first
marker of the linkage group. As a result of the nature of our
approach, we simulate two-stage hierarchical marker genotypes for the F2 individuals (in generation t) and their autogamous progeny (in generation t ⫹ 1). The autogamous embryos derived from the F2 are affected by five joint maternaloffspring QTL genotypes, QQ(t)QQ(t ⫹ 1), Qq(t)QQ(t ⫹ 1),
Qq(t)Qq(t ⫹ 1), Qq(t)qq(t ⫹ 1), and qq(t)qq(t ⫹ 1), with the
frequencies of 1/4, 1/8, 1/4, 1/8, and 1/4, respectively. The
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
冋
L
Hypothesis tests. A number of hypothesis tests can be
formulated for our seed model proposed above. The first
hypothesis test considers the existence of any QTL affecting
the expression of an embryo or endosperm trait. For the
embryo model, for example, we have the hypotheses,
266
MAPPING SEED DEVELOPMENT
Table 2. Averaged MLEs of the QTL position and effects
among 100 replicated simulations under different heritability
(H2) levels and sampling strategies (family
number ⫻ family size)
H 2 ⫽ 0.2 (␴2 ⫽ 14.42)
H 2 ⫽ 0.6 (␴2 ⫽ 2.40)
True
Parameter
10 ⫻ 40
40 ⫻ 10
10 ⫻ 40
40 ⫻ 10
Position
␮⫽ 14
a⫽1.0
d⫽0.8
b1⫽0.7
b2⫽0.7
␴2
44.92 (2.8920)
13.84 (0.8735)
1.04 (0.8590)
0.97 (1.0035)
0.59 (0.7714)
0.47 (1.5725)
14.11 (1.2995)
44.70 (2.3866)
14.04 (0.4765)
0.99 (0.4746)
0.84 (0.6783)
0.69 (0.5475)
0.55 (1.3159)
14.25 (1.1760)
44.76 (1.0442)
14.06 (0.6933)
0.94 (0.6932)
0.76 (0.7608)
0.74 (0.3918)
0.70 (0.8587)
2.38 (0.2175)
45.02 (1.0445)
13.99 (0.1672)
1.00 (0.1672)
0.83 (0.2862)
0.66 (0.2442)
0.65 (0.5519)
2.38 (0.1889)
Values are maximum likelihood estimates. (MLEs); the numbers in the
parentheses are the square roots of the mean square errors of the MLEs.
Physiol Genomics • VOL
19 •
Fig. 2. The profile of the log-likelihood ratio (LR) test statistics calculated as
a function of genome position for a simulated linkage group for sampling
strategy 10 ⫻ 40 under different heritability levels: 0.2 (broken curve) and 0.6
(solid curve). The horizontal line refers to the critical threshold value at
significance level ␣ ⫽ 0.05. The vertical solid line refers to the genomic
position at which the LR has the maximum value. The true position of the QTL
on the linkage group is indicated by a triangle.
log-likelihood ratio values over 1,000 simulation replicates can
be approximated by a ␹2 distribution. The 99th percentiles of
the distribution of the maximum are used as empirical critical
values to declare the existence of a QTL on the linkage groups
at the significance level ␣ ⫽ 0.01.
Figures 2 and 3 illustrate the profiles of the log-likelihood
ratio test statistics across the simulated linkage group under
different sampling strategies and heritability levels. In all the
situations, QTL can be detected given that the peaks of the
profiles are greater than the critical threshold. But an increase
of heritability from 0.2 to 0.6 can increase the power to detect
QTL. It appears that the two sampling strategies provide
similar power and accuracy to detect the QTL position (Figs. 2
and 3). In general, our model can provide reliable estimates of
the QTL effects including the additive, within-generation dominant, and between-generation dominant (Table 2). As expected, the additive effect and residual variance can be better
estimated than the dominant effects. The within-generation
dominant effects (b1) can be more precisely estimated than the
between-generation dominant effects (b2).
The MLEs of dominant effects have large sampling errors
estimated from 100 simulation replicates, but the sampling
errors can be reduced when an effective measure is taken to
increase the heritability level or when sampling strategy 40 ⫻
10 is used (Table 2). The increase of heritability from 0.2 to 0.6
markedly reduces the sampling errors of dominant effect estimation. We perform a hypothesis test for the significance of
between-generation dominant effects based on the hypothesis
described by Eq. 11. Given the data set simulated under the
condition as given in Table 2, we estimate the log-likelihood
ratios under the hypotheses by Eq. 11 and reject the null
hypothesis b1 ⫽ b2 ⫽ 0 in all 100 simulations for different
heritabilities and sampling strategies. This suggests that our
model has great power to detect maternal-offspring dominant
effects on an embryo trait.
Similar simulation designs were also made to study the
statistical properties of the endosperm model. Because the
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
genetic variance due to this QTL is calculated using assumed
genetic effect values (Table 2). Given a sample size of 400, our
simulation scenarios include few families (10) each with large
size (40) or many families (40) each with small size (10).
These two different sampling strategies (10 ⫻ 40 and 40 ⫻ 10)
are combined with different levels of heritabilities (H2 ⫽ 0.2
vs. 0.6).
Our simulation for the marker genotypes includes a twostage hierarchy. The upper level in the hierarchy is the F2
genotype, whereas the lower level in the hierarchy is the
autogamous embryos from the F2. Let us consider the first
marker that has three genotypes M1M1(t), M1m1(t), and
m1m1(t), with a probability of 1/4, 1/2, and 1/4, at the upper
level. Genotype M1M1(t) is self-pollinated to produce a single
genotype M1M1(t ⫹ 1); genotype M1m1(t) to produce three
genotypes M1M1(t ⫹ 1) (1/4), M1m1(t ⫹ 1) (1/2), and
m1m1(t ⫹ 1) (1/4); and m1m1(t) to produce a single genotype
m1m1(t ⫹ 1) at the lower level. Similarly, the second marker
also has three genotypes at the higher level, each of which is
self-pollinated to produce the corresponding genotypes at the
lower level. At the higher level, three genotypes for the first
marker are combined with three genotypes for the second
marker to form nine 2-locus genotypes, each with a probability
being a function of the recombination fraction between these
two markers. Meanwhile, considering the difference of genotypes at the higher level, each marker should have five joint
maternal-offspring genotypes, and thus, a pair of markers
produces 25 such joint genotypes at the lower level. The
probability of each of these joint genotypes depends on the
probability of the corresponding two-locus genotype at the
higher level and the Mendelian segregation ratios of a heterozygote, if any. This simulation strategy is extended to consider
all markers. We use the Kosambi map function to convert the
map distance into the recombination fraction.
The declaration for the existence of QTL is based on a
critical threshold for the log-likelihood ratio test statistic that
controls the chromosome-wide type I error rate. The characterization of the threshold for declaring the existence of a QTL
is a difficult issue. The simulation test is regarded as a useful
approach for calculating the threshold, because it is not dependent on the distribution of the test statistic. We simulate the
marker genotype data and the phenotype data under the null
hypothesis that there is no QTL. The simulated data are
analyzed by the proposed model. The distribution of the
MAPPING SEED DEVELOPMENT
endosperm model is overparameterized and because the genotypic means of two heterozygote QTL genotypes are indistinguishable, we cannot estimate all within-generation and between-generation dominant effects defined in Eq. 2 and Table
1. However, it is possible to make hypothesis tests for some of
the dominant effects or the sum of them.
In practice, we may simplify the endosperm model and
reduce the number of dominant effect parameters involved. For
example, by letting d1 ⫹ b2 ⫽ e1 and d2 ⫹ b3 ⫽ e2 so that the
number of the unknown parameters equal the number of
equations, we can make the model more tractable. However,
because the estimated genotypic values of joint maternaloffspring genotypes Qq(t)QQq(t ⫹ 1) and Qq(t)Qqq(t ⫹ 1) are
unidentifiable, e1 and e2 can still not be uniquely estimated. We
simulated the endosperm data assuming that e1 and e2 are
distinguishable. Results from the simulation suggest that the
estimation precision of the endosperm parameters (Eq. 2 and
Table 1) is broadly consistent with that of the embryo model.
We performed an additional simulation to test the sensitivity
of our model to false positives. A data set for two-stage
maternal-offspring marker genotypes and offspring phenotypes
was simulated under the assumption that there is no maternal
effect. This simulated data set was then analyzed with our (full)
model incorporating the maternal-offspring interaction effects
and a (reduced) model with no such effects. We did not find
significant maternal and maternal-offspring interaction effects
by the full model, although both the full and reduced models
can detect offspring QTL (results not shown).
A case study. We use an example of maize for two endosperm traits, elongation factor 1␣ (eEF1A) and free amino
acid (FAA) contents, to demonstrate the power of our statistical approach. An F2 population of 106 plants was derived from
a cross between two contrasting maize inbred lines, Oh51Ao2
(high eEF1A and low FAA content) and Oh545o2 (low eEF1A
and high FAA content). The F2 and F2:3 progeny from this
Physiol Genomics • VOL
19 •
cross were prepared for genotypic and phenotypic analysis as
previously described by Wang and Larkins (22). DNA was
extracted from young leaves of the F2 plants, whereas grain
protein quality traits were measured from the F3 kernels of the
F2, as described in Wang and Larkins (22) and Wang et al.
(23). Simple sequence repeat (SSR) primers were selected
from the Maize Microsatellite-RFLP consensus map. The
primer sequences were described in the Maize Genome Database (22). A linkage map of 83 SSR markers of the F2 plants
was constructed, based on the known order of SSR markers on
maize chromosomes.
Our proposed autogamous model is used in allogamous
maize, because the F2:3 on which the endosperm traits were
measured were derived from artificial self-pollination. Given
the structure of the data used in this example, we modified our
model to map joint maternal-offspring QTL effects on endosperm traits based on the marker genotypes only derived
from the F2 (maternal) plants (29). We recognize that this
one-stage genotyping scheme provides limited information to
separate the estimated genotypic means of the six 2-generation
QTL genotypes (see Table 1). But this data set is still useful for
providing a test for the existence of a maternal-offspring QTL
affecting the endosperm trait using the hypothesis described by
Eq. 10. All the 10 chromosomes were scanned for the existence
of QTL for two endosperm traits. We successfully detect a
QTL for eEF1A on chromosome 6, as indicated by LR of 31.2
greater than the genome-wide threshold of 22.4 at the significance level ␣ ⫽ 0.001. The estimation position of QTL is 28
cM from the first marker of chromosome 6 (Fig. 4). The
threshold for claiming the detection of a QTL was calculated
on the basis of the 99.9th percentile of the LR distribution from
1,000 permutation tests.
DISCUSSION
We have proposed a new statistical model for testing and
estimating the joint effects of maternal and offspring genomes
Fig. 4. The profile of the log-likelihood-ratio test statistics calculated as a
function of genome position for eEF1A on the chromosome 6 (Wang and
Larkins, Ref. 22). The horizontal solid line refers to the critical threshold value
at significance level ␣ ⫽ 0.05. The vertical solid line refers to the true position
of a QTL on the linkage group. The vertical dot lines refer to the marker
position.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
Fig. 3. The profile of the LR test statistics calculated as a function of genome
position for a simulated linkage group for sampling strategy 40 ⫻ 10 under
different heritability levels: 0.2 (broken curve) and 0.6 (solid curve). The
horizontal line refers to the critical threshold value at significance level ␣ ⫽
0.05. The vertical broken and solid lines refer to the genomic positions at
which the LR has the maximum value for the two different heritabilities, 0.2
and 0.4, respectively. The true position of the QTL on the linkage group is
indicated by a triangle.
267
268
MAPPING SEED DEVELOPMENT
Physiol Genomics • VOL
19 •
pears for allogamous and mixed-pollinated systems in which
all QTL genotypes can be uniquely determined by marker
genotypes.
Understanding the maternal and paternal genetic regulation
of seed development helps to answer many fundamental evolutionary questions in higher plants. There is a straightforward
application of our model to evolutionary genetic studies, but
this will need to consider the patterns of gene segregation and
transmission in natural plant populations. Additional parameters characterizing population structure and organization, such
as allele frequencies, linkage disequilibrium, and haplotype
frequencies (16), should be incorporated into our seed development mapping model. In addition, a considerable body of
literature has suggested different roles of the paternal and
maternal loci in seed development, a phenomenon called “parent-of-origin effects” (18, 19). Our model provides a fundamental platform for detecting and characterizing these socalled imprinting genes whose expression depends on the
origin of parents. In sum, the model framework presented in
this article will make us closer to unravel the genetic basis of
embryogenesis and seed development in higher plants.
ACKNOWLEDGMENTS
The publication of this manuscript is approved as journal series R-10069 by
the Florida Agricultural Experiment Station.
GRANTS
This work is partially supported by an Outstanding Young Investigators
Award of the National Science Foundation of China (30128017), a University
of Florida Research Opportunity Fund (02050259), and a University of South
Florida Biodefense Grant (7222061-12) to R. Wu.
REFERENCES
1. Agrawal AF, Brodie ED III, and Brown J. Parent-offspring coadaptation and the dual genetic control of maternal care. Science 292: 1710–
1712, 2001.
2. Bazzaz FA. Plant biology in the future. Proc Natl Acad Sci USA 98:
5441–5445, 2001.
3. Berger F. Endosperm development. Curr Opin Plant Biol 2: 28–32, 1999.
4. Chaudhury AM, Koltunow A, Payne T, Luo M, Tucker MR, Dennis
ES, and Peacock WJ. Control of early seed development. Annu Rev Cell
Dev Biol 17: 677–699, 2001.
5. Chaudhury AM, Ming L, Miller C, Craig S, Dennis ES, and Peacock
WJ. Fertilization-independent seed development in Arabidopsis thaliana.
Proc Natl Acad Sci USA 94: 4223–4228, 1997.
6. Churchill GA and Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971, 1994.
7. Colombo L, Franken J, VanderKrol AR, Wittich PE, Dons HJM, and
Angenent GC. Downregulation of ovule-specific MADS box genes from
petunia results in maternally controlled defects in seed development. Plant
Cell 9: 703–715, 1997.
8. Dempster AP, Laird NM, and Rubin DB. Maximum likelihood from
incomplete data via EM algorithm. J R Stat Soc Ser B 39: 1–38, 1997.
9. Dilkes BP, Dante RA, Coelho C, and Larkins BA. Genetic analyses of
endoreduplication in Zea mays endosperm: evidence of sporophytic and
zygotic maternal control. Genetics 160: 1163–1177, 2002.
10. Evans MMS and Kermicle JL. Interaction between maternal effect and
zygotic effect mutations during maize seed development. Genetics 159:
303–315, 2001.
11. Friedman WE. Double fertilization in Ephedra, a nonflowering seed
plant: its bearing on the origin of angiosperms. Science 247: 951–954,
1990.
12. Friedman WE. The evolution of double fertilization and endosperm: an
“historical” perspective. Sex Plant Reprod 11: 6–16, 1998.
13. Hager R and Johnstone RA. The genetic basis of family conflict
resolution in mice. Nature 421: 533–535, 2003.
14. Hulskamp M, Schneitz K, and Pruitt RE. Genetic evidence for a long
range activity that directs pollen tube guidance in Arabidopsis. Plant Cell
7: 57–64, 1995.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
on quantitative traits expressed during seed development. Maternal inheritance has long been thought to affect animal traits
(1, 13), but based on recent observations (9), maternal effects
may be of greater importance in the seed formation of flowering plants than originally appreciated. The formation of seed
in flowering plants results from double fertilization (11), in
which one of the two sperm cells from a pollen tube fertilizes
the haploid egg cell to form a diploid zygote (embryo) and the
other sperm cell fertilizes the diploid central cell and fuses with
the central cell (polar) nuclei, thus giving rise to the triploid
endosperm. Both the embryo and endosperm affect seed size
and seed quality, including oil, protein, and carbohydrate
contents. An understanding of how these traits are genetically
determined through the integration of the formation mechanisms of the embryo and endosperm is of paramount importance to increase grain production and quality (2).
Wu et al. (29) have derived a maximum likelihood-based
one-QTL model for mapping endosperm-specific traits in autogamous plants. This endosperm model has proven to be more
powerful to detect significant QTL than can usual diploidmodels. But the model of Wu et al. did not take into account
the effects due to the interaction between the maternal genome
and offspring. A considerable body of literature supports the
view that the maternal and offspring genomes interact to
determine the developmental processes of seeds, including
those in both the embryo and the endosperm. In a recent study
in Arabidopsis, the paternal genes were found not to be
expressed during early stages of seed development (21). But
this finding is not supported by other studies in the same plant
species or different species (24). Our model can be used to test
for the maternal-offspring interaction effect of a QTL on seed
development and separate the effects due to the maternal
genome from the effect due to the offspring genome. This
separation can be made through performing the hypothesis
testing of Eq. 11. Thus the approach proposed here can be used
to examine how and when the paternal genome exerts effects
on seed development and, ultimately, address the above-mentioned fundamentally important debate arising from Arabidopsis genetic research.
Studies of joint effects of maternal and offspring genomes
on offspring traits have received considerable attention in
animals (25–27), but there is surprisingly a paucity in this kind
of study for plants. Our proposed model will provide a powerful tool for mapping specific genetic loci that trigger joint
maternal-offspring effects in plants. Our model is based on an
F2 population for an autogamous plant system. It is not difficult
to extend the model to other reproductive systems, such as
allogamous, mixed-pollinated, and other mapping populations.
For an autogamous plant, the eggs and two polar nuclei cells
are self-fertilized so that the frequencies of male gamete
genotypes are identical to those of female gamete genotypes.
But in an allogamous plant, such as maize, each female gamete
from each mother plant will be pollinated by all possible male
gametes from the pollen pool. This difference should be
considered when the current model is used to study the genetics
of the allogamous seed development. As mentioned above, the
autogamous system has an inherited limitation to separate
genotypic means of two joint maternal-offspring QTL heterozygotes for the endosperm because the conditional probabilities of these two QTL heterozygotes given two-stage hierarchical marker genotypes are identical. This problem disap-
MAPPING SEED DEVELOPMENT
Physiol Genomics • VOL
19 •
23. Wang XL, Woo YM, Kim CS, and Larkins BA. Quantitative trait locus
mapping of loci influencing elongation factor 1 alpha content in maize
endosperm. Plant Physiol 125: 1271–1282, 2001.
24. Weijers D, Geldner N, Offringa R, and Jorgens G. Early paternal gene
activity in Arabidopsis. Nature 414: 709–710, 2001.
25. Wolf JB. Gene interactions from maternal effects. Evolution 54: 1882–
1898, 2000.
26. Wolf JB. Genetic architecture and evolutionary constraint when the
environment contains genes. Proc Natl Acad Sci USA 100: 4655–4660,
2003.
27. Wolf JB, Vaughn TT, Pletscher LS, and Cheverud JM. Contribution of
maternal effect QTL to genetic architecture of early growth in mice.
Heredity 89: 300–310, 2002.
28. Wu RL. Mapping quantitative trait loci by genotyping haploid tissues.
Genetics 52: 1741–1752, 1999.
29. Wu RL, Ma CX, Gallo-Meagher M, Littell RC, and Casella G.
Statistical methods for dissecting triploid endosperm traits using molecular markers: An autogamous model. Genetics 162: 875–892, 2002.
30. Wu RL, Lou XY, Ma CX, Wang X, Larkins BA, and Casella G. An
improved mapping model generates high-resolution mapping of protein
quality traits in maize endosperm. Proc Natl Acad Sci USA 99: 11281–
11286, 2002.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 18, 2017
15. Lander ES and Botstein D. Mapping Mendelian factors underlying
quantitative traits using RFLP linkage maps. Genetics 121: 185–199, 1989.
16. Lou XY, Casella G, Littell RC, Yang MCK, Johnson JA, and Wu RL.
A haplotype-based algorithm for multilocus linkage disequilibrium mapping of quantitative trait loci with epistasis. Genetics 163: 1533–1548,
2003.
17. Mazur B, Krebbers E, and Tingey S. Gene discovery and product
development for grain quality traits. Science 285: 372–375, 1999.
18. Scott RJ, Spielman M, Bailey J, and Dickinson HG. Parent-of-origin
effects on seed development in Arabidopsis thaliana. Development 125:
3329–3341, 1998.
19. Spielman M, Vinkenoog R, Dickinson HG, and Scott RJ. The epigenetic basis of gender in flowering plants and mammals. Trends Genet 17:
705–711, 2001.
20. Springer PS, Holding DR, Groover A, Yordan C, Martienssen RA.
The essential Mcm7 protein PROLIFERA is localized to the nucleus of
dividing cells during the G(1) phase and is required maternally for early
Arabidopsis development. Development 127: 1815–1822, 2000.
21. Vielle-Calzada JP, Baskar R, and Grossniklaus U. Delayed activation
of the paternal genome during seed development. Nature 404: 91–94,
2000.
22. Wang XL and Larkins BA. Genetic analysis of amino acid accumulation
in opaque-2 maize endosperm. Plant Physiol 125: 1766–1777, 2001.
269