Download Traversing the conceptual divide between biological and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic testing wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Synthetic biology wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Twin study wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Heritability of IQ wikipedia , lookup

Behavioural genetics wikipedia , lookup

Medical genetics wikipedia , lookup

Genome-wide association study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Human genetic variation wikipedia , lookup

Designer baby wikipedia , lookup

Population genetics wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Public health genomics wikipedia , lookup

Epistasis wikipedia , lookup

Transcript
Problems and paradigms
Traversing the conceptual divide
between biological and statistical
epistasis: systems biology and a
more modern synthesis
Jason H. Moore1,2,3* and Scott M. Williams4
Summary
Epistasis plays an important role in the genetic architecture of common human diseases and can be viewed from
two perspectives, biological and statistical, each derived
from and leading to different assumptions and research
strategies. Biological epistasis is the result of physical
interactions among biomolecules within gene regulatory
networks and biochemical pathways in an individual
such that the effect of a gene on a phenotype is dependent
on one or more other genes. In contrast, statistical
epistasis is defined as deviation from additivity in a
mathematical model summarizing the relationship between multilocus genotypes and phenotypic variation in a
population. The goal of this essay is to review definitions
and examples of biological and statistical epistasis and to
explore the relationship between the two. Specifically, we
present and discuss the following two questions in the
context of human health and disease. First, when does
statistical evidence of epistasis in human populations
imply underlying biomolecular interactions in the etiology of disease? Second, when do biomolecular interactions produce patterns of statistical epistasis in human
populations?Answers to these two reciprocal questions
will provide an important framework for using genetic
information to improve our ability to diagnose, prevent
and treat common human diseases. We propose that
systems biology will provide the necessary information
for addressing these questions and that model systems
1
Computational Genetics Laboratory, Department of Genetics, Department of Community and Family Medicine, Norris Cotton Cancer
Center, Dartmouth Medical School, Lebanon, NH.
2
Department of Biological Sciences, Dartmouth College, Hanover, NH.
3
Department of Computer Science, University of New Hampshire,
Durham, NH.
4
Center for Human Genetics Research, Department of Medicine,
Department of Molecular Physiology and Biophysics, Vanderbilt
University Medical School, Nashville, TN.
Funding agency: This work was supported by National Institutes of
Health grants HL65234, AI59694, HD047447, and RR000095.
*Correspondence to: Jason H. Moore, 706 Rubin Building, HB 7937,
One Medical Center Drive, Dartmouth-Hitchcock Medical Center,
Lebanon, NH 03756. E-mail: [email protected]
DOI 10.1002/bies.20236
Published online in Wiley InterScience (www.interscience.wiley.com).
BioEssays 27:637–646, ß 2005 Wiley Periodicals, Inc.
such as bacteria, yeast and digital organisms will be a
useful place to start. BioEssays 27:637–646, 2005.
ß 2005 Wiley Periodicals, Inc.
Introduction
It has been argued that understanding genetic susceptibility to
common human diseases, such as cardiovascular disease will
require a research strategy that acknowledges and confronts
head-on the complexity of the relationship between genotype
and phenotype.(1) Part of this complexity can be attributed to
epistasis or gene–gene interaction that occurs when the
phenotype for a genotype at one locus is dependent on genotypes at one or more other loci. Indeed, epistasis is arguably a
ubiquitous component of the genetic architecture of common
disease susceptibility.(2) Determining whether a particular
genetic variant is causative in humans is an elusive goal(3) and
the extent to which statistical evidence of epistasis from population studies is predictive of biological epistasis from experimental studies remains an open, yet important, question.
As biology evolves as a discipline, different aspects of
investigation have merged to bring us closer to understanding
the natural world. Genetics emerged in the 1800s with
Mendel’s work followed by the development of statistics by
Pearson, Fisher and others. It was the rediscovery of Mendel’s
work in the early 20th century that lead to the development of
modern genetics. Following this rediscovery, it was the merger
of genetics and statistics by Fisher, Wright and others that gave
rise to population genetics as a discipline. Prior to the merger,
epistasis had been described as a genetic phenomenon by
Bateson(4) and later as a statistical phenomenon by Fisher.(5)
Population genetics and the ability to measure DNA sequence
variations made it possible for biological and statistical
concepts of epistasis to converge in studies of model
organisms where statistical analyses can often be used to
accurately infer the presence and role of gene–gene interactions in determining a particular phenotype. However, the
ultimate goal is to understand how DNA sequence variations
influence phenotypes in an individual through a hierarchy of
biochemical, metabolic and physiological systems that are
driven by biomolecular interactions (i.e. biological epistasis).
While it is not possible to make the connection between
BioEssays 27.6
637
Problems and paradigms
biological and statistical epistasis in humans at this time, it is
important for understanding the genetic basis of many common diseases and we argue that it will be possible in the future.
This is the promise of what has been called systems biology,
the simultaneous study of all the biological pieces together in
the context of ecology and evolution to reveal etiology.
Epistasis has been defined primarily as either biological or
statistical. Biological epistasis is the result of physical interactions among biomolecules within gene regulatory networks
and biochemical pathways in an individual such that the effect
of a gene on a phenotype is dependent on one or more other
genes. In contrast, statistical epistasis is defined as the deviation from additivity in a mathematical model, where the relationship between multilocus genotypes and phenotypic
variation in a population is not predictable based solely on
the actions of the genes considered singly. The goal of this
essay is to review definitions and examples of biological and
statistical epistasis and to explore the relationship between
them. Specifically, we present and discuss the following two
questions in the context of human health and disease. First,
when does statistical evidence of epistasis in human populations imply underlying biomolecular interactions in the etiology
of disease? Second, when do biomolecular interactions produce patterns of statistical epistasis in human populations?
Answers to these two reciprocal questions will provide an
important framework for using genetic information to improve
our ability to diagnose, prevent and treat common human
diseases. We propose that systems biology will provide the
necessary information for addressing these questions and
that studies of model systems, such as bacteria, yeast and
digital organisms will be important in defining the kinds of
interactions that may be expected in human phenotypic
variation.
Defining epistasis
The concept of epistasis has been around for at least 100
years and was recognized as an explanation for deviations
from simple Mendelian ratios. William Bateson(4) has been
credited by Hollander(6) and more recently by Phillips(7) as the
first to use the term epistasis that literally translated means
‘standing upon’. A commonly used textbook definition of
epistasis is one gene masking the effects of another gene.(8,9)
A classic example of epistasis comes from studies of the shape
of seed capsules from crosses of a weedy plant, the
shepherd’s purse (Bursa bursa-pastoris) by Shull.(10) In this
study, crosses from doubly heterozygous plants yielded
Mendelian ratios of 15 triangular capsules to one oval capsule.
It is generally thought that there are two pathways that lead to
triangular shape, each with dominant alleles. Only when both
pathways are blocked by homozygous recessive alleles is an
oval-shaped seed capsule produced.
The most-compelling examples of epistasis in animals
come from studies of model organisms. For example,
638
BioEssays 27.6
Fedorowicz et al.(11) described epistatic interactions between
loci that impact olfactory behavior in Drosophila. This study
was based on crosses of 12 lines derived from a common
isogenic background that differed in the location of P-element
insertions, each of which has homozygous effects on olfactory
behavior. Eight of the 12 insertion variants defined an
interactive network of genes that significantly impact the
olfactory phenotype. A subsequent study that investigated
these same insertions, using transcriptional profiling,(12)
showed that a total of 530 genes were significantly coregulated
in response to one or more of these olfactory mutations. This
study clearly and not surprisingly documents epistasis in a
model organism. A logical next step will be to determine the
impact on olfactory phenotypes through the networks of
interacting proteins and other biomolecules (i.e. biological
epistasis).
Biological epistasis
At the heart of epistasis lie biomolecular interactions that drive
transcription, translation and signal transduction, and the
function of biochemical, metabolic and physiological systems.
Here, we define biological epistasis as physical interactions
among proteins or other molecules that impact phenotype.
This can occur at several levels from the interaction of transcription factors with each other and/or promoter sequence
variation to the non-linear interaction of enzymes through a
metabolic pathway. Biological epistasis has also been referred
to as physiological epistasis.(13) One of the best known
examples of biological epistasis in the genetics literature is
sickle cell disease. Individuals with sickle cell disease have bglobin molecules that have a neutral instead of a polar amino
acid on the outer surface. This neutral amino acid leads to an
increase in intermolecular adhesion (i.e. physical interaction)
leading to accumulation of deoxyhemoglobin, which in turn
deforms the red blood cells. Thus, healthy and sick individuals
differ with respect to their biomolecular interactions among bglobin proteins. Interestingly, epistasis also plays an important
role in sickle cell disease.(14–16) For example, Templeton(16)
reviewed work by Giblett(17) that describes how the protein
haptoglobin physically interacts with hemoglobin influencing
its excretion. Giblett(17) also noted that genetic variants in the
haptoglobin gene interact with the S allele of the hemoglobin
b-chain gene.
The ubiquity of protein–protein interactions in transcriptional regulation has allowed the development and use of the
yeast two-hybrid system for assessing such interactions.(18)
This approach allows a single protein to be used as bait
for other proteins that might physically interact with it. For
example, Bondos et al.(19) used the yeast two-hybrid assay to
identify proteins that physically interact with a developmental
Hox protein in Drosophila called Ultrabithorax (Ubx). Screens
of 0–12 hour embryo libraries identified the Disconnected
Interacting Protein 1 (DIP1) that was later confirmed to
Problems and paradigms
physically interact with Ubx by a variety of strategies, including
phage display, immunoprecipitation, pull-down assays and gel
retardation analysis. Interestingly, Ubx and DIP1 are coexpressed in the same embryonic tissues and are both localized
to the nucleus. Further, ectopic expression of DIP1 in wing and
haltere imaginal discs results in an abnormal developmental
phenotype in the form of small shriveled wings and excess
bristle numbers. Genetic studies with wild-type and mutant
Drosophila confirm the interaction.
The study by Bondos et al.(19) illustrates how powerful the
yeast two-hybrid system is for identifying proteins that interact
with a protein of interest. However, the two-hybrid system can
also be used on a very large scale to piece together protein–
protein interaction networks. For example, Rain et al.(20) use a
high-throughput version of the two-hybrid system to build a
large-scale map of protein–protein interaction in the human
gastric pathogen Helicobacter pylori. In this study, 261 proteins
used as bait revealed over 1200 interactions, connecting more
than 46% of all H. pylori proteins. Such assays have been very
successful in model organisms. However, similar studies in
humans are complicated by the fact that, among other things,
there are expected to be many more interactions than have
been observed for many model organisms. It has been
estimated that there are roughly 10,000–30,000 pairwise
interactions among yeast proteins and perhaps as many as
200,000 or more in humans.(21) Despite the daunting task, it is
clear that the detection and characterization of biological
epistasis is dramatically improving with technological advances in high-throughput assays and analysis.
Finally, it is important to note that there is a very close
relationship between biological epistasis and pleiotropy. A
common textbook definition of pleiotropy is one gene affecting
more than one trait or phenotype.(22) Gilbert(23) further defines
pleiotropy as mosaic if the phenotypes are independent or
relational if they are correlated. Hodgkin(24) provides an even
finer delineation of pleiotropic effects. As an example of
mosaic pleiotropy, consider the apolipoprotein E (ApoE)
polymorphism and its effects on human health. The ApoE
gene plays an important role in both cardiovascular disease
and Alzheimer disease even though these two diseases are
independent of one another. Where the lines are blurred is with
relational pleiotropy. Consider for example the study by Reilly
et al.(25) that examined the relationship between the ApoE
polymorphism and the correlation between pairs of nine
plasma lipid and apolipoprotein traits among 507 unrelated
human subjects. This study showed that the ApoE polymorphism had a significant effect on a large number of trait
correlations indicative of relational pleiotropy. Since many of
the apolipoproteins and lipids physically interact with one
another there is also biological epistasis in this system. Thus, it
is likely that ApoE genotypes are interacting with genotypes at
other loci through physical interactions of biomolecules in lipid
complexes. In fact, this may be one of many examples where
pleiotropy and epistasis are one and the same. The relationship between pleiotropy and epistasis will be important to
define but this is beyond the scope of this essay.
Statistical epistasis
Bateson’s(4) biological definition of epistasis is in contrast to
the concept of statistical epistasis or epistacy that was first
used by Fisher.(5) Fisher used the term to describe deviations
from additivity in a statistical model. It is common today to use
parametric statistical methods such as linear and logistic
regression (e.g. Cordell(26)) or nonparametric methods such
as combinatorial partitioning,(27–29) restricted partitioning,(30)
set association analysis,(31–35) genetic programming neural
networks(36,37) and multifactor dimensionality reduction(38–42)
to evaluate nonadditivity indicative of statistical epistasis.
Because statistical epistasis is difficult to model using parametric models that cannot be known a priori, nonparametric
data-mining methods based on computational strategies such
as machine learning(43) are gaining in popularity as it becomes
apparent that the study of complex traits requires alternative
research strategies.(44)
Statistical epistasis is perhaps best illustrated using
penetrance functions (P[DjG]) that model the probability (P)
of disease (D) given genotype (G). Table 1 illustrates a
penetrance function that models an interaction between two
single nucleotide polymorphisms (SNPs) in the absence of any
independent main effects for either SNP. This genetic model
specifies the probability of disease given a particular genotype
(on the margin) or combination of genotypes (in the table) and
their frequencies (in parentheses). This extreme model is
based in the nonlinear exclusive OR (XOR) function that
defines a genotype pattern that is not linearly separable.(45,46)
Here, individuals are at high-risk of disease (P ¼ 1) if they
inherit the Aa OR Bb genotypes but NOT both (i.e. the XOR
function). Otherwise, they are at low-risk (P ¼ 0). Here, the
marginal probabilities represent the disease prevalences
and are not different among single genotypes, indicating
that there is no main effect of each SNP. Thus, these SNPs
would never be detected using a single-locus analysis. It
Table 1. Penetrance values for combinations of
genotypes from two SNPs exhibiting interactions in
the absence of independent main effects
Table penetrance
BB (0.25)
Bb (0.50)
bb (0.25)
Margin penetrance
AA
(0.25)
Aa
(0.50)
aa
(0.25)
Margin
penetrance
0
1
0
0.5
1
0
1
0.5
0
1
0
0.5
0.5
0.5
0.5
BioEssays 27.6
639
Problems and paradigms
should be noted that even a relatively small change in allele
frequency at either locus may permit such a marginal effect to
be detected even though the model is epistatic.(16) Thus, the
statistical interpretation of whether or not the effect of a
genotype is dependent on one or more other genotypes
depends very much on genotype frequencies and the context
in which the genotype is found.(47) This differs substantially
from biological epistasis where the interaction is defined at the
level of the individual. Knowledge about statistical patterns of
epistasis could be the difference between successful and
unsuccessful gene mapping studies.(47,48) Using this argument, Wade(47) and others(49) have suggested that the lack of
replication observed for association studies(50) is a signature
of epistasis.
There are an ever-increasing number of examples of
statistical epistasis in complex disease research. This increase is tied to the growing number of analytical tools to detect
and characterize this phenomenon. For example, Zee et al.(51)
used set association analysis to identify a combination of
seven SNPs in seven genes that are significantly associated
with coronary artery restenosis following angioplasty in a
cohort of 342 cases and 437 controls. The multifactor
dimensionality reduction approach (MDR) has identified
significant evidence of epistasis in sporadic breast cancer,(38)
essential hypertension,(52) atrial fibrillation(53) and type II
diabetes.(54) In the study of atrial fibrillation,(53) the role of
renin-angiotensin system (RAS) gene polymorphisms in atrial
fibrillation (AF) was investigated using 250 patients with
documented nonfamilial structural AF and 250 controls
matched with regard to age, gender, presence of left ventricular dysfunction and presence of significant valvular heart
disease. A total of eight polymorphisms were measured in the
angiotensin converting enzyme (ACE), angiotensinogen
(AGT), and angiotensin II type I receptor (AT1) gene from
the RAS. Single-locus association analysis revealed significant results for three polymorphisms in the AGT gene. The
MDR approach was used to evaluate interactions among all
possible subsets of the eight polymorphisms. The best model
consisted of two polymorphisms from the AGT gene and a
single polymorphism from the ACE gene. This three-locus
model had a perfect cross validation consistency of 10 and a
prediction error of 37.26. Both were significant at the 0.001
level based on a 1000-fold permutation test. Fig. 1A illustrates
the distribution of cases and controls for each multilocus
genotype combination. Interestingly, only one of the polymorphisms in the MDR model had a significant main effect
reinforcing the importance of considering combinations of
polymorphisms. As would be predicted from a phenotype that
is affected by many interacting genes, documenting some
independent main effects only reveals part of the genetic
architecture, and can in fact reveal ‘incorrect’ information
regarding the underlying genetic model of disease.(47) As
suggested by Templeton,(16) statistical epistasis is often
640
BioEssays 27.6
detected when appropriate analytical methods are employed.
Further, if the underlying models are epistatic, failure to look for
them may lead to overly simplistic and misleading conclusions
about the etiology of disease.
When does statistical epistasis
imply biological epistasis?
The relationship between biological and statistical epistasis
can be difficult to comprehend. Fig. 2 provides a visual
summary of the two concepts. Genetic information impacts
phenotype through a hierarchy of proteins that are involved in
biological processes ranging from transcription to physiological homeostasis. It is the physical interactions among proteins
and other biomolecules and their impact on phenotype that
constitute biological epistasis. Additionally, the possibility
exists that molecules that do not interact directly can have
epistatic interactions if they impact the same phenotype via an
alternative path.
Differences in biological epistasis among individuals in a
population give rise to statistical epistasis. However, it is entirely possible for biological epistasis to occur in the absence of
statistical epistasis. This can happen when every individual
sampled from a population is the same with respect to their
DNA sequence variations and biomolecules. Thus, one can
argue that genetic and biological variation is sufficient for the
statistical detection of epistasis. However, does evidence of
statistical epistasis necessarily imply biological epistasis?
An important example of the difficulty in making inferences
about biological epistasis from statistical results is human type
I diabetes where consideration of epistasis helped identify a
two-locus interaction.(55,56) Interestingly, analysis of congenic
strains of nonobese diabetic (NOD) mice showed no evidence
of biological epistasis despite the significant evidence for
statistical epistasis in humans.(57) Cordell et al.(57) suggest that
our ability to infer biological mechanisms from statistical
results is limited. There are, however, counterexamples that
suggest biological inference from statistical models may be
possible. For example, Cox et al.(58) found statistical evidence
for an interaction between a locus on chromosome 2 and a
locus on chromosome 15 in families with type II diabetes. This
confirmed an earlier linkage result on chromosome 2(59) and
helped identify the calpain-10 gene by positional cloning. This
confirms the suggestion that investigations that use the
existence of epistasis can facilitate the mapping of complex
trait genes.(47) While statistical results in a variety of genetic
studies have been inconsistent, the calpain-10 gene clearly
has a functional role in glucose metabolism.(60) For example,
Johnson et al.(61) have shown experimentally that the calpain10 protein plays an important role in the apoptosis of pancreatic islets. It will be interesting to note in the coming years
the proportion of statistical results in human populations that
ultimately lead to new knowledge about biological function and
disease etiology.
Problems and paradigms
Figure 1. This figure illustrates the challenge of making inferences about biological
epistasis in a pathway (B) from statistical
epistasis in a population-based model (A).
A: The statistical model for a previously
reported multilocus association between polymorphisms in the AGT and ACE genes and
susceptibility to atrial fibrillation.(53) The distribution of cases (left bars) and controls (right
bars) are illustrated for each multilocus genotype combination. Dark-shaded cells are considered ‘high-risk’ for disease while lightshaded cells are considered ‘low-risk’. White
cells indicate no subjects with those genotype
combinations were observed in the dataset. B:
Summary of a computational model of the
renin–angiotensin system as adapted from
Takahashi and Smithies.(77,78) A more detailed
form of this model has been used to carry out
simulations of the changes in this pathway that
are involved in blood pressure regulation.(77)
Expansion of this model may facilitate computational thought experiments that can be used
to generate hypotheses about the relationship
between AGT, ACE and susceptibility to atrial
fibrillation.
Making inferences about biological function and causation
from any statistical result is a significant challenge. This is
particularly true in genetic studies of humans due to our
inability to conduct perturbation experiments that are possible
in model organisms.(62) This is further complicated by a lack of
concordance between findings in humans and those in model
organisms.(63) The difficulty in determining causation for
genetic results is outlined by Page et al.(3) Here, it is pointed
out that there are several explanations for an observed
statistical association between a DNA sequence variation
and a trait. First, the association could be a false-positive result
due to chance events. Second, variable sites within a gene
may be associated with alleles at the true causative site. Third,
the association is due to some systematic bias in the study
design or analysis. The final possibility is that the association is
real. Page et al.(3) suggest that proving causation will always
be a challenge due to our inability to randomly assign people
to genotypes as is possible with model organisms. However,
evidence in favor of an association can be significantly
strengthened through comprehensive efforts to address
sources of error and bias. Even in this case, we are still left
with only inferences about the nature of the underlying
biological epistasis architecture. Consider, for example, the
multilocus statistical model of atrial fibrillation susceptibility
depicted in Fig. 1A and described above. The AGT and ACE
genes in this model are part of the RAS pathway that is
summarized in Fig. 1B. While it is possible to speculate on the
role of these genes in atrial fibrillation,(53) inferring how
combinations of polymorphisms impact the biology of this
pathway and its relationship with human health and disease is
difficult. This is especially true without complete measurements of the pathway and all its factors.
Clearly, making inferences about biological function and
causation from any statistical result will always be a challenge if
the relevant biomolecular information has not been measured.
Thus, making the connection between statistical and biological
epistasis will only be feasible if the appropriate genetic,
genomic, proteomic and metabolic information is at hand.
Fortunately, we are in an era of rapid technological advancement with new tools for measuring the genome, transcriptome,
proteome, and metabolome surfacing each day. In the short
term, our best hope for making use of all this biological information is to study model organisms where there are fewer
biomolecules, the genetic background can be controlled and
BioEssays 27.6
641
Problems and paradigms
Figure 2. The conceptual relationship between biological and
statistical epistasis. Biological epistasis occurs at the level of the
individual and involves DNA sequence variations (vertical bars),
biomolecules (circle, square and triangle) and their physical
interactions (dashed lines) giving rise to a phenotype (star) at a
particular point in time and space (not shown). Statistical
epistasis is a population phenomenon that is made possible by
interindividual variability in genotypes, biomolecules and their
physical interactions.
the system can be perturbed experimentally. We propose that
studies of biological and statistical epistasis should be initially
focused on simple unicellular organisms such as bacteria and
yeast. Understanding the structure of interactions in unicellular
organisms will provide basic knowledge about epistasis that can
be used to guide studies of humans.
A role for unicellular organisms in
understanding the relationship between
biological and statistical epistasis
Not only is epistasis not unique to metazoans, it is also
generally easier to detect global examples of epistasis in
642
BioEssays 27.6
unicellular organism that are amenable to genetic manipulation. For example, Elena and Lenski(64) determined that
combinations of mutations in E. coli interact to influence
fitness. In their study, 225 different genetic strains of E. coli
were created using randomly inserted transposons with each
strain harboring one, two or three mutations. Relative fitness
was defined as the ratio of net growth rates of mutant and wildtype strains during competition for limited resources. This
study showed that both synergistic and antagonistic epistasis
was commonly observed in selected strains. A follow-up study
by Elena and Lenski(65) looking at similar mutations on several
different genetic backgrounds supports the idea that epistasis
is common.
In similar studies, Remold and Lenski(66) carried out a
series of experiments to evaluate the role of plasticity and
epistasis in determining fitness in E. coli. Eighteen random
insertion mutations were generated in E. coli in two different
resource environments and in five different genetic backgrounds. The fitness of each mutation in each context was
measured using competition assays. Interestingly, half of the
mutations had a significant effect on fitness that was dependent on genetic background. Some of these were also dependent on environmental context, suggesting plasticity in the
reaction norm. It is important to note that, in these relatively
simple systems, context-dependent genetic effects are the
norm. Extrapolating these kinds of findings to organisms with
more structural complexity and potential interactions, such as
humans, would suggest that the importance of context would
be even greater.
Studies in yeast support the hypothesis that epistasis is
ubiquitous. For example, Brem et al.(67) crossed a laboratory
strain of S. cerevisiae with a wild strain and then carried out a
genome-wide genetic linkage analysis of over 1500 differentially expressed genes. This study demonstrated that the
expression levels of 570 genes were linked to one or more
different loci with most exhibiting a complex mode of inheritance. Thus, gene expression in this ‘simple’ unicellular
organism, grown in controlled experimental conditions, has a
multifactorial etiology. These studies in bacteria and yeast are
important because they connect biological and statistical
epistasis in a meaningful way.
The findings of these studies raise the questions: what can
we learn about the relationship between biological and
statistical epistasis in simple organisms, and how can we
translate these results into our understanding of human
phenotypes? Statistical epistasis and biological epistasis can
be equated in these studies because experimental conditions
and background effects can to a large extent be controlled,
allowing simple statistical methods to tie the two together. It is
also true that the underlying biology is much easier to study in
unicellular organisms than in humans and other more complex
organisms. Therefore, it is not surprising that progress in
defining protein interaction networks has been much easier in
Problems and paradigms
unicellular organisms because there are likely to be fewer
interactions than in humans.(21) As the technology and
statistical methodology for measuring biological processes
matures, it will become increasingly easier to understand the
biology that is responsible for the functional connection
between biological epistasis and statistical epistasis observed
in population studies. We propose that making this connection
in an organism such as E. coli will shed substantial light on
similar processes in humans because it will produce knowledge about ‘rules’ that map statistical epistasis onto biology.
A role for digital organisms in
understanding the relationship between
biological and statistical epistasis
An alternative strategy that is being used to investigate
epistasis is digital biology. The goal of digital biology is to
generate data through computer experiments that can be
studied as if they were real data. This is important because a
digital experiment provides all the relevant information thus
making it feasible to carry out a completely informed analysis
of the relationship between biological and statistical epistasis.
For example, Lenski et al.(68) evolved artificial populations of
organisms each consisting of computer programs of varying
complexity. Simple computer programs were established with
the goal of rapid replication while the goal of more-complex
computer programs was to perform mathematical calculations
that increase replication in response to certain metabolic
rewards. These populations of simple and complex digital
organisms were subjected to millions of single and multiple
mutations and the resulting impact on fitness was measured.
The results of this large-scale digital experiment and others(69)
indicate that the complex organismal phenotypes were robust
to single mutations and that gene–gene interactions were
ubiquitous. These studies demonstrate that complex interactions evolve under relatively simple rules. They also suggest
that digital organisms may be useful in modeling and therefore
understanding epistasis by answering specific critical questions. For example, when can we conclude that biological
epistasis underlies statistical epistasis? When does observed
biological epistasis result in nonadditive statistical patterns?
These types of questions can be directly addressed using
digital organisms in a manner that is very difficult or even
impossible with real organisms no matter how ‘simple’.
A slightly different approach is to carry out thought experiments about the nature of biochemical and physiological systems that are consistent with a particular statistical model
relating DNA sequence variations to interindividual phenotypic
differences. Opaque thought experiments represent a happy
medium between the extreme position of digital biology
experiments representing real biological data and the extreme
position of them being completely worthless.(70) Thought
experiments have enormous utility in the study of complex
biological systems because they generate ideas that can be
used to construct explicit hypotheses that can then be tested in
model organisms. To this end, White and Moore(71) have
developed a computational approach for generating agentbased artificial life models that are consistent with a statistical
model defined by a penetrance function. Digital organisms or
agents in these simulations move and collide with each other
on a grid of fixed size. Agents begin in a random spatial
configuration with predefined move and collision behaviors
and end in a final state after a specified number of time steps.
Some agents move in an invariant manner, while others are
dependent on global conditions (e.g. genotype) specified at
the beginning of the simulation. Agent interaction (therefore
communication) happens via collisions and is thus analogous
to physical interactions among biomolecules in biological
epistasis. This approach and others(72) may be a useful
starting point for those hoping to carry out thought experiments about the role of biochemical and physiological systems
in the mapping between genotype and phenotype, although
the utility of these methods have yet to be fully tested.
What can be learned from these computational modeling
exercises? The studies by Lenski et al.(68,69) demonstrate that
epistasis is a ubiquitous property of digital organisms and
contributes to their robustness. These results are similar to
those observed for bacterial cells(64–66) and yeast,(67) reinforcing the usefulness of the digital experiments. The ability to
generate ‘experimental’ data in a computer will make it
possible to address questions about the nature of biological
and statistical epistasis in a completely controlled environment. Showing that digital organisms mimic real simple
organism data will reinforce the usefulness in this approach.
The study by White and Moore(71) provides a medium for
generating hypotheses that can then be tested experimentally
in real biological systems. Both approaches provide a starting
point for the study of epistasis that is not currently possible
using model organisms or humans. We anticipate that knowledge gained from the study of both digital and unicellular
biological organisms will play an important role in advancing
our understanding of epistasis as it relates to human health
and disease. Of course, the limitations of these approaches
are that we can only detect what we explicitly model. However,
this does not differ from laboratory-based research where we
only find what is examined.
Systems biology and a more modern synthesis
One of the greatest contributions to our understanding of
biological organisms was the merger of Darwin’s evolution
of species by natural selection and Mendel’s principles of
heredity. This merger was referred to as ‘the modern synthesis’
by Huxley(73) and others and paved the way for evolutionary
and population genetics as we know it today. We are undergoing a more modern synthesis that merges multiple disciplines into what has been referred to as systems biology.(74)
One goal of systems biology is to efficiently, accurately and
BioEssays 27.6
643
Problems and paradigms
inexpensively measure most, if not all, of the biomolecules
involved in one or more biochemical or physiological systems.
Only after all the relevant information is at hand will it be
possible to mathematically model biomolecules with respect to
interindividual phenotypic differences.
A recent study by Segrè et al.(75) gives us a taste of what
systems-level genetics has to offer. In this study, epistatic
effects on growth phenotypes were estimated from all single
and double knockouts of 890 metabolic genes in S. cerevisiae.
This study not only demonstrated widespread epistasis but
effectively documented the directional effects of gene pairs. It
further showed that metabolic systems could be organized into
functional modules defined by their epistastic interactions.
Combining the genetic and phenotypic measures from this
study with a complete profile of measures from the transcriptome, proteome and metabolome will provide a much more
accurate understanding of how epistasis maps onto phenotypic variation.(76)
The realization of this ‘more’ modern synthesis is still
distant but we are now in a position to begin developing it using
unicellular organisms, digital organisms and computational
thought experiments. In the near future, we may be in a
position to realize the promise of systems biology in unicellular
organisms such as bacteria or yeast that are less complex than
multicellular organisms such as humans. Our ability to do this
in humans will propel us, for the first time, into an era of making
significant progress toward understanding disease etiology
and, ultimately, personalized medicine. We anticipate that the
study of both digital organisms and simple unicellular organisms using systems biology approaches will help answer some
of the questions raised in this paper about the nature of
biological and statistical epistasis. The promise of this strategy
is that it will reveal some basic principles that will be applicable
to the study of common human diseases.
Previous work that has incorporated systems level measurements such as the studies cited above have clearly
demonstrated that organisms do not operate as collections of
unrelated gene products but that the development of phenotypes is the product of the action of many genes that interact in
so far unpredictable ways. This realization and the promise of
our improving ability to measure complex phenomena and to
model them suggests that we may in fact be entering a new
period of biological understanding that, although predicted by
others years ago, could not possibly be implemented in trying
to understand our own complex biology. We are still far from the
ultimate destination but at least we have a potential map for the
study of human health and disease that can help us find our
way.
Acknowledgments
Special thanks are extended to Marylyn D. Ritchie for her
thoughtful review and critique of the manuscript. Thanks are
also extended to two anonymous reviewers and Adam Wilkins
644
BioEssays 27.6
for their very thoughtful comments, criticisms, and suggestions that were taken to heart in preparing the final version of
the manuscript.
References
1. Sing CF, Stengard JH, Kardia SL. 2003. Genes, environment, and
cardiovascular disease. Arterioscler Thromb Vasc Biol 23:1190–1196.
2. Moore JH. 2003. The ubiquitous nature of epistasis in determining
susceptibility to common human diseases. Hum Hered 56:73–82.
3. Page GP, George V, Go RC, Page PZ, Allison DB. 2003. ‘Are we there
yet?’: Deciding when one has demonstrated specific genetic causation in
complex diseases and quantitative traits. Am J Hum Genet 73:711–719.
4. Bateson W. 1909. Mendel’s Principles of Heredity. Cambridge:
Cambridge University Press.
5. Fisher RA. 1918. The correlations between relatives on the supposition of
Mendelian inheritance. Trans R Soc Edinb 52:399–433.
6. Hollander WF. 1955. Epistasis and hypostasis. J Hered 46:222–225.
7. Phillips PC. 1998. The language of gene interaction. Genetics 149:1167–
1171.
8. Neel JV, Schull WJ. 1954. Human Heredity. Chicago: University of
Chicago Press.
9. Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM. 2000.
An Introduction to Genetic Analysis. New York: WH Freeman.
10. Shull GH. 1914. Duplicate genes for capsule form in BURSA bursa
Bastoris. J Ind Abst Vererb 12:97–149.
11. Fedorowicz GM, Fry JD, Anholt RR, Mackay TF. 1998. Epistatic interactions between smell-impaired loci in Drosophila melanogaster.
Genetics 148:1885–1891.
12. Anholt RR, Dilda CL, Chang S, Fanara JJ, Kulkarni NH, et al. 2003. The
genetic architecture of odor-guided behavior in Drosophila: epistasis
and the transcriptome. Nat Genet 35:180–184.
13. Cheverud JM, Routman EJ. 1995. Epistasis and its contribution to
genetic variance components. Genetics 139:1455–1461.
14. Weatherall DJ. 2001. Phenotype–genotype relationships in monogenic
disease: lessons from the thalassaemias. Nat Rev Genet 2:245–255.
15. Nagel RL. 2001. Pleiotropic and epistatic effects in sickle cell anemia.
Curr Opin Hematol 8:105–110.
16. Templeton AR. 2000. Epistasis and complex traits. In: Wolf J, Brodie B III,
Wade M, editors. Epistasis and the Evolutionary Process. New York:
Oxford University Press.
17. Giblett ER. 1969. Genetic Markers in Human Blood. Philadelphia: FA
Davis Co.
18. Fields S, Song O. 1989. A novel genetic system to detect protein-protein
interactions. Nature 340:245–246.
19. Bondos SE, Catanese DJ Jr, Tan XX, Bicknell A, Li L, et al. 2004. Hox
transcription factor ultrabithorax Ib physically and genetically interacts
with disconnected interacting protein 1, a double-stranded RNA-binding
protein. J Biol Chem 279:26433–26444.
20. Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, et al. 2001. The
protein-protein interaction map of Helicobacter pylori. Nature 409:211–
215.
21. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, et al. 2004. Protein
interaction networks from yeast to human. Curr Opin Struct Biol 14:292–
299.
22. Hartl D, Clark A. 1998. Principles of Population Genetics. Sunderland
MA: Sinauer Associates, Inc.
23. Gilbert SF. 2000. Developmental Biology. Sunderland MA: Sinauer
Associates, Inc.
24. Hodgkin J. 1998. Seven types of pleiotropy. Int J Dev Biol 42:501–505.
25. Reilly SL, Ferrell RE, Sing CF. 1994. The gender-specific apolipoprotein
E genotype influence on the distribution of plasma lipids and
apolipoproteins in the population of Rochester, MN. III. Correlations
and covariances. Am J Hum Genet 55:1001–1018.
26. Cordell HJ. 2002. Epistasis: what it means, what it doesn’t mean, and
statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468.
27. Nelson MR, Kardia SL, Ferrell RE, Sing CF. 2001. A combinatorial
partitioning method to identify multilocus genotypic partitions that predict
quantitative trait variation. Genome Res 11:458–470.
Problems and paradigms
28. Moore JH, Lamb JM, Brown NJ, Vaughan DE. 2002. A comparison of
combinatorial partitioning and linear regression for the detection of
epistatic effects of the ACE I/D and PAI-1 4G/5G polymorphisms on
plasma PAI-1 levels. Clin Genet 62:74–79.
29. Moore JH, Smolkin ME, Lamb JM, Brown NJ, Vaughan DE. 2002. The
relationship between plasma t-PA and PAI-1 levels is dependent on
epistatic effects of the ACE I/D and PAI-1 4G/5G polymorphisms. Clin
Genet 62:53–59.
30. Culverhouse R, Klein T, Shannon W. 2004. Detecting epistatic interactions contributing to quantitative traits. Genet Epidemiol 27:141–152.
31. Hoh J, Wille A, Ott J. 2001. Trimming, weighting, and grouping SNPs
in human case-control association studies. Genome Res 11:2115–
2119.
32. Ott J, Hoh J. 2003. Set association analysis of SNP case-control and
microarray data. J Comput Biol 10:569–574.
33. Wille A, Hoh J, Ott J. 2003. Sum statistics for the joint detection of
multiple disease loci in case-control association studies with SNP
markers. Genet Epidemiol 25:350–359.
34. Hoh J, Ott J. 2003. Mathematical multi-locus approaches to localizing
complex human trait genes. Nat Rev Genet 4:701–709.
35. Hoh J, Ott J. 2004. Genetic dissection of diseases: design and methods.
Curr Opin Genet Dev 14:229–232.
36. Ritchie MD, White BC, Parker JS, Hahn LW, Moore JH. 2003.
Optimization of neural network architecture using genetic programming
improves detection and modeling of gene-gene interactions in studies of
human diseases. BMC Bioinformatics 4:28.
37. Ritchie MD, Coffey CS, Moore JH. 2004. Genetic programming neural
networks as a bioinformatics tool in human genetics. In: Deb K, Poli R,
Banzhaf W, Beyer H-G, Burke E, Darwen P, Dasgupta D, Floreano D,
Foster J, Harman M, Holland O, Lanzi PL, Spector L, Tettamanzi
A, Thierens D, Tyrrell A, editors. Lecture Notes in Computer Science
3102, New York: Springer. pp 438–448.
38. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, et al. 2001.
Multifactor-dimensionality reduction reveals high-order interactions
among estrogen-metabolism genes in sporadic breast cancer. Am J
Hum Genet 69:138–147.
39. Ritchie MD, Hahn LW, Moore JH. 2003. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of
genotyping error, missing data, phenocopy, and genetic heterogeneity.
Genet Epidemiol 24:150–157.
40. Hahn LW, Ritchie MD, Moore JH. 2003. Multifactor dimensionality
reduction software for detecting gene-gene and gene-environment
interactions. Bioinformatics 19:376–382.
41. Hahn LW, Moore JH. 2004. Ideal discrimination of discrete clinical
endpoints using multilocus genotypes. In Silico Biol 4:0016.
42. Moore JH. 2004. Computational analysis of gene-gene interactions in
common human diseases using multifactor dimensionality reduction.
Expert Rev Mol Diagn 4:795–803.
43. McKinney B, Ritchie MD, Moore JH. 2005. Machine learning for detecting
gene–gene interactions. Appl Bioinformatics, in press.
44. Thornton-Wells TA, Moore JH, Haines JL. 2004. Genetics, statistics, and
human disease: Analytical retooling for complexity. Trends Genet 20:
640–647.
45. Li W, Reich J. 2000. A complete enumeration and classification of twolocus disease models. Hum Hered 50:334–349.
46. Moore JH, Hahn LW, Ritchie MD, Thornton TA, White BC. 2002.
Application of genetic algorithms to the discovery of complex models
for simulation studies in human genetics. In: Langdon WB, Cantu-Paz
E, Mathias K, Roy R, Davis D, Poli R, Balakrishnan K, Honavar V,
Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC, Miller JF, Burke E,
Jonoska N, editors. Proceedings of the Genetic and Evolutionary
Computation Conference. San Francisco: Morgan Kaufmann Publishers.
pp 1150–55.
47. Wade MJ. 2001. Epistasis, complex traits, and mapping genes. Genetica
112–113:59–69.
48. Culverhouse R, Suarez BK, Lin J, Reich T. 2002. A perspective on
epistasis: limits of models displaying no main effect. Am J Hum Genet
70:461–471.
49. Moore JH, Williams SM. 2002. New strategies for identifying gene–gene
interactions in hypertension. Ann Med 34:88–95.
50. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. 2002. A comprehensive review of genetic association studies. Genet Med 4:45–61.
51. Zee RY, Hoh J, Cheng S, Reynolds R, Grow MA, et al. 2002. Multi-locus
interactions predict risk for post-PTCA restenosis: an approach to the
genetic analysis of common complex disease. Pharmacogenomics J 2:
197–201.
52. Williams SM, Ritchie MD, Phillips JA 3rd, Dawson E, Prince M, et al. 2004.
Multilocus analysis of hypertension: a hierarchical approach. Hum Hered
57:28–38.
53. Tsai CT, Lai LP, Lin JL, Chiang FT, Hwang JJ, et al. 2004. Reninangiotensin system gene polymorphisms and atrial fibrillation. Circulation 109:1640–1646.
54. Cho YM, Ritchie MD, Moore JH, Park JY, Lee KU, et al. 2004. Multifactordimensionality reduction shows a two-locus interaction associated with
Type 2 diabetes mellitus. Diabetologia 47:549–554.
55. Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, Farrall M. 1995. Twolocus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am J
Hum Genet 57:920–934.
56. Cordell HJ, Todd JA. 1995. Multifactorial inheritance in type 1 diabetes.
Trends Genet 11:499–504.
57. Cordell HJ, Todd JA, Hill NJ, Lord CJ, Lyons PA, et al. 2001. Statistical
modeling of interlocus interactions in a complex disease: rejection of the
multiplicative model of epistasis in type 1 diabetes. Genetics 158:357–
367.
58. Cox NJ, Frigge M, Nicolae DL, Concannon P, Hanis C, et al. 1999. Loci
on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility
to diabetes in Mexican Americans. Nat Genet 21:213–215.
59. Hanis CL, Boerwinkle E, Chakraborty R, Ellsworth DL, Concannon P, et al.
1996. A genome-wide search for human non-insulin-dependent (type 2)
diabetes genes reveals a major susceptibility locus on chromosome 2.
Nat Genet 13:161–166.
60. Cox NJ, Hayes MG, Roe CA, Tsuchiya T, Bell GI. 2004. Linkage of
calpain 10 to type 2 diabetes: the biological rationale. Diabetes 53:
S19–S25.
61. Johnson JD, Han Z, Otani K, Ye H, Zhang Y, et al. 2004. RyR2 and
calpain-10 delineate a novel apoptosis pathway in pancreatic islets.
J Biol Chem 279:24794–24802.
62. Jansen RC. 2003. Studying complex biological systems using multifactorial perturbation. Nat Rev Genet 4:145–151.
63. Williams SM, Haines JL, Moore JH. 2004. The use of animal models in
the study of complex disease: all else is never equal or why do so
many human studies fail to replicate animal findings? BioEssays 26:170–
179.
64. Elena SF, Lenski RE. 1997. Test of synergistic interactions among
deleterious mutations in bacteria. Nature 390:395–398.
65. Elena SF, Lenski RE. 2001. Epistasis between new mutations and
genetic background and a test of genetic canalization. Evolution Int J
Org Evolution 55:1746–1752.
66. Remold SK, Lenski RE. 2004. Pervasive joint influence of epistasis and
plasticity on mutational effects in Escherichia coli. Nat Genet 36:423–
426.
67. Brem RB, Yvert G, Clinton R, Kruglyak L. 2002. Genetic dissection of
transcriptional regulation in budding yeast. Science 296:752–755.
68. Lenski RE, Ofria C, Collier TC, Adami C. 1999. Genome complexity,
robustness and genetic interactions in digital organisms. Nature 400:
661–664.
69. Lenski RE, Ofria C, Pennock RT, Adami C. 2003. The evolutionary origin
of complex features. Nature 423:139–144.
70. Di Paolo EA, Noble J, Bullock S. 2000. Simulation models as opaque
thought experiments. In: Dedau MA, McCaskill JS, Packard NH,
Rasmussen S, editors. Artificial Life VII. Cambridge MA: The MIT Press.
pp 497–506.
71. White BC, Moore JH. 2004. Systems biology thought experiments in
human genetics using artificial life and grammatical evolution. In: Pollack
J, Bedau M, Husbands P, Ikegami T, Watson RA, editors, Artificial Life IX.
Cambridge MA: The MIT Press. pp 581–86.
72. Moore JH, Boczko E, Summar M. 2005. Connecting the dots between
genes, biochemistry, and disease susceptibility: systems biology modeling in human genetics. Mol Genet Metab 84:104–111.
BioEssays 27.6
645
Problems and paradigms
73. Huxley J. 1942. Evolution: The Modern Synthesis. London: Allen & Unwin.
74. Ideker T, Galitski T, Hood L. 2001. A new approach to decoding life:
systems biology. Ann Rev Genomics Hum Genet 2:343–372.
75. Segre D, Deluna A, Church GM, Kishony R. 2005. Modular epistasis in
yeast metabolism. Nat Genet 37:77–83.
76. Moore JH. 2005. A global view of epistasis. Nat Genet 37:13–14.
646
BioEssays 27.6
77. Takahashi N, Hagaman JR, Kim HS, Smithies O. 2003. Minireview:
computer simulations of blood pressure regulation by the reninangiotensin system. Endocrinology 144:2184–2190.
78. Takahashi N, Smithies O. 2004. Human genetics, animal models and
computer simulations for studying hypertension. Trends Genet 20:136–
145.