* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Traversing the conceptual divide between biological and
Genetic testing wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Synthetic biology wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Heritability of IQ wikipedia , lookup
Behavioural genetics wikipedia , lookup
Medical genetics wikipedia , lookup
Genome-wide association study wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Human genetic variation wikipedia , lookup
Designer baby wikipedia , lookup
Population genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Problems and paradigms Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis Jason H. Moore1,2,3* and Scott M. Williams4 Summary Epistasis plays an important role in the genetic architecture of common human diseases and can be viewed from two perspectives, biological and statistical, each derived from and leading to different assumptions and research strategies. Biological epistasis is the result of physical interactions among biomolecules within gene regulatory networks and biochemical pathways in an individual such that the effect of a gene on a phenotype is dependent on one or more other genes. In contrast, statistical epistasis is defined as deviation from additivity in a mathematical model summarizing the relationship between multilocus genotypes and phenotypic variation in a population. The goal of this essay is to review definitions and examples of biological and statistical epistasis and to explore the relationship between the two. Specifically, we present and discuss the following two questions in the context of human health and disease. First, when does statistical evidence of epistasis in human populations imply underlying biomolecular interactions in the etiology of disease? Second, when do biomolecular interactions produce patterns of statistical epistasis in human populations?Answers to these two reciprocal questions will provide an important framework for using genetic information to improve our ability to diagnose, prevent and treat common human diseases. We propose that systems biology will provide the necessary information for addressing these questions and that model systems 1 Computational Genetics Laboratory, Department of Genetics, Department of Community and Family Medicine, Norris Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH. 2 Department of Biological Sciences, Dartmouth College, Hanover, NH. 3 Department of Computer Science, University of New Hampshire, Durham, NH. 4 Center for Human Genetics Research, Department of Medicine, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical School, Nashville, TN. Funding agency: This work was supported by National Institutes of Health grants HL65234, AI59694, HD047447, and RR000095. *Correspondence to: Jason H. Moore, 706 Rubin Building, HB 7937, One Medical Center Drive, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756. E-mail: [email protected] DOI 10.1002/bies.20236 Published online in Wiley InterScience (www.interscience.wiley.com). BioEssays 27:637–646, ß 2005 Wiley Periodicals, Inc. such as bacteria, yeast and digital organisms will be a useful place to start. BioEssays 27:637–646, 2005. ß 2005 Wiley Periodicals, Inc. Introduction It has been argued that understanding genetic susceptibility to common human diseases, such as cardiovascular disease will require a research strategy that acknowledges and confronts head-on the complexity of the relationship between genotype and phenotype.(1) Part of this complexity can be attributed to epistasis or gene–gene interaction that occurs when the phenotype for a genotype at one locus is dependent on genotypes at one or more other loci. Indeed, epistasis is arguably a ubiquitous component of the genetic architecture of common disease susceptibility.(2) Determining whether a particular genetic variant is causative in humans is an elusive goal(3) and the extent to which statistical evidence of epistasis from population studies is predictive of biological epistasis from experimental studies remains an open, yet important, question. As biology evolves as a discipline, different aspects of investigation have merged to bring us closer to understanding the natural world. Genetics emerged in the 1800s with Mendel’s work followed by the development of statistics by Pearson, Fisher and others. It was the rediscovery of Mendel’s work in the early 20th century that lead to the development of modern genetics. Following this rediscovery, it was the merger of genetics and statistics by Fisher, Wright and others that gave rise to population genetics as a discipline. Prior to the merger, epistasis had been described as a genetic phenomenon by Bateson(4) and later as a statistical phenomenon by Fisher.(5) Population genetics and the ability to measure DNA sequence variations made it possible for biological and statistical concepts of epistasis to converge in studies of model organisms where statistical analyses can often be used to accurately infer the presence and role of gene–gene interactions in determining a particular phenotype. However, the ultimate goal is to understand how DNA sequence variations influence phenotypes in an individual through a hierarchy of biochemical, metabolic and physiological systems that are driven by biomolecular interactions (i.e. biological epistasis). While it is not possible to make the connection between BioEssays 27.6 637 Problems and paradigms biological and statistical epistasis in humans at this time, it is important for understanding the genetic basis of many common diseases and we argue that it will be possible in the future. This is the promise of what has been called systems biology, the simultaneous study of all the biological pieces together in the context of ecology and evolution to reveal etiology. Epistasis has been defined primarily as either biological or statistical. Biological epistasis is the result of physical interactions among biomolecules within gene regulatory networks and biochemical pathways in an individual such that the effect of a gene on a phenotype is dependent on one or more other genes. In contrast, statistical epistasis is defined as the deviation from additivity in a mathematical model, where the relationship between multilocus genotypes and phenotypic variation in a population is not predictable based solely on the actions of the genes considered singly. The goal of this essay is to review definitions and examples of biological and statistical epistasis and to explore the relationship between them. Specifically, we present and discuss the following two questions in the context of human health and disease. First, when does statistical evidence of epistasis in human populations imply underlying biomolecular interactions in the etiology of disease? Second, when do biomolecular interactions produce patterns of statistical epistasis in human populations? Answers to these two reciprocal questions will provide an important framework for using genetic information to improve our ability to diagnose, prevent and treat common human diseases. We propose that systems biology will provide the necessary information for addressing these questions and that studies of model systems, such as bacteria, yeast and digital organisms will be important in defining the kinds of interactions that may be expected in human phenotypic variation. Defining epistasis The concept of epistasis has been around for at least 100 years and was recognized as an explanation for deviations from simple Mendelian ratios. William Bateson(4) has been credited by Hollander(6) and more recently by Phillips(7) as the first to use the term epistasis that literally translated means ‘standing upon’. A commonly used textbook definition of epistasis is one gene masking the effects of another gene.(8,9) A classic example of epistasis comes from studies of the shape of seed capsules from crosses of a weedy plant, the shepherd’s purse (Bursa bursa-pastoris) by Shull.(10) In this study, crosses from doubly heterozygous plants yielded Mendelian ratios of 15 triangular capsules to one oval capsule. It is generally thought that there are two pathways that lead to triangular shape, each with dominant alleles. Only when both pathways are blocked by homozygous recessive alleles is an oval-shaped seed capsule produced. The most-compelling examples of epistasis in animals come from studies of model organisms. For example, 638 BioEssays 27.6 Fedorowicz et al.(11) described epistatic interactions between loci that impact olfactory behavior in Drosophila. This study was based on crosses of 12 lines derived from a common isogenic background that differed in the location of P-element insertions, each of which has homozygous effects on olfactory behavior. Eight of the 12 insertion variants defined an interactive network of genes that significantly impact the olfactory phenotype. A subsequent study that investigated these same insertions, using transcriptional profiling,(12) showed that a total of 530 genes were significantly coregulated in response to one or more of these olfactory mutations. This study clearly and not surprisingly documents epistasis in a model organism. A logical next step will be to determine the impact on olfactory phenotypes through the networks of interacting proteins and other biomolecules (i.e. biological epistasis). Biological epistasis At the heart of epistasis lie biomolecular interactions that drive transcription, translation and signal transduction, and the function of biochemical, metabolic and physiological systems. Here, we define biological epistasis as physical interactions among proteins or other molecules that impact phenotype. This can occur at several levels from the interaction of transcription factors with each other and/or promoter sequence variation to the non-linear interaction of enzymes through a metabolic pathway. Biological epistasis has also been referred to as physiological epistasis.(13) One of the best known examples of biological epistasis in the genetics literature is sickle cell disease. Individuals with sickle cell disease have bglobin molecules that have a neutral instead of a polar amino acid on the outer surface. This neutral amino acid leads to an increase in intermolecular adhesion (i.e. physical interaction) leading to accumulation of deoxyhemoglobin, which in turn deforms the red blood cells. Thus, healthy and sick individuals differ with respect to their biomolecular interactions among bglobin proteins. Interestingly, epistasis also plays an important role in sickle cell disease.(14–16) For example, Templeton(16) reviewed work by Giblett(17) that describes how the protein haptoglobin physically interacts with hemoglobin influencing its excretion. Giblett(17) also noted that genetic variants in the haptoglobin gene interact with the S allele of the hemoglobin b-chain gene. The ubiquity of protein–protein interactions in transcriptional regulation has allowed the development and use of the yeast two-hybrid system for assessing such interactions.(18) This approach allows a single protein to be used as bait for other proteins that might physically interact with it. For example, Bondos et al.(19) used the yeast two-hybrid assay to identify proteins that physically interact with a developmental Hox protein in Drosophila called Ultrabithorax (Ubx). Screens of 0–12 hour embryo libraries identified the Disconnected Interacting Protein 1 (DIP1) that was later confirmed to Problems and paradigms physically interact with Ubx by a variety of strategies, including phage display, immunoprecipitation, pull-down assays and gel retardation analysis. Interestingly, Ubx and DIP1 are coexpressed in the same embryonic tissues and are both localized to the nucleus. Further, ectopic expression of DIP1 in wing and haltere imaginal discs results in an abnormal developmental phenotype in the form of small shriveled wings and excess bristle numbers. Genetic studies with wild-type and mutant Drosophila confirm the interaction. The study by Bondos et al.(19) illustrates how powerful the yeast two-hybrid system is for identifying proteins that interact with a protein of interest. However, the two-hybrid system can also be used on a very large scale to piece together protein– protein interaction networks. For example, Rain et al.(20) use a high-throughput version of the two-hybrid system to build a large-scale map of protein–protein interaction in the human gastric pathogen Helicobacter pylori. In this study, 261 proteins used as bait revealed over 1200 interactions, connecting more than 46% of all H. pylori proteins. Such assays have been very successful in model organisms. However, similar studies in humans are complicated by the fact that, among other things, there are expected to be many more interactions than have been observed for many model organisms. It has been estimated that there are roughly 10,000–30,000 pairwise interactions among yeast proteins and perhaps as many as 200,000 or more in humans.(21) Despite the daunting task, it is clear that the detection and characterization of biological epistasis is dramatically improving with technological advances in high-throughput assays and analysis. Finally, it is important to note that there is a very close relationship between biological epistasis and pleiotropy. A common textbook definition of pleiotropy is one gene affecting more than one trait or phenotype.(22) Gilbert(23) further defines pleiotropy as mosaic if the phenotypes are independent or relational if they are correlated. Hodgkin(24) provides an even finer delineation of pleiotropic effects. As an example of mosaic pleiotropy, consider the apolipoprotein E (ApoE) polymorphism and its effects on human health. The ApoE gene plays an important role in both cardiovascular disease and Alzheimer disease even though these two diseases are independent of one another. Where the lines are blurred is with relational pleiotropy. Consider for example the study by Reilly et al.(25) that examined the relationship between the ApoE polymorphism and the correlation between pairs of nine plasma lipid and apolipoprotein traits among 507 unrelated human subjects. This study showed that the ApoE polymorphism had a significant effect on a large number of trait correlations indicative of relational pleiotropy. Since many of the apolipoproteins and lipids physically interact with one another there is also biological epistasis in this system. Thus, it is likely that ApoE genotypes are interacting with genotypes at other loci through physical interactions of biomolecules in lipid complexes. In fact, this may be one of many examples where pleiotropy and epistasis are one and the same. The relationship between pleiotropy and epistasis will be important to define but this is beyond the scope of this essay. Statistical epistasis Bateson’s(4) biological definition of epistasis is in contrast to the concept of statistical epistasis or epistacy that was first used by Fisher.(5) Fisher used the term to describe deviations from additivity in a statistical model. It is common today to use parametric statistical methods such as linear and logistic regression (e.g. Cordell(26)) or nonparametric methods such as combinatorial partitioning,(27–29) restricted partitioning,(30) set association analysis,(31–35) genetic programming neural networks(36,37) and multifactor dimensionality reduction(38–42) to evaluate nonadditivity indicative of statistical epistasis. Because statistical epistasis is difficult to model using parametric models that cannot be known a priori, nonparametric data-mining methods based on computational strategies such as machine learning(43) are gaining in popularity as it becomes apparent that the study of complex traits requires alternative research strategies.(44) Statistical epistasis is perhaps best illustrated using penetrance functions (P[DjG]) that model the probability (P) of disease (D) given genotype (G). Table 1 illustrates a penetrance function that models an interaction between two single nucleotide polymorphisms (SNPs) in the absence of any independent main effects for either SNP. This genetic model specifies the probability of disease given a particular genotype (on the margin) or combination of genotypes (in the table) and their frequencies (in parentheses). This extreme model is based in the nonlinear exclusive OR (XOR) function that defines a genotype pattern that is not linearly separable.(45,46) Here, individuals are at high-risk of disease (P ¼ 1) if they inherit the Aa OR Bb genotypes but NOT both (i.e. the XOR function). Otherwise, they are at low-risk (P ¼ 0). Here, the marginal probabilities represent the disease prevalences and are not different among single genotypes, indicating that there is no main effect of each SNP. Thus, these SNPs would never be detected using a single-locus analysis. It Table 1. Penetrance values for combinations of genotypes from two SNPs exhibiting interactions in the absence of independent main effects Table penetrance BB (0.25) Bb (0.50) bb (0.25) Margin penetrance AA (0.25) Aa (0.50) aa (0.25) Margin penetrance 0 1 0 0.5 1 0 1 0.5 0 1 0 0.5 0.5 0.5 0.5 BioEssays 27.6 639 Problems and paradigms should be noted that even a relatively small change in allele frequency at either locus may permit such a marginal effect to be detected even though the model is epistatic.(16) Thus, the statistical interpretation of whether or not the effect of a genotype is dependent on one or more other genotypes depends very much on genotype frequencies and the context in which the genotype is found.(47) This differs substantially from biological epistasis where the interaction is defined at the level of the individual. Knowledge about statistical patterns of epistasis could be the difference between successful and unsuccessful gene mapping studies.(47,48) Using this argument, Wade(47) and others(49) have suggested that the lack of replication observed for association studies(50) is a signature of epistasis. There are an ever-increasing number of examples of statistical epistasis in complex disease research. This increase is tied to the growing number of analytical tools to detect and characterize this phenomenon. For example, Zee et al.(51) used set association analysis to identify a combination of seven SNPs in seven genes that are significantly associated with coronary artery restenosis following angioplasty in a cohort of 342 cases and 437 controls. The multifactor dimensionality reduction approach (MDR) has identified significant evidence of epistasis in sporadic breast cancer,(38) essential hypertension,(52) atrial fibrillation(53) and type II diabetes.(54) In the study of atrial fibrillation,(53) the role of renin-angiotensin system (RAS) gene polymorphisms in atrial fibrillation (AF) was investigated using 250 patients with documented nonfamilial structural AF and 250 controls matched with regard to age, gender, presence of left ventricular dysfunction and presence of significant valvular heart disease. A total of eight polymorphisms were measured in the angiotensin converting enzyme (ACE), angiotensinogen (AGT), and angiotensin II type I receptor (AT1) gene from the RAS. Single-locus association analysis revealed significant results for three polymorphisms in the AGT gene. The MDR approach was used to evaluate interactions among all possible subsets of the eight polymorphisms. The best model consisted of two polymorphisms from the AGT gene and a single polymorphism from the ACE gene. This three-locus model had a perfect cross validation consistency of 10 and a prediction error of 37.26. Both were significant at the 0.001 level based on a 1000-fold permutation test. Fig. 1A illustrates the distribution of cases and controls for each multilocus genotype combination. Interestingly, only one of the polymorphisms in the MDR model had a significant main effect reinforcing the importance of considering combinations of polymorphisms. As would be predicted from a phenotype that is affected by many interacting genes, documenting some independent main effects only reveals part of the genetic architecture, and can in fact reveal ‘incorrect’ information regarding the underlying genetic model of disease.(47) As suggested by Templeton,(16) statistical epistasis is often 640 BioEssays 27.6 detected when appropriate analytical methods are employed. Further, if the underlying models are epistatic, failure to look for them may lead to overly simplistic and misleading conclusions about the etiology of disease. When does statistical epistasis imply biological epistasis? The relationship between biological and statistical epistasis can be difficult to comprehend. Fig. 2 provides a visual summary of the two concepts. Genetic information impacts phenotype through a hierarchy of proteins that are involved in biological processes ranging from transcription to physiological homeostasis. It is the physical interactions among proteins and other biomolecules and their impact on phenotype that constitute biological epistasis. Additionally, the possibility exists that molecules that do not interact directly can have epistatic interactions if they impact the same phenotype via an alternative path. Differences in biological epistasis among individuals in a population give rise to statistical epistasis. However, it is entirely possible for biological epistasis to occur in the absence of statistical epistasis. This can happen when every individual sampled from a population is the same with respect to their DNA sequence variations and biomolecules. Thus, one can argue that genetic and biological variation is sufficient for the statistical detection of epistasis. However, does evidence of statistical epistasis necessarily imply biological epistasis? An important example of the difficulty in making inferences about biological epistasis from statistical results is human type I diabetes where consideration of epistasis helped identify a two-locus interaction.(55,56) Interestingly, analysis of congenic strains of nonobese diabetic (NOD) mice showed no evidence of biological epistasis despite the significant evidence for statistical epistasis in humans.(57) Cordell et al.(57) suggest that our ability to infer biological mechanisms from statistical results is limited. There are, however, counterexamples that suggest biological inference from statistical models may be possible. For example, Cox et al.(58) found statistical evidence for an interaction between a locus on chromosome 2 and a locus on chromosome 15 in families with type II diabetes. This confirmed an earlier linkage result on chromosome 2(59) and helped identify the calpain-10 gene by positional cloning. This confirms the suggestion that investigations that use the existence of epistasis can facilitate the mapping of complex trait genes.(47) While statistical results in a variety of genetic studies have been inconsistent, the calpain-10 gene clearly has a functional role in glucose metabolism.(60) For example, Johnson et al.(61) have shown experimentally that the calpain10 protein plays an important role in the apoptosis of pancreatic islets. It will be interesting to note in the coming years the proportion of statistical results in human populations that ultimately lead to new knowledge about biological function and disease etiology. Problems and paradigms Figure 1. This figure illustrates the challenge of making inferences about biological epistasis in a pathway (B) from statistical epistasis in a population-based model (A). A: The statistical model for a previously reported multilocus association between polymorphisms in the AGT and ACE genes and susceptibility to atrial fibrillation.(53) The distribution of cases (left bars) and controls (right bars) are illustrated for each multilocus genotype combination. Dark-shaded cells are considered ‘high-risk’ for disease while lightshaded cells are considered ‘low-risk’. White cells indicate no subjects with those genotype combinations were observed in the dataset. B: Summary of a computational model of the renin–angiotensin system as adapted from Takahashi and Smithies.(77,78) A more detailed form of this model has been used to carry out simulations of the changes in this pathway that are involved in blood pressure regulation.(77) Expansion of this model may facilitate computational thought experiments that can be used to generate hypotheses about the relationship between AGT, ACE and susceptibility to atrial fibrillation. Making inferences about biological function and causation from any statistical result is a significant challenge. This is particularly true in genetic studies of humans due to our inability to conduct perturbation experiments that are possible in model organisms.(62) This is further complicated by a lack of concordance between findings in humans and those in model organisms.(63) The difficulty in determining causation for genetic results is outlined by Page et al.(3) Here, it is pointed out that there are several explanations for an observed statistical association between a DNA sequence variation and a trait. First, the association could be a false-positive result due to chance events. Second, variable sites within a gene may be associated with alleles at the true causative site. Third, the association is due to some systematic bias in the study design or analysis. The final possibility is that the association is real. Page et al.(3) suggest that proving causation will always be a challenge due to our inability to randomly assign people to genotypes as is possible with model organisms. However, evidence in favor of an association can be significantly strengthened through comprehensive efforts to address sources of error and bias. Even in this case, we are still left with only inferences about the nature of the underlying biological epistasis architecture. Consider, for example, the multilocus statistical model of atrial fibrillation susceptibility depicted in Fig. 1A and described above. The AGT and ACE genes in this model are part of the RAS pathway that is summarized in Fig. 1B. While it is possible to speculate on the role of these genes in atrial fibrillation,(53) inferring how combinations of polymorphisms impact the biology of this pathway and its relationship with human health and disease is difficult. This is especially true without complete measurements of the pathway and all its factors. Clearly, making inferences about biological function and causation from any statistical result will always be a challenge if the relevant biomolecular information has not been measured. Thus, making the connection between statistical and biological epistasis will only be feasible if the appropriate genetic, genomic, proteomic and metabolic information is at hand. Fortunately, we are in an era of rapid technological advancement with new tools for measuring the genome, transcriptome, proteome, and metabolome surfacing each day. In the short term, our best hope for making use of all this biological information is to study model organisms where there are fewer biomolecules, the genetic background can be controlled and BioEssays 27.6 641 Problems and paradigms Figure 2. The conceptual relationship between biological and statistical epistasis. Biological epistasis occurs at the level of the individual and involves DNA sequence variations (vertical bars), biomolecules (circle, square and triangle) and their physical interactions (dashed lines) giving rise to a phenotype (star) at a particular point in time and space (not shown). Statistical epistasis is a population phenomenon that is made possible by interindividual variability in genotypes, biomolecules and their physical interactions. the system can be perturbed experimentally. We propose that studies of biological and statistical epistasis should be initially focused on simple unicellular organisms such as bacteria and yeast. Understanding the structure of interactions in unicellular organisms will provide basic knowledge about epistasis that can be used to guide studies of humans. A role for unicellular organisms in understanding the relationship between biological and statistical epistasis Not only is epistasis not unique to metazoans, it is also generally easier to detect global examples of epistasis in 642 BioEssays 27.6 unicellular organism that are amenable to genetic manipulation. For example, Elena and Lenski(64) determined that combinations of mutations in E. coli interact to influence fitness. In their study, 225 different genetic strains of E. coli were created using randomly inserted transposons with each strain harboring one, two or three mutations. Relative fitness was defined as the ratio of net growth rates of mutant and wildtype strains during competition for limited resources. This study showed that both synergistic and antagonistic epistasis was commonly observed in selected strains. A follow-up study by Elena and Lenski(65) looking at similar mutations on several different genetic backgrounds supports the idea that epistasis is common. In similar studies, Remold and Lenski(66) carried out a series of experiments to evaluate the role of plasticity and epistasis in determining fitness in E. coli. Eighteen random insertion mutations were generated in E. coli in two different resource environments and in five different genetic backgrounds. The fitness of each mutation in each context was measured using competition assays. Interestingly, half of the mutations had a significant effect on fitness that was dependent on genetic background. Some of these were also dependent on environmental context, suggesting plasticity in the reaction norm. It is important to note that, in these relatively simple systems, context-dependent genetic effects are the norm. Extrapolating these kinds of findings to organisms with more structural complexity and potential interactions, such as humans, would suggest that the importance of context would be even greater. Studies in yeast support the hypothesis that epistasis is ubiquitous. For example, Brem et al.(67) crossed a laboratory strain of S. cerevisiae with a wild strain and then carried out a genome-wide genetic linkage analysis of over 1500 differentially expressed genes. This study demonstrated that the expression levels of 570 genes were linked to one or more different loci with most exhibiting a complex mode of inheritance. Thus, gene expression in this ‘simple’ unicellular organism, grown in controlled experimental conditions, has a multifactorial etiology. These studies in bacteria and yeast are important because they connect biological and statistical epistasis in a meaningful way. The findings of these studies raise the questions: what can we learn about the relationship between biological and statistical epistasis in simple organisms, and how can we translate these results into our understanding of human phenotypes? Statistical epistasis and biological epistasis can be equated in these studies because experimental conditions and background effects can to a large extent be controlled, allowing simple statistical methods to tie the two together. It is also true that the underlying biology is much easier to study in unicellular organisms than in humans and other more complex organisms. Therefore, it is not surprising that progress in defining protein interaction networks has been much easier in Problems and paradigms unicellular organisms because there are likely to be fewer interactions than in humans.(21) As the technology and statistical methodology for measuring biological processes matures, it will become increasingly easier to understand the biology that is responsible for the functional connection between biological epistasis and statistical epistasis observed in population studies. We propose that making this connection in an organism such as E. coli will shed substantial light on similar processes in humans because it will produce knowledge about ‘rules’ that map statistical epistasis onto biology. A role for digital organisms in understanding the relationship between biological and statistical epistasis An alternative strategy that is being used to investigate epistasis is digital biology. The goal of digital biology is to generate data through computer experiments that can be studied as if they were real data. This is important because a digital experiment provides all the relevant information thus making it feasible to carry out a completely informed analysis of the relationship between biological and statistical epistasis. For example, Lenski et al.(68) evolved artificial populations of organisms each consisting of computer programs of varying complexity. Simple computer programs were established with the goal of rapid replication while the goal of more-complex computer programs was to perform mathematical calculations that increase replication in response to certain metabolic rewards. These populations of simple and complex digital organisms were subjected to millions of single and multiple mutations and the resulting impact on fitness was measured. The results of this large-scale digital experiment and others(69) indicate that the complex organismal phenotypes were robust to single mutations and that gene–gene interactions were ubiquitous. These studies demonstrate that complex interactions evolve under relatively simple rules. They also suggest that digital organisms may be useful in modeling and therefore understanding epistasis by answering specific critical questions. For example, when can we conclude that biological epistasis underlies statistical epistasis? When does observed biological epistasis result in nonadditive statistical patterns? These types of questions can be directly addressed using digital organisms in a manner that is very difficult or even impossible with real organisms no matter how ‘simple’. A slightly different approach is to carry out thought experiments about the nature of biochemical and physiological systems that are consistent with a particular statistical model relating DNA sequence variations to interindividual phenotypic differences. Opaque thought experiments represent a happy medium between the extreme position of digital biology experiments representing real biological data and the extreme position of them being completely worthless.(70) Thought experiments have enormous utility in the study of complex biological systems because they generate ideas that can be used to construct explicit hypotheses that can then be tested in model organisms. To this end, White and Moore(71) have developed a computational approach for generating agentbased artificial life models that are consistent with a statistical model defined by a penetrance function. Digital organisms or agents in these simulations move and collide with each other on a grid of fixed size. Agents begin in a random spatial configuration with predefined move and collision behaviors and end in a final state after a specified number of time steps. Some agents move in an invariant manner, while others are dependent on global conditions (e.g. genotype) specified at the beginning of the simulation. Agent interaction (therefore communication) happens via collisions and is thus analogous to physical interactions among biomolecules in biological epistasis. This approach and others(72) may be a useful starting point for those hoping to carry out thought experiments about the role of biochemical and physiological systems in the mapping between genotype and phenotype, although the utility of these methods have yet to be fully tested. What can be learned from these computational modeling exercises? The studies by Lenski et al.(68,69) demonstrate that epistasis is a ubiquitous property of digital organisms and contributes to their robustness. These results are similar to those observed for bacterial cells(64–66) and yeast,(67) reinforcing the usefulness of the digital experiments. The ability to generate ‘experimental’ data in a computer will make it possible to address questions about the nature of biological and statistical epistasis in a completely controlled environment. Showing that digital organisms mimic real simple organism data will reinforce the usefulness in this approach. The study by White and Moore(71) provides a medium for generating hypotheses that can then be tested experimentally in real biological systems. Both approaches provide a starting point for the study of epistasis that is not currently possible using model organisms or humans. We anticipate that knowledge gained from the study of both digital and unicellular biological organisms will play an important role in advancing our understanding of epistasis as it relates to human health and disease. Of course, the limitations of these approaches are that we can only detect what we explicitly model. However, this does not differ from laboratory-based research where we only find what is examined. Systems biology and a more modern synthesis One of the greatest contributions to our understanding of biological organisms was the merger of Darwin’s evolution of species by natural selection and Mendel’s principles of heredity. This merger was referred to as ‘the modern synthesis’ by Huxley(73) and others and paved the way for evolutionary and population genetics as we know it today. We are undergoing a more modern synthesis that merges multiple disciplines into what has been referred to as systems biology.(74) One goal of systems biology is to efficiently, accurately and BioEssays 27.6 643 Problems and paradigms inexpensively measure most, if not all, of the biomolecules involved in one or more biochemical or physiological systems. Only after all the relevant information is at hand will it be possible to mathematically model biomolecules with respect to interindividual phenotypic differences. A recent study by Segrè et al.(75) gives us a taste of what systems-level genetics has to offer. In this study, epistatic effects on growth phenotypes were estimated from all single and double knockouts of 890 metabolic genes in S. cerevisiae. This study not only demonstrated widespread epistasis but effectively documented the directional effects of gene pairs. It further showed that metabolic systems could be organized into functional modules defined by their epistastic interactions. Combining the genetic and phenotypic measures from this study with a complete profile of measures from the transcriptome, proteome and metabolome will provide a much more accurate understanding of how epistasis maps onto phenotypic variation.(76) The realization of this ‘more’ modern synthesis is still distant but we are now in a position to begin developing it using unicellular organisms, digital organisms and computational thought experiments. In the near future, we may be in a position to realize the promise of systems biology in unicellular organisms such as bacteria or yeast that are less complex than multicellular organisms such as humans. Our ability to do this in humans will propel us, for the first time, into an era of making significant progress toward understanding disease etiology and, ultimately, personalized medicine. We anticipate that the study of both digital organisms and simple unicellular organisms using systems biology approaches will help answer some of the questions raised in this paper about the nature of biological and statistical epistasis. The promise of this strategy is that it will reveal some basic principles that will be applicable to the study of common human diseases. Previous work that has incorporated systems level measurements such as the studies cited above have clearly demonstrated that organisms do not operate as collections of unrelated gene products but that the development of phenotypes is the product of the action of many genes that interact in so far unpredictable ways. This realization and the promise of our improving ability to measure complex phenomena and to model them suggests that we may in fact be entering a new period of biological understanding that, although predicted by others years ago, could not possibly be implemented in trying to understand our own complex biology. We are still far from the ultimate destination but at least we have a potential map for the study of human health and disease that can help us find our way. Acknowledgments Special thanks are extended to Marylyn D. Ritchie for her thoughtful review and critique of the manuscript. Thanks are also extended to two anonymous reviewers and Adam Wilkins 644 BioEssays 27.6 for their very thoughtful comments, criticisms, and suggestions that were taken to heart in preparing the final version of the manuscript. References 1. Sing CF, Stengard JH, Kardia SL. 2003. Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol 23:1190–1196. 2. Moore JH. 2003. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56:73–82. 3. Page GP, George V, Go RC, Page PZ, Allison DB. 2003. ‘Are we there yet?’: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73:711–719. 4. Bateson W. 1909. Mendel’s Principles of Heredity. Cambridge: Cambridge University Press. 5. Fisher RA. 1918. The correlations between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433. 6. Hollander WF. 1955. Epistasis and hypostasis. J Hered 46:222–225. 7. Phillips PC. 1998. The language of gene interaction. Genetics 149:1167– 1171. 8. Neel JV, Schull WJ. 1954. Human Heredity. Chicago: University of Chicago Press. 9. Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM. 2000. An Introduction to Genetic Analysis. New York: WH Freeman. 10. Shull GH. 1914. Duplicate genes for capsule form in BURSA bursa Bastoris. J Ind Abst Vererb 12:97–149. 11. Fedorowicz GM, Fry JD, Anholt RR, Mackay TF. 1998. Epistatic interactions between smell-impaired loci in Drosophila melanogaster. Genetics 148:1885–1891. 12. Anholt RR, Dilda CL, Chang S, Fanara JJ, Kulkarni NH, et al. 2003. The genetic architecture of odor-guided behavior in Drosophila: epistasis and the transcriptome. Nat Genet 35:180–184. 13. Cheverud JM, Routman EJ. 1995. Epistasis and its contribution to genetic variance components. Genetics 139:1455–1461. 14. Weatherall DJ. 2001. Phenotype–genotype relationships in monogenic disease: lessons from the thalassaemias. Nat Rev Genet 2:245–255. 15. Nagel RL. 2001. Pleiotropic and epistatic effects in sickle cell anemia. Curr Opin Hematol 8:105–110. 16. Templeton AR. 2000. Epistasis and complex traits. In: Wolf J, Brodie B III, Wade M, editors. Epistasis and the Evolutionary Process. New York: Oxford University Press. 17. Giblett ER. 1969. Genetic Markers in Human Blood. Philadelphia: FA Davis Co. 18. Fields S, Song O. 1989. A novel genetic system to detect protein-protein interactions. Nature 340:245–246. 19. Bondos SE, Catanese DJ Jr, Tan XX, Bicknell A, Li L, et al. 2004. Hox transcription factor ultrabithorax Ib physically and genetically interacts with disconnected interacting protein 1, a double-stranded RNA-binding protein. J Biol Chem 279:26433–26444. 20. Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, et al. 2001. The protein-protein interaction map of Helicobacter pylori. Nature 409:211– 215. 21. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, et al. 2004. Protein interaction networks from yeast to human. Curr Opin Struct Biol 14:292– 299. 22. Hartl D, Clark A. 1998. Principles of Population Genetics. Sunderland MA: Sinauer Associates, Inc. 23. Gilbert SF. 2000. Developmental Biology. Sunderland MA: Sinauer Associates, Inc. 24. Hodgkin J. 1998. Seven types of pleiotropy. Int J Dev Biol 42:501–505. 25. Reilly SL, Ferrell RE, Sing CF. 1994. The gender-specific apolipoprotein E genotype influence on the distribution of plasma lipids and apolipoproteins in the population of Rochester, MN. III. Correlations and covariances. Am J Hum Genet 55:1001–1018. 26. Cordell HJ. 2002. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468. 27. Nelson MR, Kardia SL, Ferrell RE, Sing CF. 2001. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res 11:458–470. Problems and paradigms 28. Moore JH, Lamb JM, Brown NJ, Vaughan DE. 2002. A comparison of combinatorial partitioning and linear regression for the detection of epistatic effects of the ACE I/D and PAI-1 4G/5G polymorphisms on plasma PAI-1 levels. Clin Genet 62:74–79. 29. Moore JH, Smolkin ME, Lamb JM, Brown NJ, Vaughan DE. 2002. The relationship between plasma t-PA and PAI-1 levels is dependent on epistatic effects of the ACE I/D and PAI-1 4G/5G polymorphisms. Clin Genet 62:53–59. 30. Culverhouse R, Klein T, Shannon W. 2004. Detecting epistatic interactions contributing to quantitative traits. Genet Epidemiol 27:141–152. 31. Hoh J, Wille A, Ott J. 2001. Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res 11:2115– 2119. 32. Ott J, Hoh J. 2003. Set association analysis of SNP case-control and microarray data. J Comput Biol 10:569–574. 33. Wille A, Hoh J, Ott J. 2003. Sum statistics for the joint detection of multiple disease loci in case-control association studies with SNP markers. Genet Epidemiol 25:350–359. 34. Hoh J, Ott J. 2003. Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4:701–709. 35. Hoh J, Ott J. 2004. Genetic dissection of diseases: design and methods. Curr Opin Genet Dev 14:229–232. 36. Ritchie MD, White BC, Parker JS, Hahn LW, Moore JH. 2003. Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics 4:28. 37. Ritchie MD, Coffey CS, Moore JH. 2004. Genetic programming neural networks as a bioinformatics tool in human genetics. In: Deb K, Poli R, Banzhaf W, Beyer H-G, Burke E, Darwen P, Dasgupta D, Floreano D, Foster J, Harman M, Holland O, Lanzi PL, Spector L, Tettamanzi A, Thierens D, Tyrrell A, editors. Lecture Notes in Computer Science 3102, New York: Springer. pp 438–448. 38. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, et al. 2001. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147. 39. Ritchie MD, Hahn LW, Moore JH. 2003. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24:150–157. 40. Hahn LW, Ritchie MD, Moore JH. 2003. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19:376–382. 41. Hahn LW, Moore JH. 2004. Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol 4:0016. 42. Moore JH. 2004. Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 4:795–803. 43. McKinney B, Ritchie MD, Moore JH. 2005. Machine learning for detecting gene–gene interactions. Appl Bioinformatics, in press. 44. Thornton-Wells TA, Moore JH, Haines JL. 2004. Genetics, statistics, and human disease: Analytical retooling for complexity. Trends Genet 20: 640–647. 45. Li W, Reich J. 2000. A complete enumeration and classification of twolocus disease models. Hum Hered 50:334–349. 46. Moore JH, Hahn LW, Ritchie MD, Thornton TA, White BC. 2002. Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: Langdon WB, Cantu-Paz E, Mathias K, Roy R, Davis D, Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC, Miller JF, Burke E, Jonoska N, editors. Proceedings of the Genetic and Evolutionary Computation Conference. San Francisco: Morgan Kaufmann Publishers. pp 1150–55. 47. Wade MJ. 2001. Epistasis, complex traits, and mapping genes. Genetica 112–113:59–69. 48. Culverhouse R, Suarez BK, Lin J, Reich T. 2002. A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70:461–471. 49. Moore JH, Williams SM. 2002. New strategies for identifying gene–gene interactions in hypertension. Ann Med 34:88–95. 50. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. 2002. A comprehensive review of genetic association studies. Genet Med 4:45–61. 51. Zee RY, Hoh J, Cheng S, Reynolds R, Grow MA, et al. 2002. Multi-locus interactions predict risk for post-PTCA restenosis: an approach to the genetic analysis of common complex disease. Pharmacogenomics J 2: 197–201. 52. Williams SM, Ritchie MD, Phillips JA 3rd, Dawson E, Prince M, et al. 2004. Multilocus analysis of hypertension: a hierarchical approach. Hum Hered 57:28–38. 53. Tsai CT, Lai LP, Lin JL, Chiang FT, Hwang JJ, et al. 2004. Reninangiotensin system gene polymorphisms and atrial fibrillation. Circulation 109:1640–1646. 54. Cho YM, Ritchie MD, Moore JH, Park JY, Lee KU, et al. 2004. Multifactordimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus. Diabetologia 47:549–554. 55. Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, Farrall M. 1995. Twolocus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am J Hum Genet 57:920–934. 56. Cordell HJ, Todd JA. 1995. Multifactorial inheritance in type 1 diabetes. Trends Genet 11:499–504. 57. Cordell HJ, Todd JA, Hill NJ, Lord CJ, Lyons PA, et al. 2001. Statistical modeling of interlocus interactions in a complex disease: rejection of the multiplicative model of epistasis in type 1 diabetes. Genetics 158:357– 367. 58. Cox NJ, Frigge M, Nicolae DL, Concannon P, Hanis C, et al. 1999. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nat Genet 21:213–215. 59. Hanis CL, Boerwinkle E, Chakraborty R, Ellsworth DL, Concannon P, et al. 1996. A genome-wide search for human non-insulin-dependent (type 2) diabetes genes reveals a major susceptibility locus on chromosome 2. Nat Genet 13:161–166. 60. Cox NJ, Hayes MG, Roe CA, Tsuchiya T, Bell GI. 2004. Linkage of calpain 10 to type 2 diabetes: the biological rationale. Diabetes 53: S19–S25. 61. Johnson JD, Han Z, Otani K, Ye H, Zhang Y, et al. 2004. RyR2 and calpain-10 delineate a novel apoptosis pathway in pancreatic islets. J Biol Chem 279:24794–24802. 62. Jansen RC. 2003. Studying complex biological systems using multifactorial perturbation. Nat Rev Genet 4:145–151. 63. Williams SM, Haines JL, Moore JH. 2004. The use of animal models in the study of complex disease: all else is never equal or why do so many human studies fail to replicate animal findings? BioEssays 26:170– 179. 64. Elena SF, Lenski RE. 1997. Test of synergistic interactions among deleterious mutations in bacteria. Nature 390:395–398. 65. Elena SF, Lenski RE. 2001. Epistasis between new mutations and genetic background and a test of genetic canalization. Evolution Int J Org Evolution 55:1746–1752. 66. Remold SK, Lenski RE. 2004. Pervasive joint influence of epistasis and plasticity on mutational effects in Escherichia coli. Nat Genet 36:423– 426. 67. Brem RB, Yvert G, Clinton R, Kruglyak L. 2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755. 68. Lenski RE, Ofria C, Collier TC, Adami C. 1999. Genome complexity, robustness and genetic interactions in digital organisms. Nature 400: 661–664. 69. Lenski RE, Ofria C, Pennock RT, Adami C. 2003. The evolutionary origin of complex features. Nature 423:139–144. 70. Di Paolo EA, Noble J, Bullock S. 2000. Simulation models as opaque thought experiments. In: Dedau MA, McCaskill JS, Packard NH, Rasmussen S, editors. Artificial Life VII. Cambridge MA: The MIT Press. pp 497–506. 71. White BC, Moore JH. 2004. Systems biology thought experiments in human genetics using artificial life and grammatical evolution. In: Pollack J, Bedau M, Husbands P, Ikegami T, Watson RA, editors, Artificial Life IX. Cambridge MA: The MIT Press. pp 581–86. 72. Moore JH, Boczko E, Summar M. 2005. Connecting the dots between genes, biochemistry, and disease susceptibility: systems biology modeling in human genetics. Mol Genet Metab 84:104–111. BioEssays 27.6 645 Problems and paradigms 73. Huxley J. 1942. Evolution: The Modern Synthesis. London: Allen & Unwin. 74. Ideker T, Galitski T, Hood L. 2001. A new approach to decoding life: systems biology. Ann Rev Genomics Hum Genet 2:343–372. 75. Segre D, Deluna A, Church GM, Kishony R. 2005. Modular epistasis in yeast metabolism. Nat Genet 37:77–83. 76. Moore JH. 2005. A global view of epistasis. Nat Genet 37:13–14. 646 BioEssays 27.6 77. Takahashi N, Hagaman JR, Kim HS, Smithies O. 2003. Minireview: computer simulations of blood pressure regulation by the reninangiotensin system. Endocrinology 144:2184–2190. 78. Takahashi N, Smithies O. 2004. Human genetics, animal models and computer simulations for studying hypertension. Trends Genet 20:136– 145.