Download SystemsBiologyPaper Roozbeh Arshadi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Genetic drift wikipedia , lookup

Genomic library wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Polyploid wikipedia , lookup

Human genome wikipedia , lookup

Genetic testing wikipedia , lookup

Genomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

X-inactivation wikipedia , lookup

Twin study wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Pharmacogenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Medical genetics wikipedia , lookup

Tag SNP wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Population genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Behavioural genetics wikipedia , lookup

Genome editing wikipedia , lookup

Genome evolution wikipedia , lookup

Heritability of IQ wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
1
Role of Systems Biology in the discovery of
genetic basis of complex cardiovascular diseases
Roozbeh Arshadi

Abstract—Detection of the genetic variants contributing to
complex, polygenic cardiovascular diseases is inherently difficult
since most forms of these diseases are a result of many genes with
small effects further complicated by the effects of environment
and other genes. This paper discusses recombination-mapping
techniques, both linkage and association studies, as current
prevalent methods in detection of genetic basis of complex disease
traits. However, despite some success, many problems and
obstacles are encountered. In recent years, systems biology
inspired tools and methodologies coupled with advances in
genomics have attempted to overcome some of these
shortcomings. These include, among others, integrative
approaches such as combination of linkage mapping with
physiological profiling, or novel pattern detection algorithms and
the use of neural networks as classifiers in association studies.
Finally, future progress requires not only genomic studies, but
also an integration of the transcriptome, proteome, and phenome
data to give a more complete picture of the complex interacting
networks contributing to the disease.
Index Terms— Association studies, Cardiovascular disease,
Linkage mapping, , Systems Biology.
I. INTRODUCTION
Each year, Cardiovascular Diseases claim the lives of close to
one million people in US alone [9]. Among Cardiovascular
Diseases (CVD), hypertension and Atherosclerosis have
received a great deal of attention. Hypertension, affecting
more than 50 million people in US alone [7], is a multifactorial disease developed as a consequence of errors in
biological systems that determine blood pressure [10].
Atherosclerosis, a primary cause of coronary heart disease, is
best described by buildup of fatty substances, cholesterol,
cellular waste products, calcium and other substances in the
inner lining of an artery. The existence of a genetic basis for
these diseases has been well established. In the case of
hypertension, genetic determinants contribute between 30 – 50
% of blood pressure variation among individuals [5]. Also,
twin studies have confirmed that with regards to coronary heart
disease, 40 to 60 % of the variance of the disease correlates
with genetic differences [4].
These common cardiovascular diseases are complex in that
they involve an interplay of many genetic variations of
molecular and biochemical pathways and their interactions
with environmental factors [8]. Whereas in monogenic
diseases, a single gene or allele determines the disease
phenotype, in polygenic diseases, such as Atherosclerosis, the
phenotype is a product of many genes with small effects,
further complicated by environmental interactions. Studies of
mice have revealed that over 100 genes influence the
development of Atherosclerotic lesions [4].
Using traditional positional cloning approaches, over 2000
different Mendelian disease genes have been identified [4].
However, the non-Mendelian, polygenic nature of most CVD
creates difficulty in locating the influential genes. Hence,
there is a great need to tailor the methodology, using advances
in the area of genomics and systems biology, to increase its
effectiveness in dealing with such complexity.
Recombination mapping as a methodology has been
extensively used, with varying degrees of success, in
identifying disease causing genes and mutations [1]. It is
important to have a clear understanding of the fundamentals of
the methodology before delving into its strengths and
limitations in dealing with complex disease phenotypes, and
ultimately discussing its potential as a discovery tool within a
systems biology framework.
II. FUNDAMENTALS OF RECOMINATION MAPPING
Human chromosomes exist in homologous pairs (with
corresponding DNA sequences each from a different parent).
A source of diversity between generations is the occurrence of
crossovers between homologous chromosomes during meiosis
[3]. The closer two loci are on a chromosome, the lower the
chance of crossover between them and the higher the chance
they will stay together in the next generation (i.e. cosegregate).
The above phenomenon forms the basis of recombination
mapping as a tool for determining the regions of the
chromosome or the alleles linked with a particular disease
phenotype. The over-simplified method is as follows: one first
genotypes (determines the specific alleles of) a sample
population at a number of locations on the chromosome
(markers) and then based on the statistical examination of the
2
results, determines whether a particular locus/allele cosegregates with the disease trait. If such evidence is found,
one can infer that a locus (location on the chromosome)
influencing the trait is near the locus that co-segregated with
the disease trait [1].
Large-scale experiments and statistical analysis are
important keys in recombination mapping. The identification
of genes underlying complex CVD requires a truly multidisciplinary approach involving geneticists, molecular
biologists (development of assays and markers),
bioinformaticists (to store and manipulate the data), and
statisticians (development of algorithms to assess cosegregation) to name a few [1].
There are two prevalent methods of recombination mapping:
Linkage Mapping and Association (linkage disequilibrium)
studies. Regardless of the methodology - linkage or
Association - recombination mapping involves one of the two
strategies: candidate-gene approach or the total genome scan
[8]. The first is a hypothesis testing approach where a
suspected gene or region of chromosome is tested. The other,
the total genome scan, is a hypothesis-generating approach. In
this case, a great number of markers (polymorphisms on a
chromosome to be tested for linkage/association) along the
genome are used to locate regions which might contain genes
influencing the trait.
A. Linkage Mapping
Linkage between two locations on a chromosome is a function
of their distance: The closer they are, the higher the probability
that they will not be separated by recombination events.
Therefore, linkage of two loci can be tested by counting the
frequency of recombination between them [3]. The lower the
recombination frequency, the higher is the probability that they
are located close to each other on the chromosome.
The above principle combined with the use of known
markers (variants on the chromosomes with known locations)
can be used to identify the chromosomal location of gene
variants related to a given disease [8].
Linkage analysis requires family-based sample collections
[3]. As an example, a common linkage study is performed in
the following manner [3]: affected sibling pairs are genotyped
and the degree of similarity between them at a specific number
of genetic markers is assessed. If the degree of similarity at a
specific marker is significantly different from that expected
from Mendelian segregation (where alleles do not cosegregate), then one can infer the disease is linked to that
marker (i.e. disease causing region is close to the marker and
therefore co-segregates with it).
The result of a linkage study is typically the identification of
portions of the chromosome linked to a particular
trait/phenotype. These regions are referred to as QTL
(quantitative trait loci).
B. Association Studies
While linkage maps attempt to determine locations (loci) on
the chromosomes linked to a particular disease trait,
association studies attempt to determine the association of a
particular allele to the disease trait [3]. This is accomplished
by conducting a case (diseased)-control (non-diseased)
genotyping and examining the frequency of specific DNA
variants (polymorphisms) between the two groups.
Association studies by definition do not require family-based
sample collections [8].
III. OBSTACLES FACING RECOMBINATION STUDIES
Several fundamental issues have hampered the effectiveness of
linkage and association studies in recent years. Some of these
problems have been addressed to a certain extent by new
integrative approaches, some of which will be discussed in the
next section. Some of the problems alluded to in published
papers include problems associated with focusing on limited
number of complex phenotypes, population stratification and
non-homogeneity, lack of resolution resulting in false
positives, and the inadequacy of one-locus approache [1, 4, 10,
11].
One of the problems with many of the linkage studies has
been the focus on a limited number of complex, high-level
phenotypes [1]. For example, in many rat studies of
hypertension, blood pressure and heart rate have been used as
phenotypes. The result has been the identification of loci on
almost every rat chromosome, with confirmed locations on
chromosomes 1, 2, 3, 5, 10, 12 [10]. Therefore, for complex
phenotypes, where the contribution of any one gene/variant to
the phenotype can be obscured by others [1], results of linkage
mapping might lead to wide QTL (regions that likely contain
genes that affect a trait) regions on many chromosomes.
Population stratification is another major concern casting
doubt on the validity of some association studies [3]. In
association studies, the homogeneity of samples is a very
important issue. For example, in a case-control association
study, suppose that the sample population is a mixture of two
populations, one high risk for the disease/phenotype in
question and one low, both with different allele frequencies for
a gene used as a marker. The high frequency of our marker
allele in the diseased (case) portion might lead the association
study to falsely associate the marker gene (although unrelated)
with the disease phenotype.
Despite the success in dealing with monogenic phenotypes,
studying complex disease phenotypes by studying one or few
polymorphisms has shown its limitations [11]. In many cases,
a single genetic variant (single locus) might not show
3
observable coinheritance (co-segregation) with the phenotype
[1]. To detect the combinatorial effects of multiple variants
(loci) simultaneously - as the variants in certain combinations
might influence the phenotype - a multi-locus approach is
preferred. This approach requires the development of
statistical methods that are able to handle multiple variable
loci [11].
Another issue with association studies is the presence of
false negatives (missing actual associations) due to lack of
resolution of linkage disequilibrium. As discussed previously,
random recombination events from generation to generation
tend to separate regions of the chromosome. Therefore, in
many cases, the disease-influencing gene will have to be very
close to the marker allele, not to be affected by recombination,
and hence show linkage disequilibrium or association. If there
exists a high degree of recombination and not enough markers
in the region, the association might not be detected, leading to
a false negative. It may be necessary to detect and genotype
every variant in a particular gene to eliminate the possibility of
a false negative [3].
IV. INTEGRATIVE GENOMIC SYSTEMS-BIOLOGY
SOLUTIONS
Considering the obstacles discussed previously, the discovery
of the genetic variants contributing to complex CVD seems
like a daunting task. In addressing some of the problems
discussed previously, there have been attempts at more
integrative approaches to disease-gene identification problem.
One of the solutions proposed is to refine the definition of
phenotype. The use of intermediate phenotypes has been
advocated by several publications [1,10]. For example, in
linkage/association studies of hypertension, instead of a broad
phenotype such as blood pressure, an intermediate phenotype
such as catecholamin levels should be used [1], increasing the
chance that a specific locus will be linked to it. Also, a joint
analysis of a group of functionally related intermediate
phenotypes would increase the power to detect a contributing
gene since a gene might affect a network of correlated
phenotypes [1].
A useful integrative strategy is to combine linkage maps
relating to a multitude of intermediate phenotypes with
patterns of correlations between these phenotypes
(physiological profiling) [10]. In a particular study [10], 239
cardiovascular and renal phenotypes were measured in normal
and stressed rats, and 125 of these phenotypes were mapped to
the regions on chromosomes using linkage analysis. At each
marker allele, these 125 phenotypes were incorporated into a
visual profile of correlation coefficients between the traits.
Figure 1 demonstrates the methodology in a simplified manner
(Note: the figure serves to demonstrate the integrative
methodology and not published results of the study in
question). Essentially, what has been created is a ‘systems
biology map for cardiovascular traits’ (in F2 rats) and a
physiological profiling tool to assess the complex relationships
between the phenotypes as a function of genotype [10].
Combining the genetic linkage maps and the physiological
profiles facilitates relating genetic information with functional
pathways.
One of the conclusions inferred from this
methodology was the relationship between alleles of nitric
oxide synthase (NOS) and arterial pressure response [10].
As discussed previously, the heterogeneity of the sample
population used for a study can cast doubt on the results. One
approach is to use younger, genetic isolate populations in these
studies. In these cases, the greater environmental homogeneity
tends to lessen the effect of the environmental factors, there is
greater genetic homogeneity, and also a fewer number of
generations implies a smaller chance of recombination events,
and hence stronger association [1]. Using an isolate
population, one can then combine linkage and association
studies. For example, at deCODE, the gene encoding PDE4D
in ischemic stroke (caused by Atherosclerosis) was identified
by a combination of linkage and association studies [4]. First,
linkage analysis of families was used to map the gene to a
portion of chromosome 5 and then association of the gene was
confirmed by saturating the region with genetic markers [4].
One of the problems discussed previously was that of
missed associations or false negatives due to lack of resolution
in association studies. This problem could be alleviated by an
increase in the number of markers (known genetic segments
used as comparison points in linkage and association studies).
The larger number of markers would also allow for genomewide association studies and a hypothesis-generating approach.
Until recently, many association studies have been restricted to
candidate genes (regions suspected by biochemists to be
involved in a particular pathway) [4]. Hence, the association
study would be a hypothesis-testing approach attempting to
associate a particular allele at the suspected locus with the
disease phenotype in question. The identification of genetic
differences
(especially
SNP
–
single
nucleotide
polymorphisms) throughout the genome in the recent years
will create a larger marker pool, which together with highthroughput genotyping techniques, will allow for wholegenome association studies.
4
Figure 1 - Integrating Linkage Mapping with physiological profiling
Another effort with the aim to facilitate gene discovery
through recombination mapping is the Human Haplotype Map
Initiative. A haplotype is a sparse representation of DNA
representing the alleles on a chromosome [4]. The goal of the
initiative is to determine the size and structure of these
common chromosome segments across any set of individuals
[4]. The haplotype map would ideally allow association
studies to be performed by counting how often diseased vs.
non-diseased individuals carry a certain haplotype. Those
haplotypes that show a statistically significant difference in
frequency between the diseased and non-diseased are likely to
contain the disease causing gene or mutation [4].
It was previously discussed that approaches ignoring the
combinatorial effects of genes on a complex phenotype, have
show significant limitations [11]. To that effect, multi-locus
approaches have been advocated with the aim to detect, among
all measured polymorphisms, the ones that individually or in
combination with others, influence the complex phenotype
[11]. Such large-scale, multi-locus association studies require
sophisticated data mining tools, statistical analysis and pattern
detection algorithms. For example, the Combinatorial
Partitioning Method (CPM) [6] identifies partitions of twolocus genotypes that are most predictive of the phenotype
variability. Tahri-Daizadeh et al have proposed an automated
Detection of Informative Combined Effects (DICE) algorithm
to be used in association studies in extracting combinatorial
effects of several polymorphisms and non-genetic covariates
[11]. Curtis et al outline the use of artificial neural networks
and their pattern-recognition properties in detection of
association between disease phenotypes and ‘multiple marker
genotypes’ [2]. The goal is for the neural network to be able to
classify the subjects in a case-control study based on their
marker genotypes.
better understanding of patterns of altered gene and protein
expression during disease development or progression. The
hope is that the integration of expression profiling information
at a variety of time points, phenotypes, and environmental
conditions, with genomic information will give us a better
understanding of the gene regulatory networks [12]. Yet, even
the transcriptome is not fully representative of the set of
proteins encoded by the genome (proteome) [12]. Figure 2 is
an illustration of the various layers of integration starting from
the genome and leading to the phenome. The integration of
data obtained from one layer with another using a systems
biology approach has been proposed [12]. In fact, in an
example of such a study [10], discussed previously, the
integration of genetic linkage maps (genome) with
physiological profiling (phenome) in model rats revealed
functional interactions between traits not apparent from
linkage analysis alone. Hence a multi-layered approach,
guided by systems biology principles, will likely dominate the
future landscape of cardiovascular research.
- quantitative description
of integrated functions of
organism
- regulatory networks
and signaling pathways
singallin
- collection of all
gpathways
encoded proteins
- messenger RNA
- DNA
Figure 2 - Multi-layered approach to cardiovascular studies [Adapted
from 8]
REFERENCES
[1] Broeckel U, Schork NJ. Identifying genes and genetic variation
V. FUTURE – BEYOND THE GENOME
The explosion of genome information in recent years has
empowered many of the aforementioned techniques for
discovery of genetic basis of complex CVD phenotypes. A
milestone was the completion of the human genome sequence
in April 2003 [12]. In addition, the on-going discovery of SNP
markers and high-throughput genotyping technology pave the
way for informative genome-wide association studies.
There is growing recognition that emergent, integrative
behavior – applies to most complex CVD phenotypes – is a
result of dynamic interactions between many components. It is
apparent that understanding integrative behavior is essential
for progress, a feat which cannot be accomplished by genomic
studies alone. Current literature [8, 12] advocate exploring the
transcriptome (messenger RNA associated with cellular
response to disease) using expression profiling, to give us a
underlying human diseases and complex phenotypes via recombination
mapping. J Physiol 2003 554(1): 40-45
[2] Curtis D, North BV, Sham PC. Use of an artificial neural network to
detect association between a disease and multiple marker genotypes.
Ann. Hum. Genet. 65: 95-107 Part 1, JAN 2001
[3] Keavney B., Genetic association studies in complex diseases. J. Hum.
Hypertens. 14 (2000), pp. 361–367
[4] Lusis, A.J, et al. Genetic basis of atherosclerosis: part I: new genes and
pathways. Circulation. 2004 Sep 28; 110(13): 1868-73
[5] McBride, Martin W., et al. Functional genomics in rodent models of
hypertension, J Physiol 2003 554(1): 56-63
[6] Nelson, M.R., et al. 2001. A combinatorial partitioning method to
identify multilocus genotypic partitions that predict quantitative trait
variation. Genome Res. 11: 458-470
[7] NHLBI Working Group. Future Directions for Hypertension Research
Executive Summary, 2004
[8] Podgoreanu M.V. and Schwinn D.A., 2004. Genomics and the
circulation. Br J Anaesth 93 (1): 140-148 JUL 2004
[9] Smith, I.K. Protect Your Heart. Newsweek, July 19, 2002.
[10] Stoll M, Cowley AW, Jr, Tonellato PJ, et al. A genomic-systems biology
map for cardiovascular function. Science 2001; 294: 1723 – 6
[11] Tahri-Daizadeh N, et al. Automated Detection of Informative Combined
Effects in Genetic Association Studies of Complex Traits. Genome Res.
2003 Aug; 13(8): 1952-6
5
[12] Winslow, R.L. and Boguski, M.S., 2003. Genome informatics: current
status and future prospects. Circ. Res. 92, pp. 953–961