Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Transposable element wikipedia , lookup

Behavioural genetics wikipedia , lookup

Population genetics wikipedia , lookup

Copy-number variation wikipedia , lookup

Oncogenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Epistasis wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Heritability of IQ wikipedia , lookup

Human genetic variation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Essential gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Quantitative trait locus wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Ridge (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Transcript
Genetic-linkage mapping of complex
hereditary disorders to a whole-genome
molecular-interaction network
Ivan Iossifov, Tian Zheng, Miron Baron, T. Conrad
Gilliam and Andrey Rzhetsky
Genome Research (2008) 18: 1150-1162
Common neurodevelopmental disorders:
Autism, Bipolar disorder and Schizophrenia
– hereditary disorders
– hard to explain by single environmental or genetic cause
Difficulties in detection of heritable patterns of disease susceptibility:
1) Dimensionality – the exponentially expanding search space
required to explore all combinations of m genes or m genetic loci.
2) How variations within the genes jointly affect the susceptibility to
disease remains unknown even after identification of a set of genes
related to the disease – genetic causes of a given disorder can be
completely or partially different for different affected families.
Proposal of a probabilistic model:
Extension of multipoint genetic linkage model.
Multipoint linkage analysis is a mathematical model that detects
correlations between disease phenotypes within human families and
states of multiple polymorphic genetic markers that are
experimentally probed in every family member.
Two additional major assumptions here:
1) Genetic variation within multiple genes (rather than a single gene that
is common to all affected individuals) can influence the disease
phenotype and that potential disease-predisposing genes are grouped
into a compact gene cluster within a molecular network.
2) Heterogeneity of genetic variations underlie the same phenotype in
different affected families.
Molecular Network:
GeneWays 6.0 database (Rzhetsky et. al, 2004)
– Generated by the large-scale text-mining of nearly 250,000 full-text
articles from 78 leading biochemical journals.
– Removed all non-human specific interactions
– Used only direct interactions between genes or their products (a total
of 47 distinct types, such as binding, phosphorylation, and methylation)
as opposed to regulatory interactions between pairs of genes that are
indirect (activation or inhibition).
– Used only those molecular interactions for which all genes have NCBI
geneIDs and only those genes were used for which physical coordinates
are available in UCSC database.
Data:
Molecular Network: Two different networks depending on whether the Xchromosome is included in the analysis.
Resulting molecular-interaction network comprises of
1) Without X-chromosome: 4,374 genes and 13,612 interactions including
1,120 self-interactions. It contains one large and several small connected
components. Used only large connected componenet which includes
3,829 genes and 12,446 interactions (of the latter 985 are selfinteractions, excluded from the analysis).
2) With X-chromosome: 4,019 genes and 13,256 interactions, including
1,025 self-interactions.
Linkage Datasets: Genotypic and phenotypic data :
Autism: 336 families, Bipolar: 414 families & Schizophrenia: 87 families.
All are multiplex families – a family is ‘multiplex’ if it includes more than
one affected person with respect to a specific disease.
A hypothetical family. Each pedigree (family) described in our data is
provided with a kinship structure and phenotypic information. Circles and
squares represent female and male individuals, respectively. By convention,
the filled shapes represent affected (sick) individuals; the open shapes
represent unaffected (healthy) ones.
Gene clusters:
Assumption: A set of genes that can contribute to the risk of contracting a
given disease phenotype when the sequence of at least one of them is
critically modified.
Under this assumption, gene cluster, C, is defined as a set of genes with
members that are grouped by their ability to harbor genetic polymorphisms
that predispose the bearer to a specific disease.
For every gene within a gene cluster C,
c is the size of the cluster
c
p
i 1
i
1
Cluster probability (pi) as the share of guilt attributable to variations at the
locus of the i-th gene for the disease phenotype in a large group of
randomly selected disease-affected families.
Explicit assumption: gene clusters are set of genes that are joined through
direct molecular interactions into a “connected subgraph”.
Genes known to participate in the same polygenic disorder tend to be close to
each other in a molecular network.
A gene with zero cluster probability:
It may not be directly involved in the disease etiology even though
it is a member of the gene cluster that has the highest likelihood value.
These zero contribution genes can serve as connectors for genes that are
strongly linked to the disease.
Genes with higher cluster probabilities are more likely to bear diseasepredisposing polymorphisms at the corresponding loci.
For formulation of mathematical model two more assumptions:
1) Only those genes that are within a gene cluster, C, harbor a diseasepredisposing variation.
2) For every family under analysis, exactly one gene from cluster C is a
disease-predisposing gene. That is, the phenotype status of every
individual is determined by the state (allele) of the family-specific gene
in the individual’s genome.
This disease model is a mixture of probabilistic models, each of which is
determined by one disease-predisposing gene in C, with the mixing
coefficients being the cluster probabilities. The set of families affected by
the same disease under this model is a mixture of families that are
predisposed to the disease via mutations at different genes that belong to C.
Y = union of genotypic and phenotypic data
Yf = portion of Y associated with the f-th family (pedigree)
= all of the linkage related parameters, including genetic penetrance,
background frequencies of marker alleles, and genetic distances between the
markers.
Assumptions:
1) Every gene in C has only one healthy and one disease-predisposing allele
and that the expected frequencies of these alleles are the same for all genes
in the cluster.
2) The genetic-penetrance parameters are the same for every gene in C.
Under this model, given the state of the chosen gene, the disease-phenotype
state of the individual is independent of the rest of the individual’s genome
and of the genotypes and phenotypes of his/her family members.
This assumption of independence leads to a “gene mixture generative model”
of the data:
The i-th disease-predisposing gene is assigned to a family by a random draw
from C with probability pi.
Once a gene is assigned to a family, the disease-related phenotype variation in
this family is probabilistically dependent on the state of the i-th gene and is
independent of the states of all other genes in C and in the rest of the
genome.
Gene-specific significance tests:
Optimum gene cluster of size c : which achieves maximum cluster LOD
score, gene cluster probabilities are estimated using the maximumlikelihood method for each cluster.
Identification of optimum c-gene cluster is done by simulated annealing
method together with bootstrap aggregation technique.
For a given gene, the statistic measures the relative strength of this gene’s
linkage signal compared to all genes under study.
Evaluation of the significance of each gene-specific test statistic using
simulated data was done under the null hypothesis that the disease
phenotype is completely unlinked to any part of the genome, preserving
the real pedigree topologies encoded in data sets and the real frequencies
of genetic markers.
One important point:
This method analyzes “groups” of functionally related genes rather than
individual genes and, as a result, the linkage signal is magnified through
summation over multiple genes within the same gene cluster, while each
individual gene may exhibit only a weak evidence of linkage.
Example: TJP1 or UBE3A – on its own yields a weak empirical P-value
because of the genetic heterogeneity of the families: In addition to
families with positive LOD scores there are families that show strong
negative signals for both genes. It is only when we consider gene clusters
that the linkage signal becomes strong and obvious.
Matching of LOD scores.
14 top-ranking gene clusters in all three disorders.
Closer look at candidate genes:
1) Some genes are regulators of cell cycle and cell death (EDAR, BCL2L11, NEK6,
SFRP1, MAPK7).
2) Some genes form intercellular contacts (TJP1, LGALS4, MMRN1, IBSP,
NPHP1)
3) A few genes are brain-specific growth and signal-transduction receptors and
small molecule transporters (RAPSN, APBA2, UBE3A, ALK, KCNB1).
4) A few others are related to immune response (CCL15, CSF2, CD55, IL10).
In a recent study (Bustamante et al, 2005) of positive selection patterns in the
human genome, it has been discovered that many genes involved in cell-cycle and
cell-death regulation appear to have undergone recent positive selection in the
lineage leading to hominid primates.
Thus, it is plausible that phenotypic effects that we classify as neurological
disorders are artifacts of a mosaic of small genetic changes that occurred during
evolutionary optimization of multiple physiological systems involved in the
unusually prolonged individual development of a human brain.
In this study, top disease-predisposing candidate loci are enriched with these cell
death and cell cycle related genes as computed by a test of randomized gene
sampling using gene ontology categories.
However, this significance of enrichment is likely to be a mere reflection of the
overall abundance of cell cycle and cell death related genes in our text-mined
network. (Possible weak point of this method - biasness).
This analysis does not identify as significant any of the previously suggested 16
genes most likely related to schizophrenia susceptibility – consistent with the
hypothesis of genetic heterogeneity, majority of the 16 genes has no or very weak
linkage support in 94 families, while some genes provide good support – but,
unfortunately those genes have few known direct physical interactions and are
either disconnected or poorly connected in molecular-interaction network. (another
weak point of the method – incompleteness of the network).