Download 46556-2-12118

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic testing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Medical genetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

Human genetic variation wikipedia , lookup

Twin study wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genetic engineering wikipedia , lookup

Behavioural genetics wikipedia , lookup

Epistasis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Population genetics wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Heritability of IQ wikipedia , lookup

Public health genomics wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
International Biometric Society
MIXED GRAPHICAL MARKOV MODELS AND HIGHER-ORDER CONDITIONING
FOR INVESTIGATING THE GENETICS OF GENE EXPRESSION
Inma Tur1, Alberto Roverato2 and Robert Castelo1
1
Universitat Pompeu Fabra, Barcelona, Spain
2
Università di Bologna, Italy
The parallel measurement of the concentration, or expression, of RNA molecules for
thousands of genes enables a large-scale unbiased profiling of a heritable cellular trait that
mediates the genetic basis of complex phenotypes. The resulting data forms a highdimensional multivariate sample which, to a large extent, reflects the entire phenotypic state
of cells, tissues and sometimes, even whole organisms. Unfortunately, expression-profiling
technology also incorporates into these measurements additional sources of non-biological
variability. Next to the heterogeneity produced by these sources of unwanted variation,
indirect effects spread throughout genes as a result of genetic, molecular and environmental
perturbations. From a multivariate perspective one would like to adjust for the effect of every
of these factors to end up with a network model of direct associations connecting the path
from genotype to phenotype through the intervening genes to study the genetics of gene
expression and higher-level phenotypes. However, the large number p of genes and genetic
loci to analyse as random variables exceeds by far the available number of multivariate
observations n, precluding the direct application of classical multivariate techniques that
start with a saturated model. Moreover, genetic effects emanating from discrete genotypes
may act non-additively through allele dominance and/or mask each other between different
loci, a phenomenon known as epistasis.
We use the framework of mixed graphical Markov models (GMMs) and conditional Gaussian
distributions to approach the analysis of the genetics of gene expression whose primary
purpose is identify genomic regions responsible for expression variability, also known as
expression quantitative trait loci (eQTL). By simulating this type of models we can learn how
genetic additive effects on mixed genotype-gene interactions (eQTL) propagate through
genes as function of the magnitude of the correlation in pure continuous gene-gene
associations. Standard linear theory coupled with decomposability in mixed GMMs enables
to perform an exact likelihood ratio test for the presence of mixed interactions between
genotypes and gene expression profiles. We show that testing these associations exactly is
critical when using higher-order conditional independences because the asymptotic
condition of classical deviance tests following a chi-squared distribution under the null
breaks under decreasing sample sizes and increasing interaction orders and conditioning
sizes. We exploit the use of mixed GMMs and higher-order conditioning by means or limitedorder correlations and marginal distributions of dimension (q+2) < n that enable the analysis
of this kind of data with p >> n. Applying these procedures on data from an experimental
cross between two strains of yeast, allows us to learn that the larger genetic effects, caused
by the engineered deletions in the genome of one of the strains, act on genes with a highnumber of gene-gene associations in the resulting estimate of the mixed GMM.
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014