* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 46556-2-12118
Gene desert wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic testing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Medical genetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Behavioural genetics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Population genetics wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Heritability of IQ wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
International Biometric Society MIXED GRAPHICAL MARKOV MODELS AND HIGHER-ORDER CONDITIONING FOR INVESTIGATING THE GENETICS OF GENE EXPRESSION Inma Tur1, Alberto Roverato2 and Robert Castelo1 1 Universitat Pompeu Fabra, Barcelona, Spain 2 Università di Bologna, Italy The parallel measurement of the concentration, or expression, of RNA molecules for thousands of genes enables a large-scale unbiased profiling of a heritable cellular trait that mediates the genetic basis of complex phenotypes. The resulting data forms a highdimensional multivariate sample which, to a large extent, reflects the entire phenotypic state of cells, tissues and sometimes, even whole organisms. Unfortunately, expression-profiling technology also incorporates into these measurements additional sources of non-biological variability. Next to the heterogeneity produced by these sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of every of these factors to end up with a network model of direct associations connecting the path from genotype to phenotype through the intervening genes to study the genetics of gene expression and higher-level phenotypes. However, the large number p of genes and genetic loci to analyse as random variables exceeds by far the available number of multivariate observations n, precluding the direct application of classical multivariate techniques that start with a saturated model. Moreover, genetic effects emanating from discrete genotypes may act non-additively through allele dominance and/or mask each other between different loci, a phenomenon known as epistasis. We use the framework of mixed graphical Markov models (GMMs) and conditional Gaussian distributions to approach the analysis of the genetics of gene expression whose primary purpose is identify genomic regions responsible for expression variability, also known as expression quantitative trait loci (eQTL). By simulating this type of models we can learn how genetic additive effects on mixed genotype-gene interactions (eQTL) propagate through genes as function of the magnitude of the correlation in pure continuous gene-gene associations. Standard linear theory coupled with decomposability in mixed GMMs enables to perform an exact likelihood ratio test for the presence of mixed interactions between genotypes and gene expression profiles. We show that testing these associations exactly is critical when using higher-order conditional independences because the asymptotic condition of classical deviance tests following a chi-squared distribution under the null breaks under decreasing sample sizes and increasing interaction orders and conditioning sizes. We exploit the use of mixed GMMs and higher-order conditioning by means or limitedorder correlations and marginal distributions of dimension (q+2) < n that enable the analysis of this kind of data with p >> n. Applying these procedures on data from an experimental cross between two strains of yeast, allows us to learn that the larger genetic effects, caused by the engineered deletions in the genome of one of the strains, act on genes with a highnumber of gene-gene associations in the resulting estimate of the mixed GMM. International Biometric Conference, Florence, ITALY, 6 – 11 July 2014