* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An introduction to Genetical Genomics and Systems
Genome evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Molecular ecology wikipedia , lookup
Expression vector wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Molecular evolution wikipedia , lookup
Inference of Regulatory Networks via Systems Genetics Ina Hoeschele 1 Systems Genetics Infer cell’s regulatory structure Systems Biology Infer molecular basis of phenotypes / diseases Complex Trait Biology 2 Systems Genetics Measure DNA sequence polymorphisms on a group of related individuals (<100 to 2000+) covering the entire genome (e.g. SNPs) Several genotypes at each polymorphism (e.g. two, 0/1) Multi-factorial perturbations of a system, genetically randomized populations Measure molecular and organismal variables, e.g. Expression profiling (etraits) Expression profiling and disease phenotypes Expression profiling, methylation profiling, disease Metabolite, protein profiling … 3 Systems Genetics The genotypes at some polymorphisms influence directly the expression of certain genes in cis: polymorphism A in gene A’s promoter region influences its transcript abundance in trans: polymorphism A in gene A’s coding region influences the function of protein A; let gene A be a regulator of gene B, then both polymorphism A and gene A influence the expression of gene B 4 Systems Genetics The genes’ expression profiles (=etraits) have both polymorphism and gene (etrait) regulators Very large number of targets (regulated genes etc.) Very large number of potential regulators for each target Sample size (n) MUCH smaller than number of potential regulators (p) Targets are co-regulated Regulators are correlated Regulatory networks are cyclic Analyses of regulatory programs should account for all of the above 5 Systems Genetics One target – one regulator approach YT = + bPR + e do for each T and each R (except cis analysis) low power trans: YT = + b1YR + b2PR + e (+ cisP) better power but does not account for coregulation of multiple targets 6 Systems Genetics One target – all regulators approach YT = + Rb1RPR (+ Rb2RYR ) + e do for each T, still does not account for co-regulation standard variable selection methods and regularization methods tend not to perform well (n<<p, correlated regulators) May also need to consider interactions among loci Often ignored or limited to two-way interactions Penalization/Regularization methods Constrained OLS, bounds on Lt norm(s) of coefficients (t=1, 2, …) Elastic net variable selection (Zou and Hastie 2005) Extension of lasso (compromise with ridge regression) n<<p, joint selection of correlated predictors Bayesian variable selection Priors on b MCMC ?? Deterministic (e.g. variational) ?? 7 Systems Genetics Clustering of targets Analyze jointly the targets in a cluster Single regulator model, multivariate analysis costly PCA within clusters, analyze PCs separately Analyze cluster with all regulator model (individual Y model but joint variable selection) Geronemo: iteratively perform clustering and selection of cluster=module regulators (regression tree) (Lee et al. 2006) 8 Systems Genetics Biclustering, two-group association Find groups of targets regulated by groups of polymorphisms Biclustering based on matrix of associations btw targets and polymorphisms – efficient but meaningful results? Various approaches for two-group association Penalized Canonical Correlation Analysis (CCA) Represent CCA in regression framework Bayesian CCA (probabilistic interpretation, joint latent factor model for both groups of variables) MCMC (convergence issues, see factor analysis) deterministic (variational) 9 Systems Genetics Two-step regulatory network inference 1a) Construct an Undirected Dependency Graph (UDG) using target data (e.g., expression) only 1b) Determine which polymorphisms affects which targets and use this information to direct edges (e.g., Neto et al. 2008) 2a) Perform cis and trans polymorphism analysis and combine into an encompassing network (Liu et al. 2008) 2b) Sparsify the network, using structural equation modeling SEM extension of linear regression (variables can be both response and predictor) likelihoods for SEM and LR not the same for cyclic networks Toward one-step regulatory network inference Geronemo (etraits; small list (~300) of candidate regulator genes) 10 11