* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PowerPoint
History of genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genome evolution wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genome (book) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
The Selfish Gene wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton 1 Parametric Empirical Bayes Methods for Microarrays • Kendziorski, C. M., Newton, M. A., Lan, H., Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine. 22, 3899-3914. • Newton, M. A. and Kendziorski, C. M. (2003). Parametric empirical Bayes methods for microarrays. Chapter 11 of The Analysis of Gene Expression Data. Springer. New York. 2 The Gamma Distribution • X ~ Gamma(α, β) α=1 β=.0001 • E(X)=α / β • Var(X)= α / β2 density βα α-1 -βx • f(x)= x e for x>0. Г(α) α=340 β=.006 α=33 β=.0008 α=2 β=.00008 x 3 A Model for the Data from a Two-Treatment Experiment • Assume there are J genes indexed by j=1, 2, ..., J. • Data for gene j is xj = (xj1, xj2, ..., xjI) where xji is the normalized measure of expression on the original scale for the jth gene and ith experimental unit. • Let s1 denote the subset of the indices {1,...,I} corresponding to treatment 1. • Let s2 denote the subset of the indices {1,...,I} corresponding to treatment 2. 4 The Model (continued) • Assume that each gene is differentially expressed (DE) with an unknown probability p, and equivalently expressed (EE) with probability 1-p. • If gene j is equivalently expressed, then i.i.d. xj1, xj2, ..., xjI | λj ~ Gamma(α , λj) with mean α / λj , where λj ~ Gamma(α0 , ν) 5 The Model (continued) • If gene j is differentially expressed, then i.i.d. {xji : i in s1} | λj1 ~ Gamma(α , λj1) with mean α / λj1 , where λj1 ~ Gamma(α0 , ν), and i.i.d. {xji : i in s2} | λj2 ~ Gamma(α , λj2) with mean α / λj2 , where λj2 ~ Gamma(α0 , ν). • All random variables are assumed to be independent. • p, α, α0 , and ν are unknown parameters to be estimated from the data. 6 An example of how the model is imagined to generate the data for the jth gene. • Suppose p=0.05, α=12, α0=0.9, and v=36. • Generate a Bernoulli random variable with success probability 0.05. If the result is a success the gene is DE, otherwise the gene is EE. • If EE, generate λj from Gamma(α0=0.9, v=36). • Then generate i.i.d. expression values from Gamma(α=12, λj). 7 If gene is EE... Gamma(α=12, λj=0.05) Density density density Gamma(α0=0.9, v=36) Density λj x Expression values for the jth gene. Trt 1 and Trt 2 8 Example Continued • If the gene is DE, generate λj1 and λj2 independently from Gamma(α0=0.9, v=36) . • Then generate treatment 1 expression values i.i.d. from Gamma(α=12, λj1), and • generate treatment 2 expression values i.i.d. from Gamma(α=12, λj2). 9 If gene is DE... Gamma(α0=0.9, v=36) Density density density Gamma(α=12, λj2=0.07) λ Gamma(α=12, λj1=0.02) x 10 If gene is DE... Gamma(α0=0.9, v=36) Density density density Gamma(α=12, λj2=0.07) λ Gamma(α=12, λj1=0.02) x Trt 2 Data Trt 1 Data 11 Joint Density of xj for an EE Gene 12 Joint Density of xj for an EE Gene (continued) fEE(xj) 13 Joint Density for a DE Gene fDE(xj) = where = the number of treatment k observations. 14 Marginal Density for Gene j f(xj) = p fDE(xj) + (1-p) fEE(xj) Marginal Likelihood for the Observed Data f(x1) f(x2) ...f(xJ) Use the EM algorithm to find values of p, α, α0, and v that make the log likelihood as large as possible. 15 The posterior probability of differential expression for gene j is obtained by replacing p, α, α0, and v in p fDE(xj) p fDE(xj) + (1-p) fEE(xj) with their maximum likelihood estimates. Software for EBArrays is available at http://www.biostat.wisc.edu/~kendzior. 16 Extension to Multiple Treatment Groups • If there are 3 treatment groups, each gene can be classified into 5 categories rather than just the two categories EE and DE: a) 1=2=3 d) 1=3≠2 b) 1=2≠3 c) 1≠2=3 e) 1≠2 , 2≠3, 1≠3. • Extensions to more than 3 groups can be handled similarly. 17 Potential Drawbacks • Coefficient of variation is assumed constant across gene-treatment combinations. This is analogous to assuming constant error variance across all gene-treatment combinations in the analysis of log-scale expression data. • Between-gene difference are assumed to have the same distribution as within-gene betweentreatment differences for differentially expressed genes. 18 Coefficient of Variation is Constant across Gene-Treatment Combinations • Coefficient of Variation = CV = sd / mean • Conditional on the mean for a gene-treatment combination, say α / λjk, the CV for the expression data is the CV of Gamma(α, λjk). • CV of Gamma(α, λjk) is (α1/2/λjk)/(α/λjk)=1/α1/2. • Note that α is assumed to be the same for all gene-treatment combinations. 19 d Between-gene diffs = within-gene between-trt diffs • In real data, differences between genes tend to be larger than treatment differences within a DE gene. • Thus, assuming the same distribution for both types of differences leads to conservative inferences. 20