Download Statistical Analysis of Gene Expression Data (A Large Number of

Estimating parameters in a statistical model • Likelihood and Maximum likelihood estimation • Bayesian point estimates • Maximum a posteriori point estimation • Empirical Bayes estimation 2-15-05 1 Random Sample •Random sample: A set of independently generated observations by the statistical model. •For example, n replicated measurements of the differences in expression levels for a gene under two different treatments x1,x2,....,xn ~ iid N(,2) •Given parameters, the statistical model defines the probability of observing any particular combination of the values in this sample •Since the observations are independent, the probability distribution function describing the probability of observing a particular combination is the product of probability distribution functions 2-15-05 2 Probability distribution function vs probability •In the case of the discrete random variables which have a countable number of potential values (can assume finitely many for now), probability density function is equal to the probability of each value (outcome) •In the case of a continuous random variable which can yield any number in a particular interval, the probability distribution function is different from the probability •Probability of any particular number for a continuous random variable is equal to zero •Probability density function defines the probability of the number falling into any particular sub-interval as the area under the curve defined by the probability density function. 2-15-05 3 Probability distribution function vs probability •Example: The assumption of our Normal model is that there the outcome can be pretty much any real number. This is obviously a wrong assumption, but it turns out that this model is a good approximation of the reality. •We could "discretize" this random variable. Define r.v. y={1 if |x|>c and 0 otherwise} for some constant c •This random variable can be assume 2 different values and the probability distribution function is define by p(y=1) •Although the probability distribution function in the case of a continuous random variable does not give probabilities, it satisfies key properties of the probability. 2-15-05 4 Back to basics – Probability, Conditional Probability and Independence • Discrete pdf (y) 1) p(y  i)  0 • Continuous pdf (x) 1) f (x)  0 b 2) p(y  i)  1 2)  f ( x)dx  1 a  2 3)  p( y  i)  1 i 0 p( y1 , y2 ) p( y 2 | y1 ) p( y1 ) 4) p( y1 | y2 )   p( y 2 ) p( y 2 ) • For y1,...,yn iid of p(y) 5) p( y1 ,..., yn )  p( y1 )... p( yn ) 3)  f(x)dx  1  -2 0  4) f ( x1 | x2 )  2  4 6 LR a b x f ( x1 , x2 ) f ( x2 | x1 ) f ( x1 )  f ( x2 ) f ( x2 ) • For x1,...,xn iid of f(x) 5) f ( x1 ,..., xn )  f ( x1 )... f ( xn ) • From now on, we will talk in terms of just a pdf and things will hold for both discrete and continuous random variables 2-15-05 5 Expectation, Expected value and Variance • Discrete pdf (y) • Expectation of any function g of the random variable y (average value of the function after a large number of experiments 1 E[ g(y)]   g (i) p(y  i) i 0 • Continuous pdf (x) • Expectation of any function g of the random variable x   E[ g(x)]  g ( x) f ( x)dx  - -2 • Expected value - average y after a very large number of experiments 1 E[ y ]   i p(y  i) i 0 • Variance - Expected value of (y-E(y))2  i 0 2-15-05 2  • Expected value - average x after a very large number of experiments 4 LRa b x  E[ x]   x f ( x)dx - • Variance - Expected value of (x-E(x))2  1 E[( y  E[ y]) 2 ]  0 (i  E[ y]) 2 p(y  i)  E[( x  E[ x]) ]  ( x  E[ x]) 2 f ( x)dx 2 - 6 6 Expected Value and Variance of a Normal Random Variable • Normal pdf f N ( x | μ , σ2 )  1 2πσ (x μ) 2 e 2σ 2  -2 0 2   E[ x]   x f N ( x | μ , σ 2 )dx  μ - • Expected value - average x after a very large number of experiments 4 LRa b x   E[( x  μ) 2 ]  ( x  μ) 2 f N ( x | μ , σ 2 )dx  σ 2 • Variance - Expected value of (x-E(x))2 - 2-15-05 7 6 Maximum Likelihood •x1,x2,....,xn ~ iid f N ( x | μ , σ2 )  N(,2) 1 2πσ (x μ) 2 e 2σ 2 •Joint pdf for the whole random sample f ( x1 , x2 ,..., xn | μ , σ 2 )  f ( x1 | μ , σ 2 ) f ( x2 | μ , σ 2 )... f ( xn | μ , σ 2 ) •Likelihood function is basically the pdf for the fixed sample l ( μ , σ | x1 , x2 ,..., xn )  f ( x1 | μ , σ) f ( x2 | μ , σ)... f ( xn | μ , σ) •Maximum likelihood estimates of the model parameters  and 2 are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function x μ̂  n 2-15-05 i σ̂ 2 ( x  μ̂)   2 i n 8 Bayesian Inference • Assumes parameters are random variables - key difference • Inference based on the posterior distribution given data • Prior Distribution Defines prior knowledge or ignorance about the parameter • Posterior Distribution Prior belief modified by data Prior : Likelihood : Posterior : 2-15-05 p(  ) l ( x1 ,..., xn| ) f(μ | x1 ,..., xn )  l ( x1 ,..., xn | μ) p( μ) D( x1 ,..., xn ) 9 Bayesian Inference Prior distribution of  Prior :  |  , 2 ~ N ( , 2 ) -10 -8 -6 -4 -2 0 2 4 6 8 10 0 2 4 6 8 10 8 10  Data model given  Likelihood : x |  , 2 ~ N ( , 2 ) -10 Posterior : 1 n   x 2 2  2 2 2    |  , , x1 ,..., xn ~ N ( , 2 ) 2 1 n n     2 2 -6 -4 -2 LogRatio Posterior distribution of  given data (Bayes theorem) P(>0|data) -10 2-15-05 -8 -8 -6 -4 -2 0  2 4 6 10 Bayesian Estimation • Bayesian point-estimate is the expected value of the parameter under its posterior distribution given data 1  |  , , x1 ,..., xn ~ N (  2 Posterior :  2 1  2  n  2 n  x   , 2 ) n   2 2 2 1  2 E[  |  , , x1 ,..., xn ]   2  2 1  2  n 2 x n 2 • In some cases, the expectation of the posterior distribution could be difficult to assess - easer to find the value for the parameter that maximized the posterior distribution given data - Maximum a Posteriori (MAP) estimate • Since the numerator of the posterior distribution in the Bayes theorem is constant in the parameter, this is equivalent to maximizing the product of the likelihood and the prior pdf Posterior : 2-15-05 f(μ | x1 ,..., xn )  l ( x1 ,..., xn | μ) p( μ) D( x1 ,..., xn ) 11 Alternative prior for the normal model • Degenerate uniform prior for  assuming that any prior value is equally likely - this is clearly unrealistic - we know more than that Prior : Posterior : p( )  1 f (  | x1 ,..., xn )  l ( x1 ,..., xn |  ) p(  )  const * l ( x1 ,..., xn |  ) P( x1 ,..., xn ) • MAP estimate for  is identical to the maximum likelihood estimate • Bayesian point-estimation and maximum likelihood are very closely related 2-15-05 12 Hierarchical Bayesian Models and Empirical Bayes Inference •xi ~ ind N(i,2), i=1,...,n, assume that variance is known •Need to estimate i , i=1,...,n •The simplest estimate is μ̂ i  xi •Assuming that i ~ iid N(,2), i=1,...,n 1 Posterior :  i |  , , x1 ,..., xn ~ N (  2 2  1  2  1  2 n  2 xi   , 2 )   2 2 2 1  E[  i |  , , x1 ,..., xn ]   2 2  1  2  1 2 xi 1 2 • If we are not happy with pre-specifying  and 2, we can estimate them based on the "marginal" distribution of the data given  and 2 and plug them back into the formula for the Bayesian estimate - the result is the Empirical Bayes estimate 2-15-05 13 Hierarchical Bayesian Models and Empirical Bayes Inference •If xi ~ ind N(i,2), i ~ iid N(,2), i=1,...,n, •The "marginal" distribution of each xi, with i's "factored out" is N(,2+2), i=1,...,n μ̂ i  xi •Now we can estimate ˆ and ˆ 2 using say maximum likelihood and plug them back into the formula for the Bayesian estimates of i's 2-15-05 1 ˆ 1   2 xi 2  Empirical Bayes estimate of  i  ˆ 1 1  ˆ 2  2 14 Hierarchical Bayesian Models and Empirical Bayes Inference •The estimates for individual means are "shrunk" towards the mean of all means •Turns out such estimates are better overall than estimates based on the individual observations ("Stein effect") •Individual observations from our model can be replaced with groups of observations x1i,x2i,...,xki ~ ind N(i,2) •Limma does the similar thing, only with variances •Data for each gene i are assumed to be distributed as x1i,x2i,...,xki ~ iid N(i,i2), and the means are estimated in the usual way, while an additional hierarchy is placed on the variances describing how variances are expected to vary across genes: Prior : 1 1 | d0 , s ~  d20 2 2  d 0 s0 2-15-05 2 0  some minor assumptions  d 0 s02  (n  1)ˆ i2 2 2 2 ~ si  E[ i | ˆ i , d 0 , s0 ]  d0  n 1 15 Hierarchical Bayesian Models and Empirical Bayes Inference •Testing the hypothesis i=0, by calculating the modified t-statistics t*  ˆ 1 ~ si n 2-15-05 ~ t d 0  n 1 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistical Analysis of Gene Expression Data (A Large Number of