Download Statistical Analysis of Gene Expression Data (A Large Number of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Estimating parameters in a statistical
model
• Likelihood and Maximum likelihood
estimation
• Bayesian point estimates
• Maximum a posteriori point estimation
• Empirical Bayes estimation
2-15-05
1
Random Sample
•Random sample: A set of independently generated
observations by the statistical model.
•For example, n replicated measurements of the differences in
expression levels for a gene under two different treatments
x1,x2,....,xn ~ iid N(,2)
•Given parameters, the statistical model defines the
probability of observing any particular combination of the
values in this sample
•Since the observations are independent, the probability
distribution function describing the probability of observing a
particular combination is the product of probability distribution
functions
2-15-05
2
Probability distribution function vs
probability
•In the case of the discrete random variables which have a
countable number of potential values (can assume finitely many
for now), probability density function is equal to the probability
of each value (outcome)
•In the case of a continuous random variable which can yield
any number in a particular interval, the probability distribution
function is different from the probability
•Probability of any particular number for a continuous random
variable is equal to zero
•Probability density function defines the probability of the
number falling into any particular sub-interval as the area under
the curve defined by the probability density function.
2-15-05
3
Probability distribution function vs
probability
•Example: The assumption of our Normal model is that there
the outcome can be pretty much any real number. This is
obviously a wrong assumption, but it turns out that this model
is a good approximation of the reality.
•We could "discretize" this random variable. Define r.v.
y={1 if |x|>c and 0 otherwise} for some constant c
•This random variable can be assume 2 different values and the
probability distribution function is define by p(y=1)
•Although the probability distribution function in the case of a
continuous random variable does not give probabilities, it
satisfies key properties of the probability.
2-15-05
4
Back to basics – Probability, Conditional Probability and
Independence
• Discrete pdf (y)
1) p(y  i)  0
• Continuous pdf (x)
1) f (x)  0
b
2) p(y  i)  1
2)
 f ( x)dx  1
a

2
3)
 p( y  i)  1
i 0
p( y1 , y2 ) p( y 2 | y1 ) p( y1 )
4) p( y1 | y2 ) 

p( y 2 )
p( y 2 )
• For y1,...,yn iid of p(y)
5) p( y1 ,..., yn )  p( y1 )... p( yn )
3)
 f(x)dx  1

-2
0

4) f ( x1 | x2 ) 
2

4
6
LR
a b x
f ( x1 , x2 ) f ( x2 | x1 ) f ( x1 )

f ( x2 )
f ( x2 )
• For x1,...,xn iid of f(x)
5) f ( x1 ,..., xn )  f ( x1 )... f ( xn )
• From now on, we will talk in terms of just a pdf and things will hold for both
discrete and continuous random variables
2-15-05
5
Expectation, Expected value and Variance
• Discrete pdf (y)
• Expectation of any function g of the random
variable y (average value of the function
after a large number of experiments
1
E[ g(y)] 
 g (i) p(y  i)
i 0
• Continuous pdf (x)
• Expectation of any function g of
the random variable x


E[ g(x)]  g ( x) f ( x)dx

-
-2
• Expected value - average y after a very
large number of experiments
1
E[ y ] 
 i p(y  i)
i 0
• Variance - Expected value of (y-E(y))2

i 0
2-15-05
2

• Expected value - average x after a
very large number of experiments
4
LRa b
x

E[ x] 
 x f ( x)dx
-
• Variance - Expected value of (x-E(x))2

1
E[( y  E[ y]) 2 ] 
0
(i  E[ y]) 2 p(y  i)

E[( x  E[ x]) ]  ( x  E[ x]) 2 f ( x)dx
2
-
6
6
Expected Value and Variance of a Normal Random Variable
• Normal pdf
f N ( x | μ , σ2 ) 
1
2πσ
(x μ) 2
e
2σ 2

-2
0
2


E[ x] 

x f N ( x | μ , σ 2 )dx  μ
-
• Expected value - average x after a
very large number of experiments
4
LRa b
x


E[( x  μ) 2 ]  ( x  μ) 2 f N ( x | μ , σ 2 )dx  σ 2
• Variance - Expected value of (x-E(x))2
-
2-15-05
7
6
Maximum Likelihood
•x1,x2,....,xn ~ iid
f N ( x | μ , σ2 ) 
N(,2)
1
2πσ
(x μ) 2
e
2σ 2
•Joint pdf for the whole random sample
f ( x1 , x2 ,..., xn | μ , σ 2 )  f ( x1 | μ , σ 2 ) f ( x2 | μ , σ 2 )... f ( xn | μ , σ 2 )
•Likelihood function is basically the pdf for the fixed sample
l ( μ , σ | x1 , x2 ,..., xn )  f ( x1 | μ , σ) f ( x2 | μ , σ)... f ( xn | μ , σ)
•Maximum likelihood estimates of the model parameters 
and 2 are numbers that maximize the joint pdf for the fixed
sample which is called the Likelihood function
x
μ̂ 
n
2-15-05
i
σ̂
2
( x  μ̂)


2
i
n
8
Bayesian Inference
• Assumes parameters are random variables - key difference
• Inference based on the posterior distribution given
data
• Prior Distribution
Defines prior knowledge or ignorance about the
parameter
• Posterior Distribution
Prior belief modified by data
Prior :
Likelihood :
Posterior :
2-15-05
p(  )
l ( x1 ,..., xn| )
f(μ | x1 ,..., xn ) 
l ( x1 ,..., xn | μ) p( μ)
D( x1 ,..., xn )
9
Bayesian Inference
Prior distribution of 
Prior :
 |  , 2 ~ N ( , 2 )
-10
-8
-6
-4
-2
0
2
4
6
8
10
0
2
4
6
8
10
8
10

Data model given 
Likelihood :
x |  , 2 ~ N ( , 2 )
-10
Posterior :
1
n


x
2
2
 2 2
2


 |  , , x1 ,..., xn ~ N (
, 2
)
2
1 n
n




2 2
-6
-4
-2
LogRatio
Posterior distribution of  given data
(Bayes theorem)
P(>0|data)
-10
2-15-05
-8
-8
-6
-4
-2
0

2
4
6
10
Bayesian Estimation
• Bayesian point-estimate is the expected value of the parameter under its
posterior distribution given data
1
 |  , , x1 ,..., xn ~ N ( 
2
Posterior :

2
1

2

n

2
n

x
 
, 2
)
n   2
2
2
1

2
E[  |  , , x1 ,..., xn ]  
2

2
1

2

n
2
x
n
2
• In some cases, the expectation of the posterior distribution could be
difficult to assess - easer to find the value for the parameter that
maximized the posterior distribution given data - Maximum a Posteriori
(MAP) estimate
• Since the numerator of the posterior distribution in the Bayes theorem is
constant in the parameter, this is equivalent to maximizing the product
of the likelihood and the prior pdf
Posterior :
2-15-05
f(μ | x1 ,..., xn ) 
l ( x1 ,..., xn | μ) p( μ)
D( x1 ,..., xn )
11
Alternative prior for the normal model
• Degenerate uniform prior for  assuming that any prior value is equally
likely - this is clearly unrealistic - we know more than that
Prior :
Posterior :
p( )  1
f (  | x1 ,..., xn ) 
l ( x1 ,..., xn |  ) p(  )
 const * l ( x1 ,..., xn |  )
P( x1 ,..., xn )
• MAP estimate for  is identical to the maximum likelihood estimate
• Bayesian point-estimation and maximum likelihood are very closely
related
2-15-05
12
Hierarchical Bayesian Models and Empirical Bayes
Inference
•xi ~ ind N(i,2), i=1,...,n, assume that variance is known
•Need to estimate i , i=1,...,n
•The simplest estimate is
μ̂ i  xi
•Assuming that i ~ iid N(,2), i=1,...,n
1
Posterior :
 i |  , , x1 ,..., xn ~ N ( 
2
2

1

2

1

2
n

2
xi
 
, 2
)
  2
2
2
1

E[  i |  , , x1 ,..., xn ]  
2
2

1

2

1
2
xi
1
2
• If we are not happy with pre-specifying  and 2, we can
estimate them based on the "marginal" distribution of the data
given  and 2 and plug them back into the formula for the
Bayesian estimate - the result is the Empirical Bayes estimate
2-15-05
13
Hierarchical Bayesian Models and Empirical Bayes
Inference
•If xi ~ ind N(i,2), i ~ iid N(,2), i=1,...,n,
•The "marginal" distribution of each xi, with i's "factored out" is
N(,2+2), i=1,...,n
μ̂ i  xi
•Now we can estimate ˆ and ˆ 2 using say maximum likelihood and
plug them back into the formula for the Bayesian estimates of
i's
2-15-05
1 ˆ 1
  2 xi
2

Empirical Bayes estimate of  i  ˆ
1
1

ˆ 2  2
14
Hierarchical Bayesian Models and Empirical Bayes
Inference
•The estimates for individual means are "shrunk" towards the mean of all
means
•Turns out such estimates are better overall than estimates based on the
individual observations ("Stein effect")
•Individual observations from our model can be replaced with groups of
observations
x1i,x2i,...,xki ~ ind N(i,2)
•Limma does the similar thing, only with variances
•Data for each gene i are assumed to be distributed as
x1i,x2i,...,xki ~ iid N(i,i2), and the means are estimated in the usual way,
while an additional hierarchy is placed on the variances describing how
variances are expected to vary across genes:
Prior :
1
1
| d0 , s ~
 d20
2
2

d 0 s0
2-15-05
2
0
 some minor assumptions

d 0 s02  (n  1)ˆ i2
2
2
2
~
si  E[ i | ˆ i , d 0 , s0 ] 
d0  n 1
15
Hierarchical Bayesian Models and Empirical Bayes
Inference
•Testing the hypothesis i=0, by calculating the modified t-statistics
t* 
ˆ
1
~
si
n
2-15-05
~ t d 0  n 1
16
Related documents