Download The Likelihood Function

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Transcript
Consistency
p
• An estimator ˆn is a consistent estimator of θ, if ˆn   , i.e., if ˆn
converge in probability to θ.
week 3
1
Theorem
• An unbiased estimator ˆn for θ, is a consistent estimator of θ if
 
lim Var ˆn  0.
n
• Proof:
week 3
2
Example
• Suppose X1, X2,…, Xn are i.i.d Poisson(λ). Let ̂  X then…
week 3
3
Important comment
• Consistency is an asymptotic property so we can have a consistent
estimator that is biased as long as it is asymptotically unbiased.
• Example: Uniform example above.
week 3
4
The Likelihood Function - Introduction
• Recall: a statistical model for some data is a set  f :    of
distributions, one of which corresponds to the true unknown
distribution that produced the data.
• The distribution fθ can be either a probability density function or a
probability mass function.
• The joint probability density function or probability mass function
of iid random variables X1, …, Xn is
n
f  x1 ,..., x n    f  xi .
i 1
week 3
5
The Likelihood Function
• Let x1, …, xn be sample observations taken on corresponding random
variables X1, …, Xn whose distribution depends on a parameter θ. The
likelihood function defined on the parameter space Ω is given by
L | x1 ,..., xn   f x1 ,..., xn .
• Note that for the likelihood function we are fixing the data, x1,…, xn,
and varying the value of the parameter.
• The value L(θ | x1, …, xn) is called the likelihood of θ. It is the
probability of observing the data values we observed given that θ is the
true value of the parameter. It is not the probability of θ given that we
observed x1, …, xn.
week 3
6
Examples
• Suppose we toss a coin n = 10 times and observed 4 heads. With no
knowledge whatsoever about the probability of getting a head on a
single toss, the appropriate statistical model for the data is the
Binomial(10, θ) model. The likelihood function is given by
• Suppose X1, …, Xn is a random sample from an Exponential(θ)
distribution. The likelihood function is
week 3
7
Sufficiency - Introduction
• A statistic that summarizes all the information in the sample about
the target parameter is called sufficient statistic.
• An estimator ˆ is sufficient if we get as much information about θ
from ˆ as we would from the entire sample X1, …, Xn.
• A sufficient statistic T(x1, …, xn) for a model is any function of the
data x1, …, xn such that once we know the value of T(x1, …, xn),
then we can determine the likelihood function.
week 3
8
Sufficient Statistic
• A sufficient statistic is a function T(x1, …, xn) defined on the
sample space, such that whenever T(x1, …, xn) = T(y1, …, yn), then
L | x1 ,..., x2   c  L | y1 ,... yn 
for some constant c.
• Typically, T(x1, …, xn) will be of lower dimension than x1, …, xn, so
we can consider replacing x1, …, xn by T(x1, …, xn) as a data
reduction and this simplifies the analysis.
• Example…
week 3
9
Minimal Sufficient Statistics
• A minimal sufficient statistic T for s model is any sufficient
statistic such that once we know a likelihood function L(θ|x1, …, xn)
for the model and data then we can determine T(x1, …, xn).
• A relevant likelihood function can always be obtained from the
value of any sufficient statistic T, but if T is minimal sufficient as
well, then we can also obtain the value of T from any likelihood
function.
• It can be shown that a minimal sufficient statistics gives the
maximal reduction of the data.
• Example…
week 3
10
Alternative Definition of Sufficient Statistic
• Let X1, …, Xn be a random sample from a distribution with unknown
parameter θ. The statistic T(x1, …, xn) is said to be sufficient for θ if
the conditional distribution of X1, …, Xn given T does not depend on θ.
• This definition is much harder to work with as the conditional
distribution of the sample X1, …, Xn given the sufficient statistics T is
often hard to derive.
week 3
11
Factorization Theorem
• Let T be a statistic based on a random sample X1, …, Xn. Then T is a
sufficient statistic for θ if
L | x1 ,..., xn   g T ; hx1 ,.., xn 
i.e. if the likelihood function can be factored into two nonnegative
functions one that depend on T(x1, …, xn) and θ and one that depend
only on the data x1, …, xn.
• Proof:
week 3
12
Examples
week 3
13