Download MLE - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MAXIMUM LIKELIHOOD ESTIMATES
The Maximum Likelihood Estimate is one estimate of a parameter’s value. It basically answers the
question: “Given my data, what’s the most likely value of an unknown parameter”. To use it, we need to
compute the likelihood function by taking a sample of n observations from the same distribution,
evaluating each of the n observations, (k1, …, kn), at the pdf pX(x), and taking the product of the result for
all n.
𝑛
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐿(𝜃) = ∏ 𝑝𝑋 (𝑥; 𝜃)
𝑖=1
Note: The expression pX(x; θ) is just the distribution of the population from which we’re sampling – usually the
parameter is added with a semicolon to emphasize the fact that the distribution is characterized by the unknown
parameter
The likelihood function is just the joint density of the random sample. Since samples are independent and
identically distributed (iid), the joint pdf of all n sample observations is the product of the individual
densities. This is the same principle that lets us multiply the P(A), P(B), and P(C) together to find P(A
intersect B intersect C) when events A, B, and C are independent. Suppose we take a sample of size n = 3
from the distribution, and the resulting values are x1, x2, and x3. What’s the probability associated with
the three sample values? That is, what’s the joint density of the three sample values, pX(x1, x2, x3)?
𝑝𝑋 (𝑥1 , 𝑥2 , 𝑥3 ; 𝜃) = 𝑝𝑋 (𝑥1 ; 𝜃) ∙ 𝑝𝑋 (𝑥2 ; 𝜃) ∙ 𝑝𝑋 (𝑥3 ; 𝜃)
Generalizing this case for all n clearly gives us the result that the joint density of n randomly drawn
sample values is the product of individual densities and the likelihood function is nothing more than the
joint pdf of the sample – a multivariate probability density function of the values taken on my the n
random variables in the sample.
The likelihood function is a function of the unknown parameter. The Maximum Likelihood Estimate
for the unknown parameter is the parameter value that maximizes the likelihood function:
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒: 𝜃 ∗ 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝐿(𝜃 ∗ ) > 𝐿(𝜃) ∀ 𝜃
We use calculus to find this value, by first taking the derivative of the likelihood function with respect to
the unknown parameter, then setting it equal to 0 and solving for the parameter. Don’t forget to verify
conditions so as to make sure you are indeed finding a maximum.
𝑆𝑜𝑙𝑣𝑒
𝑑
𝐿(𝜃) = 0 𝑓𝑜𝑟 𝜃
𝑑𝜃
This will usually involve complicated, messy math. To mitigate this, we sometimes work with the
logarithm of the likelihood function and use properties of logs to simplify computations. This won’t
chance our answer – taking the logarithm of some function won’t change the point at which the maximum
value is achieved.
𝑆𝑜𝑙𝑣𝑒
𝑑
ln[𝐿(𝜃)] = 0 𝑓𝑜𝑟 𝜃
𝑑𝜃
The value of the parameter that you end up with maximizes the probability of your sample values x1,
…,xn. You could say it’s the value “most consistent” with the observed sample – the Maximum
Likelihood Estimate.