Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 9.1: Parameter estimation CIS 2033. Computational Probability and Statistics Pei Wang Parameters of distributions After determining the family of distribution, the next step is to estimate the parameters Example 9.1: The number of defects on each chip is believed to follow Pois(λ) Since λ = E(X) is the expectation of a Poisson variable, it can be estimated with a sample mean X-bar Method of moments Method of moments (2) Special cases: μ’2 is Var(X), and m’2 is s2(n-1)/n Method of moments (3) To estimate k parameters, we can equate the first k population and sample moments (or centralized version), i.e. μ1 = m1 , … …, μk = mk The left-hand sides of these equations depend on the parameters, while the right-hand sides can be computed from data The method of moments estimator is the solution of this system of equations Moments method example The CPU time for 30 randomly chosen tasks of a certain type are (in seconds) 9 15 19 22 24 25 30 34 35 35 36 36 37 38 42 43 46 48 54 55 56 56 59 62 69 70 82 82 89 139 If they are considered to be the value of a random variable X, what is the model? Moments method example (2) The histogram suggests Gamma distribution Moments method example (3) Moments method example (4) From data, we compute and write two equations, Solving this system in terms of α and λ, we get the method of moment estimates Method of maximum likelihood Maximum likelihood estimator is the parameter value that maximizes the likelihood of the observed sample, L(X1, …, Xn) L(X1, …, Xn) is defined as p(X1, …, Xn) for a discrete distribution, and f(X1, …, Xn) for a continuous distribution When the variables X1, …, Xn are independent, L(X1, …, Xn) is obtained by multiplying the marginal pmfs or pdfs Maximum likelihood Maximum likelihood estimator is the parameter value that maximizes the likelihood L(θ) of the observed sample, x1, …, xn When the observations are independent of each other, L(θ) = pθ(x1)*...*pθ(xn) for a discrete variable fθ(x1)*...*fθ(xn) for a continuous variable Which is a function with θ as variable Maximum likelihood (2) Here we consider two types of L(θ): 1. If the function always increases or decreases in its defined range, the maximum value is at the boundary of the range, i.e., the min or max of θ 2. If the function first increases then decreases, the maximum value is at where its derivative L’(θ) is zero Example of Type 1 To estimate the θ in U(0, θ) given positive data x1, …, xn, L(θ) is θ-n when θ ≥ max(x1, …, xn), otherwise it is 0 So the best estimator for θ is max(x1, …, xn) since L(θ) is a decreasing function when θ ≥ max(x1, …, xn) Similarly, if x1, …, xn are generated by U(a, b), the maximum likelihood estimate is a = min(x1, …, xn), b = max(x1, …, xn) Example of Type 2 If the distribution is Ber(p), and m of the n sample values are 1, L(p) = pm(1 – p)n–m L’(p) = mpm–1(1 – p)n–m – pm(n – m)(1 – p)n–m–1 = (m – np)pm–1(1 – p)n–m–1 L’(p) is 0 when p = m/n, which also covers the situation where p is 0 or 1 So the sample mean is a maximum likelihood estimator of p in Ber(p) Exercise If a probability mass function is partially known, how to guess the missing part using testing data? Take the following die as an instance a 1 p(a) 0.1 count 12 2 3 4 5 6 0.1 10 0.2 19 0.2 23 ? 9 ? 27 Log-likelihood Log function turns multiplication into addition, and power into multiplication E.g. ln(f × g) = ln(f) + ln(g) ln(fg) = g × ln(f) Log-likelihood function and likelihood function reach maximum at the same value Therefore, ln(L(θ)) may be easier for getting maximum likelihood Log-likelihood (2) E. g., L(p) = pm(1 – p)n–m ln(L(p)) = m(ln(p)) + (n – m)(ln(1 – p)) [ln(L(p))]’ = m/p – (n – m)/(1 – p) m/p – (n – m)/(1 – p) = 0 m/p = (n – m)/(1 – p) m – mp = np – mp p = m/n Estimation of standard errors Standard error equals Std(T), so can be estimated through sample variances Mean squared error When both the bias and variance of estimators can be obtained, usually we prefer the one that has the smallest mean squared error (MSE) For estimator T of parameter θ, MSE(T) = E[(T − θ)2] = E[T2] −2θE[T] + θ2 = Var(T) + (E[T] − θ)2 = Var(T) + Bias(T)2 So, MSE summarizes variance and bias MSE example Let T1 and T2 be two unbiased estimators for the same parameter θ based on a sample of size n, and it is known that Var(T1) = (θ + 1)(θ − n) / (3n) Var(T2) = (θ + 1)(θ − n) / [(n + 2)n] Since n + 2 > 3 when n > 1, MSE(T1) > MSE(T2) , so T2 is a better estimator for all values of θ MSE example (2) Let T1 and T2 be two estimators for the same parameter, and it is known that Var(T1) = 5/n2, Bias(T1) = -2/n Var(T2) = 1/n2, Bias(T2) = 3/n 5 + 4 < 1 + 9, MSE(T1) < MSE(T2) , so T1 is a better estimator for the parameter