* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 7
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Student's t-test wikipedia , lookup
Taylor's law wikipedia , lookup
Chapter 7. Statistical Estimation and Sampling Distributions 7.1 7.2 7.3 7.4 7.5 NIPRL Point Estimates Properties of Point Estimates Sampling Distributions Constructing Parameter Estimates Supplementary Problems 7.1 Point Estimates 7.1.1 Parameters • Parameters – In statistical inference, the term parameter is used to denote a quantity , say, that is a property of an unknown probability distribution. – For example, the mean, variance, or a particular quantile of the probability distribution – Parameters are unknown, and one of the goals of statistical inference is to estimate them. NIPRL NIPRL Figure 7.1 The relationship between a point estimate and an unknown parameter θ NIPRL Figure 7.2 Estimation of the population mean by the sample mean 7.1.2 Statistics • Statistics – In statistical inference, the term statistics is used to denote a quantity that is a property of a sample. – Statistics are functions of a random sample. For example, the sample mean, sample variance, or a particular sample quantile. – Statistics are random variables whose observed values can be calculated from a set of observed data. Examples : sample mean X X1 X 2 n Xn 2 ( X X ) i1 i n sample variance S 2 NIPRL n 1 7.1.3 Estimation • Estimation – A procedure of “guessing” properties of the population from which data are collected. – A point estimate of an unknown parameter is a statistic ˆ that represents a “guess” at the value of . • Example 1 (Machine breakdowns) – How to estimate P(machine breakdown due to operator misuse) ? • Example 2 (Rolling mill scrap) – NIPRL How to estimate the mean and variance of the probability distribution of % scrap ( D% scrap ) ? 7.2 Properties of Point Estimates 7.2.1. Unbiased Estimates (1/5) • Definitions - A point estimate unbiased if ˆ for a parameter is said to be E (ˆ) - If a point estimate is not unbiased, then its bias is defined to be bias E (ˆ) NIPRL 7.2.1. Unbiased Estimates (2/5) • Point estimate of a success probability If X - NIPRL X B(n, p) , then pˆ n B(n, p ) E ( X ) np, X 1 1 hence E ( pˆ ) E ( ) E ( X ) np p n n n it means that pˆ is anunbiased estimate X 7.2.1. Unbiased Estimates(3/5) • Point estimate of a population mean If X 1 , X 2 , , X n is a sample observations from a prob. dist. with a mean , then the sample mean ˆ X is an unbiased point estimate of the population mean . - since E ( X i ) , 1 i n , n 1 1 n 1 E ( ˆ ) E ( X ) E ( X i ) E ( X i ) n n i 1 n i 1 n NIPRL 7.2.1. Unbiased Estimates(4/5) • Point estimate of a population variance If X 1 , X 2 , , X n is a sample observations from a prob. dist . with a variance 2 , then the sample variance 2 ( X X ) i1 i n ˆ 2 S 2 n 1 is an unbiased point estimate of the population variance 2 NIPRL 7.2.1. Unbiased Estimates (5/5) i1 ( X i X )2 i1 (( X i ) ( X ))2 n n i 1 ( X i ) 2 2( X ) i 1 ( X i ) n( X ) 2 n n i 1 ( X i ) 2 n( X ) 2 n note E ( X i ) , E ( X i )2 Var ( X i ) 2 E( X ) , E ( X ) 2 Var ( X ) 1 n 1 2 E (S ) E i 1 ( X i X ) E n 1 n 1 2 1 n 2 2 i 1 n n 1 n 2 NIPRL 2 n 2 2 ( X ) n ( X ) i1 i n 7.2.2. Minimum Variance Estimates (1/4) • Which is the better of two unbiased point estimates? Probability density function of ˆ 2 Probability density function of ˆ 1 NIPRL x 7.2.2. Minimum Variance Estimates (2/4) SinceVar (ˆ1 Var (ˆ2 ),ˆ2 is a better point estimate thanˆ1. This can be written P(| ˆ1 | )P(| ˆ2 | )for any value of 0. Probability density function of ˆ 2 Probability density function of ˆ 1 NIPRL 7.2.2. Minimum Variance Estimates (3/4) • An unbiased point estimate whose variance is smaller than any other unbiased point estimate: minimum variance unbised estimate (MVUE) • Relative efficiency • The relative efficiency of an unbiased point estimate ˆ1 Var (ˆ2 ) ˆ to an unbiased point estimate 2 is Var (ˆ ) 1 Mean squared error (MSE) 2 MSE ( ) E ( ) How is it decomposed ? – – Why is it useful ? NIPRL 7.2.2. Minimum Variance Estimates (4/4) Probability density function of ˆ Probability density function of ˆ 2 1 ˆ1 Bias of ˆ1 NIPRL ˆ2 x Bias of ˆ2 Example: two independent measurements XA N (C, 2.97) \ and \ X B N (C,1.62) Point estimates of the unknown C C A X A \ and \ C B X B They are both unbiased estimates since E[C A ] C \ and \ E[C B ] C. The relative efficiency of C A to C B is Var (C B ) / Var (C A ) 1.62 / 2.97 0.55. Let us consider a new estimate C pC A (1 p)C B . Then, this estimate is unbiased since E[C] pE[C A ] (1 p) E[C B ] C. What is the optimal value of p that results in C having the smallest possible mean square error (MSE)? NIPRL Let the variance of C be given by Var (C ) p 2Var (C A ) (1 p) 2Var (C B ) p 2 12 (1 p) 2 2 2 . Differentiating with respect to p yields that d Var (C ) 2 p 12 2(1 p) 2 2 . dp The value of p that minimizes Var (C ) : 1/ 12 d Var (C ) | p p 0 p dp 1/ 12 1/ 22 Therefore, in this example, 1/ 2.97 p 0.35. 1/ 2.97 1/1.62 The variance of 1 Var (C ) 1.05. 1/ 2.97 1/1.62 NIPRL The relative efficiency of Var (C ) C B to C is 1.05 0.65. Var (C B ) 1.62 In general, assuming that we have n independent and unbiased 2 estimates i , i 1, , n having variance i , i 1, , n respectively for a parameter , we can set the unbiased n estimator as 2 i i 1 n /i 2 1/ i . i 1 The variance of this estimator is Var ( ) 1 n 2 1/ i i 1 NIPRL . Mean square error (MSE): Let us consider a point estimate . Then, the mean square error is defined by MSE( ) E[( )2 ]. Moreover, notice that MSE ( ) E[( ) 2 ] E[( E[ ] E[ ] ) 2 ] E[( E[ ]) 2 2( E[ ])( E[ ] ) ( E[ ] ) 2 E[( E[ ]) 2 ] ( E[ ] ) 2 Var ( ) bias 2 NIPRL 7.3 Sampling Distribution 7.3.1 Sample Proportion (1/2) • If X X has n p(1 p) N p, n B(n, p), then the sample propotion pˆ the approximate distribution pˆ E ( pˆ ) p, X p(1 p) Var ( X ) np(1 p) Var ( pˆ ) Var ( ) n n NIPRL 7.3.1 Sample Proportion (2/2) • Standard error of the sample mean Thestandard error of the sample mean is defined as p (1 p ) , but since p is usually unknown, n wereplace p by the observed value pˆ x / nto have s.e.( pˆ ) pˆ (1 p̂) 1 x(n x) n n n s.e.( pˆ ) NIPRL 7.3.2 Sample Mean (1/3) • Distribution of Sample Mean If X 1 , , X n are observation from a population with mean and variance 2 , then the Central Limit Theorem says, ˆ X NIPRL N ( , 2 n ) 7.3.2 Sample Mean (2/3) • Standard error of the sample mean The standard error of the sample mean is defined as s.e.( X ) , but since is usually unknown, n wemaysafelyreplace ,whennislarge, by the observed value s, s s.e.( X ) n NIPRL 7.3.2 Sample Mean (3/3) If n 20, prob. that ˆ X lies within / 4 of 2 P X P N ( , ) 4 4 4 20 4 20 20 P N (0,1) (1.12) (1.12) 0.7372 4 4 Probability density function 74% of ˆ X when n 20 / 4 NIPRL / 4 7.3.3 Sample Variance (1/2) • Distribution of Sample Variance , X n normally distributed with mean and If X 1 , variance 2 , then the sample variance S 2 has the distribution S NIPRL 2 2 n21 (n 1) Theorem: if X i , i 1, , n is a sample from a normal 2 population having mean and variance , then X \ and \ S 2 are independent random variables, with X being normal with mean and variance 2 / n and (n 1) S 2 / 2 being chi-square with n-1 degrees of freedom. (proof) n n 2 2 2 (Yi Y ) Yi nY Let Yi X i . Then, i 1 i 1 or equivalently, n n i 1 i 1 2 2 2 ( X X ) ( X ) n ( X ) . i i Dividing this equation by n n i 1 i 1 2 , we get 2 2 2 (( X ) / ) (( X X ) / ) ( n ( X ) / ) . i i Cf. Let X and Y be independent chi-square random variables with m and n degrees of freedom respectively. Then, Z=X+Y is a chi-square random variable with m+n degrees of freedom. NIPRL In the previous equation, n 2 ( n ( X ) / ) i 1 n 2 (( X ) / ) i i 1 12 \ and n2 . Therefore, n 2 2 2 (( X X ) / ) ( n 1) S / i i 1 NIPRL n21. 7.3.3 Sample Variance (2/2) • t-statistics X 2 n N , (X ) n And also n21 S (n 1) N (0,1) so that n ( X ) n( X ) S S NIPRL N (0, 1) 2 n 1 (n 1) tn 1 7.4 Constructing Parameter Estimates 7.4.1 The Method of Moments (1/3) • Method of moments point estimate for One Parameter If a data set of observations x1 , , xn from a probabilty distribution that depends upon one unknown parameter , the method of moments point estimate ˆ of the parameter is found by solving the equation x E ( X ) NIPRL 7.4.1 The Method of Moments (2/3) • Method of moments point estimates for Two Parameters For unknown two parameters 1 and 2 , The method of moments point estimates are found by solving the equations x E ( X ) and s 2 Var ( X ) NIPRL 7.4.1 The Method of Moments (3/3) • Examples - For example of normally distributed data, since E ( X ) and Var ( X ) 2 for N ( , 2), the method of moments give x ˆ and s 2 2 - Suppose that the data observations 2.0 2.4 3.1 3.9 4.5 4.8 5.7 9.9 are obtained from a U (0, ). x 4.5375 and E ( X ) 2 ˆ 2 4.5375 9.075 – What if the distribution is exponential with the parameter NIPRL ? 7.4.2 Maximum Likelihood Estimates (1/4) • Maximum Likelihood Estimate for One Parameter If a data set consists of observations x1 , , xn from a probability distrbution f ( x, ) depending upon one unknown parameter , The maximum likelihood estimate ˆ of the parameter is found by maximizing the likelihood function L( x1 , , xn , ) f ( x1 , ) f ( xn , ) NIPRL 7.4.2 Maximum Likelihood Estimates (2/4) • Example - If x1 , , xn are a set of Bernoulli observations, with f (1, p) p and f (0, p) 1 p, i.e. f ( xi , p) p xi (1 p)1 xi - The likelihood function is n L( p; x1 , , xn ) p xi (1 p )1 xi p x (1 p ) n x , where x x1 i 1 xn , and the m.l.e pˆ is the value that maximizes this. - The log-likelihood is ln( L) x ln( p) ( n x) ln(1 p) d ln( L) x n x x ˆ and 0 p dp p 1 p n NIPRL 7.4.2 Maximum Likelihood Estimates (3/4) • Maximum Likelihood Estimate for Two Parameters For two unknown parameters 1 and 2 , the maximum likelihood estimates ˆ and ˆ 1 2 are found bymaximizing the likelihood function L(1 , 2 ;x1 , , xn ) f ( x1; 1 , 2 ) f ( xn ; 1 , 2 ) NIPRL 7.4.2 Maximum Likelihood Estimates (4/4) • Example The normal dist. f ( x, , ) 2 1 ( x )2 / 2 2 e 2 n likelihood L( x1 , xn , , ) f ( xi , , 2 ) 1 2 2 2 i 1 n/2 n exp ( xi ) 2 / 2 2 i 1 2 ( x ) n i 2 so that log-likelihood ln( L) ln(2 ) i 1 2 2 2 n i 1 ( xi ) 2 ( x ) d ln( L) d ln( L) n i i 1 , d 2 d ( 2 ) 2 2 2 4 setting two equations to zeros, n n 2 2 ˆ ( x ) ( x x ) ˆ x , i i ˆ 2 i 1 i 1 n n n NIPRL n 7.4.3 Examples (1/6) • Glass Sheet Flaws At a glass manufacturing company, 30 randomly selected sheets of glass are inspected. If the dist. of the number of flaws per sheet is taken to have a Poisson dist. How should thepara. be estimated? - The method of moment E ( X ) ˆ x NIPRL 7.4.3 Examples (2/6) • The maximum likelihood estimate: e xi f ( xi , ) , so that the likelihood is xi ! n L( x1 , , xn , ) i 1 e n ( x1 xn ) f ( xi , ) ( x1 ! xn !) The log-likelihood is therefore ln( L) n ( x1 xn ) ln( ) ln( x1 ! xn !) ( x1 xn ) d ln( L) so that n 0 ˆ x d NIPRL 7.4.3 Examples (3/6) • Example 26: Fish Tagging and Recapture Suppose fisherman wants to estimate the fish stock N of a lake and that 34 fish have been tagged and released back into the lake. If, over a period time, the fisherman catches 50 fish and 9 of them are tagged, then an intuitive estimate of total number of fish is 34 50 Nˆ 189 9 this assumes that the proportion of tagged fish roughly equal to the proportion of the fisherman's catch that is tagged. NIPRL 7.4.3 Examples (4/6) Under the assumption that all the fish are equally likely to be caught, the dist. of the number of tagged fish X in the fisherman's catch of 50 fish is a hypergeometric dist. with r 34, n 50 and N unknown. r N r x n x hypergeometric P( X x | N , n, r ) , N n E( X ) nr 50 34 N N nr Since there is only one obs. x and so x x, equating E ( X ) x N 50 34 ˆ N 188.89 9 NIPRL 7.4.3 Examples (5/6) Under binomial approximation, in this case success prob. p is estimated to be pˆ r N x 9 r 50 34 , Hence Nˆ n 50 pˆ 9 • Example 36: Bee Colonies The data on the proportion of workers bees that leave a colony with a queen bee. 0.28 0.32 0.09 0.35 0.45 0.41 0.06 0.16 0.16 0.46 0.35 0.29 0.31. If the entomologist wishes to model this proportion with a beta dist., How should the parameters be estimated? NIPRL 7.4.3 Examples (6/6) (a b) a 1 x (1 x)b 1 , 0 x 1, a 0, b 0 (a)(b) a ab E( X ) , Var ( X ) ab (a b) 2 (a b 1) beta f ( x | a, b) x 0.3007, s 2 0.01966 So the point estimates aˆ and bˆ are solution to the equations a ab 0.3007 and 0.01966 2 ab (a b) (a b 1) which are aˆ 2.92 and bˆ 6.78 NIPRL MLE for U (0, ) • For some distribution, the MLE may not be found by differentiation. You have to look at the curve of the likelihood function itself. • The MLE of max {X1 , NIPRL , X n}