Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 CHAPTER: POINT ESTIMATION AND CONFIDENCE INTERVALS Contents 0 Introduction 1 Point Estimation and Unbiasness 2 Unbiased Estimators for Population Mean and Variance 3 Unbiased Estimator for Population Proportion 4 Confidence Interval for Population Mean, , when Population is Normal and 2 known 5 Confidence Interval for Population Mean, , when Population is Normal and 2 unknown 6 Confidence Interval for Population Proportion 7 Miscellaneous Examples 0 Introduction Suppose that a population has an unknown parameter, such as the mean, or the variance, or the proportion of ‘successes’. Then an estimate of the unknown parameter can be made from the information supplied by a random sample (or samples) taken from the population. A statistic used to estimate the value of a parameter is called an estimator and it is denoted by a capital letter (e.g. U, T, ...). The numerical value taken by the estimator in a particular instance is called an estimate and is denoted by a small letter (e.g. u, t, ...) For example, a coin is known to be biased but the probability of obtaining a head, p, is unknown. If this coin is tossed 100 times, we know that X ~ Bin(100,p) where X is the random variable ‘the number of 65 heads in 100 tosses’. If in this experiment we observed 65 heads, then we could estimate p by = 0.65. 100 x We can then use as an estimate of p. 100 1 Point Estimation and Unbiasness An estimate of a population parameter given by a simple number is called a point estimate. An estimator is unbiased if the expected value of its sampling distribution equals the parameter it is estimating _ X1 + X2 + ... + Xn _ For example, X = is an unbiased estimator of because E(X ) = . n We could form many estimators, the most efficient estimator is one which a) is unbiased, and b) has the smallest variance i.e. as n , Var(X) 0. Notations for Mean Population Variance 2 Sample _ x s2 Proportion p ps Unbiased estimator ^ 2 ^ p Example 1.1 An educational psychologist at a major university wants to estimate the mean IQ of the students. To this end, n = 30 students are to be randomly selected and given an IQ test. Suppose the IQ scores of the students selected are as follows : 107 99 101 93 99 103 134 132 103 109 104 103 101 128 113 106 126 103 131 106 119 102 98 116 108 103 111 119 112 105 _ Compute X . [Note : x = 3294] [109.8] Solution 2 Example 1.2 If X1, X2, X3 is a random sample taken from a population with mean and variance 2, find which of the following estimators for are unbiased, and which is the most efficient of these. X1 + X 2 + X 3 X1 + 2X2 X1 + 2X2 + 3X3 T1 = , T2 = , T3 = [T1, T2; T1] 3 3 3 Solution 3 2 Unbiased Estimators for Population Mean and Variance From a population with unknown mean and unknown variance 2, take a random sample of size n. (xi -a) _ 1 n Then the most efficient unbiased estimator for the population mean, is x = xi or a + n n i=1 ns2 And the most efficient unbiased estimator for the population variance, 2 is , n-1 (xi -a)2 (xi -a)2 1 _ 1 _ x2 x2 where s2 = - where a any constant or (xi - x )2 or x2 - (x )2 or n n n n n n 1 _ Hence, it can also be written as 2 = (xi - x )2 n-1 Example 2.1 (YJC 96/2/10a modified) A random sample of 100 observations of a random variable X gives the following results: (x – 49) = -25 and (x – 49)2 = 195. Calculate the unbiased estimate of the population variance and population mean. [48.8, 1.91] Solution Example 2.2 Find an unbiased estimate of the population mean and variance from which each of the following sample is drawn: a) 35, 42, 38, 55, 70, 69 _ b) x = 120, (x - x )2 = 302, n = 8 c) x = 120, x2 = 2102, n = 8 d) (t – 300) = 2012, (t – 300)2 = 525 262, n = 200 [51.5, 241, 15.0, 43.1, 15.0, 43.1, 310, 2537] Solution 4 3 Unbiased Estimator for Population Proportion Take a binomial population in which an unknown p is the proportion of successes. A random sample of size n is taken. Let X be the random variable ‘the number of successes obtained from sample of size n’. and Ps be the random variable ‘the proportion of successes in the sample’. X Then, an unbiased estimator for population proportion p, , is Ps where Ps = i.e. E(Ps) = p n p Note: X E(X) np E(Ps) = E( ) = = =p n n n X 1 npq pq Var(Ps) = Var( ) = 2 Var(X) = 2 = . When n, Var(Ps)0. Hence Ps is efficient estimator. n n n n Example 3.1 A random sample of 50 children from a large school is chosen and the number who are left handed is noted. It is found that 6 are left handed. Obtain an unbiased estimate of the proportion of children in the school who are left handed. [0.12] Solution Example 3.2 A drawing pin is tossed 100 times. It lands ‘point-up’ 64 times. Obtain an unbiased estimate of the proportion of ‘point-up’ tosses. [0.64] Solution 5 4 Confidence Interval for Population Mean, , when Population is Normal and 2 known When we have a population with unknown parameters (such as the mean, the variance, . . . etc.), we can estimate these parameters by using either point estimates (discussed above) or interval estimates. A confidence-interval estimate of an unknown population parameter is a random interval constructed so that it has a given probability of including the parameter. Consider a population with unknown parameter . If we can find an interval (a, b) such that P(a< <b)= 0.95, we say that (a, b) is a 95% confidence interval for . In this case, 0.95 is the probability that the interval includes . (It is not the probability that lies in the interval.). Values a and b are the 95% confidence limits. Note: In interval estimation a) The shorter the length of the interval (a,b), the better the estimation. b) The higher the confidence interval, the better the estimation. c) The end values of the interval a and b are called the confidence limits. Consider a normal population, with mean and variance 2, i.e. X N(, 2). Now take a random sample of size n (n can be big or small) from the population, X 1, X2, . . . Xn _ _ 1 and consider the distribution of sample mean, X where X = Xi. n _ 2 Since X is normally distributed, X N(, ). n _ x- Standardising, we have Z= where ZN(0,1). n From the standard normal table, we know that the central 95% of N(0, 1) lies between the values 1.96. So, P(- 1.96 Z 1.96) = 0.95 N(0, 1) _ x- P(- 1.96 1.96) = 0.95 95% n _ P(- 1.96 x - 1.96 ) = 0.95 n n _ _ P(- 1.96 - x - 1.96 - ) = 0.95 -1.96 0 1.96 n n x _ _ P( 1.96 + x x - 1.96 ) = 0.95 (multiplying by -1) n n _ _ P( x - 1.96 x + 1.96 ) = 0.95 (rearranging) n n _ _ So we have found an interval (x - 1.96 , + 1.96 ) such that the probability that the interval includes is n x n 0.95. This is called the 95% confidence interval for . _ _ _ Similarly, a 90% confidence interval for is given by (x - 1.645 , + 1.645 ) or (x 1.645 ) n x n n _ _ _ and a 98% confidence interval for is given by (x - 2.326 , + 2.326 ) or (x 2.326 ) n x n n _ _ _ and a 99% confidence interval for is given by (x - 2.575 , x + 2.575 ) or (x 2.575 ) n n n Note : If the population is not normally distributed, then we require n to be large (n 30) for the result to be _ used. This is because by the Central Limit Theorem, if X is not normally distributed,then X is normally _ 2 distributed, i.e. X ~ N(, ), only when n is large. n 6 Example 4.1 A random sample of six items with sample mean 12.45 cm is taken from a normal population with variance 4.5 cm2. Find the 95% confidence interval for the population mean . [(10.8, 14.2)] Solution Example 4.2 On the basis of the results obtained from a random sample of 100 men from a particular district, the 95% confidence interval for the mean height of the men in the district is found to be (177.22cm, 179.18cm). _ Find the value of x , the mean of the sample, and , the standard deviation of the normal population from which the sample is drawn. Calculate the 98% confidence interval for the mean height. [178.2, 5, (177.04, 179.36)] Solution 7 Example 4.3 A machine produces washers whose diameter has a standard deviation 0.04 cm. In order to find the mean diameter of the washers produced, a random sample of 9 washers is taken whose mean diameter is found to be 3.14 cm. Calculate symmetric a) 95% b) 98 % confidence intervals for the mean diameter of washers produced by the machine. [(3.114cm, 3.166cm), (3.109cm, 3.171 cm)] Solution 5 Confidence Interval for Population Mean, , when Population is Normal and 2 unknown When the population is normally distributed, and 2 is unknown, it is neccessary to use an unbiased estimator, ns2 = where s2 is the sample variance. 2 n-1 _ _ ns2 1 _ s Then a 95% confidence interval for is given by (x 1.96 ) = ( x 1.96 ) = ( x 1.96 ) n n-1 n n-1 _ _ s Similarly, a 98% confidence interval for is given by (x 2.236 ) or (x 2.236 ) n n-1 _ _ s and a 99% confidence interval for is given by (x 2.575 ) or (x 2.575 ). n n-1 Note: When n is large (n 30), n 1. So 2 s2 s. n-1 _ s Then a 95% confidence interval for can be given approximately by (x 1.96 ). n 8 Example 5.1 A random sample of 120 measurements taken from a normal population gave the following data; n = 120, _ x = 1008, (x - x )2 = 172.8. Find a) a 97% and b) a 99% confidence intervals for the population mean . [(8.16, 8.64), (8.12, 8.68)] Solution 6 Confidence Interval for the Population Proportion (n 30) Consider a binomial population where p, the proportion of ‘successes’ in the population is unknown. Take a random sample of size n from the population. Let Ps be the random variable ‘the proportion of successes in the sample’. pq Then Ps ~ N(p, ) where q = 1 p. Now, as p is unknown, we use an estimator for it. n pq ps qs An unbiased estimator for p is ps. So, assume that an estimator for is where qs = 1 - ps . n n ps qs Since n is large (n 30), by the Central Limit Theorem, we have Ps ~ N(p, ) approximately. n ps - p Standardising, we have Z = where Z ~ N(0,1). ps qs n ps - p Since from the standard normal table, P(-1.96 < Z < 1.96) = 0.95 or 95% P(-1.96 < < 1.96) = 0.95. ps qs n So, rewriting P(ps - 1.96 ps qs < p < ps + 1.96 n ps qs ) = 0.95. n If a random sample of size n (n 30) the proportion with a particular property is p s , the 95% confidence interval p s qs ps qs for the population proportion p is given by (p s - 1.96 , ps + 1.96 ) where qs = 1 - ps . n n ps qs This can be written as (ps 1.96 ). n ps qs Similarly, 98% confidence interval for p is (ps 2.326 ) n ps qs and 99% confidence interval for p is (ps 2.575 ). n 9 Example 6.1 A manufacturer wants to assess the proportion of defective items in a large batch produced by a particular machine. He tests a random sample of 300 items and finds that 45 are defective. Calculate a) a 95% b) a 98% confidence intervals for the proportion of defective items in a complete batch. [(0.110, 0.190),(0.101,0.198)] Solution Example 6.2 In a sample of 400 carpet shops taken in 1998, it was discovered that 136 of them sold carpets at below the list prices which had been recommended by manufacturers. a) Estimate the percentage of all carpet selling shops selling below list prices. b) Calculate the 95% confidence interval for this estimate, and explain briefly what these mean. c) What size sample would have to be taken in order to estimate the percentage to within 2%? [34%, (29.4%, 38.6%), 2156] Solution 10 Example 6.3 An opinion poll is taken as to how an electorate of 20 million will vote in a forthcoming referendom. Out of a random sample of 100, 40 say 'yes' and 60 say 'no'. What is the 95 % confidence interval for the proportion who will vote 'yes'? [between 30.4% and 49.6%] Solution 7 Miscellaneous Examples Example 7.1 The weights, x kg, of a random sample of 100 fifteen-year-old girls from a school were taken and the data obtained is summarised by (x – 40) = 82 and (x – 40)2 = 362. a) Calculate the unbiased estimate of the population mean. b) Construct a symmetric 98% confidence interval for the population mean. c) Fifty schools are taken and the symmetric 98% confidence interval for the population mean is determined for each school. Find the expected number of thses intervals that would contain the population mean. [2.977, (40.4, 41.2), 49] Solution 11 SUMMARY Notations for Mean Variance 2 Sample _ x s2 Proportion p ps Estimator _ x 2 ^ ss Population Formula 1 n x or n i i=1 ns2 n-1 a+ Unbiased estimator ^ 2 ^ p (xi -a) n (xi -a)2 (xi -a)2 - n n or 1 _ (xi - x )2 or n 1 _ x2 - (x )2 n or x2 x2 n n _ If X is the mean of a random sample of size n taken from a normal population with known variance 2: _ A central 90% confidence interval for is given by (x 1.645 ) n _ a central 95% confidence interval for , is given by (x 1.96 ) n _ A central 98% confidence interval for is given by (x 2.326 ) n _ A central 99% confidence interval for is given by (x 2.575 ) n _ If X and s2 are the mean and variance of a random sample of size n from a normal population with unknown _ variance 2 , then a central 95% confidence interval for is given by (x 1.96 s ). n-1 If a random sample of size n (n 30) the proportion with a particular property is p s , ps qs the 95% confidence interval for the population proportion p is (ps 1.96 ). n ps qs Similarly, 98% confidence interval for p is (ps 2.326 ) n ps qs and 99% confidence interval for p is (ps 2.575 ). n Must it be a Polar Bear? There is this familiar story about how a hunter travelled one mile south, one mile east and one mile north, ended up in the same spot where he started off and shot a bear. The colour of the bear he shot was white because he can only be at the North Pole. But with more thought, there are actually many places, in fact, infinite number of places where one can travel one mile south, one mile east and one mile north and yet end up in the same starting point. Can you find out where these places are? [ There is no trick question here. It can actually be proven mathematically] Puzzle taken from lectures by Prof Tan Eng Chye, NUS Maths Dept