Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 9 Estimation Using a Single Sample Point Estimation A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic. 2 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example A sample of 200 students at a large university is selected to estimate the proportion of students that wear contact lens. In this sample 47 wear contact lens. Let p = the true proportion of all students at this university that wear contact lens. Consider “success” being a student wears contact lens. number of successes in the sample The statistic p n Is a reasonable choice for a formula to obtain a point estimate for p. 3 47 0.235 Such a point estimate is p 200 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example A sample of weights of 34 male freshman students was obtained. 185 170 188 184 161 151 170 179 174 176 207 155 175 197 180 148 202 214 167 180 178 283 177 194 202 184 166 176 139 189 231 177 168 176 If one wanted to estimate the true mean of all male freshman students, you might use the sample mean as a point estimate for the true mean. sample mean x 182.44 4 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example After looking at a histogram and boxplot of the data (below) you might notice that the data seems reasonably symmetric with a outlier, so you might use either the sample median or a sample trimmed mean as a point estimate. 140 180 220 260 5% trimmed mean( Calculated with Minitab ) 180.07 5 177 178 sample median 177.5 2 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Bias A statistic with mean value equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said to be biased. 6 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Criteria Given a choice between several unbiased statistics that could be used for estimating a population characteristic, the best statistic to use is the one with the smallest standard deviation. 7 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Large-sample Confidence Interval for a Population Proportion A confidence interval for a population characteristic is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval. 8 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Level The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval. 9 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Recall For the sampling distribution of p, p(1 p) p = p p and for large* n the n sampling distribution of p is approximately normal. Specifically when n is large*, the statistic p has a sampling distribution that is approximately normal with mean p and standard deviation p(1 p) . n * np 10 and np(1-p) 10 10 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Some considerations 11 Approximately 95% of all large samples will result in a value of p that is within p(1 p) of the true population 1.96p 1.96 n proportion p. Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Some considerations Equivalently, this means that for 95% of all possible samples, p will be in the interval p(1 p) p(1 p) p 1.96 to p 1.96 n n Since p is unknown and n is large, we estimate p(1 p) p(1 p) with n n 12 This interval can be used as long as np 10 and np(1-p) 10 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. The 95% Confidence Interval When n is large, a 95% confidence interval for p is p(1 p) p(1 p) , p 1.96 p 1.96 n n The endpoints of the interval are often abbreviated by p(1 p) p 1.96 n where - gives the lower endpoint and + the upper endpoint. 13 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example For a project, a student randomly sampled 182 other students at a large university to determine if the majority of students were in favor of a proposal to build a field house. He found that 75 were in favor of the proposal. Let p = the true proportion of students that favor the proposal. 14 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example - continued 75 p 0.4121 182 So np = 182(0.4121) = 75 >10 and n(1-p)=182(0.5879) = 107 >10 we can use the formulas given on the previous slide to find a 95% confidence interval for p. p(1 p) 0.4121(0.5879) p 1.96 0.4121 1.96 n 182 0.4121 0.07151 The 95% confidence interval for p is (0.341, 0.484). 15 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. The General Confidence Interval The general formula for a confidence interval for a population proportion p when 1. p is the sample proportion from a random sample , and 2. The sample size n is large (np 10 and np(1-p) 10) is p z critical value 16 p(1 p) n Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Finding a z Critical Value Finding a z critical value for a 98% confidence interval. 17 2.33 Looking up the cumulative area or 0.9900 in the body of the table we find z = 2.33 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Some Common Critical Values Confidence z critical level value 80% 90% 95% 98% 99% 99.8% 99.9% 18 1.28 1.645 1.96 2.33 2.58 3.09 3.29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Terminology The standard error of a statistic is the estimated standard deviation of the statistic. For sample proportions, the standard deviation is p(1 p) n This means that the standard error of the sample proportion is p(1 p) n 19 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Terminology The bound on error of estimation, B, associated with a 95% confidence interval is (1.96)(standard error of the statistic). The bound on error of estimation, B, associated with a confidence interval is (z critical value)·(standard error of the statistic). 20 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Sample Size The sample size required to estimate a population proportion p to within an amount B with 95% confidence is 1.96 n p(1 p) B 2 The value of p may be estimated by prior information. If no prior information is available, use p = 0.5 in the formula to obtain a conservatively large value for n. Generally one rounds the result up to the nearest integer. 21 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Sample Size Calculation Example If a TV executive would like to find a 95% confidence interval estimate within 0.03 for the proportion of all households that watch NYPD Blue regularly. How large a sample is needed if a prior estimate for p was 0.15. We have B = 0.03 and the prior estimate of p = 0.15 2 2 1.96 1.96 n p(1 p) (0.15)(0.85) 544.2 B 0.03 A sample of 545 or more would be needed. 22 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Sample Size Calculation Example revisited Suppose a TV executive would like to find a 95% confidence interval estimate within 0.03 for the proportion of all households that watch NYPD Blue regularly. How large a sample is needed if we have no reasonable prior estimate for p. We have B = 0.03 and should use p = 0.5 in the formula. 2 2 1.96 1.96 n p(1 p) (0.5)(0.5) 1067.1 B 0.03 23 The required sample size is now 1068. Notice, a reasonable ball park estimate for p can lower the needed sample size. Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Another Example A college professor wants to estimate the proportion of students at a large university who favor building a field house with a 99% confidence interval accurate to 0.02. If one of his students performed a preliminary study and estimated p to be 0.412, how large a sample should he take. We have B = 0.02, a prior estimate p = 0.412 and we should use the z critical value 2.58 (for a 99% confidence interval) 2 2 2.58 2.58 n p(1 p) (0.412)(0.588) 4031.4 B 0.02 24 The required sample size is 4032. Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. One-Sample z Confidence Interval for The general formula for a confidence interval for a population mean when 1. x is the sample proportion from a random sample, 2. The sample size n is large (generally n30), and 3. , the population standard deviation, is known is x z critical value n 25 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. One-Sample z Confidence Interval for If n is small (generally n < 30) but it is reasonable to believe that the distribution of values in the population is normal, a confidence interval for (when is known) is x z critical value n Notice that this formula works when is known and either 1. n is large (generally n 30) or 26 2. The population distribution is normal (any sample size. Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example A certain filling machine has a true population standard deviation = 0.228 ounces when used to fill catsup bottles. A random sample of 36 “6 ounce” bottles of catsup was selected from the output from this machine and the sample mean was x 6.018 ounces. Find a 90% confidence interval estimate for the true mean fills of catsup from this machine. 27 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Example I (continued) x 6.018, 0.228, n 36 1.645 Z critical value is 1.645 x (z critical value) n 0.228 6.018 1.645 6.018 0.063 36 90% Confidence Interval 28 (5.955, 6.081) Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Unknown - Small Size Samples [All Size Samples] An Irish mathematician/statistician, W. S. Gosset developed the techniques and derived the Student’s t distributions that describe the behavior of x 0. s n 29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. t Distributions If X is a normally distributed random variable, the statistic x 0 t s n follows a t distribution with df = n-1 (degrees of freedom). 30 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. t Distributions x 0 This statistic t s n is fairly robust and the results are reasonable for moderate sample sizes (15 and up) if x is just reasonable centrally weighted. It is also quite reasonable for large sample sizes for distributional patterns (of x) that are not extremely skewed. 31 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. t distribution Comparison of normal and t distibutions df = 2 df = 5 df = 10 df = 25 Normal -4 32 -3 -2 -1 0 1 2 3 4 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. t Distributions Continued Notice: As df increase, t distributions approach the standard normal distribution. Since each t distribution would require a table similar to the standard normal table, we usually only create a table of critical values for the t distributions. 33 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. 34 Central area captured: Confidence level: 1 2 3 4 5 6 D 7 e 8 g 9 r 10 11 e 12 e 13 s 14 15 16 o 17 f 18 19 20 f 21 r 22 e 23 24 e 25 d 26 o 27 m 28 29 30 40 60 120 z critical values 0.80 0.90 0.95 0.98 0.99 0.998 0.999 80% 90% 95% 98% 99% 99.8% 99.9% 3.08 1.89 1.64 1.53 1.48 1.44 1.41 1.40 1.38 1.37 1.36 1.36 1.35 1.35 1.34 1.34 1.33 1.33 1.33 1.33 1.32 1.32 1.32 1.32 1.32 1.31 1.31 1.31 1.31 1.31 1.30 1.30 1.29 1.28 6.31 2.92 2.35 2.13 2.02 1.94 1.89 1.86 1.83 1.81 1.80 1.78 1.77 1.76 1.75 1.75 1.74 1.73 1.73 1.72 1.72 1.72 1.71 1.71 1.71 1.71 1.70 1.70 1.70 1.70 1.68 1.67 1.66 1.645 12.71 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.23 2.20 2.18 2.16 2.14 2.13 2.12 2.11 2.10 2.09 2.09 2.08 2.07 2.07 2.06 2.06 2.06 2.05 2.05 2.05 2.04 2.02 2.00 1.98 1.96 31.82 6.96 4.54 3.75 3.36 3.14 3.00 2.90 2.82 2.76 2.72 2.68 2.65 2.62 2.60 2.58 2.57 2.55 2.54 2.53 2.52 2.51 2.50 2.49 2.49 2.48 2.47 2.47 2.46 2.46 2.42 2.39 2.36 2.33 63.66 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 3.17 3.11 3.05 3.01 2.98 2.95 2.92 2.90 2.88 2.86 2.85 2.83 2.82 2.81 2.80 2.79 2.78 2.77 2.76 2.76 2.75 2.70 2.66 2.62 2.58 318.29 22.33 10.21 7.17 5.89 5.21 4.79 4.50 4.30 4.14 4.02 3.93 3.85 3.79 3.73 3.69 3.65 3.61 3.58 3.55 3.53 3.50 3.48 3.47 3.45 3.43 3.42 3.41 3.40 3.39 3.31 3.23 3.16 3.09 636.58 31.60 12.92 8.61 6.87 5.96 5.41 5.04 4.78 4.59 4.44 4.32 4.22 4.14 4.07 4.01 3.97 3.92 3.88 3.85 3.82 3.79 3.77 3.75 3.73 3.71 3.69 3.67 3.66 3.65 3.55 3.46 3.37 3.29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. One-Sample t Procedures Suppose that a SRS of size n is drawn from a population having unknown mean . The general confidence limits are s x (t critical value) n and the general confidence interval for is s s , x (t critical value) x (t critical value) n n 35 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval Example Ten randomly selected shut-ins were each asked to list how many hours of television they watched per week. The results are 82 66 90 84 75 88 80 94 110 91 Find a 90% confidence interval estimate for the true mean number of hours of television watched per week by shut-ins. 36 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval Example Continued Calculating the sample mean and standard deviation we have n 10, x 86, and s 11.842 We find the critical t value of 1.833 by looking on the t table in the row corresponding to df = 9, in the column with bottom label 90%. Computing the confidence interval for is s 11.842 x t* 86 (1.833) 86 6.86 n 10 (79.14, 92.86) 37 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval Example Continued To calculate the confidence interval, we had to make the assumption that the distribution of weekly viewing times was normally distributed. Consider the following normal plot of the 10 data points. 38 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc. Confidence Interval Example Continued Notice that the normal plot looks reasonably linear so it is reasonable to assume that the number of hours of television watched per week by shut-ins is normally distributed. Normal Probability Plot .999 .99 Probability The output comes from Minitab. Typically if the p-value is more than 0.05 we assume that the distribution is normal .95 .80 .50 .20 .05 .01 .001 70 80 90 100 110 Hours Anderson-Darling Normality Test A-Squared: 0.226 P-Value: 0.753 Average: 86 StDev: 11.8415 N: 10 39 Anderson-Darling Normality Test A-Squared: 0.226 P-Value: 0.753 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.