Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Chapter 9 Estimating the Value of a Parameter KEY Review on symbols: π₯Μ the mean of a sample πΜ the proportion of a sample µ the true population mean p the true population proportion π the true population standard deviation s the sample standard deviation n the size of the sample (number of data collected) Which of the above is a parameter? Which of the above is a statistic? Chapter 9.1 Estimating a Population Proportion Objective A :Point Estimate A point estimate is the value of a statistic that estimates the value of a parameter. x The best point estimate of the population proportion is a sample proportion ( pΛ ο½ ). n The best point estimate of the population mean is a sample mean ( x ο½ ο₯ x ). n Since pΜ varies from sample to sample, we use an interval based on pΜ to capture the unknown population proportion with a level of confidence. Objective B : Confidence Interval A confidence interval for an unknown parameter consists of an interval of numbers based on a point estimate. The level of confidence represents the expected proportion of intervals that will contain the parameter if a large number of different samples is obtained. The level of confidence is denoted as ο¨1 ο ο‘ ο© ο100% . The level of confidence controls the width of the interval. Confidence interval estimates for a parameter are of the form: Point estimate ο± margin of error. Confidence interval for p : 1 pΛ ο± Z ο‘ / 2 ο ο³ pΛ ο³ pΛ ο½ where pΛ (1 ο pΛ ) n provided that n pΛ (1 ο pΛ ) ο³ 10 . Used when p (the true proportion) is not known. Can be written as pΛ ο± E The value of Zο‘ / 2 is called the critical value of the distribution. The margin of error, E , in a ο¨1 ο ο‘ ο© ο100% confidence interval for a population proportion is pΛ (1 ο pΛ ) . The width of the interval is determined by the margin of error. n Note: More confidence leads to a wider interval Ex. 100% confidence vs 50% confidence: given by E ο½ Z ο‘ / 2 Example 1:Use StatCrunch to determine the critical value Zο‘ / 2 that corresponds to the given level of confidence. Stat-calc-normal βbetween-ΞΌ=0, Ο=1 (a) 90% (b) 95% Diagram: diagram: P(____< x <____) = .90 P(____< x <____) = .95 Compute π§1 = -1.645 π§1 = -1.96 (c) 98% diagram: P(____< x <____) = .98 π§1 = -2.33 (d) 92% diagram: P(____< x <____) = .92 π§1 = -1.75 π§2 = 1.645 π§2 = 1.96 π§2 = 2.33 π§2 = 1.75 NOTE: for 95% z is close 2 SD Example 2: Determine the margin of error (E) for p with x ο½ 540 and n ο½ 900 at a 99% level of confidence. Assume this represents the number who admitted to have texted in the last month while driving from a sample of 900 people. π₯ 540 πΜ =π = 900 = 0.60 Statcrunch gives z = 2.58 for 99% level of confidence pΛ (1 ο pΛ ) .60(.40) =2.58β 900 =0.0421 n Note: the 99% CI: 0.60 ± .0421 E ο½ Zο‘ / 2 2 Example 3: A Rasmussen Reports national survey of 1000 adult Americans found that 18% dreaded Valentine's Day. Construct a 95% confidence interval for the population proportion of adult Americans who dread Valentine's Day. Explain what does the interval mean. n=1000 πΜ = 0.18 95% CI: 0.18 ± 2SE (You can use 1.96 for z to be more exact. For 68% you can use z=1, for 99.7% use z=3)) Standard deviation of sample (or standard error of sample) = ο³ pΛ ο½ pΛ (1 ο pΛ ) .18(.82) =β 1000 =0.01215 n So 0.18 ± 2SE = 0.18 ± 2(0.01215) = 0.18 ± .024 CI: (0.18 β 0.024, 0.18 + 0.024) = (0.156, 0.204) or between 15.6% and 20.4% We are 95% confident that the true proportion of adults who dread Valentineβs Day is between 15.6% and 20.4%. Summary of CIβs 68% use z = 1 90% z=1.65 95% z=2 (or 1.96) 98% z = 2.33 99.7% z= 3 99% z =2.58 95% confidence interval: About 95 samples out of 100 will capture the true proportion and about 5 samples will not. Example 4: Construct a confidence interval of the population proportion at the given level of confidence where 80 students out of 200 came late to class for a lecture on a randomly selected day. x ο½ 80, n ο½ 200, 96% confidence Stat-proportion stat-one sample-with summary-80 successes-200 observations-confidence interval 96%compute Lower limit, upper limit = (0.329, 0.471) *We are 96% confident that the true population proportion of students who come late is between 32.9% and 47.1%. Example 5: In a study of 1228 randomly selected medical malpractice lawsuits, it is found that 856 of them were later dropped or dismissed. (a) What is the best point of estimate of the proportion of medical malpractice lawsuits that are dropped or dismissed? 856 πΜ = = 0.697 1228 3 (b) Construct a 99% confidence interval (by hand) for the population proportion of medical malpractice lawsuits that are dropped or dismissed? 99%: use z = 2.58 pΛ (1 ο pΛ ) .697(.303) ο½ ο½ 0.0131 n 1228 CI = 0.697 ± 2.58(0.0131) = 0.697 ± 0.0338 CI : (.697 - .034, .697+.034) = (0.663, 0.731) CI = πΜ ± 2.58ππΈ where SE = (c) Interpret the interval. We are 99% confident that the true proportion of medical malpractice lawsuits that are dropped/dismissed is between 66.3% and 73.1%. Objective C :Sample Size Needed for Estimating the Population Proportion p The sample size required to obtain a ο¨1 ο ο‘ ο© ο100% confidence interval for p with a margin of error E is given by ο¦Z οΆ n ο½ pΛ (1 ο pΛ )ο§ ο‘ / 2 ο· ο¨ E οΈ 2 Round up to the next integer pΜ is a prior estimate of p If a prior estimate of p is unavailable, the sample size required is 2 ο¦Z οΆ n ο½ 0.25 ο§ ο‘ /2 ο· Round up to the next integer ο¨ E οΈ So you can use πΜ = 0.50 if proportion is not given Example 1 : An urban economist wishes to estimate the proportion of Americans who own their homes. What size sample should be obtained if he wishes the estimate to be within 0.02 with 90% confidence if: (a) he uses a 2010 estimate of 0.669 obtained from the U.S Census Bureau? He wants: πΜ ± 0.02 = 0.669 ± 0.02 with 90% confidence so use z = 1.65 0.02 = 1.65 SE .669(.331) 0.02 = 1.65 β π 0.02 . 669(.331) =β 1.65 π 4 0.0121 = β . 669(.331) π (0.0121)2 = . 669(.331) π (0.0121)2 π = .669(.331) π= .669(.331) (0.0121)2 β 1507.17 round up to 1508. He should sample 1508 Americans so that the estimate is within 0.02 margin of error at a 90% confidence level. Note: For MML you might have to be more precise and use z = 1.645 for 90%CI and z = 1.96 got 95%CI (b) he does not use any prior estimates? If not estimate is given for the proportion, then we use πΜ = .50 .50(.50) 0.02 = 1.65 β π 0.02 . 50(.50) =β 1.65 π 0.0121 = β . 50(.50) π (0.0121)2 = . 50(.50) π (0.0121)2 π = .50(.50) .50(.50) π = (0.0121)2β 1707.5 round up to 1708. He should sample 1708 Americans. Note: if you do not round your answer will be 1702. Example 2: In a Gallup poll conducted in October 2010, 64% of the people polled answered "more strict" to the following question: "Do you feel that the laws covering the sale of firearms should be made more strict as they are now?" Suppose the margin of error in the poll was 3.5% and the estimate was made with 95% confidence. At least how many people were surveyed? (a) he uses a 2010 estimate of 0.64 obtained from the U.S Census Bureau? πΜ ± πΈ = 0.64 ± 0.035 with 95% confidence so use z = 1.96 or 2. 0.035 = 1.96 SE .64(.36) 0.035 = 1.96β π 5 0.035 . 64(.36) =β 1.96 π 0.0179 = β . 64(.36) π (0.0179)2 = . 64(.36) π (0.0179)2 π = .64(.36) .64(.36) π = (0.0179)2β 722.5 round up to 723. So 723 people were surveyed in total. Note: If z = 2 is used instead, then n = 753 people. Example 3: A Gallup poll conducted in November 2010 found that 493 of 1050 adult Americans believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. (a) Obtain a point estimate for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. 493 πΜ = = 0.470 1050 (b) Verify the requirements for constructing a confidence interval for p are satisfied. npq β₯ 10 1050(0.47)(0.53) β₯ 10 263 β₯ 10 yes Condition is met so πΜ (the sampling distribution) will be normally distributed (c) Construct a 95% confidence interval for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans healthcare coverage. Interpret the interval. By hand: 95%: use z = 2 pΛ (1 ο pΛ ) .47(.53) ο½ ο» 0.0154 n 1050 CI = 0.47 ± 2 (0.0154) = 0.47 ± 0.0308 β 0.47 ± 0.031 CI : (0.47 - 0.031, 0.47+0.031) = (0.439, 0.501) CI = πΜ ± 2ππΈ where SE = 6 We are 95% confident that the true proportion of adults who believe the federal government should cover healthcare is between 43.9% and 50.1%. Using Statcrunch (which will use z = 1.96) Stat-proportion stat-one sample-with summary-493 successes-1050 observations-confidence interval 95%compute CI = (0.439, 0.500) or between 43.9% and 50.0% (d) You wish to conduct your own study for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. What sample size would be needed for the estimate to be within 3 percentage points with 90% confidence if you use the estimate obtained in part (a). (Use statcrunch). pΛ (1 ο pΛ ) for 90% confidence z = 1.65 n Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.47width 0.06-compute N= 749 rounding up. (The total width is 0.06 since it is 3% on each side of the normal curve. By hand it would be: 0.47 ± 0.03 and setting 0.03 = 1.65 (e) You wish to conduct your own study for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. What sample size would be needed for the estimate to be within 3 percentage points with 90% confidence if you do not have a prior estimate? By hand it would be: 0.50 ± 0.03 and setting 0.03 = 1.65 pΛ (1 ο pΛ ) for 90% confidence z = 1.65 and we n would use πΜ = 0.50. Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.50width 0.06-compute N = 752 (rounded up) Note this is still close to our previous answer but larger. Chapter 9.2 Estimating a Population Mean Objective A : Point Estimate 7 The best point estimate of the population mean, ο , is the sample mean, x . Objective B :Student's t - distribution Properties of the t - distribution 1. The t - distribution is different for different degrees of freedom ( df ο½ n ο 1 ). 2. The t - distribution has the same general symmetric bell shape as the standard normal distribution but its area in the tails is a little greater than the area in the tails of the standard normal distribution due to the greater variability that is expected with small samples. 3. The t - distribution has a mean of t ο½ 0 at the center of the distribution. 4. As the sample size n gets larger, the t - distribution gets closer to the standard normal distribution. Example 1: Use StatCrunch to determine the t -value. *Note: There is more variability for smaller sample sizes. (a) Using Statcrunch find the t -value such that the area in the right tail is 0.05 with 19 degrees of freedom. Stat-Calc-T-degrees of freedom 19- P(xβ₯___) = 0.05 βcompute t value = 1.73 Note n = 20 (sample size) You could also use P(xβ€___) = 0.95 (b) Find the t -value such that the area left of the t -value is 0.02 with 6 degrees of freedom. Stat-Calc-T-degrees of freedom 6- P(xβ€___) = 0.02 βcompute t value = - 2.61 (c) Find the critical t -value that corresponds to 95% confidence. Assume 12 degrees of freedom. Stat-Calc-T-between-degrees of freedom 12- P(___β€xβ€___) = 0.95 βcompute (note sample size n = 13) T value = ± 2.18 d) Find the critical t -value that corresponds to 95% confidence. Assume 50 degrees of freedom. 8 Stat-Calc-T-between-degrees of freedom 50- P(___β€xβ€___) = 0.95 βcompute (note sample size n = 51) T value = ± 2.0086 e) What happened to the width of the interval as the sample size increased from 13 to 51? As the sample size increases, the interval became more narrow. The distribution became more normal. In general, the population standard deviation is unknown for estimating a population mean based on a sample mean. The t -distribution is used to off-set the additional variability introduced by using s in place of ο³ . Objective C :Confidence Interval for a Population Mean Constructing a ο¨1 ο ο‘ ο© ο100% Confidence Interval for ο Point estimate ο± margin of error s s where E ο½ t ο‘ / 2 ο . x ο± tο‘ /2 ο n n provided the data come from a population that is normally distributed, or the sample size is large. Example 1: A simple random sample of size n οΌ 30 has been obtained. From the normal probability plot and boxplot, judge whether a t -interval should be constructed. (a) 9 Yes, no outliers in the normal probability plot and the boxplot is roughly symmetrical. Thus we can use the t distribution regardless of the sample size. (b) No, one slight outlier in the normal probabilty plot and boxplot is skewed to the left. Since the sample size is less than 30 we cannot use a t distribution. Example 2: A simple random sample of size n is drawn from a population that is normally distributed to investigate the age when working people start thinking about retirement . The sample mean, x , is found to be 50, and the sample standard deviation, s , is found to be 8. (a) Construct a 98% confidence interval for ο if the sample size, n , is 20. (That is 20 people are surveyed.) By hand: for 98%: use t = 2.54 from statcrunch (stat-calc-t) CI: π₯Μ ±πΈ== π₯Μ ± 2.54 ππΈ where SE = CI = 50 ± 2.54 ( 8 s n ο½ 8 20 ) = 50 ± 4.54 β20 CI= (50 β 4.54, 50 + 4.54) β (45.46, 54.54) β (45.5, 54.5) One can be 98% confident that the true mean age when people start thinking about retirement is between 45.8 and 54.2 years of age. Statcrunch: 10 Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .98-compute CI= (45.46, 54.54) β (45.5, 54.5) (b) Use StatCrunch to construct a 98% confidence interval for ο if the sample size, n , is 15. How does decreasing the sample size affect the margin of error, E ? Stat-t stats-one sample-with summary-mean 50, SD 8, n=15, CI .98-compute CI= (44.6, 55.4) This interval is wider. So when sample size is decreased, the CI increases. 8 E = 2.54 ( )=5.25 The margin of error increased when sample size decreased (compared to 4.54 in part a). β15 You can also find E from the CI: E = ππΏβπΏπΏ 2 = 55.4β44.6 2 =5.4 (Difference due to rounding in computations) (c) Construct a 95% confidence interval for ο if the sample size, n , is 20. Compare the results to those obtained in part (a). How does decreasing the level of confidence affect the margin of error, E ? Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .95-compute CI=(46.3, 53.7) This CI is more narrow than the one in part a: (45.5, 54.5). So as the level of confidence increases, the interval will become wider. E for 95% CI is E = ππΏβπΏπΏ 2 = 53.7β46.3 2 = 3.7 This is smaller than E in part a: 4.5 So the margin of error decreases as the level of confidence decreases. In part a: About 98 samples out of 100 will capture the true proportion. In part b: About 95 samples out of 100 will capture the true proportion. (d) Could we have computed the confidence intervals in parts (a) to (c) if the population had not been normally distributed? Why? No because the conditions to apply this formula would not have been met. The sample size ,n was less than 30 in both parts. Example 3: Determine the point estimate of the population mean and margin of error for the following confidence interval. Lower bound: 5 Upper bound: 23 E= ππΏβπΏπΏ 2 = 23β5 2 = 9 ( so 5+9 = 14 and 23-9 = 14) The population mean, π₯Μ = 14. You can also compute π₯Μ by finding the midpoint of the interval: 23+5 2 =14 Margin of error, E = 9 Point estimate of the popultion mean, π₯Μ = 14 Note CI = 14 ± 9 11 Example 4 : How much time do Americans spend eating or drinking? Suppose for a random sample of 1001 Americans age 15 or older, the mean amount of time spent eating or drinking per day is 1.22 hours with a standard deviation of 0.65 hour. (a) A histogram of time spent eating and drinking each day is skewed right. Use this result to explain why a large sample size is needed to construct a confidence interval for the mean time spent eating and drinking each day. Since the population is not normally distributed, you need a large sample size to achieve a sampling distribution of π₯Μ that will be normally distributed. Thus, use n >30. (b) Determine a 95% confidence interval for the mean amount of time Americans age 15 or older spend eating and drinking each day. Interpret the interval. By Hand: First find the t-value using Statcrunch for 95% confidence: stat-calc-t-between, df 1000, P(___β€xβ€___) = 0.95 βcompute t = 1.96 (this is close to the z value for 95% since the sample size was large, which we can then also use 2SEβs) s 0.65 CI: π₯Μ ±πΈ= π₯Μ ± 1.96 ππΈ where SE = ο½ n 1001 CI = 1.22 ± 1.96 ( 0.65 ) = 1.22 ± 0.040 β1001 CI= (1.22 β 0.040, 1.22 + 0.040) β (1.18, 1.26) Using Statcrunch: Stat-t stats-one sample-with summary-mean 1.22, SD 0.65, n=1001, CI 0.95-compute CI = (1.18, 1.26) *One can be 95% confident that the true mean time spent eating and drinking each day is between 1.18 and 1.26 hours. (c) Could the interval be used to estimate the mean amount of time a 9-year-old American spends eating and drinking each day? Explain. No. The study was conducting using people who were 15 years old or more. Therefore, the point estimate of the mean is for a population that was 15 or older only. Objective D : Determining the Sample Size n The sample size required to estimate the population mean, ο , with a level of confidence ο¨1 ο ο‘ ο© ο100% within a specified margin of error, E , is given by ο¦ Z οs οΆ n ο½ ο§ ο‘ /2 ο· ο¨ E οΈ where n is rounded up to the nearest whole number. 2 Note: *The t -distribution approaches the standard normal z - distribution as the sample size increases. Z is used in this formula instead of t to approx. n. 12 Example 1: A researcher wanted to determine the mean number of hours per week (Sunday through Saturday) the typical person watches television. Results from the Sullivan Statistics Survey indicate that s ο½ 7.5 hours. (a) How many people are needed to estimate the number of hours people watch television per week within 2 hours with 95% confidence? The standard deviation is s ο½ 7.5 hours. Note the mean was not provided. Want CI: π₯Μ ± πΈ = π₯Μ ± 2 2=π‘ π βπ Can use z instead of t , z = 1.96 or t = 2 2 = 1.96 7.5 βπ 2βπ = 1.96 (7.5) 1.96 (7.5) βπ = 2 [(1.96 π= (7.5))/2]2 n = 54.02 round up to 55 people 55 people need to be surveyed per week so that the margin of error is within 2 hours at (at a 95% confidence level). (b) How many people are needed to estimate the number of hours people watch television per week within 1 hour with 95% confidence? Letβs do this one using Statcrunch: Stat-z stat-one sample- power/sample size-select βconfidence interval widthβ-confidence level 0.95, SD 7.5, width 2-compute n = 217 (c) What effect does doubling the required accuracy have on the sample size? If you want to be more accurate (within 1 hour instead of within 2 hours), increase the sample size. In this case it was increased at a ratio of 217 55 β 4. If you double the accuracy, the sample has to be 4 times as large. Chapter 9 Estimating a Population Standard Deviation (Supplementary Materials) Finding CI for standard deviations Objective A : Point Estimate The best point estimate of the population variance, ο³ 2 , is the sample variance, s 2 . 13 Objective B : Chi-Square Distribution Example 1: Use StatCrunch to find the critical values ο£ 12οο‘ / 2 and ο£ ο‘2 / 2 for the given level of confidence and sample size. (a) 90% confidence, n ο½ 23 Stat ο Calculators ο Chi-Square ο DF 22 (n-1) ο Between - P(___β€x β€ ____) = 0.90 ο compute The critical values are 12.338 and 33.924. (the βzβ values) 14 Objective C : Confidence Interval for a Population Variance or Standard Deviation (1 ο ο‘ ) ο100% of the values of ο£ 2 will lie between ο£ 12οο‘ / 2 and ο£ ο‘2 / 2 . ( Recall: ο£ 2 ο½ (n ο 1) s 2 ο³2 ) To find a (1 ο ο‘ ) ο100% confidence interval about ο³ , take the square root of the lower bound and upper bound. Example 1: A simple random sample of size n is drawn from a population that is known to be normally distributed. The sample variance, s 2 , is determined to be 19.8. (Thus the standard deviation is β19.8 β 4.45). (a) Use StatCrunch to construct a 95% confidence interval for ο³ 2 if the sample size, n , is 10. Stat ο Variance Stats ο One Sample ο with summary ο Sample variance: 19.8, sample size: 10 ο Confidence interval for π 2 : 0.95 ο compute and record the results. 95% confidence interval results: Ο2 : Variance of population 15 Variance Sample Var. DF L. Limit U. Limit Ο2 19.8 9 9.367722 65.99048 *One can be 95% confident that the true variance is between 9.37 and 65.99. (b) If the sample size is increased to n = 25, how does increasing the sample size affect the width of the interval? It will decrease the width (becomes narrower). (c) If the confidence level is increased to 99%, how does increasing the level of confidence affect the width of the confidence interval? The interval becomes wider if you want higher confidence. Example 2: Travelers per taxes for flying, car rentals, and hotels. The following data represent the total travel tax for a 3-day business trip in eight randomly selected cities. It was verified that the data are normally distributed. Use StatCrunch to construct a 90% confidence interval for the standard deviation travel tax for a 3-day business trip. Interpret the interval. First we need to compute the variance since the raw data has been provided. Stat ο Input given data ο Summary Statistics ο Columns ο Var1ο Variance ο compute Summary statistics: Column Variance var1 151.87187 Now we will find the interval: Stat ο Variance Stats ο One Sample ο with summary ο Sample variance: 151.87187, sample size: 8 ο Confidence interval for π 2 : 0.90 ο compute Alternate way: Stat ο Input given data ο Variance Stats ο One Sample ο with dataο Columns ο Var1ο Confidence interval for π 2 : 0.90 ο compute and record the results. 90% confidence interval results: Ο2 : Variance of population 16 Variance Sample Var. DF L. Limit U. Limit Ο2 151.87187 7 75.573504 490.50829 Manually compute the square root of each limit to change from variance to standard deviation. β75.573504= 8.69330224943 Lower Limit β490.50829= 22.1474217461 Upper Limit We are 90% confident that the standard deviation of the travel tax for a 3-day business trip is between $8.69 and $22.15. Summary of CI of means and proportions: To summarize: Confidence Intervals As sample size n increases (you get better results) -----ο CI narrows As standard error (SE or SD) decreases (you get better results) ----ο CI narrows As % of confidence level increases ο CI widens Sample size As n increases (better results) ---ο SE/SD decreases As n increases (better results) ---ο shape becomes more normal (symmetric) As n increases (better results) ---ο sample mean π₯Μ approaches true population mean, µ (or sample proportion πΜ approaches true population proportion, p) 17