Download Chapter 9 Estimating the Value of a Parameter KEY

Chapter 9 Estimating the Value of a Parameter KEY Review on symbols: 𝑥̅ the mean of a sample 𝑝̂ the proportion of a sample µ the true population mean p the true population proportion 𝜎 the true population standard deviation s the sample standard deviation n the size of the sample (number of data collected) Which of the above is a parameter? Which of the above is a statistic? Chapter 9.1 Estimating a Population Proportion Objective A :Point Estimate A point estimate is the value of a statistic that estimates the value of a parameter. x The best point estimate of the population proportion is a sample proportion ( pˆ  ). n The best point estimate of the population mean is a sample mean ( x   x ). n Since p̂ varies from sample to sample, we use an interval based on p̂ to capture the unknown population proportion with a level of confidence. Objective B : Confidence Interval A confidence interval for an unknown parameter consists of an interval of numbers based on a point estimate. The level of confidence represents the expected proportion of intervals that will contain the parameter if a large number of different samples is obtained. The level of confidence is denoted as 1    100% . The level of confidence controls the width of the interval. Confidence interval estimates for a parameter are of the form: Point estimate  margin of error. Confidence interval for p : 1 pˆ  Z  / 2   pˆ  pˆ  where pˆ (1  pˆ ) n provided that n pˆ (1  pˆ )  10 . Used when p (the true proportion) is not known. Can be written as pˆ  E The value of Z / 2 is called the critical value of the distribution. The margin of error, E , in a 1    100% confidence interval for a population proportion is pˆ (1  pˆ ) . The width of the interval is determined by the margin of error. n Note: More confidence leads to a wider interval Ex. 100% confidence vs 50% confidence: given by E  Z  / 2 Example 1:Use StatCrunch to determine the critical value Z / 2 that corresponds to the given level of confidence. Stat-calc-normal –between-μ=0, σ=1 (a) 90% (b) 95% Diagram: diagram: P(____< x <____) = .90 P(____< x <____) = .95 Compute 𝑧1 = -1.645 𝑧1 = -1.96 (c) 98% diagram: P(____< x <____) = .98 𝑧1 = -2.33 (d) 92% diagram: P(____< x <____) = .92 𝑧1 = -1.75 𝑧2 = 1.645 𝑧2 = 1.96 𝑧2 = 2.33 𝑧2 = 1.75 NOTE: for 95% z is close 2 SD Example 2: Determine the margin of error (E) for p with x  540 and n  900 at a 99% level of confidence. Assume this represents the number who admitted to have texted in the last month while driving from a sample of 900 people. 𝑥 540 𝑝̂ =𝑛 = 900 = 0.60 Statcrunch gives z = 2.58 for 99% level of confidence pˆ (1  pˆ ) .60(.40) =2.58√ 900 =0.0421 n Note: the 99% CI: 0.60 ± .0421 E  Z / 2 2 Example 3: A Rasmussen Reports national survey of 1000 adult Americans found that 18% dreaded Valentine's Day. Construct a 95% confidence interval for the population proportion of adult Americans who dread Valentine's Day. Explain what does the interval mean. n=1000 𝑝̂ = 0.18 95% CI: 0.18 ± 2SE (You can use 1.96 for z to be more exact. For 68% you can use z=1, for 99.7% use z=3)) Standard deviation of sample (or standard error of sample) =  pˆ  pˆ (1  pˆ ) .18(.82) =√ 1000 =0.01215 n So 0.18 ± 2SE = 0.18 ± 2(0.01215) = 0.18 ± .024 CI: (0.18 – 0.024, 0.18 + 0.024) = (0.156, 0.204) or between 15.6% and 20.4% We are 95% confident that the true proportion of adults who dread Valentine’s Day is between 15.6% and 20.4%. Summary of CI’s 68% use z = 1 90% z=1.65 95% z=2 (or 1.96) 98% z = 2.33 99.7% z= 3 99% z =2.58 95% confidence interval: About 95 samples out of 100 will capture the true proportion and about 5 samples will not. Example 4: Construct a confidence interval of the population proportion at the given level of confidence where 80 students out of 200 came late to class for a lecture on a randomly selected day. x  80, n  200, 96% confidence Stat-proportion stat-one sample-with summary-80 successes-200 observations-confidence interval 96%compute Lower limit, upper limit = (0.329, 0.471) *We are 96% confident that the true population proportion of students who come late is between 32.9% and 47.1%. Example 5: In a study of 1228 randomly selected medical malpractice lawsuits, it is found that 856 of them were later dropped or dismissed. (a) What is the best point of estimate of the proportion of medical malpractice lawsuits that are dropped or dismissed? 856 𝑝̂ = = 0.697 1228 3 (b) Construct a 99% confidence interval (by hand) for the population proportion of medical malpractice lawsuits that are dropped or dismissed? 99%: use z = 2.58 pˆ (1  pˆ ) .697(.303)   0.0131 n 1228 CI = 0.697 ± 2.58(0.0131) = 0.697 ± 0.0338 CI : (.697 - .034, .697+.034) = (0.663, 0.731) CI = 𝑝̂ ± 2.58𝑆𝐸 where SE = (c) Interpret the interval. We are 99% confident that the true proportion of medical malpractice lawsuits that are dropped/dismissed is between 66.3% and 73.1%. Objective C :Sample Size Needed for Estimating the Population Proportion p The sample size required to obtain a 1    100% confidence interval for p with a margin of error E is given by Z  n  pˆ (1  pˆ )  / 2   E  2 Round up to the next integer p̂ is a prior estimate of p If a prior estimate of p is unavailable, the sample size required is 2 Z  n  0.25   /2  Round up to the next integer  E  So you can use 𝑝̂ = 0.50 if proportion is not given Example 1 : An urban economist wishes to estimate the proportion of Americans who own their homes. What size sample should be obtained if he wishes the estimate to be within 0.02 with 90% confidence if: (a) he uses a 2010 estimate of 0.669 obtained from the U.S Census Bureau? He wants: 𝑝̂ ± 0.02 = 0.669 ± 0.02 with 90% confidence so use z = 1.65 0.02 = 1.65 SE .669(.331) 0.02 = 1.65 √ 𝑛 0.02 . 669(.331) =√ 1.65 𝑛 4 0.0121 = √ . 669(.331) 𝑛 (0.0121)2 = . 669(.331) 𝑛 (0.0121)2 𝑛 = .669(.331) 𝑛= .669(.331) (0.0121)2 ≈ 1507.17 round up to 1508. He should sample 1508 Americans so that the estimate is within 0.02 margin of error at a 90% confidence level. Note: For MML you might have to be more precise and use z = 1.645 for 90%CI and z = 1.96 got 95%CI (b) he does not use any prior estimates? If not estimate is given for the proportion, then we use 𝑝̂ = .50 .50(.50) 0.02 = 1.65 √ 𝑛 0.02 . 50(.50) =√ 1.65 𝑛 0.0121 = √ . 50(.50) 𝑛 (0.0121)2 = . 50(.50) 𝑛 (0.0121)2 𝑛 = .50(.50) .50(.50) 𝑛 = (0.0121)2≈ 1707.5 round up to 1708. He should sample 1708 Americans. Note: if you do not round your answer will be 1702. Example 2: In a Gallup poll conducted in October 2010, 64% of the people polled answered "more strict" to the following question: "Do you feel that the laws covering the sale of firearms should be made more strict as they are now?" Suppose the margin of error in the poll was 3.5% and the estimate was made with 95% confidence. At least how many people were surveyed? (a) he uses a 2010 estimate of 0.64 obtained from the U.S Census Bureau? 𝑝̂ ± 𝐸 = 0.64 ± 0.035 with 95% confidence so use z = 1.96 or 2. 0.035 = 1.96 SE .64(.36) 0.035 = 1.96√ 𝑛 5 0.035 . 64(.36) =√ 1.96 𝑛 0.0179 = √ . 64(.36) 𝑛 (0.0179)2 = . 64(.36) 𝑛 (0.0179)2 𝑛 = .64(.36) .64(.36) 𝑛 = (0.0179)2≈ 722.5 round up to 723. So 723 people were surveyed in total. Note: If z = 2 is used instead, then n = 753 people. Example 3: A Gallup poll conducted in November 2010 found that 493 of 1050 adult Americans believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. (a) Obtain a point estimate for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. 493 𝑝̂ = = 0.470 1050 (b) Verify the requirements for constructing a confidence interval for p are satisfied. npq ≥ 10 1050(0.47)(0.53) ≥ 10 263 ≥ 10 yes Condition is met so 𝑝̂ (the sampling distribution) will be normally distributed (c) Construct a 95% confidence interval for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans healthcare coverage. Interpret the interval. By hand: 95%: use z = 2 pˆ (1  pˆ ) .47(.53)   0.0154 n 1050 CI = 0.47 ± 2 (0.0154) = 0.47 ± 0.0308 ≈ 0.47 ± 0.031 CI : (0.47 - 0.031, 0.47+0.031) = (0.439, 0.501) CI = 𝑝̂ ± 2𝑆𝐸 where SE = 6 We are 95% confident that the true proportion of adults who believe the federal government should cover healthcare is between 43.9% and 50.1%. Using Statcrunch (which will use z = 1.96) Stat-proportion stat-one sample-with summary-493 successes-1050 observations-confidence interval 95%compute CI = (0.439, 0.500) or between 43.9% and 50.0% (d) You wish to conduct your own study for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. What sample size would be needed for the estimate to be within 3 percentage points with 90% confidence if you use the estimate obtained in part (a). (Use statcrunch). pˆ (1  pˆ ) for 90% confidence z = 1.65 n Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.47width 0.06-compute N= 749 rounding up. (The total width is 0.06 since it is 3% on each side of the normal curve. By hand it would be: 0.47 ± 0.03 and setting 0.03 = 1.65 (e) You wish to conduct your own study for the proportion of adult Americans who believe it is the responsibility of the federal government to make sure all Americans have healthcare coverage. What sample size would be needed for the estimate to be within 3 percentage points with 90% confidence if you do not have a prior estimate? By hand it would be: 0.50 ± 0.03 and setting 0.03 = 1.65 pˆ (1  pˆ ) for 90% confidence z = 1.65 and we n would use 𝑝̂ = 0.50. Statcruch: Stat-proportion-one sample- power/sample size-confidence interval 0.90-target proportion 0.50width 0.06-compute N = 752 (rounded up) Note this is still close to our previous answer but larger. Chapter 9.2 Estimating a Population Mean Objective A : Point Estimate 7 The best point estimate of the population mean,  , is the sample mean, x . Objective B :Student's t - distribution Properties of the t - distribution 1. The t - distribution is different for different degrees of freedom ( df  n  1 ). 2. The t - distribution has the same general symmetric bell shape as the standard normal distribution but its area in the tails is a little greater than the area in the tails of the standard normal distribution due to the greater variability that is expected with small samples. 3. The t - distribution has a mean of t  0 at the center of the distribution. 4. As the sample size n gets larger, the t - distribution gets closer to the standard normal distribution. Example 1: Use StatCrunch to determine the t -value. *Note: There is more variability for smaller sample sizes. (a) Using Statcrunch find the t -value such that the area in the right tail is 0.05 with 19 degrees of freedom. Stat-Calc-T-degrees of freedom 19- P(x≥___) = 0.05 –compute t value = 1.73 Note n = 20 (sample size) You could also use P(x≤___) = 0.95 (b) Find the t -value such that the area left of the t -value is 0.02 with 6 degrees of freedom. Stat-Calc-T-degrees of freedom 6- P(x≤___) = 0.02 –compute t value = - 2.61 (c) Find the critical t -value that corresponds to 95% confidence. Assume 12 degrees of freedom. Stat-Calc-T-between-degrees of freedom 12- P(___≤x≤___) = 0.95 –compute (note sample size n = 13) T value = ± 2.18 d) Find the critical t -value that corresponds to 95% confidence. Assume 50 degrees of freedom. 8 Stat-Calc-T-between-degrees of freedom 50- P(___≤x≤___) = 0.95 –compute (note sample size n = 51) T value = ± 2.0086 e) What happened to the width of the interval as the sample size increased from 13 to 51? As the sample size increases, the interval became more narrow. The distribution became more normal. In general, the population standard deviation is unknown for estimating a population mean based on a sample mean. The t -distribution is used to off-set the additional variability introduced by using s in place of  . Objective C :Confidence Interval for a Population Mean Constructing a 1    100% Confidence Interval for  Point estimate  margin of error s s where E  t  / 2  . x  t /2  n n provided the data come from a population that is normally distributed, or the sample size is large. Example 1: A simple random sample of size n  30 has been obtained. From the normal probability plot and boxplot, judge whether a t -interval should be constructed. (a) 9 Yes, no outliers in the normal probability plot and the boxplot is roughly symmetrical. Thus we can use the t distribution regardless of the sample size. (b) No, one slight outlier in the normal probabilty plot and boxplot is skewed to the left. Since the sample size is less than 30 we cannot use a t distribution. Example 2: A simple random sample of size n is drawn from a population that is normally distributed to investigate the age when working people start thinking about retirement . The sample mean, x , is found to be 50, and the sample standard deviation, s , is found to be 8. (a) Construct a 98% confidence interval for  if the sample size, n , is 20. (That is 20 people are surveyed.) By hand: for 98%: use t = 2.54 from statcrunch (stat-calc-t) CI: 𝑥̅ ±𝐸== 𝑥̅ ± 2.54 𝑆𝐸 where SE = CI = 50 ± 2.54 ( 8 s n  8 20 ) = 50 ± 4.54 √20 CI= (50 – 4.54, 50 + 4.54) ≈ (45.46, 54.54) ≈ (45.5, 54.5) One can be 98% confident that the true mean age when people start thinking about retirement is between 45.8 and 54.2 years of age. Statcrunch: 10 Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .98-compute CI= (45.46, 54.54) ≈ (45.5, 54.5) (b) Use StatCrunch to construct a 98% confidence interval for  if the sample size, n , is 15. How does decreasing the sample size affect the margin of error, E ? Stat-t stats-one sample-with summary-mean 50, SD 8, n=15, CI .98-compute CI= (44.6, 55.4) This interval is wider. So when sample size is decreased, the CI increases. 8 E = 2.54 ( )=5.25 The margin of error increased when sample size decreased (compared to 4.54 in part a). √15 You can also find E from the CI: E = 𝑈𝐿−𝐿𝐿 2 = 55.4−44.6 2 =5.4 (Difference due to rounding in computations) (c) Construct a 95% confidence interval for  if the sample size, n , is 20. Compare the results to those obtained in part (a). How does decreasing the level of confidence affect the margin of error, E ? Stat-t stats-one sample-with summary-mean 50, SD 8, n=20, CI .95-compute CI=(46.3, 53.7) This CI is more narrow than the one in part a: (45.5, 54.5). So as the level of confidence increases, the interval will become wider. E for 95% CI is E = 𝑈𝐿−𝐿𝐿 2 = 53.7−46.3 2 = 3.7 This is smaller than E in part a: 4.5 So the margin of error decreases as the level of confidence decreases. In part a: About 98 samples out of 100 will capture the true proportion. In part b: About 95 samples out of 100 will capture the true proportion. (d) Could we have computed the confidence intervals in parts (a) to (c) if the population had not been normally distributed? Why? No because the conditions to apply this formula would not have been met. The sample size ,n was less than 30 in both parts. Example 3: Determine the point estimate of the population mean and margin of error for the following confidence interval. Lower bound: 5 Upper bound: 23 E= 𝑈𝐿−𝐿𝐿 2 = 23−5 2 = 9 ( so 5+9 = 14 and 23-9 = 14) The population mean, 𝑥̅ = 14. You can also compute 𝑥̅ by finding the midpoint of the interval: 23+5 2 =14 Margin of error, E = 9 Point estimate of the popultion mean, 𝑥̅ = 14 Note CI = 14 ± 9 11 Example 4 : How much time do Americans spend eating or drinking? Suppose for a random sample of 1001 Americans age 15 or older, the mean amount of time spent eating or drinking per day is 1.22 hours with a standard deviation of 0.65 hour. (a) A histogram of time spent eating and drinking each day is skewed right. Use this result to explain why a large sample size is needed to construct a confidence interval for the mean time spent eating and drinking each day. Since the population is not normally distributed, you need a large sample size to achieve a sampling distribution of 𝑥̅ that will be normally distributed. Thus, use n >30. (b) Determine a 95% confidence interval for the mean amount of time Americans age 15 or older spend eating and drinking each day. Interpret the interval. By Hand: First find the t-value using Statcrunch for 95% confidence: stat-calc-t-between, df 1000, P(___≤x≤___) = 0.95 –compute t = 1.96 (this is close to the z value for 95% since the sample size was large, which we can then also use 2SE’s) s 0.65 CI: 𝑥̅ ±𝐸= 𝑥̅ ± 1.96 𝑆𝐸 where SE =  n 1001 CI = 1.22 ± 1.96 ( 0.65 ) = 1.22 ± 0.040 √1001 CI= (1.22 – 0.040, 1.22 + 0.040) ≈ (1.18, 1.26) Using Statcrunch: Stat-t stats-one sample-with summary-mean 1.22, SD 0.65, n=1001, CI 0.95-compute CI = (1.18, 1.26) *One can be 95% confident that the true mean time spent eating and drinking each day is between 1.18 and 1.26 hours. (c) Could the interval be used to estimate the mean amount of time a 9-year-old American spends eating and drinking each day? Explain. No. The study was conducting using people who were 15 years old or more. Therefore, the point estimate of the mean is for a population that was 15 or older only. Objective D : Determining the Sample Size n The sample size required to estimate the population mean,  , with a level of confidence 1    100% within a specified margin of error, E , is given by  Z s  n    /2   E  where n is rounded up to the nearest whole number. 2 Note: *The t -distribution approaches the standard normal z - distribution as the sample size increases. Z is used in this formula instead of t to approx. n. 12 Example 1: A researcher wanted to determine the mean number of hours per week (Sunday through Saturday) the typical person watches television. Results from the Sullivan Statistics Survey indicate that s  7.5 hours. (a) How many people are needed to estimate the number of hours people watch television per week within 2 hours with 95% confidence? The standard deviation is s  7.5 hours. Note the mean was not provided. Want CI: 𝑥̅ ± 𝐸 = 𝑥̅ ± 2 2=𝑡 𝑠 √𝑛 Can use z instead of t , z = 1.96 or t = 2 2 = 1.96 7.5 √𝑛 2√𝑛 = 1.96 (7.5) 1.96 (7.5) √𝑛 = 2 [(1.96 𝑛= (7.5))/2]2 n = 54.02 round up to 55 people 55 people need to be surveyed per week so that the margin of error is within 2 hours at (at a 95% confidence level). (b) How many people are needed to estimate the number of hours people watch television per week within 1 hour with 95% confidence? Let’s do this one using Statcrunch: Stat-z stat-one sample- power/sample size-select ‘confidence interval width’-confidence level 0.95, SD 7.5, width 2-compute n = 217 (c) What effect does doubling the required accuracy have on the sample size? If you want to be more accurate (within 1 hour instead of within 2 hours), increase the sample size. In this case it was increased at a ratio of 217 55 ≈ 4. If you double the accuracy, the sample has to be 4 times as large. Chapter 9 Estimating a Population Standard Deviation (Supplementary Materials) Finding CI for standard deviations Objective A : Point Estimate The best point estimate of the population variance,  2 , is the sample variance, s 2 . 13 Objective B : Chi-Square Distribution Example 1: Use StatCrunch to find the critical values  12 / 2 and  2 / 2 for the given level of confidence and sample size. (a) 90% confidence, n  23 Stat  Calculators  Chi-Square  DF 22 (n-1)  Between - P(___≤x ≤ ____) = 0.90  compute The critical values are 12.338 and 33.924. (the ‘z’ values) 14 Objective C : Confidence Interval for a Population Variance or Standard Deviation (1   ) 100% of the values of  2 will lie between  12 / 2 and  2 / 2 . ( Recall:  2  (n  1) s 2 2 ) To find a (1   ) 100% confidence interval about  , take the square root of the lower bound and upper bound. Example 1: A simple random sample of size n is drawn from a population that is known to be normally distributed. The sample variance, s 2 , is determined to be 19.8. (Thus the standard deviation is √19.8 ≈ 4.45). (a) Use StatCrunch to construct a 95% confidence interval for  2 if the sample size, n , is 10. Stat  Variance Stats  One Sample  with summary  Sample variance: 19.8, sample size: 10  Confidence interval for 𝜎 2 : 0.95  compute and record the results. 95% confidence interval results: σ2 : Variance of population 15 Variance Sample Var. DF L. Limit U. Limit σ2 19.8 9 9.367722 65.99048 *One can be 95% confident that the true variance is between 9.37 and 65.99. (b) If the sample size is increased to n = 25, how does increasing the sample size affect the width of the interval? It will decrease the width (becomes narrower). (c) If the confidence level is increased to 99%, how does increasing the level of confidence affect the width of the confidence interval? The interval becomes wider if you want higher confidence. Example 2: Travelers per taxes for flying, car rentals, and hotels. The following data represent the total travel tax for a 3-day business trip in eight randomly selected cities. It was verified that the data are normally distributed. Use StatCrunch to construct a 90% confidence interval for the standard deviation travel tax for a 3-day business trip. Interpret the interval. First we need to compute the variance since the raw data has been provided. Stat  Input given data  Summary Statistics  Columns Var1 Variance compute Summary statistics: Column Variance var1 151.87187 Now we will find the interval: Stat  Variance Stats  One Sample  with summary  Sample variance: 151.87187, sample size: 8  Confidence interval for 𝜎 2 : 0.90  compute Alternate way: Stat  Input given data  Variance Stats  One Sample  with data Columns Var1 Confidence interval for 𝜎 2 : 0.90  compute and record the results. 90% confidence interval results: σ2 : Variance of population 16 Variance Sample Var. DF L. Limit U. Limit σ2 151.87187 7 75.573504 490.50829 Manually compute the square root of each limit to change from variance to standard deviation. √75.573504= 8.69330224943 Lower Limit √490.50829= 22.1474217461 Upper Limit We are 90% confident that the standard deviation of the travel tax for a 3-day business trip is between $8.69 and $22.15. Summary of CI of means and proportions: To summarize: Confidence Intervals As sample size n increases (you get better results) ----- CI narrows As standard error (SE or SD) decreases (you get better results) ---- CI narrows As % of confidence level increases  CI widens Sample size As n increases (better results) --- SE/SD decreases As n increases (better results) ---shape becomes more normal (symmetric) As n increases (better results) --- sample mean 𝑥̅ approaches true population mean, µ (or sample proportion 𝑝̂ approaches true population proportion, p) 17

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 9 Estimating the Value of a Parameter KEY