Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Chapter 21 What Is a Confidence Interval? Chapter 21 1 Thought Question 1 Suppose that 40% of a certain population favor the use of nuclear power for energy. (a) If you randomly sample 10 people from this population, will exactly four (40%) of them be in favor of the use of nuclear power? Would you be surprised if only two (20%) of them are in favor? How about if none of the sample are in favor? Chapter 21 2 Thought Question 2 Suppose that 40% of a certain population favor the use of nuclear power for energy. (b) Now suppose you randomly sample 1000 people from this population. Will exactly 400 (40%) of them be in favor of the use of nuclear power? Would you be surprised if only 200 (20%) of them are in favor? How about if none of the sample are in favor? Chapter 21 3 Thought Question 3 A 95% confidence interval for the proportion of adults in the U.S. who have diabetes extends from .07 to .11, or 7% to 11%. What does it mean to say that the interval from .07 to .11 represents a 95% confidence interval for the proportion of adults in the U.S. who have diabetes ? Chapter 21 4 Thought Question 4 Would a 99% confidence interval for the proportion described in Question 3 be wider or narrower than the 95% interval given? Explain. (Hint: what is the difference between a 68% interval and a 95% interval?) Chapter 21 5 Thought Question 5 In a May 2006 Zogby America poll of 1000 adults, 70% said that past efforts to enforce immigration laws have been inadequate. Based on this poll, a 95% confidence interval for the proportion in the population who feel this way is about 67% to 73%. If this poll had been based on 5000 adults instead, would the 95% confidence interval be wider or narrower than the interval given? Explain. Chapter 21 6 Recall from previous chapters: Parameter fixed, unknown number that describes the population Statistic known value calculated from a sample a statistic is used to estimate a parameter Sampling Variability different samples from the same population may yield different values of the sample statistic estimates from samples will be closer to the true values in the population if the samples are larger Chapter 21 7 Recall from previous chapters: Example: The amount by which the proportion obtained from the sample ( p̂) will differ from the true population proportion (p) rarely exceeds the margin of error. Sampling Distribution tells what values a statistic takes and how often it takes those values in repeated sampling. Example: sample proportions ( p̂’s) from repeated sampling would have a normal distribution with a certain mean and standard deviation. Chapter 21 8 Case Study Comparing Fingerprint Patterns Science News, Jan. 27, 1995, p. 451. Chapter 21 9 Case Study: Fingerprints Fingerprints are a “sexually dimorphic trait…which means they are among traits that may be influenced by prenatal hormones.” It is known… – Most people have more ridges in the fingerprints of the right hand. (People with more ridges in the left hand have “leftward asymmetry.”) – Women are more likely than men to have leftward asymmetry. Compare fingerprint patterns of heterosexual and homosexual men. Chapter 21 10 Case Study: Fingerprints Study Results 66 homosexual men were studied. • 20 (30%) of the homosexual men showed left asymmetry. 186 heterosexual men were also studied • 26 (14%) of the heterosexual men showed left asymmetry. Chapter 21 11 Case Study: Fingerprints A Question Assume that the proportion of all men who have leftward asymmetry is 15%. Is it unusual to observe a sample of 66 men with a sample proportion (p̂) of 30% if the true population proportion (p) is 15%? Chapter 21 12 Twenty Simulated Samples (n=66) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Sample Size Chapter 21 13 The Rule for Sample Proportions If numerous simple random samples of size n are taken from the same population, the sample proportions ( p ˆ ) from the various samples will have an approximately normal distribution. The mean of the sample proportions will be p (the true population proportion). The standard deviation will be: p(1 p) n Chapter 21 14 Rule Conditions and Illustration For rule to be valid, must have Random sample ‘Large’ sample size Chapter 21 15 Case Study: Fingerprints Sampling Distribution p 0.15 ( mean); n 66 p(1 p ) 0.15(1 0.15) n 66 0.044 ( s.d.) Chapter 21 16 Case Study: Fingerprints Answer to Question Where should about 95% of the sample proportions lie? mean plus or minus two standard deviations 0.15 2(0.044) = 0.062 0.15 + 2(0.044) = 0.238 95% should fall between 0.062 & 0.238 Chapter 21 17 1000 Simulated Samples (n=66) Simulated Data: p=0.15 160 p 0.15 n 66 140 120 100 80 0.15(1 0.15) 0.044 66 60 40 20 0.9697 0.9091 0.8485 0.7879 0.7273 0.6667 0.6061 0.5455 0.4848 0.4242 0.3636 0.3030 0.2424 0.1818 0.1212 0.0606 0 0 Proportion of Successes Chapter 21 18 1000 Simulated Samples (n=66) Simulated Data: p=0.15 160 140 approximately 95% of sample proportions fall in this interval 120 100 (0.062 to 0.238). 80 60 Is it likely we would observe a sample proportion 0.30? 40 20 0.9697 0.9091 0.8485 0.7879 0.7273 0.6667 0.6061 0.5455 0.4848 0.4242 0.3636 0.3030 0.2424 0.1818 0.1212 0.0606 0 0 Proportion of Successes Chapter 21 19 1000 Simulated Samples (n=30) Simulated Data: p=0.15 200 p 0.15 n 30 180 160 140 120 100 0.15(1 0.15) 0.065 30 80 60 40 20 0.9333 0.8667 0.8000 0.7333 0.6667 0.6000 0.5333 0.4667 0.4000 0.3333 0.2667 0.2000 0.1333 0.0667 0 0 Proportion of Successes Chapter 21 20 1000 Simulated Samples (n=30) Simulated Data: p=0.15 200 180 160 approximately 95% of sample proportions fall in this interval. 140 120 100 80 60 Is it likely we would observe a sample proportion 0.30? 40 20 0.9333 0.8667 0.8000 0.7333 0.6667 0.6000 0.5333 0.4667 0.4000 0.3333 0.2667 0.2000 0.1333 0.0667 0 0 Proportion of Successes Chapter 21 21 Confidence Interval for a Population Proportion An interval of values, computed from sample data, that is almost sure to cover the true population proportion. “We are ‘highly confident’ that the true population proportion is contained in the calculated interval.” Statistically (for a 95% C.I.): in repeated samples, 95% of the calculated confidence intervals should contain the true proportion. Chapter 21 22 Formula for a 95% Confidence Interval for the Population Proportion (Empirical Rule) sample proportion plus or minus two standard deviations of p( 1 p ) p̂ 2 the sample proportion: n since we do not know the population proportion p (needed to calculate the standard deviation) we will use the sample proportion p̂ in its place. Chapter 21 23 Formula for a 95% Confidence Interval for the Population Proportion (Empirical Rule) pˆ (1 pˆ ) pˆ 2 n standard error (estimated standard deviation of p̂ ) Chapter 21 24 Margin of Error 2 p̂(1 p̂ ) (plus or minus part of C.I.) n 2 0.5(1 0.5 ) n Chapter 21 1 n 25 Formula for a C-level (%) Confidence Interval for the Population Proportion pˆ (1 pˆ ) pˆ z * n where z* is the critical value of the standard normal distribution for confidence level C Chapter 21 26 Common Values of z* Confidence Level C Critical Value z* 50% 0.67 60% 0.84 68% 1 70% 1.04 80% 1.28 90% 1.64 95% 1.96 (or 2) 99% 2.58 99.7% 3 99.9% 3.29 Chapter 21 27 Case Study Parental Discipline Brown, C. S., (1994) “To spank or not to spank.” USA Weekend, April 22-24, pp. 4-7. What are parents’ attitudes and practices on discipline? Chapter 21 28 Case Study: Survey Parental Discipline Nationwide random telephone survey of 1,250 adults. – 474 respondents had children under 18 living at home – results on behavior based on the smaller sample reported margin of error – 3% for the full sample – 5% for the smaller sample Chapter 21 29 Case Study: Results Parental Discipline “The 1994 survey marks the first time a majority of parents reported not having physically disciplined their children in the previous year. Figures over the past six years show a steady decline in physical punishment, from a peak of 64 percent in 1988” – The 1994 proportion who did not spank or hit was 51% ! Chapter 21 30 Case Study: Results Parental Discipline Disciplining methods over the past year: – denied privileges: 79% – confined child to his/her room: 59% – spanked or hit: 49% – insulted or swore at child: 45% Margin of error: 5% – Which of the above appear to show a true value different from 50%? Chapter 21 31 Case Study: Confidence Intervals Parental Discipline denied privileges: 79% – p̂ : 0.79 .79(1 .79) 0.019 – standard error of p̂ : 474 – 95% C.I.: .79 2(.019) : (.752, .828) confined child to his/her room : 59% – p̂ : 0.59 .59(1 .59) 0.023 474 – standard error of p̂ : – 95% C.I.: .59 2(.023) : (.544, .636) Chapter 21 32 Case Study: Confidence Intervals Parental Discipline spanked or hit: 49% – p̂ : 0.49 .49(1 .49) 0.023 – standard error of p̂ : 474 – 95% C.I.: .49 2(.023) : (.444, .536) insulted or swore at child: 45% – p̂ : 0.45 .45(1 .45) 0.023 474 – standard error of p̂ : – 95% C.I.: .45 2(.023) : (.404, .496) Chapter 21 33 Case Study: Results Parental Discipline Asked of the full sample (n=1,250): “How often do you think repeated yelling or swearing at a child leads to long-term emotional problems?” – very often or often: 74% – sometimes: 17% – hardly ever or never: 7% – no response: 2% Margin of error: 3% Chapter 21 34 Case Study: Confidence Intervals Parental Discipline hardly ever or never: 7% – p̂ : 0.07 .07(1 .07) 0.007 – standard error of p̂ : 1250 – 95% C.I.: .07 2(.007) : (.056, .084) Few people believe such behavior is harmless, but almost half (45%) of parents engaged in it! Chapter 21 35 Key Concepts (1st half of Ch. 21) Different samples (of the same size) will generally give different results. We can specify what these results look like in the aggregate. Rule for Sample Proportions Compute and interpret Confidence Intervals for population proportions based on sample proportions Chapter 21 36 Inference for Population Means Sampling Distribution, Confidence Intervals The remainder of this chapter discusses the situation when interest is in making conclusions about population means rather than population proportions – includes the rule for the sampling distribution of sample means ( X's ) – includes confidence intervals for one mean or a difference in two means Chapter 21 37 Thought Question 6 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316) Suppose the mean weight of all women at a university is 135 pounds, with a standard deviation of 10 pounds. • Recalling the material from Chapter 13 about bell-shaped curves, in what range would you expect 95% of the women’s weights to fall? 115 to 155 pounds Chapter 21 38 Thought Question 6 (cont.) • If you were to randomly sample 10 women at the university, how close do you think their average weight would be to 135 pounds? • If you randomly sample 1000 women, would you expect the average to be closer to 135 pounds than it would be for the sample of 10 women? Chapter 21 39 Thought Question 7 A study compared the serum HDL cholesterol levels in people with low-fat diets to people with diets high in fat intake. From the study, a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5... a. Does this mean that 95% of all people with low-fat diets will have HDL cholesterol levels between 43.5 and 50.5? Explain. Chapter 21 40 Thought Question 7 (cont.) … a 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from 43.5 to 50.5. A 95% confidence interval for the mean HDL cholesterol for the high-fat group extends from 54.5 to 61.5. ( ) ( ) 40 45 50 55 60 65 b. Based on these results, would you conclude that people with low-fat diets have lower HDL cholesterol levels, on average, than people with high-fat diets? Chapter 21 41 Thought Question 8 The first confidence interval in Question 7 was based on results from 50 people. The confidence interval spans a range of 7 units. If the results had been based on a much larger sample, would the confidence interval for the mean cholesterol level have been wider, more narrow or about the same? Explain. Chapter 21 42 Thought Question 9 In Question 7, we compared average HDL cholesterol levels for two diet groups by computing separate confidence intervals for the two means. Is there a more direct value (and single C.I.) to examine in order to make the comparison between the two groups? Chapter 21 43 Case Study Weights of Females at a Large University Hypothetical (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316) Suppose the mean weight of all women is =135 pounds with a standard deviation of =10 pounds and the weight values follow a bellshaped curve. Chapter 21 44 Case Study: Weights Questions Where should 95% of all women’s weights fall? mean plus or minus two standard deviations 135 2(10) = 115 135 + 2(10) = 155 95% should fall between 115 & 155 What about the mean (average) of a sample of n women? What values would be expected? Chapter 21 45 Twenty Simulated Samples (n=1000) 140 139 138 137 136 135 134 133 132 131 130 11 6 11 16 21 26 31 36 41 46 50051 56 Sample Size 61 Chapter 21 66 71 76 81 86 91 961000 46 The Rule for Sample Means If numerous simple random samples of size n are taken from the same population, the sample means (X ) from the various samples will have an approximately normal distribution. The mean of the sample means will be (the population mean). The standard deviation will be: ( is the population s.d.) n Chapter 21 47 Conditions for the Rule for Sample Means Random sample Population of measurements… – Follows a bell-shaped curve - or - – Not bell-shaped, but sample is ‘large’ Chapter 21 48 Case Study: Weights Sampling Distribution (for n = 10) μ 135 ( mean for population and X) σ 10 ( s.d. for population ) n 10 σ n 10 10 3.16 ( s.d. for X) Chapter 21 49 Case Study: Weights Answer to Question (for n = 10) Where should 95% of the sample mean weights fall (from samples of size n=10)? mean plus or minus two standard deviations 135 2(3.16) = 128.68 135 + 2(3.16) = 141.32 95% should fall between 128.68 & 141.32 Chapter 21 50 Chapter 21 150.0000 148.5000 147.0000 145.5000 144.0000 142.5000 141.0000 139.5000 138.0000 136.5000 135.0000 133.5000 132.0000 130.5000 129.0000 127.5000 126.0000 124.5000 123.0000 121.5000 120 Sampling Distribution of Mean (n=10) Simulated Data: Sample Size=10 200 150 100 50 0 Sam ple Means 51 Case Study: Weights Sampling Distribution (for n = 25) μ 135 σ 10 n 25 10 25 2 Chapter 21 52 Case Study: Weights Answer to Question (for n = 25) Where should 95% of the sample mean weights fall (from samples of size n=25)? mean plus or minus two standard deviations 135 2(2) = 131 135 + 2(2) = 139 95% should fall between 131 & 139 Chapter 21 53 0 Chapter 21 Sam ple Means 54 150.0000 148.5000 147.0000 145.5000 144.0000 142.5000 141.0000 139.5000 138.0000 136.5000 135.0000 133.5000 132.0000 130.5000 129.0000 127.5000 126.0000 124.5000 123.0000 121.5000 120 Sampling Distribution of Mean (n=25) Simulated Data: Sample Size=25 200 150 100 50 Case Study: Weights Sampling Distribution (for n = 100) μ 135 σ 10 n 100 10 100 1 Chapter 21 55 Case Study: Weights Answer to Question (for n = 100) Where should 95% of the sample mean weights fall (from samples of size n=100)? mean plus or minus two standard deviations 135 2(1) = 133 135 + 2(1) = 137 95% should fall between 133 & 137 Chapter 21 56 Chapter 21 Sam ple Means 57 150.0000 148.5000 147.0000 145.5000 144.0000 142.5000 141.0000 139.5000 138.0000 136.5000 135.0000 133.5000 132.0000 130.5000 129.0000 127.5000 126.0000 124.5000 123.0000 121.5000 120 Sampling Distribution of Mean (n=100) Simulated Data: Sample Size=100 200 150 100 50 0 Case Study Exercise and Pulse Rates Hypothetical Is the mean resting pulse rate of adult subjects who regularly exercise different from the mean resting pulse rate of those who do not regularly exercise? Find Confidence Intervals for the means Chapter 21 58 Case Study: Results Exercise and Pulse Rates A random sample of n1=31 nonexercisers yielded a sample mean of X1=75 beats per minute (bpm) with a sample standard deviation of s1=9.0 bpm. A random sample of n2=29 exercisers yielded a sample mean of X 2=66 bpm with a sample standard deviation of s2=8.6 bpm. Nonexercisers Exercisers n mean std. dev. 31 75 9.0 29 66 8.6 Chapter 21 59 The Rule for Sample Means If numerous simple random samples of size n are taken from the same population, the sample means (X ) from the various samples will have an approximately normal distribution. The mean of the sample means will be (the population mean). The standard deviation will be: n We do not know the value of ! Chapter 21 60 Standard Error of the (Sample) Mean SEM = standard error of the mean (standard deviation from the sample) = divided by (square root of the sample size) = s n Chapter 21 61 Case Study: Results Exercise and Pulse Rates Nonexer. Exercisers n mean std. dev. 31 75 9.0 29 66 8.6 std. err. 1.6 1.6 Typical deviation of an individual pulse rate (for Exercisers) is s = 8.6 Typical deviation of a mean pulse rate (for Exercisers) is s = 1.6 8.6 n Chapter 21 29 62 Case Study: Confidence Intervals Exercise and Pulse Rates 95% C.I. for the population mean: sample mean 2 (standard error) X2 s n Nonexercisers: 75 ± 2(1.6) = 75 ± 3.2 = (71.8, 78.2) Exercisers: 66 ± 2(1.6) = 66 3.2 = (62.8, 69.2) Do you think the population means are different? Yes, because the intervals do not overlap Chapter 21 63 Formula for a C-level (%) Confidence Interval for the Population Mean s x z * n where z* is the critical value of the standard normal distribution for confidence level C Chapter 21 64 Careful Interpretation of a Confidence Interval “We are 95% confident that the mean resting pulse rate for the population of all exercisers is between 62.8 and 69.2 bpm.” (We feel that plausible values for the population of exercisers’ mean resting pulse rate are between 62.8 and 69.2.) ** This does not mean that 95% of all people who exercise regularly will have resting pulse rates between 62.8 and 69.2 bpm. ** Statistically: 95% of all samples of size 29 from the population of exercisers should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the C.I.s should contain the true population mean. Chapter 21 65 Case Study: Confidence Intervals Exercise and Pulse Rates 95% C.I. for the difference in population means (nonexercisers minus exercisers): (difference in sample means) 2 (SE of the difference) Difference in sample means: X 1 X 2 = 9 SE of the difference = 2.26 (given) 95% confidence interval: (4.48, 13.52) – interval does not include zero ( means are different) Chapter 21 66 Case Study An Experiment Testing a Vaccine for Those with Genital Herpes Adler, T., (1994) “Therapeutic vaccine fights herpes.” Science News, Vol. 145, June 18, p. 388. Does a new vaccine prevent the outbreak of herpes in people already infected? Chapter 21 67 Case Study: Sample An Experiment Testing a Vaccine for Those with Genital Herpes 98 men and women aged 18 to 55 Experience between 4 and 14 outbreaks per year Experiment – Double-blind experiment – Randomized to vaccine or placebo Chapter 21 68 Case Study: Report An Experiment Testing a Vaccine for Those with Genital Herpes “The vaccine was well tolerated. gD2 recipients reported fewer recurrences per month than placebo recipients (mean 0.42 [sem 0.05] vs 0.55 [0.05]…)…” Chapter 21 69 Case Study: Confidence Intervals An Experiment Testing a Vaccine for Those with Genital Herpes 95% C.I. for population mean recurrences: – Vaccine group: 0.42 2(0.05) : (.32, .52) – Placebo group: 0.55 2(0.05) : (.45, .65) 95% C.I. for the difference in population means: – Difference = -0.13, SE = 0.07 (given) – C.I.: (-0.27, 0.01) (contains 0 means not different) Chapter 21 70 Key Concepts (2nd half of Ch. 21) Rule for Sample Means Compute confidence intervals for means based on one sample Compute confidence intervals for means based on two samples Interpret Confidence Intervals for Means Chapter 21 71