Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 651 Lecture 5 Copyright (c) Bani Mallick 1 Topics in Lecture #5 Confidence intervals for a population mean m when the population standard deviation s is known. Properties of confidence intervals: what things make them longer and shorter. Sample size calculation for a population mean m when the population standard deviation s is known : a simple illustration of a method. Copyright (c) Bani Mallick 2 Book Sections Covered in Lecture #5 Chapter 5.1 Chapter 5.2 Chapter 5.3 Copyright (c) Bani Mallick 3 Lecture 4 Review: Pr(X < c) for Normal Populations Compute the z-score c-μ z= σ Look up value in Table 1 Copyright (c) Bani Mallick 4 Lecture 4 Review: Pr(X > c) for Normal Populations Compute the z-score c-μ z= σ Look up the value for z in Table 1 Subtract this value from 1.0 Copyright (c) Bani Mallick 5 Lecture 4 Review: Inference The sample mean is a random variable Its own “population” mean is m It’s standard deviation is σ/ n Note how the standard deviation of the sample mean becomes smaller as the sample size becomes larger More data = more precision!!!!! Copyright (c) Bani Mallick 6 Lecture 4 Review: Central Limit Theorem The sample mean is a random variable Its own “population” mean is m It’s standard deviation is σ/ n In “large enough” samples, the sample mean is very nearly normally distributed, i.e., has a bell--shaped histogram Copyright (c) Bani Mallick 7 Confidence Interval for a Population Mean A considerable part of basic statistics is to make inferences about the population mean m It is impossible to know the value of m exactly. This is a key factoid: why do I say this with such certainty? Copyright (c) Bani Mallick 8 Confidence Interval for a Population Mean A considerable part of basic statistics is to make inferences about the population mean m It is impossible to know the value of m exactly. Because (almost) every sample will give you a unique sample mean, and that sample mean will not equal the population mean. Copyright (c) Bani Mallick 9 Confidence Interval for a Population Mean What we can do is to construct an interval of possible values for the population mean m. The interval is determined by how much “confidence” we want in saying that the population mean m is in the interval. The interval is always of the form confidence factor Copyright (c) Bani Mallick 10 Confidence Interval for a Population Mean confidence factor The confidence factor is determined by how much confidence we want in concluding that the population mean is actually in the interval Which interval has higher confidence of including the population mean? -100 to -50 OR -150 to 0 Copyright (c) Bani Mallick 11 Confidence Interval for a Population Mean: Formal Method The first method assumes that the population standard deviation s is known. Suppose we want to be 95% confident that our interval includes the population mean m, i.e., the probability is 95% that the population mean m is in the interval. Here is the interval: s 1.96 n s to 1.96 n Copyright (c) Bani Mallick 12 WOMEN’S INTERVIEW SURVEY OF HEALTH (WISH) I computed the reported mean caloric intake at the start of the study, and the mean reported caloric intake at the end My random variable X was the change (difference) My hypothesis is that the population mean of X is < 0. In other words, I think women report less calories the more they are asked about their diet (Hawthorne Effect). Copyright (c) Bani Mallick 13 WISH: Change in Caloric Intake 2000 247 1000 0 -1000 -2000 217 239 Does it look like a big change? Note that the scale of the box plot is -3000 to 2000 208 -3000 N= 271 Change in mean Energ Copyright (c) Bani Mallick 14 WISH The sample size is n = 271 The sample mean change I am going to pretend that the population standard deviation is s = 600. s 1.96 n = -180 s to 1.96 n Copyright (c) Bani Mallick 15 WISH: Change in Reported Caloric Intake n = 271, s = 600, = -180 s 1.96 71 n s 1.96 - 180 - 71 - 251 n s 1.96 - 180 71 - 109 n 95% CI = -251 to -109 Copyright (c) Bani Mallick 16 Review s = 600, n = 271, = -180 Then, with 95% probability, true population mean change is in the interval from -251 to 109 The chance is 95% that the population mean change is between 251 and 109 calories lower Is there a Hawthorne effect? Copyright (c) Bani Mallick 17 Confidence Intervals You can construct a confidence interval for the population mean with any level of confidence. Generally, people report the 95% CI, but sometimes they report the 90% and 99% confidence intervals. This is easy to do via a formula, and even easier to do via SPSS. Copyright (c) Bani Mallick 18 Confidence Interval for a Population Mean m when s is Known Want 90%, 95% and 99% chance of interval including m. s s 90% 1.645 to 1.645 n n 95% 99% s 1.96 n s 2.58 n s to 1.96 n s to 2.58 n Copyright (c) Bani Mallick 19 Confidence Intervals There is a general formula given on page 200 If you want a (1-a)100% confidence interval for the population mean m when the population s.d. s is known, use the formula za / 2 s n to z a / 2 s n The term za/2 is the value in Table 1 that gives probability 1 - a/2. a = 0.10, za/2 = 1.645: a = 0.05, za/2 = 1.96, a = 0.01, za/2 = 2.58 Copyright (c) Bani Mallick 20 WISH The sample size is n = 271 The sample mean change I am going to pretend that the population standard deviation is s = 600. I want a 99% confidence interval: za/2 = 2.58 s 2.58 n = -180 s to 2.58 n Copyright (c) Bani Mallick 21 WISH: Change in Reported Caloric Intake n = 271, s = 600, = -180 s 2.58 94 n s 2.58 - 180 - 94 - 274 n s 2.58 - 180 94 - 86 n 99% CI = -274 to -86 Copyright (c) Bani Mallick 22 WISH: Change in Reported Caloric Intake 99% CI = -274 to -86 The chance is 99% that the population mean change in reported caloric intake is between 274 and 86 calories The chance is less than 1% that there is no change in the population mean. Copyright (c) Bani Mallick 23 WISH: Change in Reported Caloric Intake 99% CI = -274 to -86 95% CI = -251 to -109 Note that the 99% CI is longer than the 95% CI. This is clear(!): the more confidence you want, the longer the CI has to be. Put another way, the less willing you are to be wrong, the more conservative your claims. Copyright (c) Bani Mallick 24 Effect of Sample Size 95% CI = -251 to -109 with n = 271 If n = 1000, the 95% CI would be from -217 to -143 Note how the CI gets shorter in length as the sample size gets larger. This is a general fact: the larger the sample size the shorter the CI. Copyright (c) Bani Mallick 25 Effect of Population Standard Deviation 95% CI = -251 to -109 with s= 600 If s = 2000, the 95% CI would be from -418 to +58 Note how the CI gets longer in length as the population standard deviation s gets larger. This is a general fact: the larger the population standard deviation s the longer the CI. Copyright (c) Bani Mallick 26 Using SPSS to Construct CI SPSS actually assumes that the population standard deviation is unknown: we will consider this case later. Its default is a 95% CI You can easily change to any level of confidence SPSS demo using Wish Data Copyright (c) Bani Mallick 27 Sample Size Determination In general, this is a relatively complex issue, depending very heavily on the experiment. I will show you a simple calculation in the special case that the population standard deviation s is known. Of course, s is not known in practice, and more complex methods are required, but this will give you a feel for the process. Copyright (c) Bani Mallick 28 Sample Size Determination The usual answer to “what sample size should I take” is “what can you afford”. Remember, more precision with larger sample sizes Less precision with smaller sample sizes Copyright (c) Bani Mallick 29 Sample Size Determination confidence factor The length of a confidence interval is 2 x confidence factor Thus, our 95% CI for WISH was -251 to -109, so that the length was 142 calories What if I wanted the length to be 100 calories? Then the CI would have to be 50 Copyright (c) Bani Mallick 30 Sample Size Determination confidence factor z a / 2 s n s n The length of the CI is If I want the length of a confidence interval to be 2z a / 2 2xE then I have to set 2E 2z a / 2 s n Now I do some algebra Copyright (c) Bani Mallick 31 Sample Size Determination I want the length of a confidence interval to be 2xE then the sample size I need is s n za / 2 E 2 Copyright (c) Bani Mallick 32 Sample Size Determination Consider WISH, where = 600. Suppose I want the confidence interval length of 95% CI to be 2xE = 100 E = 50, za/2 = 1.96 s 600 n z a / 2 1.96 553 E 50 2 Copyright (c) Bani Mallick 2 33 Sample Size Determination Consider WISH, where = 600. Suppose I want the confidence interval length of 95% CI to be 2xE = 60 E = 30, za/2 = 1.96 s 600 n z a / 2 1.96 1,537 E 30 2 Copyright (c) Bani Mallick 2 34 Sample Size Determination 95% confidence Length = 100, E = 50, n = 553 Length = 60, E = 30, n = 1,557 General fact: the more precise you want to be (shorter CI), the larger the sample size you will need. Copyright (c) Bani Mallick 35 Sample Size Determination General fact: the larger the population standard deviation, the larger the sample size you will need to have a CI of length 2xE Copyright (c) Bani Mallick 36 Reactiver Oxygen Species (ROS) Data Rats fed with Fish oil enhanced diets Response is the change in ROS for an animal when the cells are exposed to butyrate Copyright (c) Bani Mallick 37 ROS Data 14 10 12 10 3 8 6 4 2 0 -2 N= 20 Change in Response Copyright (c) Bani Mallick 38 ROS Data Sample mean = 3.21 Sample size is n = 20 Pretend s = 3.33 Then 95% interval for population mean change is [3.21 - 0.74 * 1.96, 3.21 + 0.74 * 1.96] = [1.76, 4.66]: Does butyrate increase ROS? How certain are we? s n = 0.74 Copyright (c) Bani Mallick 39 ROS Data s = 3.33, n = 20 95% interval for population mean change is [1.76, 4.66] The length of the CI is 2xE = 2.90 What sample size would I need to make the length of the CI = 1.00? Here 2xE = 1.00, E = 0.50, and 2 2 s 3.33 n z a / 2 1.96 170 E 0.50 Copyright (c) Bani Mallick 40