* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Confidence Intervals
Survey
Document related concepts
Transcript
Confidence Intervals May 2, 2011 What is a confidence interval? I Being 95% certain that the average height of a male student at SUNY Cortland in inches is 71 ± 3 µ ∈ (68, 74) with confidence level .95 What is a confidence interval? I Being 95% certain that the average height of a male student at SUNY Cortland in inches is 71 ± 3 µ ∈ (68, 74) with confidence level .95 I This is the central interval of X values that has probability .95. What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean It is easy to sketch a general graph of this situation. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. I So we can make the approximate probability statement X − µ P(x − EBM < X < x + EBM) ≈ P Z < σX √ where σX = σ/ n. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. I So we can make the approximate probability statement X − µ P(x − EBM < X < x + EBM) ≈ P Z < σX √ where σX = σ/ n. I But we don’t know µ or σ, so a better approximate distribution for our standardized scores is what called the Student-t distribution. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. I We do not need to assume that we know σ, but we do assume that the original population is approximately normal. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. I We do not need to assume that we know σ, but we do assume that the original population is approximately normal. I There is one parameter for the Student-t distribution called the degrees of freedom or df = n − 1, where n is the sample size. Degrees of Freedom I There is a different student-t distribution for each possible value of df . The notation for the student-t distribution is T ∼ tdf . Degrees of Freedom I There is a different student-t distribution for each possible value of df . The notation for the student-t distribution is T ∼ tdf . I As the degrees of freedom diverges to ∞ the Student-t distribution approaches the standard normal distribution. Loosely speaking with df = n − 1, we have lim tdf = N(0, 1). n→∞ A Bit of History About the Student-t In the golden-olden days of yore, we used tables to calculate probabilities with the student-t and we needed a different table for each value of df . Due to the proliferation of calculators and computers we can use any student-t we want. A Bit of History About the Student-t In the golden-olden days of yore, we used tables to calculate probabilities with the student-t and we needed a different table for each value of df . Due to the proliferation of calculators and computers we can use any student-t we want. Still, the table perspective helps us to understand the process. We can sketch a generic t distribution and the corresponding probabilities that will allow us to find the EBM. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. 3. Calculate the error bound s EBM = tα/2 √ n Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. 3. Calculate the error bound s EBM = tα/2 √ n 4. The confidence interval is (x − EBM, x + EBM). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). I Larger sample size n means narrower interval for the same CL (confidence level). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). I Larger sample size n means narrower interval for the same CL (confidence level). I Higher confidence means smaller α, which means a wider interval. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Define the Random Variables X and X , in words. Solution: X = the length of a randomly chosen conference. X = the average length of stay for a random sample of 84 conferences. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Define the Random Variables X and X , in words. Solution: X = the length of a randomly chosen conference. X = the average length of stay for a random sample of 84 conferences. I Which distribution should you use to study the length of a randomly chosen conference? Explain your choice. Solution: We are told to assume that the underlying distribution, i.e., the distribution of X is normal. Our best estimates for the mean and standard deviation of X are x = 3.94 and s = 1.28, so we assume X should be approximately distributed as N(3.94, 1.28). Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Which distribution should you use to study the average length of another sample of randomly chosen conferences? Explain your choice. Solution: By the CLT, the distribution of the sample average X is approximately normal with mean µX equal to the population mean µX and standard deviation σX equal to √ σX / n. In our case this means that X is approximately distributed as N(3.94, .139). However, since µX and σX are unknown, we are better off assuming that the t-scores follow the t83 distribution. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I State the confidence interval. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I I State the confidence interval. Sketch the graph. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I I I State the confidence interval. Sketch the graph. Calculate the error bound. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. I Enter the C-level. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. I Enter the C-level. I Arrow down to Calculate and press ENTER. Using the Student-t Distribution In this example I n = 84 is the sample size. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. I (3.66, 4.22) = (3.94 − .28, 3.94 + .28) = (3.94 − EBM, 3.94 + EBM) is the Confidence Interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. I (3.66, 4.22) = (3.94 − .28, 3.94 + .28) = (3.94 − EBM, 3.94 + EBM) is the Confidence Interval. √ EBM = tα/2 · s/ n = .28 is the Error Bound for the Mean. I Calculating a Confidence Interval Given Data e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at random from some population. Find a 90% confidence interval for the population mean µ. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. I Enter the C-level. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. I Enter the C-level. I Arrow down to Calculate and press ENTER. Calculating a Confidence Interval Given Data e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at random from some population. Find a 90% confidence interval for the population mean µ. Solution: The 90% confidence interval for the population mean µ is (3.175, 3.458), x = 3.317, and EBM = .142. Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n e.g. If n = 1000 and p 0 = .52, then r .52(1 − .52) EBP = 1.96 = .03 1000 Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n e.g. If n = 1000 and p 0 = .52, then r .52(1 − .52) EBP = 1.96 = .03 1000 The “variance” p 0 (1 − p 0 ) comes from the variance of the binomial distribution. The proportion random variable is the random variable that counts the average proportion of successes. Margin of Error In Pictures Figure: Source: http://en.wikipedia.org/wiki/Margin_of_error An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. This means P(49 < P 0 < 55) = .95 An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. This means P(49 < P 0 < 55) = .95 The nation voted and Obama earned 52% of the vote. Should we be surprised? Learning the Terms by Example In the last example I n = 1000 is the sample size. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. I α = .05 is the probability that that the confidence interval does not contain the true proportion. OR α is the probability that the random variable will take on a value outside the interval. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. I α = .05 is the probability that that the confidence interval does not contain the true proportion. OR α is the probability that the random variable will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. From the previous example we get the interval (.489, .551) if we assume p 0 = .52 means x = 520.