Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Intervals May 2, 2011 What is a confidence interval? I Being 95% certain that the average height of a male student at SUNY Cortland in inches is 71 ± 3 µ ∈ (68, 74) with confidence level .95 What is a confidence interval? I Being 95% certain that the average height of a male student at SUNY Cortland in inches is 71 ± 3 µ ∈ (68, 74) with confidence level .95 I This is the central interval of X values that has probability .95. What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean What is a confidence interval? Let X be the random variable for the sample mean with a sample size of n. Then, if x is a sample mean, we can say 1 − α = P(x − EBM < X < x + EBM) = 1 − α where I α is the error probability I 1 − α is the confidence level CL I EBM is the error bound for the mean It is easy to sketch a general graph of this situation. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. I So we can make the approximate probability statement X − µ P(x − EBM < X < x + EBM) ≈ P Z < σX √ where σX = σ/ n. Constructing a Confidence Interval I The tolerated error α, and thus the confidence level CL = 1 − α is determined before obtaining a sample, but how do we find the error bound for the mean, the EMB. I The CLT tells us that the standardized scores should approximately follow the standard normal distribution. I So we can make the approximate probability statement X − µ P(x − EBM < X < x + EBM) ≈ P Z < σX √ where σX = σ/ n. I But we don’t know µ or σ, so a better approximate distribution for our standardized scores is what called the Student-t distribution. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. I We do not need to assume that we know σ, but we do assume that the original population is approximately normal. The Student-t Distribution I The Student-t distribution looks similar to the normal distribution, but has a bit more probability in the tails. I The random variable for the distribution is T , and a value of the random variable is called a t-score t= x −µ √ s/ n which is similar to a z-score, and measures the number of standard deviations (from the sampling distribution) a sample mean x is from the theoretical mean µ. I We do not need to assume that we know σ, but we do assume that the original population is approximately normal. I There is one parameter for the Student-t distribution called the degrees of freedom or df = n − 1, where n is the sample size. Degrees of Freedom I There is a different student-t distribution for each possible value of df . The notation for the student-t distribution is T ∼ tdf . Degrees of Freedom I There is a different student-t distribution for each possible value of df . The notation for the student-t distribution is T ∼ tdf . I As the degrees of freedom diverges to ∞ the Student-t distribution approaches the standard normal distribution. Loosely speaking with df = n − 1, we have lim tdf = N(0, 1). n→∞ A Bit of History About the Student-t In the golden-olden days of yore, we used tables to calculate probabilities with the student-t and we needed a different table for each value of df . Due to the proliferation of calculators and computers we can use any student-t we want. A Bit of History About the Student-t In the golden-olden days of yore, we used tables to calculate probabilities with the student-t and we needed a different table for each value of df . Due to the proliferation of calculators and computers we can use any student-t we want. Still, the table perspective helps us to understand the process. We can sketch a generic t distribution and the corresponding probabilities that will allow us to find the EBM. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. 3. Calculate the error bound s EBM = tα/2 √ n Confidence Intervals With a Table of Student-t Values Suppose we are given some data and we want to construct a confidence interval for x. 1. Calculate x and s. 2. Look up tα/2 in a table with the appropriate df ; where tα/2 is the t value with probability α/2 in the tail of tdf to its right. 3. Calculate the error bound s EBM = tα/2 √ n 4. The confidence interval is (x − EBM, x + EBM). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). I Larger sample size n means narrower interval for the same CL (confidence level). Qualitative Analysis Consider the equation s EBM = tα/2 √ . n From this it follows that I Larger s means wider interval for the same CL (confidence level). I Larger sample size n means narrower interval for the same CL (confidence level). I Higher confidence means smaller α, which means a wider interval. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Define the Random Variables X and X , in words. Solution: X = the length of a randomly chosen conference. X = the average length of stay for a random sample of 84 conferences. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Define the Random Variables X and X , in words. Solution: X = the length of a randomly chosen conference. X = the average length of stay for a random sample of 84 conferences. I Which distribution should you use to study the length of a randomly chosen conference? Explain your choice. Solution: We are told to assume that the underlying distribution, i.e., the distribution of X is normal. Our best estimates for the mean and standard deviation of X are x = 3.94 and s = 1.28, so we assume X should be approximately distributed as N(3.94, 1.28). Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Which distribution should you use to study the average length of another sample of randomly chosen conferences? Explain your choice. Solution: By the CLT, the distribution of the sample average X is approximately normal with mean µX equal to the population mean µX and standard deviation σX equal to √ σX / n. In our case this means that X is approximately distributed as N(3.94, .139). However, since µX and σX are unknown, we are better off assuming that the t-scores follow the t83 distribution. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I State the confidence interval. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I I State the confidence interval. Sketch the graph. Calculating a Confidence Interval Given Statistics From a stack of IEEE Spectrum magazines, announcements for 84 upcoming engineering conferences were randomly picked. The average length of the conferences was 3.94 days, with a standard deviation of 1.28 days. Assume the underlying population is normal. I Construct a 95% confidence interval for the population average length of engineering conferences. Solution: We will use a TI-83 or TI-84 to construct a T interval and complete the following. I I I State the confidence interval. Sketch the graph. Calculate the error bound. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. I Enter the C-level. TI-83+ or TI-84 Confidence Interval: From Statistics: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Stats. Press ENTER. I Arrow down and enter the sample mean. Press ENTER. I Arrow down and enter the sample standard deviation. Press ENTER. I Arrow down and enter the sample size. Press ENTER. I Enter the C-level. I Arrow down to Calculate and press ENTER. Using the Student-t Distribution In this example I n = 84 is the sample size. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. I (3.66, 4.22) = (3.94 − .28, 3.94 + .28) = (3.94 − EBM, 3.94 + EBM) is the Confidence Interval. Using the Student-t Distribution In this example I n = 84 is the sample size. I x = 3.94 is the sample mean. I s = 1.28 is the sample standard deviation. I α = .05 is the probability that that the confidence interval does not contain the true mean (proportion). OR α is the probability that the random variable X will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. I (3.66, 4.22) = (3.94 − .28, 3.94 + .28) = (3.94 − EBM, 3.94 + EBM) is the Confidence Interval. √ EBM = tα/2 · s/ n = .28 is the Error Bound for the Mean. I Calculating a Confidence Interval Given Data e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at random from some population. Find a 90% confidence interval for the population mean µ. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. I Enter the C-level. TI-83+ or TI-84 Confidence Interval: From Data: I Use the function 8:TInterval in STAT TESTS. I Once you are in TESTS, press 8:TInterval and arrow to Data. Press ENTER. I Arrow down and enter the list name where you put the data for List. I Enter 1 for Freq. if the data is in a single list or enter the list name of the frequencies associated to the data values in the first list. I Enter the C-level. I Arrow down to Calculate and press ENTER. Calculating a Confidence Interval Given Data e.g. Suppose the data 3.1; 3.3; 3.2; 3.4; 3.6; 3.3 were drawn at random from some population. Find a 90% confidence interval for the population mean µ. Solution: The 90% confidence interval for the population mean µ is (3.175, 3.458), x = 3.317, and EBM = .142. Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n e.g. If n = 1000 and p 0 = .52, then r .52(1 − .52) EBP = 1.96 = .03 1000 Modifications for the Sample Proportion Let P 0 be the random variable that measures a sample proportion p 0 . Then the Error Bound for the Proportion is r p 0 (1 − p 0 ) EBP = zα/2 n e.g. If n = 1000 and p 0 = .52, then r .52(1 − .52) EBP = 1.96 = .03 1000 The “variance” p 0 (1 − p 0 ) comes from the variance of the binomial distribution. The proportion random variable is the random variable that counts the average proportion of successes. Margin of Error In Pictures Figure: Source: http://en.wikipedia.org/wiki/Margin_of_error An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. This means P(49 < P 0 < 55) = .95 An Example Confidence Interval Let P 0 be the random variable that measures the percent of a sample that says they will vote for Barack Obama. The Gallup poll from October 27 – November 2 found p 0 = .52. To the best of my knowledge, this sample mean was based on a sample of size n = 1000 and the margin of error is ± 3% at 95% confidence. This means P(49 < P 0 < 55) = .95 The nation voted and Obama earned 52% of the vote. Should we be surprised? Learning the Terms by Example In the last example I n = 1000 is the sample size. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. I α = .05 is the probability that that the confidence interval does not contain the true proportion. OR α is the probability that the random variable will take on a value outside the interval. Learning the Terms by Example In the last example I n = 1000 is the sample size. I p 0 = .52 is the sample proportion. I EBP = 3 is the Error Bound for the Proportion. I (52 − 3, 52 + 3) = (49, 55) is the Confidence Interval. I α = .05 is the probability that that the confidence interval does not contain the true proportion. OR α is the probability that the random variable will take on a value outside the interval. I CL = 1 − .05 = 95 is the Confidence Level, i.e., the probability that the true value lies inside the interval. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. TI-83+ and TI-84 Proportion Confidence Interval I Press STAT and arrow over to TESTS. I Arrow down to A:PropZint. Press ENTER. I Arrow down and enter the number of successes for x. Press ENTER. I Arrow down and enter the sample size for n. Press ENTER. I Arrow down and enter the C-Level. Press ENTER. I Arrow down to Calculate. Press ENTER. From the previous example we get the interval (.489, .551) if we assume p 0 = .52 means x = 520.