Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Intervals Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We’ve already seen one mode of statistical inference: hypothesis testing. Probability vs. Statistics Probability: I have a fair coin. If I toss it 100 times, how many heads will I get? 0 1 100 draws with replacement ??? Statistics: I have a coin. I do not know if it is fair or not. I toss it 100 times and get 60 heads. Is the coin fair or not? ? ... ? 100 draws with replacement 0 , 1 , 1 , ..., 0 | {z } 60 1 s Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See which box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. NOT COMPATIBLE Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. NOT COMPATIBLE? Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. COMPATIBLE Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. COMPATIBLE Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. COMPATIBLE Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. NOT COMPATIBLE? Confidence Intervals Another mode of statistical inference is interval estimation. Idea: See what box models are compatible with the given data. Example: Let’s vary the proportion p of 1 s in the box, and see which boxes are compatible with the observed data: 60 heads in 100 tosses. NOT COMPATIBLE Confidence Interval Interpretation #1 For any p (the proportion of 1 s in the box) between .497 and .697, the P -value for the observed data (60 heads) is above 2.5%. In other words, all values of p in the interval (.497, .697) are “compatible” with observing 60 heads. We call this a 95% confidence interval for p. In this class, we will only deal with 95% confidence intervals. But you can obtain intervals will other confidence levels by adjusting the minimum P -value you are willing to tolerate. Confidence Intervals by Theory Confidence intervals are easier to understand by theory than by simulation. Remember, if the mean of the box was µ, we compared Z = to a Normal(0, 1) distribution. (Or T = X−µ √ S/ n X−µ √ σ/ n to tn−1 .) If Z is too large or too small, then the P -value will be small. We need to find the cutoffs that make the P -value is exactly 2.5%. Confidence Intervals by Theory The cutoff is close to 2. (It’s actually a bit lower for Z and usually a bit higher for T .) So if we observe X, we need to find all values µ such that X − µ √ < 2. σ/ n σ In other words, the interval contains all µ within X ± 2 √ . n Theory-Based Interval for p We observed 60 heads in 100 tosses. Let’s find a 95% confidence interval. The mean of the sample is X = 60 100 = .60. We don’t know σ. Approximate it by the SD of the sample, S ≈ .49. So a 95% confidence interval for p is: .49 .60 ± 2 √ = (.502, .698). 100 Compare this with the simulation-based interval we obtained earlier, (.497, .697). Confidence Interval Interpretation #2 We just saw that the interval X ± 2 √σn contains all values µ that are “compatible” with the observed data X. (If we were to test that the mean of the box is any of these values, the P -value would be at least 2.5%.) Another way to interpret this is to imagine the interval as random. If we were to collect another sample of the same size, we would obtain a new X and thus a new interval. A 95% confidence interval means that about 95 of every 100 intervals will cover the true mean µ. But of course, we typically only get to observe one interval.