Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Significance, Meaning and Confidence Intervals Paul Cohen ISTA 370 April, 2012 Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 1 / 18 Significance vs. Meaning Significance isn’t Importance You can usually get a significant result with a big sample; Saying a result is “statistically significant” only matters if it also is important or meaningful or interesting; p values measure significance, what measures importance or meaning? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 2 / 18 Significance vs. Meaning Importance and Effect Size Only you can decide whether a result is important or meaningful. Effect size can help. Recall that our test statistic almost always has the form: SampleStatistic − PopulationParameterUnderH0 √ SampleStandardDeviation/ N Effect size is just SampleStatistic − PopulationParameterUnderH0 SampleStandardDeviation Effect size is the “effect” expressed in standard deviation units, so that effects across experiments are comparable. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 3 / 18 Significance vs. Meaning Significance Tells You What a Parameter is Not Significance says H0 is probably false; Significance tells you that a sample comes from a population that does not have the H0 parameter value Significance tells you what the parameter probably isn’t, what tells you what it probably is? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 4 / 18 Confidence Intervals Confidence Intervals Wouldn’t it be nice to say, “I drew a sample of size N , and the statistic value for that sample is f , so I can infer that in the population the corresponding parameter, φ, is bounded by an interval −g(f ) ≤ φ ≤ g(f ) with high probability.” The expression −g(f ) ≤ φ ≤ g(f ) is a confidence interval Confidence intervals put probabilities on estimates of population parameters, given sample statistics. Sample Sta/s/c Confidence Intervals that may contain the popula/on parameter Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals 70% 80% 95% April, 2012 5 / 18 Confidence Intervals Examples of Confidence Intervals The average midterm grade in ISTA 370 was 15.54 with a standard deviation of 3.709. The 95% confidence interval around this mean grade is [14.25,16.83]. The mean difference between ISTA100 scores in 2010 and 2011 was 9.4 points. The 95% confidence interval around this difference was [0.58,19.36]. The true difference between the classes is about 19 points with 95% confidence. The slope of the line relating body mass index of Miss America to year is −0.02 – each contestant (on average) has 98% of the BMI of her predecessor. The 95% confidence interval around this slope is [-0.036,-0.015]. We can be confident that BMI is decreasing, and we have some uncertainty about the rate. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 6 / 18 Confidence Intervals Confidence Intervals and “Accepting” H0 Two samples each have N = 100 and have means 99.79 and 100.07, and standard deviations 5.55 and 4.879 and respectively. The 95% confidence interval around the difference is [-1.18,1.73]. This is small and contains zero, so with high confidence the true difference between the samples is “nearly zero.” This is as close to “accepting” H0 as we ever get. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 7 / 18 Confidence Intervals How to Get Confidence Intervals > t.test(Scores2010,Scores2011) Welch Two Sample t-test data: Scores2010 and Scores2011 t = -1.8898, df = 51.806, p-value = 0.06438 alternative hypothesis: true difference in means is not equal 95 percent confidence interval: -19.362186 0.581351 sample estimates: mean of x mean of y 71.36765 80.75806 Better answer: Understand what a CI is, then ask R or run Monte Carlo or Bootstrap Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 8 / 18 Confidence Intervals How to Get Confidence Intervals You have a statistic f and you want to infer the corresponding parameter φ: Get the sampling distribution of f The confidence interval around φ is bounded by particular quantiles of the sampling distribution. You just have to know which quantiles and how to use them. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 9 / 18 Confidence Intervals How to Get Confidence Intervals MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 Mean370<-mean(MT370) sd370<-sd(MT370) df370<-length(MT370)-1 0.4 > > > > 0.0 0.1 y 0.2 0.3 The confidence interval is the 0.025 and 0.975 quantiles (dotted lines). But why? 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 10 / 18 Confidence Intervals How to Get Confidence Intervals: Intuition 0.0 0.1 y 0.2 0.3 0.4 If the true mean were upper CI bound, then we’d see the sample mean 2.5% of the time. If the true mean were the lower CI bound, then we’d see the sample mean 2.5% of the time. If the true mean were between the upper and lower CI bounds, then we’d see the sample mean at least 5% of the time. So with 95% confidence, the CI around the sample mean “captures” the true mean. 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 11 / 18 Confidence Intervals How to Get Confidence Intervals: Math For an α/2 critical value k : P (x ≥ µ + k ) ≤ α/2 Rearrange terms: Similarly: P (µ ≤ x − k ) ≤ α/2 P (x ≤ µ − k ) = P (µ ≥ x + k ) ≤ α/2 Combining these: P (µ ≤ x − k ) or P (µ ≥ x + k ) ≤ α P (x − k ≤ µ ≤ x + k ) ≤ α So if x − k and x + k each have a p value of less than α = 0.025 then x ± k is the α confidence interval. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 12 / 18 Confidence Intervals How to Get Confidence Intervals - By hand 0.4 MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 sd370<-sd(MT370) ; N370<-length(MT370) ; Mean370<- mean(MT3 # Standard error of the sampling distribution: se370<- sd370/sqrt(N370) # Critical values of t dist with N370-1 df lc<-qt(.025,N370-1) ; uc<-qt(.975,N370-1) # Confidence interval: Mean370 + (lc * se370) 0.0 [1] 16.83855 0.1 > Mean370 + (uc * se370) y 0.3 [1] 14.24968 0.2 > > > > > > > > 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 13 / 18 Confidence Intervals How to Get Confidence Intervals - By hand For the mean, the confidence interval is read from a t distribution: x + tcrit,0.025 s.e. ≤ µ ≤ x + tcrit,0.975 s.e. So for x = 15.54 √ and tcrit,0.025 = −2.034 and tcrit,0.975 = 2.034 and s.e. = 3.709/ 34 = 0.636: 0.4 15.54 + (−2.034 × 0.636) ≤ µ ≤ 15.54 + (2.034 × 0.636) y 0.0 0.1 0.2 0.3 14.249 ≤ µ ≤ 16.838 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 14 / 18 Confidence Intervals How to Get Confidence Intervals - Ask R > t.test(MT370) One Sample t-test y 0.0 0.1 0.2 0.3 0.4 data: MT370 t = 24.4313, df = 33, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 14.24968 16.83855 sample estimates: mean of x 15.54412 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 15 / 18 Confidence Intervals How to Get Confidence Intervals - Quantiles Note that x + tcrit,0.025 s.e. ≤ µ ≤ x + tcrit,0.975 s.e. is just another way of asking for the 2.5 and 97.5 quantiles of the t distribution. If we got the sampling distribution by bootstrapping, then we’d just read off these quantiles as the confidence interval. Why in general wouldn’t we get the sampling distribution by Monte Carlo? What is a confidence interval telling you about? What do you need to get the sampling distribution by Monte Carlo? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 16 / 18 Confidence Intervals How to Get Confidence Intervals - Bootstrap The bootstrap is used frequently to estimate the standard error of the sampling distribution, in which case the confidence interval is gotten by: x + tcrit,0.025 s.e. ≤ µ ≤ x + tcrit,0.975 s.e. Alternatively, use the bootstrap sampling distribution directly and read off it’s quantiles to get the confidence interval. > BootMT370<-replicate(1000,mean(sample(MT370,replace=TRUE))) > quantile(BootMT370,.025) 2.5% 14.29338 > quantile(BootMT370,.975) Interval based on t distribution was [14.249,16.838] 97.5% 16.64743 Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 17 / 18 Confidence Intervals How to Get Confidence Intervals - Bootstrap The real advantage of the bootstrap is that you can get confidence intervals for unconventional statistics. In a sample of N stockbrokers, you don’t know how many stocks each holds, so no Monte Carlo. Each reports a proportion of their stocks “up.” Bootstrap confidence intervals around the MAXIMUM “up” of all N stockbrokers. 3000 Frequency 0 97.5% 0.75 1000 2.5% 0.7115385 5000 > N<-827 ; pStockUp<-.5 # For N brokers and pStockUp > BrokerSample<replicate(N,GetOneStockbrokerProportionUp(pStockUp)) > BootMax<replicate(10000,max(sample(BrokerSample,N,replace=T))) > quantile(BootMax,.025) ; quantile(BootMax,.975) 0.70 0.71 0.72 0.73 0.74 0.75 BootMax Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 18 / 18