Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 21: More About Tests AP Statistics Null Hypothesis Stating it can sometimes be tricky – If event is random or result of “Guessing”, then many times your null is unusual, such as 1 H : p 0.5 or 0 H0 : p 6 – Especially when testing to see if “coin is fair” or to see if “someone has ESP and can predict which closed hand contains a prize” or to see if “die is fair” Alpha Levels (Significant Levels) • If P-Value is small, it tells us that our data is rare given the null hypothesis • How rare is rare? How low must it be to reject the null hypothesis? • We arbitrarily set a threshold for P-value and if it is below that value, we reject the null hypothesis • This threshold is called the ALPHA LEVEL Alpha Levels (Significant Levels) • We denote the Alpha Level: • Also called the significant level because values under the are considered statistically significant. • When we reject the null, “the test is significant at the =.05 level.” • Always select before you look at data • Always report P-value and level in conclusion Common Alpha Levels 1-sided 2-sided 0.05 1.645 1.96 0.01 2.28 2.575 0.001 3.09 3.29 • When the alternative is one-sided, the critical value puts all of on one side: • When the alternative is two-sided, the critical value splits equally into two tails: Practical vs Statistical Significance • For larger sample sizes, small unimportant deviations from the null can be statistically significant. • For smaller sample sizes, large seemingly important deviations from the null may not be statistically significant. • Also, always do a reality check—what is the big deal about that difference? Confident Intervals • You can approximate a hypothesis by examining a confidence interval – Just ask whether the null is consistent with a CI for the parameter at the corresponding confidence interval • A 95% Confidence interval corresponds to a two-sided hypothesis test at .05 (sum of the two tails) • A 95% Confidence interval corresponds to a one-sided hypothesis test at .025 Errors • Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: • I. II. The null hypothesis is true, but we mistakenly reject it. (Type I error) The null hypothesis is false, but we fail to reject it. (Type II error) Errors Errors Type I: You are healthy, but a test says you have a disease (false positive) Type II: You are not healthy, but the test says you do not have a disease (false negative) ___________________________________ Type I: Jury convicts a innocent person Type II: Jury fails to convict a guilty person Type I Errors How often does a Type I Error Occur? • Happens when null hypothesis is true, but you have the misfortune to draw the unusual sample. • To reject null hypothesis, P-value must fall below • When the null hypothesis is true, but we reject it the Pvalue is exactly • When you set , you are setting the probability of a Type I error • Remember, you can only have a Type I error if the null hypothesis is true Type II Error What happens if Null Hypothesis is not true? • If null hypothesis is false and we reject if, we have done the correct thing. • If the null hypothesis is false and we fail to reject it, we have committed a Type II Error. • The probability that this error occurs is denoted by Errors in General Review • Probability of Type I Error = • Probability of Type II Error = Reducing Error • Neither Error is good • Difficulty is reducing one, without increasing the other. • Imagine: To reduce Type I error, I reduce my ; however, then my , or Type II error would increase. • The only way to reduce both errors is to collect more evidence—more data – Many times studies fail because sample sizes are too small to detect the change they are looking for – When designing a survey or experiment it is a good idea to calculate , for a reasonable . Power—related to reducing error It is natural to think that if we failed to reject the null hypothesis we did not look hard enough and made the wrong decision. – Is the null hypothesis really false and our test was too weak to detect the strength of the difference? • We want a test that is strong enough to make the right decision when should be rejecting the null hypothesis when it really is false • The POWER of the test tells us how strong our test is in rejecting a false null hypothesis. Power • When power is high, we can be confident that we looked hard enough. • High Power tells us that our test is strong and has a very good chance of detecting a false null hypothesis (very good chance of NOT making a Type II Error) • Power is calculated by: 1 This is the complement of making a Type II Error Power • Whenever a study fails to reject the Null Hypothesis, the Power of the test comes into play. • When we calculate Power, we imagine the null hypothesis is FALSE. • The value of Power depends on how far the truth lies from the null hypothesis—this distance is called the “effect size” effect size p0 p Power Notice from visual: • Power = 1 • Reducing to lower Type I error will move the critical * value, p to the right and have the effect of increasing the probability of a Type II error and consequentially reducing Power • Notice that the large the effect size, the smaller the chance of making a Type II Error and the greater that power of the test. Reducing Both Type I and Type II Error • This was discussed earlier—increase the sample size. The effect of a larger sample size can be seen below. An increased sample size will reduce the standard deviation, thus making the curves narrower. Reducing Both Type I and Type II Error (cont.) • Original comparison of errors: • Comparison of errors with a larger sample size: