Download pptx

Three Views of Hypothesis Testing What are we doing? • In science, we run lots of experiments • In some cases, there's an "effect”; in others there's not – Some manipulations have an impact; some don't – Some drugs work; some don't • Goal: Tell these situations apart, using the sample data – If there is an effect, reject the null hypothesis – If there’s no effect, retain the null hypothesis • Challenge – Sample is imperfect reflection of population, so we will make mistakes – How to minimize those mistakes? Analogy: Prosecution • Think of cases where there is an effect as "guilty” – No effect: "innocent" experiments • Scientists are prosecutors – We want to prove guilt – Hope we can reject null hypothesis • Can't be overzealous – Minimize convicting innocent people – Can only reject null if we have enough evidence • How much evidence to require to convict someone? • Tradeoff – Low standard: Too many innocent people get convicted (Type I Error) – High standard: Too many guilty people get off (Type II Error) Binomial Test • Example: Halloween candy – Sack holds boxes of raisins and boxes of jellybeans – Each kid blindly grabs 10 boxes – Some kids cheat by peeking • Want to make cheaters forfeit candy – How many jellybeans before we call kid a cheater? Binomial Test • How many jellybeans before we call kid a cheater? • Work out probabilities for honest kids – Binomial distribution – Don’t know distribution of cheaters • Make a rule so only 5% of honest kids lose their candy • For each individual above cutoff Honest Cheaters – Don’t know for sure they cheated – Only know that if they were honest, chance would have been less than 5% that they’d have so many jellybeans 0.20 ? 0.00 0.10 • Therefore, our rule catches as many cheaters as possible while limiting Type I errors (false convictions) to 5% 0 1 2 3 4 5 6 7 8 9 10 Jellybeans Testing the Mean • Example: IQs at various universities – Some schools have average IQs (mean = 100) – Some schools are above average – Want to decide based on n students at each school • Need a cutoff for the sample mean – Find distribution of sample means for Average schools – Set cutoff so that only 5% of Average schools will be mistaken as Smart Smart 0.4 0.8 p(M) s ? n 0.0 exp(-z^2) Average -4 -2 0 100 z 2 4 Unknown Variance: t-test • Want to know whether population mean equals m0 – H0: m = m0 H1: m ≠ m0 • Don’t know population variance – Don’t know distribution of M under H0 – Don’t know Type I error rate for any cutoff • Use sample variance to estimate standard error of M -4 4e-04 Distribution of t when H0 is True 2e-04 Critical value 0e+00 pt(z[2:8001], 10) - pt(z[1:8000], 10) 0.4 p(M) p(t) 0.8 0.0 exp(-z^2) – Divide M – m0 by standard error to get t – We know distribution of t exactly -4 0 -2 -2 2000 m00 0 4000 0 z[1:8000] Index z 2 6000 a 2 4 8000 Critical Region 4 Two-tailed Tests • Often we want to catch effects on either side • Split Type I Errors into two critical regions 3e-04 H1 ? H0 Critical value 0e+00 a/2 pt(z[2:8001], 10) - pt(z[1:8000], 10) – Each must have probability a/2 -4 -2 Test 0 statistic z[1:8000] Critical value 2 H1 ? 4 a/2 An Alternative View: p-values • p-value – Probability of a value equal to or more extreme than what you actually got – Measure of how consistent data are with H0 • p>a – t is within tcrit – Retain null hypothesis • p<a 0.002 0.001 tcrit for a = .05 p = .03 0.000 p 0.003 0.004 – t is beyond tcrit – Reject null hypothesis; accept alternative hypothesis -4 -2 0 t t2 tt= 2.57 4 Confidence interval 4e-04 pt(z[2:8001], 10) - pt(z[1:8000], 10) Three Views of Inferential Statistics 0e+00 • p-value & a – Type I error rate • Can answer hypothesis test at any level – Result predicted by H0 vs. confidence interval – Test statistic vs. critical value – p vs. m0 M m0 -2 0 2 z[1:8000] -4 Test 0 statistic -2 2 4 z[1:8000] 0 a m0 Critical value Critical value 3e-04 – Measure of consistency with H0 m0 -4 0e+00 • Test statistic & critical value pt(z[2:8001], 10) - pt(z[1:8000], 10) – Values of m0 that don’t lead to rejecting H0 2e-04 • Effect size & confidence interval p 1 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download pptx