Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inferential Statistics 2 Maarten Buis January 11, 2006 outline • • • • Student Recap Sampling distribution Hypotheses Type I and II errors and power • testing means • testing correlations Sampling distribution • PrdV example from last lecture. • If H0 is true, than the population consists of 16 million persons of which 41% (=6.56 million persons) supports de PrdV. • I have drawn 100,000 random samples of 2,598 persons each and compute the average support in each sample. Sampling distribution • 5% or 50,000 samples have a mean of 39% or less. • So if we reject H0 when we find a support of 39% or less than we will have a 5% chance of making an error. • Notice: We assume that the only reason we would make an error is random sampling error. sampling distribution of support for PrdV 1.0e+04 8000 6000 4000 2000 0 .36 .38 .4 .42 % support for PrdV .44 .46 More precise approach • We want to know the score below which only 5% of the samples lie. • Drawing lots of random samples is a rather rough approach, an alternative approach is to use the theoretical sampling distribution. • The proportion is a mean and the sampling distribution of a mean is the normal distribution with a mean equal to the H0 and a standard deviation (called standard error) of N More precise approach • For a standard normal distribution we know the z-score below which 5% of the samples lie (Appendix 2, table A): -1.68 • So if we compute a z-score for the observed value (.31) and it is below -1.68 we can reject the H0, and we will do so wrongly in only 5% of the cases • z x se More precise approach • is the mean of the sampling distribution, so .41 (H0) • se is N , of a proportion is p1 p (1 .41) • so the se is .412598 .0096 .41 10.4 • so the z-score is z x se .31.0096 • -10.4 is less than -1.68, so we reject the H0 Null Hypothesis • A sampling distribution requires you to imagine what the population would look like if H0 is true. • This is possible if H0 is one value (41%) • This is impossible if H0 is a range (<41%) • So H0 should always contain a equal sign (either = or ≤ or ≥) Null hypothesis • In practice the H0 is almost always 0, e.g.: – difference between two means is 0 – correlation between two variables is 0 – regression coefficient is 0 • This is so common that SPSS always assumes that this is the H0. Undirected Alternative Hypotheses • Often we have an undirected alternative hypothesis, e.g.: – the difference between two means is not zero (could be either positive or negative) – the correlation between two variables is not zero (could be either positive or negative) – the regression coefficient is not zero (could be either Directed alternative hypothesis • In the PrdV example we had a directed alternative hypothesis: Support for PrdV is less than 41%, since PrdV would have still participated if his support were more than 41%. Type I and Type II errors actual situation decision H0 is True H0 is False reject H0 Type I error probability = a do not reject H0 correct decision probability = 1-a correct decision probability = 1-b (power) Type II error probability = b Type I error rate • You choose the type I error rate (a) • It is independent of sample size, type of alternative hypothesis, or model assumptions. Type I versus type II error rate • a low probability of rejecting H0 when H0 is true (type I error), is obtained by: • rejecting the H0 less often, • Which also means a higher probability of not rejecting H0 when H0 is false (type II error), • In other words: a lower probability of finding a significant result when you should (power). How to increase your power: • Lower type I error rate • Larger sample size • Use directed instead of undirected alternative hypothesis • Use more assumptions in your model (nonparametric tests make less assumptions, but are also have less power) Testing means • What kind of hypotheses might we want to test: – Average rent of a room in Amsterdam is 300 euros – Average income of males is equal to the average income of females Z versus t • In the PrdV example we knew everything about the sampling distribution with only an hypothesis about the mean. • In the rent example we don’t: we have to estimate the standard deviation. • This adds uncertainty, which is why we use the t distribution instead of the normal • Uncertainty declines when sample size becomes larger. • In large samples (N>30) we can use the normal. t-distribution • It has a mean and standard error like the normal distribution. • It also has a degrees of freedom, which depends on the sample size • The larger the degrees of freedom the closer the t-distribution is to the normal distribution. Data: rents of rooms rent rent room 1 175 room 11 240 room 2 room 3 180 185 room 12 room 13 250 250 room 4 room 5 room 6 190 200 210 room 14 room 15 room 16 280 300 300 room 7 room 8 room 9 room 10 210 210 230 240 room 17 room 18 room 19 310 325 620 Rent example H0: =300, HA: ≠ 300 We choose a to be 5% N = 19, so df= 18 We reject H0 if we find a t less than -2.101 or more than 2.101 (appendix B, table 2) • We do not reject H0 if we find a t between -2.101 and 2.101 . • • • • Rent example x t , se se n • • We use s2 as an estimate of 2 x 258, 300, 258 300 1.85 99 19 s 99, N 19 • So • -1.85 is between -2.101 and 2.101, so we do not reject H0 t Compare means in SPSS Group Statistics incmid hous ehold income in guilders s ex s ex res pondent 1 male 2 female N Mean 1131 2833,2228 1121 2199,2640 Std. Deviation 1530,70376 1366,42170 Std. Error Mean 45,51556 40,81144 Independent Samples Test Levene's Test for Equality of Variances F incmid hous ehold income in guilders Equal variances ass umed Equal variances not as sumed 19,012 Sig. ,000 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 10,365 2250 ,000 633,95876 61,16368 514,01564 753,90189 10,370 2225,825 ,000 633,95876 61,13297 514,07516 753,84236 t x1 x2 1 2 x1 x2 0 x1 x2 sex1 x2 sex1 x2 sex1 x2 sex1 x2 s pool N1 N 2 s pool s 2pool s 2 pool N1 1s12 N 2 1s22 N1 N 2 2 N1 1s12 N 2 1s22 sex1 x2 N1 N 2 2 N1 N 2 N1 1s12 N 2 1s22 sex1 x2 t N1 N 2 2 N1 N 2 N1 1s12 N 2 1s22 1 1 N N 2 N N 1 2 2 1 x1 x2 N1 1s12 N 2 1s22 1 N1 N 2 2 1 N N 2 1 Do before Monday • Read Chapter 9 and 10 • Do the “For solving Problems”