Survey

Survey

Document related concepts

Transcript

Inferential Statistics 4 Maarten Buis 18/01/2006 Outline • interpretation of confidence interval • confidence interval and testing • Analysis of Variance Interpreting confidence intervals • If you draw a hundred samples and compute a 95% confidence interval of the mean in each of these samples than the population mean will be inside the interval in 95 samples • If you draw one sample and compute the confidence interval, than the population mean is either within that interval or it is not. • So you are not 95% sure that the population mean is in that interval. Confidence vs. Probability • The procedure will give the correct conclusion in 95% of the times it is used. • You have no way of knowing if you are one of the 95% ‘lucky ones’ or the 5% ‘unlucky ones’ when you have drawn one sample and computed a confidence interval. • All you can say is that you have used a high quality method to construct the interval. 50 95% confidence intervals 50 40 30 20 10 0 -.3 -.2 -.1 0 .1 value 50 samples of 250 observations each from a normal distribution with mean 0 and SD 1 .2 confidence interval and the sampling distribution • If we have an estimate of the sampling distribution, than the 2.5th and the 97.5th percentiles will form the 95% confidence interval. • These percentiles are the critical values and can be looked up in the appropriate table. • In 5% of the samples the true parameter will be outside that interval • Notice that the true parameter remains fixed and the estimates of the lower and upper bound change between samples. Best estimate of the sampling distribution of a mean • Our best estimate of the mean in the population is the mean in the sample • So, our best estimate of the mean of the sampling distribution is the mean of the sample • Our best estimate of the standard error is the standard deviation divided by the square root of N • So our best estimate of the sampling distribution of the mean is a t-distribution with mean equal to the sample mean, a standard deviation of the standard error, and N-1 degrees of freedom confidence interval for mean rent lb x se t ub x se t • N=19, so df =18 • look up the two sided critical t-value in Appendix B, table 2: 2.101 • mean is 258, s = 99, so se = 99 19 22.7 • lb = 258 - 22.7*2.101 = 210 • ub = 258 + 22.7*2.101 = 306 Comparing means of more than two groups • Until now we have compared the means of two groups, and not – compared means of more than two groups or, – compared means for a continuous x-variable (regression) • In these cases we use analysis of variance (ANOVA) and the F-test The Null Hypothesis • The null hypothesis is that the means of all groups are equal: m1 m2 m3 ... mk • We observe the means of group 1 till k: M1, M2, M3, ..., Mk, and these differ due to sampling error • Are these deviations large enough to reject H0 Decomposition of Sum of Squares • • • • McCall p. 358 Yi, Mk, M (Yi-M) = (Yi-Mk) + (Mk-M) Deviation of a score from the overall mean consists of a deviation of the score to the group mean plus a deviation of the group mean to the overall mean. • Square and sum: SStotal=SSwithin + SSbetween Mean Sum of Squares • Estimates of the Mean Sum of Squares (variance) are obtained by dividing the Sum of Squares by the number of degrees of freedom: – MStotal = SStotal/(N-1) – MSwithin = SSwithin/(N-k) – MSbetween = SSbetween/(k-1) • N is the sample size and k is the number of groups old friends • MStotal = variance • MSwithin = (standard error of the estimate)2 • MSbetween/MStotal = R2 or proportion of variance explained, so: • MSbetween = variance explained F-test • The F statistic is just an estimate like the mean, or the correlation, so it has a sampling distribution: the F-distribution, appendix 2, table E. • The F-distribution has two types of degrees of freedom: – for the numerator, MSbetween; k-1) and – for the denominator, MSwithin; n-k F-test • If H0 is true (all group means are equal) than MSwithin = MSbetween • Otherwise MSbetween > MSwithin • F = MSbetween / MSwithin • So H0 can be rewritten as: F = 1 • And HA: F > 1 • This is not a directional hypothesis since F>1 implies: m1 m2 m3 ... mk To do before Monday • read chapter 14, pay special attention to pp. 356-360 • Skip: – pp. 367-375 computational procedure – pp. 375-385 • Use SPSS when making sums with example data