Download Inferential Statistics 3

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Inferential Statistics 4
Maarten Buis
• interpretation of confidence interval
• confidence interval and testing
• Analysis of Variance
Interpreting confidence intervals
• If you draw a hundred samples and compute
a 95% confidence interval of the mean in
each of these samples than the population
mean will be inside the interval in 95
• If you draw one sample and compute the
confidence interval, than the population
mean is either within that interval or it is
• So you are not 95% sure that the population
mean is in that interval.
Confidence vs. Probability
• The procedure will give the correct
conclusion in 95% of the times it is used.
• You have no way of knowing if you are one
of the 95% ‘lucky ones’ or the 5% ‘unlucky
ones’ when you have drawn one sample and
computed a confidence interval.
• All you can say is that you have used a high
quality method to construct the interval.
50 95% confidence intervals
50 samples of 250 observations each from a normal distribution with mean 0 and SD 1
confidence interval and the
sampling distribution
• If we have an estimate of the sampling
distribution, than the 2.5th and the 97.5th
percentiles will form the 95% confidence interval.
• These percentiles are the critical values and can be
looked up in the appropriate table.
• In 5% of the samples the true parameter will be
outside that interval
• Notice that the true parameter remains fixed and
the estimates of the lower and upper bound change
between samples.
Best estimate of the sampling
distribution of a mean
• Our best estimate of the mean in the population is
the mean in the sample
• So, our best estimate of the mean of the sampling
distribution is the mean of the sample
• Our best estimate of the standard error is the
standard deviation divided by the square root of N
• So our best estimate of the sampling distribution
of the mean is a t-distribution with mean equal to
the sample mean, a standard deviation of the
standard error, and N-1 degrees of freedom
confidence interval for mean rent
lb  x  se  t
ub  x  se  t
• N=19, so df =18
• look up the two sided critical t-value in Appendix
B, table 2: 2.101
• mean is 258, s = 99, so se = 99 19  22.7
• lb = 258 - 22.7*2.101 = 210
• ub = 258 + 22.7*2.101 = 306
Comparing means of more than
two groups
• Until now we have compared the means of
two groups, and not
– compared means of more than two groups or,
– compared means for a continuous x-variable
• In these cases we use analysis of variance
(ANOVA) and the F-test
The Null Hypothesis
• The null hypothesis is that the means of all
groups are equal: m1  m2  m3  ...  mk
• We observe the means of group 1 till k: M1,
M2, M3, ..., Mk, and these differ due to
sampling error
• Are these deviations large enough to reject
Decomposition of Sum of
McCall p. 358
Yi, Mk, M
(Yi-M) = (Yi-Mk) + (Mk-M)
Deviation of a score from the overall mean
consists of a deviation of the score to the
group mean plus a deviation of the group
mean to the overall mean.
• Square and sum: SStotal=SSwithin + SSbetween
Mean Sum of Squares
• Estimates of the Mean Sum of Squares
(variance) are obtained by dividing the Sum
of Squares by the number of degrees of
– MStotal = SStotal/(N-1)
– MSwithin = SSwithin/(N-k)
– MSbetween = SSbetween/(k-1)
• N is the sample size and k is the number of
old friends
• MStotal = variance
• MSwithin = (standard error of the estimate)2
• MSbetween/MStotal = R2 or proportion of
variance explained, so:
• MSbetween = variance explained
• The F statistic is just an estimate like the
mean, or the correlation, so it has a
sampling distribution: the F-distribution,
appendix 2, table E.
• The F-distribution has two types of degrees
of freedom:
– for the numerator, MSbetween; k-1) and
– for the denominator, MSwithin; n-k
• If H0 is true (all group means are equal)
than MSwithin = MSbetween
• Otherwise MSbetween > MSwithin
• F = MSbetween / MSwithin
• So H0 can be rewritten as: F = 1
• And HA: F > 1
• This is not a directional hypothesis since
F>1 implies: m1  m2  m3  ...  mk
To do before Monday
• read chapter 14, pay special attention to pp.
• Skip:
– pp. 367-375 computational procedure
– pp. 375-385
• Use SPSS when making sums with example