Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Taylor's law wikipedia, lookup

Regression toward the mean wikipedia, lookup

German tank problem wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
```Inferential Statistics 4
Maarten Buis
18/01/2006
Outline
• interpretation of confidence interval
• confidence interval and testing
• Analysis of Variance
Interpreting confidence intervals
• If you draw a hundred samples and compute
a 95% confidence interval of the mean in
each of these samples than the population
mean will be inside the interval in 95
samples
• If you draw one sample and compute the
confidence interval, than the population
mean is either within that interval or it is
not.
• So you are not 95% sure that the population
mean is in that interval.
Confidence vs. Probability
• The procedure will give the correct
conclusion in 95% of the times it is used.
• You have no way of knowing if you are one
of the 95% ‘lucky ones’ or the 5% ‘unlucky
ones’ when you have drawn one sample and
computed a confidence interval.
• All you can say is that you have used a high
quality method to construct the interval.
50 95% confidence intervals
50
40
30
20
10
0
-.3
-.2
-.1
0
.1
value
50 samples of 250 observations each from a normal distribution with mean 0 and SD 1
.2
confidence interval and the
sampling distribution
• If we have an estimate of the sampling
distribution, than the 2.5th and the 97.5th
percentiles will form the 95% confidence interval.
• These percentiles are the critical values and can be
looked up in the appropriate table.
• In 5% of the samples the true parameter will be
outside that interval
• Notice that the true parameter remains fixed and
the estimates of the lower and upper bound change
between samples.
Best estimate of the sampling
distribution of a mean
• Our best estimate of the mean in the population is
the mean in the sample
• So, our best estimate of the mean of the sampling
distribution is the mean of the sample
• Our best estimate of the standard error is the
standard deviation divided by the square root of N
• So our best estimate of the sampling distribution
of the mean is a t-distribution with mean equal to
the sample mean, a standard deviation of the
standard error, and N-1 degrees of freedom
confidence interval for mean rent
lb  x  se  t
ub  x  se  t
• N=19, so df =18
• look up the two sided critical t-value in Appendix
B, table 2: 2.101
• mean is 258, s = 99, so se = 99 19  22.7
• lb = 258 - 22.7*2.101 = 210
• ub = 258 + 22.7*2.101 = 306
Comparing means of more than
two groups
• Until now we have compared the means of
two groups, and not
– compared means of more than two groups or,
– compared means for a continuous x-variable
(regression)
• In these cases we use analysis of variance
(ANOVA) and the F-test
The Null Hypothesis
• The null hypothesis is that the means of all
groups are equal: m1  m2  m3  ...  mk
• We observe the means of group 1 till k: M1,
M2, M3, ..., Mk, and these differ due to
sampling error
• Are these deviations large enough to reject
H0
Decomposition of Sum of
Squares
•
•
•
•
McCall p. 358
Yi, Mk, M
(Yi-M) = (Yi-Mk) + (Mk-M)
Deviation of a score from the overall mean
consists of a deviation of the score to the
group mean plus a deviation of the group
mean to the overall mean.
• Square and sum: SStotal=SSwithin + SSbetween
Mean Sum of Squares
• Estimates of the Mean Sum of Squares
(variance) are obtained by dividing the Sum
of Squares by the number of degrees of
freedom:
– MStotal = SStotal/(N-1)
– MSwithin = SSwithin/(N-k)
– MSbetween = SSbetween/(k-1)
• N is the sample size and k is the number of
groups
old friends
• MStotal = variance
• MSwithin = (standard error of the estimate)2
• MSbetween/MStotal = R2 or proportion of
variance explained, so:
• MSbetween = variance explained
F-test
• The F statistic is just an estimate like the
mean, or the correlation, so it has a
sampling distribution: the F-distribution,
appendix 2, table E.
• The F-distribution has two types of degrees
of freedom:
– for the numerator, MSbetween; k-1) and
– for the denominator, MSwithin; n-k
F-test
• If H0 is true (all group means are equal)
than MSwithin = MSbetween
• Otherwise MSbetween > MSwithin
• F = MSbetween / MSwithin
• So H0 can be rewritten as: F = 1
• And HA: F > 1
• This is not a directional hypothesis since
F>1 implies: m1  m2  m3  ...  mk
To do before Monday
• read chapter 14, pay special attention to pp.
356-360
• Skip:
– pp. 367-375 computational procedure
– pp. 375-385
• Use SPSS when making sums with example
data
```