Download mm lecture chapter 7

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
 How do we use confidence intervals &
significance tests to make inferences
from a random sample about a
population mean?
 How do we use confidence intervals &
significance tests to compare the means
of two populations?
 Standard error: when the standard
deviation of a statistic is estimated
from the data (i.e. from a sample), the
result is called the standard error of
the statistic.
 Standard error: the estimated
average deviation from the expected
value of the sample mean if the
sample were repeated over & over.
 Standard error is based on the tdistribution, not the standard
normal (z) distribution.
 Because it’s based on sample
data, the t-distribution is less
certain, less precise, & thus more
variable than the z-distribution.
 Hence the t-distribution is flatter, or
wider, than the z-distribution, when
N<=1000.
 But the t-distribution closely &
increasingly approximates the zdistribution once sample size reaches
N=120.
 When N>1000, then the t- and zdistributions are identical.
 Put differently, the smaller the sample
size (i.e. the fewer the degrees of
freedom*), the wider (i.e. the less
precise) the t-distribution is relative to
the z-distribution.
* Recall that ‘degrees of freedom’ are
the amount of information available to
estimate a statistic. The more df’s, the
better.
 This, then, is another reason to
have larger samples: so that the
t-distribution becomes more
precise, & thus so hypothesis
tests can be more accurate.
 The z-distribution is used when we
know the population’s standard
deviation—which, however, we virtually
never know.
 Almost always, then, we use the tdistribution, because we are
estimating a statistic from sample
data.
Confirm that there’s a different tdistribution for each n – 1
distribution:
 Check the t-distribution critical
values in Moore/McCabe/Craig
(Table D, page T-11) for each df.
 N>=120: the t-distribution closely &
increasingly approximates the zdistribution.
 N>1000: the t-distribution & zdistribution are identical.
 See Table D (page T-11).
 Standard error of the mean:
when the standard deviation of the
mean is estimated from sample data
(& thus the t-distribution is used).
 Formula for the standard error of
the mean:
se  s
n
 We’ve already been using this
formula, but we’ve generally
been using the z-distribution.
 From now on, when we refer to
the standard error of the
mean, we’ll use the tdistribution.
 A sample mean will deviate
from the population mean due to
sampling error (not to mention
non-sampling error).
 The standard error of the mean
gives the estimated size of this
deviation.
From now on, then, think
standard error & tdistribution.

 Here’s the t-confidence interval
for the mean of a quantitative
variable:
x  t * s/ n
How to use the t-distribution in
hypothesis tests
 We can use t-value
confidence intervals to make
inferences from a sample
mean about a benchmark
mean (i.e. some
hypothesized parameter from
the present or past).
The One-Sample t-Test
 The one-sample t-test uses the
t-confidence interval to compare the
mean of a random sample to some
benchmark parameter (from the
present or past).
 E.g., compare the mean SAT
score of a random sample of FIU
undergrads to some other, ‘ideal’
score (e.g., 500).
 Is the difference large enough
relative to the standard error of
the difference to be statistically
significant?
 E.g., compare the mean SAT score
of a random sample of FIU undergrads
today to that of FIU undergrads a
decade ago.
 Is the difference large enough
relative to the standard error of the
difference to be statistically
significant?
 E.g., compare the mean SAT
score of a random sample of FIU
students to the national SAT mean.
 Is the difference large enough
relative to the standard error of
the difference to be statistically
significant?
 The one-sample t-test compares
the mean of a quantitative
variable from a random sample
to some benchmark parameter.
This benchmark parameter may
be:
 some measurement ideal
 some independent, comparison
group
 a parameter from the past or
present
The one-sample t-test requires:
 a probability sample of independent
observations
 a quantitative variable
 a graphic check for pronounced
skewness & outliers
 a benchmark comparison mean
t-tests of all sorts can be used safely:
 When the probability sample N<15 if the data
distribution is close to normal (i.e. no more than minimal
skewness & no pronounced outliers, because mean & sd
are not resistant).
 When 15<N<40 & there is no pronounced skewness &
no outliers.
 When N>=40 (more or less) if there are no outliers,
even if there is pronounced skewness (although
transforming this may be safer), due to the central limit
theorem & the law of large numbers.
 What if the sample distribution is too small & non-normal,
&/or contains pronounced outliers?
 One possible option: transform the variable &/or
eliminate the outliers (in Stata see ‘help ladder’).
 Alternatively, use a non-parametric (i.e. distribution free)
statistic: the sign rank test or the sign test—though these
are much less precise & are weaker than parametric
procedures for testing hypotheses.
 Stata: see ‘help signrank’ or ‘help signtest.’ See
Moore/McCabe chap. 7 & the CD-Rom chapter on nonparametric statistics.
If the distribution is acceptable or
becomes so after you’ve intervened,
then use the one-sample t-test:

Ho: there’s no difference.
x
Ha: there is a difference.
x
 Or a one-sided alternative hypothesis.
Put differently:
Ho: difference = 0
Ha: difference ~= 0
 One-sided hypothesis: difference >
0; or difference < 0
 E.g., compare the mean SAT
score of a random sample of FIU
undergrads to some other, ‘ideal’
score (e.g., 500): is the difference
statistically significant?
 First, check that the sample
assumption is fulfilled.
 Second, do a graphic check for
pronounced skewness (if sample size
<40) & for outliers, taking action to
minimize the problems if necessary.
 Third, state the hypotheses, e.g.:
Ho: FIU mean SAT = 500
Ha: FIU mean SAT  500
 Put differently: Ho: diff=0. Ha: diff 0.
 Fourth, test the hypothesis.
 These data aren’t in memory, so the
Stata test is ttesti rather than ttest.
. testi sample-n sample-mean
sample-sd benchmark-mean
. ttesti 400
512 73
500
One-sample t test
x
Obs
Mean
400
512
Degrees of freedom:
Std. Err.
Std. Dev.
3.65
73
[95% Conf. Interval]
504.8244
519.1756
399
Ho: mean(x) = 500
Ha: mean < 500
Ha: mean ~= 500
Ha: mean > 500
t=
t=
t=
3.2877
P<t=
0.9995
3.2877
P>t=
0.0011
3.2877
P>t=
0.0005
 Conclusion: Reject the null hypothesis
(p=0.001 for a two-tailed test, df=399).
 Note: if the data are in memory, modify
the Stata command as follows:
. ttest FIU_SAT = 500
Before we move on to another variety
of t-tests:
 What’s the purpose of the onesample t-test?
 What kind of data does it require?
 How do we conduct the test?
 When does it test significant or
insignificant?
Example:
 There is evidence that 51% of a
specific graduate program’s student
admissions are women, but your
program has admitted just 43%
women.
 Should you use a one-sample t-test
to assess whether or not this
difference is statisically significant?
Caution
 One-sample t-test requires a
probability sample.
 All conclusions are uncertain.
 Sampling & non-sampling
sources of error.
 The next variety of t-test—matched
pairs—applies the one-sample t-test
to an after vs. before ‘difference’
score for comparing means for a
random sample of matched after vs.
before observations.
 E.g., the mean SAT score of a
random sample of FIU students before
they received SAT-training versus
after they received such training
 Is the difference in scores large
enough relative to the standard error
of the difference to be statistically
significant?
 E.g., the mean cholesterol level of a
random sample of adults before they
go on a low-fat diet versus after they
went on the diet.
 Is the difference large enough
relative to the standard error of the
difference to be statistically
significant?
 E.g., the mean earnings of a random
sample of inner-city women workers
before they received skill-training
versus after they received such
training
 Is the difference large enough
relative to the standard error of the
difference to be statistically
significant?
This is called the matched pair
(or dependent sample) t-test:

Ho:
 0
(i.e. there’s no after vs. before effect)
Ha:
 0
(i.e. there is an after vs. before effect: the
after-mean is greater than the beforemean)
 The after vs. before matched
pairs, of course, are not
independent of each other.
 Is the after vs. before difference
in sample means large enough
relative to the standard error of
the difference to test statistically
significant?
What kind of data does the matched
pairs (or dependent sample) t-test
require?
 a random sample involving the same,
matched observations (i.e. individuals or
subjects) before & after the treatment
 a quantitative variable
 recall the previous discussion of sample
size.
 a graphic check for pronounced skewness &
outliers
 And something else: the sd of
the ‘before’ group can’t be more
than two times larger/smaller
than that of the ‘after’ group.
 If it is, then use an adjusted
version of the t-test: e.g., Stata’s
‘unequal’ option)
 What if the sample distribution is too
skewed &/or contains pronounced outliers?
 Consider transforming the data &/or
eliminating the outliers.
 Alternatively, use a non-parametric test
(i.e. distribution free) test: the sign test or
the sign rank test.
 Here’s an after vs. before example
concerning a hypothetical test to
improve standardized reading scores.
 List the ‘after’ data first.
 For ‘after’ & ‘before’ data, list: sample
size sample mean sd
. ttesti 22 520 46.1 23 501 44.7
Two-sample t test with equal variances
----------------------------------------------------------------------------------------|
Obs
Mean
Std. Err.
Std. Dev. [95% Conf. Interval]
---------+-----------------------------------------------------------------------------x|
22
520
9.828553
46.1
499.5604
540.4396
y|
23
501
9.320594
44.7
481.6703
520.3297
---------+-----------------------------------------------------------------------------combined | 45
510.2889
6.840412
45.88688 496.5029
524.0748
---------+------------------------------------------------------------------------------diff |
19
13.53576
-8.297466
46.29747
------------------------------------------------------------------------------------------Degrees of freedom: 43
Ho: mean(x) - mean(y) = diff = 0
Ha: diff < 0
Ha: diff != 0
Ha: diff > 0
t = 1.4037
t = 1.4037
t = 1.4037
P < t = 0.9162
P > |t| = 0.1676
P > t = 0.0838
 Note: if the data are in memory modify
the Stata command as follows:
. ttest after_score = before_score
Example:
 You gain access to an entire class of
third graders for your curriculum
experiment. You submit them to a new
curriculum to promote scientific thinking.
 You give them an after vs. before ttest
& assess the magnitude of the effect &
the test of significance.
 Correct, or not?
Another kind: t-test for
comparing the means of
two groups:
 What if we want to compare
the means of a quantitative
variable for two groups within
the same random sample?
 E.g., we want to know if there’s a
statistically significant difference between
female & male mean SAT scores among
college students.
 Restricting the population to FIU, we
could randomly sample female & male
SAT scores at FIU: is the difference
between the two groups statistically
significant?
 This variety of t-test, then, compares
the mean value on a quantitative
variable between two groups (e.g.,
female vs. male) within the same
random sample.
 Is the difference between the two
groups relative to the standard error
of the difference statistically
significant?
The t-test for comparing the
means of two groups requires:
a random sample of independent
observations

 a quantitative response variable
 a binary categorical explanatory
variable (e.g., females vs. males)
a graphic check for pronounced
skewness & outliers, & a comparison of
the distributions (boxplot)

 the standard deviation of neither
group can be more than twice that of
the other group (or else an adjusted
version of the t-test must be used: e.g.,
Stata’s ‘unequal’ option)
 What if the sample distribution is too
skewed &/or contains pronounced outliers?
 Consider transforming the data &/or
eliminating the outliers.
 Alternatively, use a non-parametric test
such as the median test or rank sum test.
 Non-parametric tests: premised on
ranking of sampled observations.
 Parametric tests: premised on Central
Limit Theorem—approximately normal
sampling distribution of sample
means.
 The t-test for comparing the means of two
‘independent’ groups:
Ho:
1   2
(i.e. there’s no difference)
Ha:
1   2
(or one-sided: > or <)
 Is the difference large enough relative to the
combined standard error of the two groups to
be statistically significant?
 Here’s an example concerning
female vs. male standardized
reading test scores among a sample
of California high school students
(hsb2.dta).
 These data are in memory, so the
Stata command is ttest:
. Ho: ...
Ha: …
see two-tailed test below
. ttest read, by(female)
Two-sample t test with equal variances
Group
Obs
male
female
combined
diff
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
91
52.82418
1.101403
10.50671
50.63605
55.0123
109
51.73394
.9633659
10.05783
49.82439
53.6435
52.23
.7249921
10.25294
50.80035
53.65965
1.457507 -1.783997
3.964459
200
1.090231
Degrees of freedom: 198
Ho: mean(male) - mean(female) = diff = 0
Ha: diff < 0
Ha: diff ~= 0
Ha: diff > 0
t=
t=
t=
0.7480
P<t=
0.7723
0.7480
P>t=
0.4553
0.7480
P>t=
0.2277
 The test conclusion?
A Brief Review
 What’s the difference between the
z-distribution & the t-distribution?
 At what sample size do they
become nearly identical? At what
sample size do they become
identical?
In conducting the various kinds of
t-test:
Ask if the data are drawn from a
random sample of independent
observations.

 Check the sample size.
 Graphically check the data for
pronounced skewness (if sample size<
40) & outliers, & if there are two
distributions compare their standard
deviations.
Conclusions are always
uncertain: sampling & nonsampling sources of error.

 If N<40 & the data do not have
pronounced skewness &/or outliers, use the
t-procedures to make inferences about a
one-sample mean or two-sample means.
 If at this sample size there are problems
with skewness &/or outliers, consider
transforming the data &/or eliminating
outliers.
 Alternatively, use nonparametric (i.e.
distribution-free) procedures.
 If N>=40, central limit theorem & law of
large numbers kick in, but check for outliers.
Summary
 What’s a standard error?
 What’s the difference between
standard error of the mean &
standard deviation?
 Why is the difference important?
 What kinds of confidence intervals &
what tests do we use to make
inferences about one-sample & twosample means of random variables?
 What are the premises of these
tests, & how do we assess the
premises in any given case?