Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Testing Hypotheses and The
Standard Error
Testing Hypotheses and The
Standard Error
• The standard error, as an estimate of chance
fluctuation, is the measure against which the
outcomes of experiments are checked. Is there a
difference a “real” difference or merely a
consequence of the many relatively small
differences that could have arisen by chance?
• To answer this question, the standard error of the
differences between means is calculated and the
obtained difference is compared to this stand error.
Examples: Differences Between Means and
Absolute and Relative Differences
• Two problems: one of absolute and
relative size of differences and one of
practical or “real” significance versus
statistical significance.
• The difference of 0.164 is probably trivial
even though statistically significant. The
0.164 was derived from a 7-point scale of
smoking frequency, and is thus really
small.
Examples: Differences Between Means and
Absolute and Relative Differences
• One should ordinarily not be enthusiastic
about mean differences like 0.2, 0.15, 0.05,
and so on, but one has to be intelligient
about it.
Correlation Coefficients
• How low is low? At what point is a correlation
coefficient too low to warrant treating it seriously?
• The problem is complex. In basic research, low
correlations—of course, they should be statistically
significant—may enrich theory and research. It is in
applied research where prediction is important. It is
here where value judgments about low correlation
and the trivial amounts of variance shared have
grown. In basic research, however, the picture is
more complicated. One conclusion is fairly sure:
correlation coefficients, like other statistics, must be
tested for statistical significance.
Hypothesis Testing: Substantive
and Null Hypotheses
• The main research purpose of inferential
statistics is to test research hypotheses by
testing statistical hypotheses.
• Broadly speaking, scientists use two types
of hypotheses: substantive and statistical.
A substantive hypothesis is the usual type
of hypothesis discussed in Chapter 2,
where a conjectural statement of the
relation between two or more variables is
expressed.
Hypothesis Testing: Substantive
and Null Hypotheses
• Statistical hypotheses must be tested
against something, however. It is not
possible to simply test a stand-alone
statistical hypothesis. That is, we do not
directly test the statistical proposition  A  B
in and of itself. We test it against an
alternative proposition. Naturally, there can
 A. The
B
be several alternatives to
alternative usually selected is the null
hypothesis, which is invented by Sir
Ronald Fisher.
Hypothesis Testing: Substantive
and Null Hypotheses
• The null hypothesis is a statistical
proposition that states, essentially, that
there is no relation between the variables.
The null hypothesis says, “You’re wrong,
there is no relation; disprove me if you
can.”
Hypothesis Testing: Substantive
and Null Hypotheses
• Researchers sometimes unwittingly use
null hypotheses as substantive hypotheses.
The trouble with this is that it places the
investigator in a difficult position logically
because it is extremely difficult to
demonstrate the empirical “validity” of a null
hypothesis. After all, if the hypothesis  A  B
is supposed, it could well be one of the
many chance results that are possible,
rather than a meaningful nondifference!
Hypothesis Testing: Substantive
and Null Hypotheses
• Fisher (1950) says, “Every experiment
may be said to exist only in order to give
the facts a chance of disproving the null
hypothesis.”
Hypothesis Testing: Substantive
and Null Hypotheses
• Although as researchers we want to
demonstrate that H1 :  A  B is true, it cannot
be done in a direct way easily. If we want to
test this hypothesis directly, we would need
to test an infinite number of values. That is,
we would need to test each and every
situation where    is not equal to zero.
A
B
Hypothesis Testing: Substantive
and Null Hypotheses
• In hypothesis testing, the procedure dictates
that we test the null hypothesis. The null
hypothesis is written as H 0 :  A   B  0 . Note
that it points directly to a value, namely zero.
What we need is to gather enough empirical
data to show that the null hypothesis is not
tenable.
Hypothesis Testing: Substantive
and Null Hypotheses
• In statistical terms, we would “reject H0”
Rejecting H0 would indicate to us that we
have a significant result. Rejecting H0
leads us to ward supporting H1.
Supporting H1, in turn leads to support for
our substantive hypothesis.
Hypothesis Testing: Substantive
and Null Hypotheses
• If there are not enough empirical data to refute the
null hypothesis, we would not be able to reject the
null hypothesis. Statistically we would say “failed to
reject H0” or “do not “reject” H0; one can never
“accept” H0. To “accept” H0 would require
repeating the study an infinite number of times,
and getting exactly zero each time. On the other
hand, we can “fail to reject” H0 because the results
are not sufficiently different from what one would
predict (under the assumption that H0 is true) to
warrant the conclusion that it is false.
Hypothesis Testing: Substantive
and Null Hypotheses
• The states of H0 is akin to the defendant in a trial
who is deemed to be “innocent” until proved “guilty.”
If the trial results in a verdict of “not guilty”, this
does not mean the defendant is “innocent.” It
merely means that guilt could not be demonstrated
beyond a reasonable doubt.
• When the investigator fails to reject H0 it does not
mean H0 is true, merely that H0 cannot be shown
to be false beyond a “reasonable” doubt.
The General Nature of a Standard
Error
• If there was no random error, there would
be no need for statistical tests of
significance. Any difference at all would be
a “real” difference. But alas, such is never
the case.
• There are always chance errors (and
biased errors, too), and standard errors are
measures of this error. The standard error
is the standard deviation of the sampling
distribution of any given measure.
The General Nature of a Standard Error
• Suppose we draw a random sample of 100 children
from eighth-grade classes in such-and-such a
school system, and we find the mean=110 and
SD=10. How accurate is this mean?
• What we do is to set up a hypothetical distribution
of sample means, all calculated from samples of
100 pupils, each drawn from the parent population
of eighth-grade pupils. If we know the mean of this
population of means, everything would be simple.
In fact, we cannot obtain it. The best we can do is
to estimate it with our sample value, or sample
mean. We simply say, in this case, “Let the sample
mean equal the mean of the population mean” —
and hope we are right. Then we must test our
equation. We do this with the standard error.
The General Nature of a Standard Error
• The formula for the standard error of the mean:
SEM 
SD
n
• This is also called the sampling error.
• Just as the standard deviation is a measure of
the dispersion of the original scores, the
standard error of the mean is a measure of the
dispersion of the distribution of sample means.
A Monte Carlo Demonstration
•
•
•
•
•
Table 12.1, 12.2
The procedure
Generalizations
The Central Limit Theorem
The Standard Error of the Differences
Between Means
Statistic
S tan dard Error of the Statistic
The Central Limit Theorem
• If samples are drawn from a population at
random, the means of the samples will tend
to be normally distributed. The larger the Ns,
the more this is so. And the shape and kind
of distribution of the original population
makes no difference.
• Why is it important to show that distributions
of means approximate normality? We work
with means a great deal in data analysis, and
if they are normally distributed then one can
use the known properties of the normal curve
to interpret obtaained research data.
Statistical Inference
• To infer is to derive a conclusion from
premises or from evidence. To infer
statistically is to derive probabilistic
conclusions from probabilistic premises.
We conclude probabilistically; that is, at a
specified level of significance.
• Another form of inference, discussed at
length in the chapter on sampling, is that
from a sample to a population.
Statistical Inference
• One of the gravest dangers of research is
the inferential leap from sample data to
population fact.
• It can be said, in sum, that statistics enable
scientists to test substantive hypotheses
indirectly by enabling them to test statistical
hypotheses directly. They test the “truth” of
substantive hypotheses by subjecting null
hypotheses to statistical tests on the bases
of probabilistic reasoning.
Testing Hypotheses and the Two Types of
Errors
• Figure 12.3, type one and type two errors
• H0 is rejected with the awareness that an error
might have been made, but the chances of that
happening are less than 0.05. The conclusion of
rejecting H0 on an average is correct more than
95% of the time.
• As a rule, in selecting a significance level one must
decide which type of error is more important to
avoid or minimize. To be certain that an event of
some importance has been identified before
reporting it, use a fairly stringent criterion of
significance, such as 0.01. On the other hand, if
there is greater concern not to miss something, use
a less stringent level, such as 0.05.
Testing Hypotheses and the Two Types of
Errors
• Table 12.3.
• The size of the sample is related to both types of
errors. With a fixed value of type one error and a
fixed sample size n, the value of type two error is
predetermined. If type two error is too large, it
can be reduced by either raising the level of type
one error for fixed n, or by increasing n for a
fixed level of type one error. Although type two
error is seldom determined in an experiment,
researchers can be assured that it is reasonably
small by collecting a large sample.
The Five Steps of Hypothesis
Testing
• Using our substantive hypothesis we can
state it statistically. Even though we have
referred to it as our statistical hypothesis,
many statisticians refer to it as the
research, or experimental or alternative
hypothesis.
• Table 12.4.
Sample Size Determination
• A sample that is too large is a waste of resources. A
sample that is too small is also a wasted effort
since it will not be large enough to detect a
significant effect (difference).
• By increasing the sample size, the sampling
distribution becomes narrower and the standard
error becomes smaller. As a result, a large sample
increases the likelihood of detecting a difference.
However, too large of a sample will make a very
small difference statistically significant, but not
necessarily of practical significance.
Sample Size Determination
• The formula to estimate sample size for
each group for a simple random sample is
Z 2 2
n
d2
• If sampling is from a finite population of
size N, and the sampling is done without
replacement,
n
n 
1 n / N
Sample Size Determination
• Researchers who want to protect
themselves on both type one error and
type two error can use the following
formula for each group,
(Z   Z ) 
2
n
d
2
2
Sample Size Determination
• The procedure described above is for a
one-tailed test. For a two-tailed test, only
the Z  will change.