Download Specify, hypothesize, assume, and obtain

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
SPECIFY, HYPOTHESIZE, ASSUME, OBTAIN, TEST, and PROVE
Thomas R. Knapp
©
2013
I am constantly amazed that many researchers don't understand the differences
among the six verbs "specify", "hypothesize", "assume", "obtain", "test", and
"prove".
An example
Consider the following example: You're interested in the relationship between
height and weight, and you would like to carry out a study of that relationship for
a simple random sample of some large population.
What do you SPECIFY? If you plan to use traditional statistical inference
(significance testing) you need to specify the magnitudes of tolerable probabilities
of Type I (alpha) and Type II (beta) errors before you see the data. (For the latter
you can specify the power you want rather than the tolerable probability of a
Type II error, where power = 1 - beta.) If you plan to use interval estimation you
need to specify how confident you want to be with the finding you'll get and the
tolerable margin of error (half-width of the confidence interval), also before you
see the data.
What do you HYPOTHESIZE? If you plan to use significance testing you need to
hypothesize both a null value (or set of values) for a particular parameter and an
alternative value (or set of values) for that parameter. If you plan to use interval
estimation you need not, nay cannot, hypothesize any values beforehand.
What do you ASSUME? For significance testing you need to assume the
independence of the observations and random sampling (which you have), and
you might need to assume a normal distribution of the observations in the
population from which the sample is to be drawn. You might also need to
assume homogeneity of variance, homogeneity of regression, and/or other
things. For interval estimation the assumptions are the same. For Bayesian
inference you need to consult your local friendly statistician.
What do you OBTAIN? For both significance testing and interval estimation the
first thing you obtain is the appropriate sample size necessary for your
specifications, before you embark upon the study. Upon completion of the study
you obtain the relevant descriptive statistics, p-values, actual confidence
intervals, and the like.
What do you TEST? For significance testing you test the null hypothesis against
the alternative hypothesis. For interval estimation there is nothing to test per se.
What do you PROVE? Nothing.
So what's the problem?
1. Some people say you calculate (obtain) power for a study. No, you specify
the power you want (directly; or indirectly by specifying the tolerable probability of
a Type II error, which is 1 minus power). There is such a thing as post hoc
power in which power is calculated after the fact for the effect size actually
obtained, but it is a worthless concept. See below for more about post hoc
power.
2. Some people say you specify the sample size. No, unless you're stuck with a
particular sample size. As indicated above, you determine (calculate, obtain) the
appropriate sample size.
3. Some people say you assume the null hypothesis to be true until, or unless,
rejected. No, you hypothesize it to be true (although you usually hope that it
isn't!), along with an alternate hypothesis, which you usually hope to be true.
4. Some people say you hypothesize that the population distribution is normal.
No, you assume that (sometimes).
5. Some people say you prove the null hypothesis to be true if you don't reject it.
No, you calculate the probability of getting the statistic you got, or anything more
discrepant from the null-hypothesized parameter, if the null hypothesis is true. If
that conditional probability is greater than your pre-specified alpha level, you
cannot reject the null-hypothesized parameter. But that doesn't mean you've
proven it to be true.
Some of those same people say you prove the null hypothesis to be false if you
reject it. No; if the conditional probability is less than your pre-specified alpha
you reject the null-hypothesized parameter. But that doesn't mean you've proven
it to be false
Back to the example
What should you do?
a. If you're going to use significance testing, you should first SPECIFY alpha and
beta. The conventional specifications are .05 for alpha and .20 for beta (power of
.80), but you should preferably base your choices on the consequences of
making Type I and Type II errors. For example, suppose your null hypothesis will
be that the population correlation is equal to zero and you subsequently reject
that hypothesis but it's true. There should be no serious consequence of being
wrong, other than your running around thinking that there is a non-zero
relationship between height and weight when there isn't. In that case you should
feel free to specify a value for alpha that is more liberal than the traditional .05
(perhaps .10, double that probability?). If the null hypothesis of zero is pitted
against an alternative hypothesis of, say, .90 (a strong relationship) and you
subsequently do not reject the null but it's false, you will have missed a golden
opportunity to be able to accurately predict weight from height. Therefore, you
should feel free to decrease beta to .05 (increase power to.95) or even less.
b. If you're going to use interval estimation, you should first SPECIFY the
maximum margin of error you will be able to tolerate when you make your
inference from sample to population, along with the associated specification of
how confident you want to be in making that inference. The former might be
something like .10 (you'd like to come that close to the population correlation).
The latter is conventionally taken to be 95% but, like alpha and beta in
significance testing, is always "researcher's choice".
c. Once those specifications have been made, the next step is to use one of the
various formulas and tables that are available for determining (OBTAINING) the
sample size that will satisfy the specifications. If you've intellectualized things
properly, it will be a "Goldilocks sample" (not too large, not too small, but just
right).
d. For significance testing you are now ready to HYPOTHESIZE: one value (or
set of values) for a parameter for the null hypothesis, and a competing value (or
set of values) for the alternative hypothesis. For a study of the relationship
between two variables (in your case, height and weight), the null-hypothesized
parameter is almost always zero, i.e., the conservative claim that there is no
relationship. Things are much trickier for the alternative hypothesis. You might
want to hypothesize a particular value other than zero, e.g., .60, if you believe
that the relationship is positive and reasonably large. (You probably would not
want to hypothesize something like .98 because you can't imagine the
relationship to be that strong.) Or you might not want to stick your neck out that
far, so you might merely hypothesize that the correlation in the population is
positive. (That is the conventional alternative hypothesis for a relationship study,
whether or not it is actually stated.) There are other possibilities for the
alternative hypothesis, but those should do for the present.
e. For interval estimation you get off easy, because there are no values to
hypothesize. You have made your speciifications regarding tolerable margin of
error and degree of confidence, but you are uninterested, unwilling, or unable to
speculate what the direction or the magnitude of the relationship might be.
f. No matter whether you choose to use significance testing or interval
estimation, if the Pearson product-moment correlation coefficient is to be the
statistic of principal interest you will need to ASSUME that in the population there
is a bivariate normal distribution. If you prefer to rank-order the heights and
weights (and lose some information) and use Spearman's rank correlation, that
assumption is not necessary.
f. You're now ready to draw (OBTAIN) your sample, collect (OBTAIN) the actual
heights and weights, and calculate (OBTAIN) the sample correlation. If you've
chosen the significance testing approach you can TEST the null hypothesis of no
relationship against whatever alternative hypothesis you thought to be relevant,
and see whether the p-value corresponding to the sample correlation is less than
or greater than your pre-specified alpha. If it is less, the sample correlation is
statistically significant; if it is greater, the sample correlation is not. If you've
chosen the interval estimation approach, you can construct (OBTAIN) the
confidence interval around the sample correlation and make the inference that
you are X% confident that the interval "captures" the unknown population
correlation.
g. You will not have PROVEN anything, but if you've chosen the significance
testing route you will have made the correct inference or you will have made
either a Type I error (by rejecting a true null) or a Type II error (by not rejecting a
false null) but there's no way you would be subject to a Type I error and a Type II
error (you can't both reject and not reject the null). Unfortunately, alas, you will
never know for sure whether you're right or not, but "the odds" will usually be in
your favor. Similarly, if you've chosen interval estimation your inference that the
parameter has been captured or has not been captured can be either right or
wrong and you won't know which. But once again "the odds" will be in your
favor. That should be comforting.
What you should not do
The first thing you should not do is use both significance testing AND interval
estimation. As you might already know, a confidence interval consists of all of
the values of a parameter that are "unrejectable" with a significance test. There
is an unfortunate tendency these days to report the actual p-value, e.g., .003,
from a significance test ALONG WITH a confidence interval (usually 95%)
around the obtained statistic.
The second thing you should not do is report the so-called post hoc (or
retrospective or observed) power, along with or (worse yet) instead of the a priori
(ordinary) power. Post hoc power adds no important information, but has
unfortunately been incorporated into some computer packages, e.g., SPSS's
Analysis of Variance routines. It is perfectly inversely related to p-value.
Both things drive me up a wall. Please don't do either of them. Thank you.