Download Abbreviated sample size tables for multiple regression, t

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript
1
Sample size and MCA
Always use the largest samples possible, regardless of the research design or
statistical analyses to be applied. Large samples yield statistics that are closer in value
to true, though unknown, population values, a reflection of The Law of Large Numbers. Large
samples contain more members of the population, and therefore have potential to harbor
better approximations to the population values of quantities under study than do small
samples.
Sample size for MCA in particular can be considered from several different
perspectives. It is known that MCA results (the values of Beta and R) are more trustworthy
when n = 200 or more. It is said that "the Betas bounce" or are unstable from sampling to
sampling when n is smaller than 200, and especially so when n is trivially small. When true
R is known (as it can be when a simulation is conducted or a population has been surveyed),
small samples will yield values of R that fluctuate away from true R, greater than do
samples of, say 200 and larger.
The number of people (n) must exceed the number of independent variables (k) in the
analysis. As k approaches n, the value of R becomes artificially large, reaching 1.0 when k
= n. Plan to have no fewer than three people for every variable in the analysis, or n  k X
3. This number may not provide adequate statistical "power", however.
Another consideration does pertain to "power", or the so-called probability of
rejecting a false null hypothesis. The topics of Type-II Error, accepting a false null
hypothesis when
R <> 0, and Type-I Error, rejecting a true null hypothesis when
R = 01 can be seen as controversial, because (a) lacking a population census the value of R
is unknowable, and (b) p-levels do not indicate probabilities that null hypotheses are
either true or false. Assigning mathematical probabilities to these two Errors may not be as
important as using a large sample to provide as high a likelihood as possible of a sample
that yields an R value close to that in the population. However, we present a table of
required n's for achieving statistical power to detect R of varying sizes.
Each of the tables below presents minimum required n to achieve adequate (.70) to
excellent (.90) statistical power for varying numbers of variables (k). Select the table
that provides the information for the minimum value of R to detect; that is, decide upon the
smallest R that would be of practical interest to discover. The indicated sample size will
the null hypothesis value of R2 is equal to
(k-1)/(n-1), not zero, unless there is only one predictor
variable, i.e., a simple Pearson r between two variables.
1actually,
2
provide statistical power to detect R of the minimum size or larger. For example, to detect
R no smaller than .20, with power of .70, and 2 variables in the analysis, n must be no less
than 187.
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.20
1
150
190
254
.20
2
187
234
306
.20
3
214
265
344
.20
4
237
291
374
.20
5
256
313
401
.20 10
332
400
503
.20 15
391
467
581
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.30
1
64
81
108
.30
2
80
100
130
.30
3
92
114
147
.30
4
102
125
160
.30
5
111
135
172
.30 10
146
175
218
.30 15
174
206
254
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.40
1
34
43
57
.40
2
43
53
69
.40
3
50
61
78
.40
4
55
67
85
.40
5
60
73
92
.40 10
81
96
118
.40 15
98
114
139
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.50
1
20
25
33
.50
2
26
31
40
.50
3
30
36
46
.50
4
34
40
51
.50
5
37
44
55
.50 10
51
59
72
.50 15
62
72
86
3
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.60
1
12
15
20
.60
2
16
20
25
.60
3
19
23
29
.60
4
22
26
32
.60
5
24
28
35
.60 10
34
39
47
.60 15
43
49
57
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.70
1
8
10
12
.70
2
11
13
16
.70
3
13
15
18
.70
4
15
17
21
.70
5
16
19
23
.70 10
24
27
32
.70 15
32
35
40
Sample size table for R, k, and power
Power
R
k
.70
.80
.90
----------------------------------------.80
1
5
6
7
.80
2
7
8
10
.80
3
8
10
11
.80
4
10
11
13
.80
5
11
13
15
.80 10
18
20
22
.80 15
24
26
29
4
About statistical significance and statistical power
The descriptive statistic that we have been considering is the group mean. It is used
to characterize a group or sample overall as having a small to large amount of a
characteristic. Other descriptive statistics are frequencies or proportions, standard
deviations, and correlation coefficients. ANOVA and the t-test are vehicles for assessing
the statistical significance of differences between two or more means. Statistical
significance testing (SST) is a mathematical simulation that we use as an aid in making a
decision to report that observed mean differences are or are not representative of "real"
differences. Upon obtaining a set of group means, the researcher will observe either
manifestly trivial differences or those that she "feels" and believes are substantial. SST
offers the researcher one vehicle for modifying or solidifying her beliefs about observed
mean differences.
The enterprise of statistics is firstly about organizing and summarizing data so that
states, trends, and relationships can be revealed. The second concern of statistics is
testing mathematical hypotheses about what we see in the data--doing SST--so that we can
form beliefs that we report to the public; beliefs that what we see is fluky, or beliefs
that what we see is "real". SST, like all other simulations, yields information that in this
case is used by the researcher in deciding to report either that she has found "something",
or that "nothing" has been found. SST does not determine or define this decision.
SST asserts that the observed differences are due to sampling error, while the true
mean difference is precisely zero. A mathematical model is used as a hypothetical population
in which the true mean difference is zero, though from which a virtual infinitude of nonzero differences can and will be sampled if the sample size is less than extremely large
(infinite!). When using SST, the researcher temporarily asserts that a no-difference
universe is the origin of her set of observed differences; that her differences are fluky,
random noise, insubstantial. SST reveals the probability of sampling a set of differences of
the size one has observed when, not if the true difference is zero; or when the null
5
hypothesis is true, as it is said.
When an observed set of differences has a small probability of being sampled when it
has been in fact sampled from a truly null population, the researcher concludes that such a
population is a poor model for the origin of the sampled differences. She rejects the null
hypothesis. When the observed set of differences has a relatively large probability of being
sampled when it has been sampled in fact from the null population, the researcher will
accept the null model as a tenable origin of the sampled differences; she accepts the null
hypothesis. SST deals with probabilities for samples from the null population, not with
probabilities that the null population was sampled!
SST is never an attempt to prove any hypothesis whatsoever. SST is an attempt to
obtain information that can assist in making a decision about the origin of one's sample
findings. Is it fluky or is it not?
The researcher must balance and consider the quality of
the research design, the size of a given difference, and the probability of sampling such
differences even though (when, not if) the true difference is zero in finally deciding to
report "a significant difference" or not.
The coefficient of statistical significance is the probability or p-level. The most
commonly used value is p  .05; decide to declare a statistically significant--that is,
real--difference if the probability of sampling one's difference(s) when the universe is
null is no greater than five percent. Large samples will yield "significant" findings more
often than small samples. This is so because sample size appears in the mathematical
equations used in calculating F- and t-ratios in a manner that tends to make these ratios
large when n is large, and make these ratios small when n is small. The larger the F- or tratio, the more likely it will have a small associated value of p; perhaps as small is .05
or .01.
By contrast to the mechanical impact of large n on the value of p, there is a theory
that large samples tend to give studies greater statistical power than do small samples.
This is the theory that if the origin of one's findings was not a null universe, but instead
6
was a universe in which the true difference is not zero, then a large sample provides a
better chance of sampling sufficient data points from the region of that universe, namely
its center, in which the population means actually differ. This is an expression of The Law
of Large Numbers, which states that the larger the sample, the more closely the sample
statistics will resemble the population statistics. Researchers wish to maximize this
"power" by using a sample of a certain minimum size or larger. "Power" is said to be the
probability of rejecting a false null hypothesis, whereas α (alpha) is said to be the
probability of rejecting a true null hypothesis.
The arithmetic of SST can guarantee p  .05 with a certain value for n, when a
specific minimum "effect size" is being sought. This is not to say that a certain sample
size will cause a desired minimum effect size to emerge from the data. If the data do
contain this minimum set of differences (or larger), then the p-level will be .05 or smaller
if and only if the sample size is no less than a predetermined number.
Below are abbreviated sample size tables for t- and F- tests.
Find the minimum sample size per group that will provide a certain level of power for
detecting an effect of a certain minimum size or larger by locating n at the intersection of
a given power row and effect size column.
7
Sample size for t-tests according to desired effect size and power
Power
.70
.80
.90
small
Effect Size
medium
large
233
313
425
38
50
68
16
20
28
Sample size for F-tests relative to numerator Degrees of Freedom,2
effect size, and power
Power
.70
.80
.90
Power
.70
.80
.90
Power
.70
.80
.90
2DF
small
DF = 2
Effect Size
medium
large
258
322
421
42
52
68
17
21
27
small
DF = 3
Effect Size
medium
large
221
274
354
36
45
58
15
18
23
small
DF = 4
Effect Size
medium
large
195
240
309
32
39
50
13
16
20
are generally equal to the number of means being
compared minus one. DF = 2 implies that three means are being
compared.
8
Power
.70
.80
.90
Power
.70
.80
.90
Power
.70
.80
.90
small
DF = 5
Effect Size
medium
large
175
215
275
29
35
45
12
14
18
small
DF = 10
Effect Size
medium
large
123
148
187
20
24
31
8
10
13
small
DF = 15
Effect Size
medium
large
98
118
148
16
20
24
7
8
10