Download as a PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
invited review
Fundamental concepts in statistics:
elucidation and illustration
DOUGLAS CURRAN-EVERETT, SUE TAYLOR, AND KAREN KAFADAR
Departments of Pediatrics and of Preventive Medicine and Biometrics, School of Medicine,
University of Colorado Health Sciences Center, Denver, 80262; and Department of Mathematics,
University of Colorado at Denver, Denver, Colorado 80217-3364
confidence interval; estimation; tolerance interval; uncertainty; variability
There are very few things which we know, which are not
capable of being reduc’d to a Mathematical Reasoning,
. . . and where a Mathematical Reasoning can be had,
it’s as great folly to make use of any other, as to grope for
a thing in the dark when you have a Candle standing
by you.
John Arbuthnot (1692)
STATISTICS IS ONE KIND of mathematical reasoning. Its
concepts and principles are ubiquitous in science: as
researchers, we use them to design experiments, analyze data, report results, and interpret the published
findings of others. Indeed, it is from this foundation of
statistical concepts and principles that scientific knowledge is accumulated. If we fail to understand fully these
fundamental statistical concepts and principles—if our
statistical reasoning is faulty—then we are more likely
to reach wrong scientific conclusions. Wrong conclusions based on faulty reasoning is shoddy science; it is
also unethical (1, 21, 30).
Regrettably, faulty reasoning in statistics rears its
head in the practice of science: for 60 years, statisticians have documented statistical errors in the scientific literature (3, 4, 17, 33, 50). In part, these errors
exist because many introductory textbooks of statistics
paradoxically hinder literacy in statistics: they empha-
http://www.jap.org
size methods rather than concepts, they contain glaring
errors, or they perpetuate misconceptions (4, 11, 12).
In his editorial prelude to a series of statistical
papers, Yates (51) wrote that the papers were designed
to raise statistical consciousness and thereby reduce
statistical errors in journals published by the American
Physiological Society. Rather than reinforce concepts,
these papers reviewed methods: analysis of variance
(20), linear regression (37, 46), mathematical modeling
(22, 29, 40), risk assessment (36), and statistical packages (34). The proper use of any statistical technique,
however, requires an understanding of the fundamental statistical concepts behind the technique.
How well do physiologists understand fundamental
concepts in statistics? One way to answer this question
is to examine the empirical incidence of basic statistical
quantities such as standard deviations, standard errors, and confidence intervals. These quantities characterize different statistical features: standard deviations
characterize variability in the population, whereas
standard errors and confidence intervals characterize
uncertainty about the estimated values of population
parameters, e.g., means. Of the original articles published in 1996 by the American Physiological Society,
the overwhelming majority (69–93%, range) report
standard errors, apparently not as estimates of uncer-
8750-7587/98 $5.00 Copyright r 1998 the American Physiological Society
775
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
Curran-Everett, Douglas, Sue Taylor, and Karen Kafadar. Fundamental concepts in statistics: elucidation and illustration. J. Appl.
Physiol. 85(3): 775–786, 1998.—Fundamental concepts in statistics form
the cornerstone of scientific inquiry. If we fail to understand fully these
fundamental concepts, then the scientific conclusions we reach are more
likely to be wrong. This is more than supposition: for 60 years, statisticians have warned that the scientific literature harbors misunderstandings about basic statistical concepts. Original articles published in 1996
by the American Physiological Society’s journals fared no better in their
handling of basic statistical concepts. In this review, we summarize the
two main scientific uses of statistics: hypothesis testing and estimation.
Most scientists use statistics solely for hypothesis testing; often, however,
estimation is more useful. We also illustrate the concepts of variability
and uncertainty, and we demonstrate the essential distinction between
statistical significance and scientific importance. An understanding of
concepts such as variability, uncertainty, and significance is necessary,
but it is not sufficient; we show also that the numerical results of
statistical analyses have limitations.
776
INVITED REVIEW
Table 1. Manuscripts for the American Physiological Society’s journals in 1996: use of statistics and statisticians
%Research Manuscripts That Report
Am. J. Physiol.
Cell Physiol.
Endocrinol. Metab.
Gastrointest. Liver Physiol.
Heart Circ. Physiol.
Lung Cell. Mol. Physiol.
Regulatory Integrative Comp. Physiol.
Renal Fluid Electrolyte Physiol.
J. Appl. Physiol.
J. Neurophysiol.
n
Standard
deviation
Standard
error
Confidence
interval
P
value*
Statistician†
43
28
26
60
25
41
27
62
58
21
18
8
17
20
17
15
24
36
88
86
92
87
84
88
93
79
69
0
0
0
0
0
0
0
0
2
7
4
4
10
4
15
7
6
5
0
4
12
3
4
12
4
10
7
n, No. of research manuscripts reviewed. In 1996, these journals published a total of 3,693 original articles. No. of articles reviewed
represents a 10% sample (selected by systematic random sampling, fixed start) of articles published by each journal. * Precise P value: for
example, P 5 0.02 (rather than P , 0.05) or P 5 0.13 (rather than P $ 0.05 or P 5 not significant). † We assessed collaboration with a
statistician using author affiliation and acknowledgments. We recognize that a statistician may be affiliated with another department, e.g.,
medicine. Using our criterion, however, few articles (0–12%, range) report formal collaboration of a physiologist with a statistician, a
partnership that typically reaps great rewards.
Glossary
a
Ave 5q6
µ
n
n
N (µ, s2 )
P
Pr 5A6
s
sy
s
s2
s2
SE 5q6
Var 5q6
Y
yi
y
Critical significance level
Average of the quantity q
Population mean
Degrees of freedom
Number of observations
Normal (Gaussian) distribution with mean
µ and variance s2
Achieved significance level
Probability of event A
Population standard deviation
Standard deviation of the sampling distribution of the sample mean
Sample standard deviation
Population variance
Sample variance
Standard error of the quantity q
Variance of the quantity q
Random variable Y
Sample observation i, where i 5 1, 2, . . . , n
Sample mean
SCIENTIFIC USES OF STATISTICS
In science, there are two main uses of statistics:
hypothesis testing and estimation. Most researchers
use statistics solely for hypothesis testing. In many
situations, statisticians play down hypothesis testing
and prefer estimation instead.
Hypothesis testing. To test a scientific hypothesis, a
researcher must formulate the hypothesis before any
data are collected, then design and execute an experiment that is relevant to it. Because the hypothesis is
most often one of no difference, the hypothesis is called,
by tradition, the null hypothesis.1 Using data from the
experiment, the researcher must next compute the
observed value T of a test statistic. Finally, the researcher must compare the observed value T with some
critical value T *, chosen from the distribution of the
test statistic that is based on the null hypothesis. If T is
more extreme than T *, then that is a surprising result
if the null hypothesis is true, and the researcher is
entitled, on statistical grounds, to become skeptical
about the scientific validity of the null hypothesis.
The statistical test of a null hypothesis is useful
because it assesses the strength of the evidence: it helps
guard against an unwarranted conclusion, or it helps
argue for a real experimental effect (19, 48). Nevertheless, a null hypothesis is often an artificial construct:
before any data are recorded, the investigator knows—at
least, suspects—that the null hypothesis is not exactly
true. Moreover, the only question a hypothesis test can
answer is a trivial one: is there anything other than
random variation here?2
Statisticians have emphasized repeatedly the limited
value of hypothesis testing (2, 4, 9, 18, 24, 28, 31, 38,
50). In fact, the P values that result from hypothesis
tests have been described as ‘‘absurdly academic’’3 (25)
and as having a ‘‘strictly limited role’’ (19) in data
analysis. Within the scientific community, unwar1 The adjective ‘‘null’’ can be misleading: this hypothesis need not
be one of no difference. The use of null persists because of historical
inertia.
2 Kruskal (38) reviews other drawbacks to hypothesis testing.
3 Sir Ronald Fisher, the author of this phrase, developed many
statistical procedures, including the analysis of variance.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
tainty but as estimates of variability (Table 1). Virtually no articles (0–2%, range) report confidence intervals, recommended by statisticians (2, 5, 9, 10, 28, 39)
as interval estimates of uncertainty about the values of
population parameters. Moreover, few articles (4–15%,
range) report precise P values, which precludes personal assessment of statistical significance.
In this review, we summarize the primary scientific
uses of statistics. Then, we illustrate several fundamental concepts: variability, uncertainty, and significance.
Last, we illustrate that although an understanding of
concepts such as variability, uncertainty, and significance is necessary, it is not sufficient: it is essential to
realize also that the numerical results of statistical
analyses have limitations.
777
INVITED REVIEW
ranted focus on hypothesis testing has blurred the
distinction between statistical significance and scientific importance (3, 13, 19). Most investigators appear
to reach scientific conclusions that are based not on
their knowledge of science but solely on the probabilities of test statistics (16); this is an untenable approach
to scientific discovery.
The limited utility of hypothesis testing can be
demonstrated with an example. Suppose a clinician
wants to assess the impact of a placebo and the
b-blockers bisoprolol and metoprolol on heart rate
variability in patients with left heart failure. Suppose
also that the clinician constructs the null and alternative hypotheses, H0 and H1, as
H0: treatments have identical effects
on heart rate variability
The result of this hypothesis test will fail to convey any
information about the direction or magnitude of the
treatment effects on heart rate variability. Direction
and magnitude are important: in patients with left
heart failure, decreases in heart rate variability are
associated with increases in the risk of sudden cardiac
catastrophe (49). Direction and magnitude of an effect
reflect scientific importance; they are obtained by estimation.
Estimation. Regardless of the statistical result of a
hypothesis test, the crucial question concerns the scientific result: is the experimental effect big enough to be
relevant? A point estimate of a population parameter4
and an interval estimate of the uncertainty about the
value of that parameter help answer this question. For
example, one point estimate of a population mean is the
sample mean; one interval estimate of the uncertainty
about the value of the populations mean is a confidence
interval. Interval estimates circumvent the drawbacks
inherent to hypothesis testing, yet they provide the
same statistical information as a hypothesis test (15,
18, 28, 38). More important, point and interval estimates convey information about scientific importance.
Practical considerations. Estimation focuses attention on the magnitude and uncertainty of the experimental results. We must emphasize that hypothesis testing
can have value beyond assessing the strength of the
experimental evidence: for example, hypothesis testing
is useful if an investigator wants to evaluate the
importance of between-subject variability in an experiment. In practice, estimation should be done whenever
it is relevant and feasible; the precise P value from the
associated hypothesis test should be reported with the
point and interval estimates. When more than one
hypothesis is tested in an experiment, the problem of
multiple comparisons becomes relevant. Nevertheless,
a discussion of the issues involved in multiple-compari4 A parameter is a numerical constant: for example, the population
mean.
USING SAMPLES TO LEARN ABOUT POPULATIONS
As researchers, we use samples to make inferences
about populations. A sample interests us not because of
its own merits but because it helps us estimate selected
characteristics of the underlying population: for example, the sample mean y estimates the population
mean µ.5
As an illustration, suppose the random variable Y
represents the change in systolic blood pressure after
some intervention. Suppose also that the distribution of
Y conforms to a normal distribution. A normal distribution is specified completely by two parameters: the
mean and variance. The population mean µ conveys the
location of the center of the distribution; the population
standard deviation s, the square root of the population
variance s2, conveys the spread of the distribution. The
distribution of possible outcomes of the random variable Y is described by the normal probability density
function ( f ), which incorporates µ and s2
f(y) 5
1
sÎ2p
·exp 52(y 2 µ) 2/(2·s2 )6,
(1)
for 2` , y , 1`
In Fig. 1, the distributions for three different populations are theoretical: each depicts the distribution of
population values as if we had observed the entire
population.6
Suppose we want to estimate µ1 5 215, the mean of
population 1, in Fig. 1. To do this, we would measure
the change in systolic blood pressure in a sample of n
independent observations, y1, y2, . . . , yn, from the
population. For simplicity, assume we limit the sample
to 10 observations. One random sample is
233, 215, 26, 0, 18, 23, 8, 222, 222, 27
The average of these sample observations is the sample
mean y
y5
1
10
10
·
o y 5 28.2
i
(2)
i51
Because of intrinsic variability in the population, the
sample mean y differs from the population mean µ1;
only because this is a contrived example do we know
the true magnitude of the discrepancy.7 Next, we review measures that estimate variability in the population.
5
References 2, 9, 42, and 48 discuss other aspects of sampling.
Statistical calculations and exercises were executed by using SAS
Release 6.04 (SAS Institute, Cary, NC, 1987).
7 We address the discrepancy between the value of the sample
estimate of a population parameter and the value of the population
parameter itself in ESTIMATING UNCERTAINTY ABOUT A POPULATION
PARAMETER.
6
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
H1: treatments have different effects
on heart rate variability
son procedures is beyond the scope of this review; Refs.
2, 9, 42, and 48 summarize these issues.
For the rest of this review, we focus our attention on
several aspects of estimation.
778
INVITED REVIEW
Fig. 1. Using samples to learn about populations: 3
normal distributions. These distributions differ in
location, reflected in the mean µ, or spread, reflected
in the standard deviation s. A normal probability
density function (Eq. 1) describes the distribution of
each population.
The preceding sample observations, 233, 215, . . . ,
27, differ because the population from which they were
drawn is distributed over a range of possible values.
This intrinsic variability is more than a distraction: it is
an integral part of statistics, and the careful study of
variability may reveal something about underlying
scientific processes (25). The most common measure of
the variability among sample observations is the sample standard deviation s, the square root of the sample
variance s 2
s 5
2
1
n21
n
·
o ( y 2 y)
i
2
i51
(See also Refs. 2, 9, 42, and 48.) The sample standard
deviation characterizes the typical distance of an observation from the distribution center; in other words, it
reflects the dispersion of individual sample observations about the sample mean. The sample standard
deviation s also estimates the population standard
deviation s: the standard deviation of the sample
observations 233, 215, . . . , 27 is s 5 15.2, which
estimates s 5 20.
Most journals would publish the preceding sample
mean and standard deviation as
28.2 mmHg 6 15.2
The 6 symbol, however, is superfluous: the standard
deviation is a single positive number. A standard
deviation can be reported clearly with notation of this
form
28.2 mmHg (SD 15.2)
In a table, the symbol SD can be omitted without loss of
clarity as long as the table legend identifies the parenthetical value as a standard deviation.
The standard deviation is often a useful index of
variability, but in many experimental situations it may
be a deceptive one: even subtle departures from a
normal distribution can render useless the standard
deviation as an index of variability (43); often, the
distribution of a biological variable differs grossly from
a normal distribution. As one example, the distribution
of values for plasma creatinine (26) resembles the
skewed distribution depicted in Fig. 2. When the tails of
a distribution are elongated, as is the right tail of this
skewed distribution, the sample standard deviation
will be an inflated measure of variability in the population (43, 48). There are two remedies to this misrepresentation of variability by the standard deviation: use
another measure of variability, or transform the data.
Alternative measures of variability. Two measures of
variability that are useful with a variety of distributions are the mean absolute deviation and the interquartile range. The mean absolute deviation (Ave 5 0 dev0 6) is
the average distance of the sample observations from
the sample mean
1 n
0y 2 y 0
Ave 50dev 06 5 ·
n i51 i
o
The interquartile range (often designated as IQR)
encompasses the middle 50% of a distribution and is
the difference between the 75th and 25th percentiles.
For 0 , w , 1, the 100wth percentile is the value below
which 100w% of the distribution is found.
Data transformation. When the sample observations
happen to be drawn from a population that has a
skewed distribution (e.g., a constituent of blood or the
growth rate of a tumor), a transformation may change
the shape of their distribution so that the distribution
of the transformed observations is more symmetric (14,
23, 26, 32, 48). Common transformations include the
logarithmic, inverse, square root, and arc sine transformations. The APPENDIX reviews a useful family of data
transformations.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
ESTIMATING VARIABILITY IN THE POPULATION
779
INVITED REVIEW
Fig. 2. Estimating variability in the population: a
skewed distribution. The lognormal probability density function (Eq. A1) describes this skewed distribution in which the Pr 5Y # 6.16 5 0.50 and the Pr 52.1 #
Y # 16.46 5 0.68 (gray area). For a normal distribution with the same mean and variance (inset), the
Pr 5Y # 10.06 5 0.50, and the Pr 523.1 # Y # 23.16 5
0.68 (gray area). See APPENDIX for further explanation.
ESTIMATING UNCERTAINTY ABOUT
A POPULATION PARAMETER
In the sampling exercise from USING SAMPLES TO
the sample mean y 5 28.2
(Eq. 2) estimated the population mean µ1 5 215. If we
had calculated this sample mean from experimental
observations, then we would be uncertain about the
magnitude of the discrepancy between the sample
estimate y and the population parameter µ1. The ability
to estimate the level of uncertainty about the value of a
population parameter by using the sample estimate of
that parameter is a powerful aspect of statistics (47).
Suppose we measure the same response variable, the
change in systolic blood pressure, in a second sample of
10 independent observations drawn from the same
population. We know beforehand that because of random sampling the mean of the second sample, y2, will
differ from the mean of the first sample, y1 5 28.2. If we
measure the change in systolic blood pressure in 100
samples of 10 independent observations, then we expect 100 different estimates of the population mean µ1;
for example
LEARN ABOUT POPULATIONS,
y1 5 28.2,
y2 5 28.1,
· · ·,
y100 5 222.5
If we treat these 100 observed sample means as 100
observations, then we can calculate their mean and
standard deviation, designated as Ave5y6 and SD5y6
Ave 5y6 5 214.5
and
and variance s2, which are known; the notation for this
normal distribution is Y , N(µ, s2 ). If an infinite
number of samples, each with n independent observations, is drawn from this normal distribution, then the
sample means y1, y2, . . . , y` will also be distributed
normally.8 The average of the sample means, Ave 5y6, is
the population mean µ, but the variance of the sample
means (Var 5y6) is smaller than the population variance
s2 by a factor of 1/n
Ave 5y6 5 µ
and
Var 5y6 5 s2y 5 s2/n
(The APPENDIX derives these expressions. Figure 3
develops these expressions using empirical examples.)
Therefore, the standard deviation of the theoretical
distribution of the sample mean, sy, is
sy 5 s/În
If the sample size n increases, then the standard
deviation sy will decrease: that is, the more sample
observations we have, the more certain we will be that
the point estimate y is near the actual population
mean µ.
The standard deviation of the theoretical distribution of the sample mean is known also as the standard
error of the sample mean, SE 5y6; that is
SE 5y6 5 s/În
In estimation, the standard error of the mean has no
particular value; instead, it is useful because of its role
SD 5y6 5 6.07
We can generalize from this empirical distribution of
sample means to a theoretical distribution of the sample mean for a sample of size n. Consider a random
variable Y that is distributed normally with mean µ
8 The Central Limit Theorem states that the theoretical distribution of the sample mean will be approximately normal, regardless of
the distribution of the original observations (35, 42). If the distribution of the original observations happens to be normal, then the
theoretical distribution of the sample mean will be exactly normal.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
In the next section, we revisit the unknown discrepancy between the sample estimate of a population
parameter and the population parameter itself.
780
INVITED REVIEW
Fig. 3. Estimating uncertainty about a population parameter: empirical distributions of sample means. These
distributions are based on 1,000 samples of 5 (A), 10 (B),
20 (C), or 40 (D) observations drawn at random from
population 1, for which the mean µ 5 215 and the
variance s2 5 400. For each empirical distribution, the
average of the sample means, Ave 5 y6, happens to be
215.1. As sample size increases, however, the sample
means become concentrated more closely about Ave 5 y6.
When sample size doubles, the variance of the sample
means, Var 5 y6, is approximately halved.
[µ 2 a, µ 1 a]
(4)
where the allowance a is
a 5 za/2 · SE 5y6
(5)
In Eq. 5, za/2 is the 100[1 2 (a/2)]th percentile from the
standard normal distribution, i.e., a normal distribution with mean 0 and variance 1, and SE 5y6 is defined
by Eq. 3. Therefore, when the population standard
deviation s is known, 95% of the possible sample means
are within 1.96·SE 5y6 of the population mean µ.
The interval in Eq. 4 can be written as the probability
expression
Pr 5µ 2 a # y # µ 1 a6 5 1 2 a
which declares that the probability is 1 2 a that a
sample mean lies within the interval [µ 2 a, µ 1 a].
After algebraic rearrangement, this expression can be
written
the interval
[y 2 a, y 1 a]
(6)
is called the 100(1 2 a)% confidence interval for the
population mean µ.
In practice, the sample standard deviation s estimates the population standard deviation s, which
means that s/În estimates the standard error of the
mean (Eq. 3). In calculating a 100(1 2 a)% confidence
interval for the mean µ, this uncertainty about the
actual value of s is handled by replacing za/2 in Eq. 5
with ta/2,n, the 100[1 2 (a/2)]th percentile from a Student t distribution with n 5 n 2 1 degrees of freedom.
Therefore, the allowance applied to the sample mean to
obtain the 100(1 2 a)% confidence interval for the
population mean (Eq. 6) is
a 5 ta/2,n · SE 5y6
where SE 5y6 5 s/În. Note that this allowance exceeds
the allowance in Eq. 5: there is greater uncertainty
about the value of the population mean µ. This happens
because if n , `, then ta/2,n . za/2 for all values of a.
Suppose we want to calculate a confidence interval
for the population mean µ1 5 215 by using the observations 233, 215, . . . , 27 of the first sample. The mean
and standard deviation of these 10 observations are y 5
28.2 and s 5 15.2. Therefore, the estimated standard
error of the mean is
SE 5y6 5 s/În 5 15.2/Î10 5 4.81
Pr 5y 2 a # µ # y 1 a6 5 1 2 a
Because n 5 10, there are n 5 n 2 1 5 9 degrees of
freedom. If we want a 95% confidence interval, then a 5
0.05, ta/2,n 5 2.26, and the allowance a 5 2.26 3 4.81 5
10.9. Therefore, the 95% confidence interval is
but note that the randomness resides in the parameter
estimate y, not in the actual parameter µ. In this form,
[219.1, 12.7]
9 References 2, 9, 42, and 48 discuss the calculation of confidence
intervals for other population parameters.
10 Moses (Ref. 42, p. 113–117) illustrates further the concept of a
confidence interval by using empirical examples.
In other words, we can declare, with 95% confidence,
that the population mean is included in the interval
[219.1, 12.7].
Bear in mind that a single confidence interval either
does or does not include the value of the population
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
in the calculation of a confidence interval for the
population mean µ.9
Confidence intervals. When we construct a confidence
interval for the population mean, we assign numerical
bounds to the expected discrepancy between the sample mean y and the population mean µ. In essence, a
confidence interval is a range that we expect, with some
level of confidence, to include the actual value of the
population mean. Below, we use the theoretical distribution of the sample mean to derive the confidence
interval for the population mean µ.10
In the theoretical distribution of the sample mean,
100(1 2 a)% of the possible sample means is included
in the interval
781
INVITED REVIEW
Fig. 4. Estimating uncertainty about a population parameter: 95% confidence intervals for a population
mean. These confidence intervals are for 100 samples of
10 observations drawn at random from population 1 in
Fig. 1. It is because of the random sampling that the
position and length of the confidence interval vary from
sample to sample. About 95 of these intervals—the
actual number will vary—are expected to cover the
population mean of 215 mmHg. In this example, 98 of
the confidence intervals cover the population mean µ;
the 2 exceptions are highlighted (heavy black lines
numbered 1 and 2).
STATISTICAL AND SCIENTIFIC SIGNIFICANCE DIFFER
Hypothesis testing, as the primary scientific use of
statistics, has a drawback: the result of a hypothesis
test conveys mere statistical significance. In contrast,
estimation conveys scientific significance.11 This distinction is obvious if we use the results of a recent clinical
trial. In this trial, the Systolic Hypertension in the
Elderly Program (SHEP) Cooperative Research Group
11 The word ‘‘significance,’’ when used to refer to scientific consequence, is ambiguous. Hereafter, we use the word ‘‘importance.’’
(45) evaluated the impact of antihypertensive drugs on
the incidence of stroke in persons with isolated systolic
hypertension. When compared with placebo, these drugs
reduced by 36% (P 5 0.0003) the incidence of stroke.
Associated with this reduced incidence of stroke was a
greater decrease in systolic blood pressure.
To appreciate the distinction between statistical significance and scientific importance, consider two populations that represent the theoretical distributions of
the decreases in systolic blood pressure for the two
groups. Let the decrease in systolic blood pressure of
the placebo group be designated Y1 and that of the drug
treatment group be designated Y2. Assume that Y1 and
Y2 are distributed normally
Y1 , N (µ1, s21 )
and Y2 , N (µ2, s22 )
The normal probability density function (Eq. 1), in
which approximate values for the observed sample
means and variances from the SHEP trial, yi and s2i , are
substituted for the population means and variances,
generates the population distributions depicted in Fig. 5
y1 5 215 ⇒ µ1, s 21 5 400 ⇒ s21
and y2 5 225 ⇒ µ2, s 22 5 400 ⇒ s22
Suppose our objective is to estimate the difference
between population means
µ2 2 µ1 5 225 2 (215) 5 210 mmHg
The SHEP group established convincingly that the
difference µ2 2 µ1, which represents the greater decrease in systolic blood pressure after drug therapy,
was important. To estimate µ2 2 µ1, we would sample at
random from each population: the difference between
sample means, y2 2 y1, estimates the difference between population means, µ2 2 µ1.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
parameter; in experimental situations, we are uncertain about which of these outcomes has occurred.
Instead, the level of confidence in a confidence interval
is based on the concept of drawing a large number of
samples, each with n observations, from the population. When we measured the change in systolic blood
pressure in 100 random samples, we obtained 100
different sample means and 100 different sample standard deviations. As a consequence, we will calculate
100 different 100(1 2 a)% confidence intervals; we expect ,100(1 2 a)% of these observed confidence intervals to include the actual value of the population mean
(see Fig. 4).
A confidence interval characterizes the uncertainty
about the estimated value of a population parameter.
Sometimes, an investigator may be interested less in
the value of the population parameter and more in the
distribution of individual observations. A tolerance
interval characterizes the uncertainty about the estimated distribution of those individual observations
(see APPENDIX).
Next, we illustrate the distinction between statistical
significance and scientific importance. Last, we show
that the numerical results of statistical analyses have
limitations.
782
INVITED REVIEW
Fig. 5. Statistical and scientific significance differ: placebo (black) and drug-treatment (gray) populations. The
populations represent theoretical distributions of
changes in systolic blood pressure during year 5 of the
Systolic Hypertension in the Elderly Program clinical
trial (see Ref. 45). The distributions are described by the
normal probability density function (Eq. 1) in which the
sample means and variances, yi and s2i , are substituted
for the population means and variances. To generate
samples of size n from each population, observations
(Obs) were drawn at random from the placebo population; corresponding observations from the drug-treatment population were obtained by subtracting 10 from
each placebo observation. The sampling procedure is
illustrated for n 5 2.
Table 2. Statistical and scientific significance differ:
statistical results
n
y2 2 y1
SE 5 y2 2 y1 6
95% Confidence
Interval*
t†
Pr 5µ2 2 µ1 5 06‡
2
4
8
10
15
20
25
32
64
128
210
210
210
210
210
210
210
210
210
210
23.1
22.6
12.3
10.1
7.3
5.8
5.3
4.7
3.7
2.4
2110 to 190
265 to 145
236 to 116
231 to 111
225 to 15
222 to 12
221 to 11
219 to 21
217 to 23
215 to 25
20.43
20.44
20.81
20.99
21.38
21.74
21.88
22.13
22.70
24.25
0.71
0.67
0.43
0.34
0.18
0.09
0.07
0.04
,0.01
,0.001
n, Sample size drawn from placebo ( population 1) and drug
treatment (population 2) populations (see Fig. 5). * Confidence interval for the difference between population means, µ2 2 µ1 (see Eq. A2).
† Test statistic used to evaluate statistical significance of the difference y2 2 y1 (see Eq. A3). ‡ Probability (2-tailed) that µ2 2 µ1 5 0; this
is the significance level P for the null hypothesis H0 : µ2 2 µ1 5 0. The
difference y2 2 y1 and the 95% confidence interval for the difference
µ2 2 µ1 reflect the magnitude and uncertainty of the experimental
results. The test statistic t and its associated P value reflect statistical significance. An increase in the no. of observations drawn from
each population decreases SE 5 y2 2 y1 6: as a consequence, the statistical significance increases (irregularly, because of random sampling),
but the estimated difference between population means remains
constant at y2 2 y1 5 210. The APPENDIX details the statistical
equations required to perform this sampling exercise.
scientific importance can be maintained by routinely
addressing two questions: how likely is it that the
experimental effect is real, and is the experimental
effect large enough to be relevant? The first question
can be answered simply: compare the P value, obtained
in the hypothesis test, with the critical significance
level a, chosen before any data are collected; if P , a,
then the experimental effect is likely to be real. The
second question can be answered in two steps: calculate
a confidence interval for the population parameter, and
then assess the numerical bounds of that confidence
interval for scientific importance; if either bound of the
confidence interval is important from a scientific perspective, then the experimental effect may be large
enough to be relevant.
Consider the results when 15 sample observations
were drawn from the placebo and drug treatment
populations: when compared with placebo, the greater
decrease in systolic blood pressure after drug therapy
was unconvincing from a statistical perspective
(P 5 0.18). Because the 95% confidence interval was
[225, 15], uncertainty about the actual impact of drug
treatment on systolic blood pressure is relatively large.
Note, however, that the additional decrease in systolic
blood pressure gained by drug treatment may have
been as pronounced as 25 mmHg. From a scientific
perspective, further studies, designed with greater
statistical power, are warranted.
To illustrate that a significant statistical result may
have little scientific importance, imagine that systolic
blood pressure had been measured in mmH2O rather
than in mmHg. Consider the results when 128 sample
observations were drawn from the two populations: the
greater decrease in systolic blood pressure after drug
therapy was compelling from a statistical perspective
(P , 0.001). If the confidence interval [215, 25] is
expressed in mmHg (by dividing each bound by 13.6),
then the investigator can declare, with 95% confidence,
that the magnitude of the greater decrease in systolic
blood pressure was 0.4–1.1 mmHg. In this example, the
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
By drawing samples of 2–128 observations from each
population (Table 2) and by forcing y2 2 y1 5 210 (see
Fig. 5), the distinction between statistical significance
and scientific importance becomes clear. As sample size
n grows, the statistical significance increases, from P 5
0.71 for n 5 2 to P , 0.001 for n 5 128. Regardless of
sample size, one aspect of scientific importance, that
reflected by the difference y2 2 y1, remains constant. As
sample size increases, uncertainty about the actual
difference µ2 2 µ1, another aspect of scientific importance characterized by the numerical bounds of the
confidence interval, decreases.
Practical considerations. In experimental situations,
the distinction between statistical significance and
783
INVITED REVIEW
investigator can be quite certain of a trivial experimental effect.
Whatever the statistical result of a hypothesis test,
assessment of the corresponding confidence interval
incorporates the scientific importance of the experimental result.
LIMITATIONS OF STATISTICS
Fig. 6. Limitations of statistics: solar radiation and New York stock
market prices during 1929 (after Ref. 27). In general, increases in
stock prices were associated with decreases in solar radiation. This
nonsensical association illustrates the phenomenon of spurious correlation.
Drug A
x
10
8
13
9
11
14
6
4
12
7
5
y
Drug B
x
y
For each drug:
8.04 10 9.14 No. of observations (n) 5 11
6.95 8 8.14 Average of x values (x) 5 9.0
7.58 13 8.74 Average of y values (y) 5 7.5
8.81 9 8.77 Equation of regression line: ŷ 5 3 1 0.5x
8.33 11 9.26 Standard error of estimate of slope
[SE5b16] 5 0.118
9.96 14 8.10 t5H0 : slope (b1 ) 5 06 5 4.24
7.24 6 6.13 Sum of squares of x values [S(x 2 x)2] 5 110.0
4.26 4 3.10 Regression sum of squares 5 27.50
10.84 12 9.13 Residual sum of squares 5 13.75
4.82 7 7.26 Correlation coefficient (r) 5 0.82
5.68 5 4.74 %Total sum of squares explained by regression
(R 2 ) 5 67%
For drugs A and B, values are administered drug concentration x,
measured increase in neurotransmitter production y, and statistics
from regression analysis of the first-order model Y 5 b0 1 b1X 1 e,
where e is error. Additional regression analyses (23) reveal that this
model is inappropriate for drug B (see Fig. 7). Data are from
Anscombe (6).
statistical technique are to be verified. For examples in
regression, see chapt. 3 in Ref. 23.
SUMMARY
It is depressing to find how much good biological work is
in danger of being wasted through incompetent and
misleading analysis . . .
Frank Yates and Michael J. R. Healy (1964)
This scathing remark, written almost 35 years ago
(50) but relevant even now (4), reflects the frustrations
felt by statisticians over the statistical misconceptions
held by scientists. These misconceptions exist in large
part because of shortcomings in the cursory statistics
education we received in graduate or medical school (4,
11, 12). The major defect in most introductory courses
in statistics is that fundamental concepts in statistics,
the cornerstone of scientific inquiry (47), are neglected
rather than emphasized (4, 7, 17, 44, 50). Statisticians
share responsibility with other faculty for ensuring
Fig. 7. Limitations of statistics: scatterplots of drug concentration x
and increase in neurotransmitter production y. For each drug, the
fitted first-order model ŷ 5 3 1 0.5x and corresponding regression
statistics are identical (see Table 3). For only drug A, however, is this
first-order relationship plausible. For drug B, a second-order model of
the form Y 5 b0 1 b1 X 1 b2 X 2 1 e is required.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
Although the process of scientific discovery requires
an understanding of fundamental concepts in statistics, the use of statistics does have limitations. For
example, not many of us would accept, solely on the
basis of a close temporal relationship, that solar radiation governs stock market prices (Fig. 6). The limitations of statistics are more subtle if an association is
plausible.
Imagine this scenario: a neurological syndrome results from impaired production of some neurotransmitter. Drugs A and B, derivatives of the same parent
compound, both stimulate production of this neurotransmitter. Just one of the drugs, however, continues to
increase neurotransmitter production over its entire
therapeutic range. At higher doses, the second drug
becomes less effective at boosting neurotransmitter
production and causes neurotoxicity. For each drug,
Table 3 lists administered drug concentrations and
measured increases in neurotransmitter production. If
you rely on only the regression statistics in Table 3,
which drug is which? If you are unfortunate and
happen to have this hypothetical syndrome, then your
choice assumes added importance.
From the regression statistics alone, it is impossible
to differentiate the drugs. Their identities are plain,
however, when the data are plotted (Fig. 7): drug A
increases neurotransmitter production over the entire
range of drug concentrations; the increase in neurotransmitter production begins to fall at higher concentrations of drug B.
Practical considerations. Data graphics are essential
also if the requisite assumptions behind a particular
Table 3. Limitations of statistics: raw data and
regression statistics
784
INVITED REVIEW
that introductory courses in statistics are relevant and
sound (7, 44, 50).
In this review, we have reiterated the primary role of
statistics within science to be one of estimation: estimation of a population parameter or estimation of the
uncertainty about the value of that parameter. Moreover, we have demonstrated the essential distinction
between statistical significance and scientific importance; of the two, scientific importance merits more
consideration. We have shown also that without data
graphics, data analysis is a game of chance. And last,
that this review was written by a physiologist and two
statisticians embodies one of the most basic notions in
all science: collaboration.
APPENDIX
g( y) 5
1
yjÎ2p
· exp 52ln2 ( y/e t )/(2 · j2 )6, for y . 0
(A1)
The mean µg and variance s2g of the lognormal distribution
specified by Eq. A1 are
µg 5 e t1(j
2
/2)
and
s2g
2
2
5 e 2t1j · (e j 2 1)
j2
For the distribution in Fig. 2, t 5 1.803 and
5 1; therefore, µg 5 10 and s2g 5 172.
A family of data transformations. Box and Cox (14) have
described a family of power transformations in which an
observed variable y is transformed into the variable w by
using the parameter l
w5
5
( y l 2 1)/l
for l Þ 0, and
ln y
for l 5 0
The inverse (l 5 21) and square root transformations
(l 5 0.5) are members of this family. Draper and Smith (Ref.
23, p. 225–226) summarize the steps required to estimate the
parameter l so that the distribution of w is as normal
(Gaussian) as possible.
Theoretical distribution of the sample mean. Suppose some
random variable X is distributed normally with mean µ and
variance s2: that is, X , N(µ, s2 ). When a sample of n
independent observations, x1, x2, . . . , xn, is drawn repeatedly
from this distribution, the observed sample means can be
treated as observations. These sample means will be distributed normally with mean µ and variance s2/n, or
Ave 5x6 5 µ
and
Var 5x6 5 s2x 5 s2/n
As you might expect, there is a mathematical foundation to
these relationships.
L 5 k1X11 k2X2 1 · · · 1 kmXm
For i 5 1, 2, . . . , m, each ki is a real constant, and each Xi ,
N(µi, s2i ). The mean of L, Ave 5L6, is
m
Ave 5L 6 5 k1µ1 1 k2µ2 1 · · · 1 kmµm 5
okµ
i i
i51
If X1, X2, . . . , Xm are mutually independent, then the variance of L, Var 5L6, is
m
Var 5L 6 5 k 21s21 1 k 22s22 1 · · · 1 k 2ms2m 5
ok s
i51
2 2
i i
If the function L is x, the mean of the n sample observations
x1, x2, . . . , xn, then m 5 n, and furthermore, for i 5 1, 2, . . . , n
ki 5 1/n and Xi , N (µ, s2 )
Therefore
n
Ave 5L 6 5
n
o k µ 5 o µ/n 5 n · (µ/n) 5 µ 5 Ave 5x6
i i
i51
i51
and
n
Var 5L 6 5
o
i51
n
k 2i s2i 5
o s /n
2
2
5 n · (s2/n 2 ) 5 s2/n 5 Var 5x6
i51
Tolerance intervals. A tolerance interval identifies the
bounds that are expected to contain some percentage of a
population, not just a single population parameter such as
the mean (41). If a normal distribution has mean µ and
variance s2, which are known, then the 100w% tolerance
interval is
[µ 2 5z(12w)/2 · s6, µ 1 5z(12w)/2 · s6]
where z(12w)/2 is the 100[1 2 5(1 2 w)/26]th percentile from the
standard normal distribution, i.e., N(0, 1). This tolerance
interval covers exactly 100w% of the distribution. If w 5 0.95,
then z(12w)/2 5 1.96. For the population that represented the
change in systolic blood pressure after some intervention (see
USING SAMPLES TO LEARN ABOUT POPULATIONS), µ 5 215 and
s 5 20; therefore, the exact 95% tolerance interval is
[254, 124]
In practice, the sample statistics y and s are used to
estimate the population parameters µ and s. This element of
uncertainty about the values of µ and s is handled by
replacing z(12w)/2 with the confidence coefficient k, where k
depends on w as well as the sample size n. Therefore, the
estimated 100w% tolerance interval is
[ y 2 ks, y 1 ks]
[If w 5 0.95 and n 5 `, then k 5 z(12w)/2 5 1.96 as above, when
µ and s were known.] The coefficient k is chosen to enable the
declaration, with 100(1 2 a)% confidence, that the estimated
tolerance interval covers 100w% of the distribution (see Table
XIV in Ref. 41).
For the observations listed in USING SAMPLES TO LEARN
ABOUT POPULATIONS, y 5 28.2 and s 5 15.2. Suppose we want
to estimate with 95% confidence a 90% tolerance interval
based on these results. When we use these percentages and
the sample size of 10, the coefficient k 5 2.839. Therefore, the
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
This APPENDIX reviews the lognormal distribution (a distribution that reveals limitations of the standard deviation as
an estimate of variability), a versatile family of data transformations, the theoretical distribution of the sample mean,
tolerance intervals, the statistical equations required to
perform the significance sampling exercise, and the confidence interval for the difference between two population
means.
Lognormal distribution. The lognormal distribution is a
common probability distribution model for skewed data. The
random variable Y is distributed lognormally if the logarithm
of Y is distributed normally with mean t and variance j2, or
ln Y , N(t, j2 ). Formally, the lognormal probability density
function g is
Consider the linear function L
785
INVITED REVIEW
tolerance interval is
[251, 135]
In other words, we can declare, with 95% confidence, that 90%
of persons will have a change in systolic blood pressure of
between 251 and 135 mmHg after the intervention. Note
that this statement differs markedly from our previous
assertion, made also with 95% confidence, that the population
mean µ was included in the interval [219.1, 12.7].
The tolerance intervals outlined above are appropriate
only if the distribution of the underlying population is
normal; other formulas exist to construct tolerance intervals
when the population is distributed nonnormally.
Equations for the significance sampling exercise. For two
samples of equal size n, the standard error of the difference
between sample means, SE 5 y2 2 y1 6, is estimated as
SE 5 y2 2 y1 6 5
Î(s
2
2
1 s 21 )/n
[5( y2 2 y1 ) 2 a6, 5( y2 2 y1 ) 1 a6]
(A2)
The allowance a applied to the difference y2 2 y1 is
a 5 ta/2,n · SE 5 y2 2 y1 6
where ta/2,n is the 100[1 2 (a/2)]th percentile from a Student t
distribution with n 5 2n 2 2 degrees of freedom. In this
sampling exercise, we use the t distribution because we
assume the standard deviations of the populations are unknown (42).
The test statistic used to evaluate statistical significance of
the difference y2 2 y1 is
t 5H0 : µ2 2 µ1 5 06 5
( y2 2 y1 ) 2 0
SE 5 y2 2 y1 6
(A3)
Confidence interval for the difference between population
means. In the significance sampling exercise (see STATISTICAL
AND SCIENTIFIC SIGNIFICANCE DIFFER), we calculated a confidence interval for the difference between two population
means. Rather than construct a confidence interval for this
difference, a researcher could construct a confidence interval
for each population mean: if the two confidence intervals fail
to overlap, the researcher would conclude that the population
means differ. This approach is conservative.
Consider the results when 32 sample observations were
drawn from the placebo and drug treatment populations:
when compared with placebo, drug therapy was associated
with a greater decrease in systolic blood pressure (P 5 0.04),
and the 95% confidence interval for the difference between
population means was [219, 21]. That this confidence interval excludes 0 corroborates that the population means differ
at the a 5 0.05 level.
The observed sample means for the placebo and drug
treatment groups were
y1 5 29.9
and y2 5 219.9
the standard errors of these sample means were
SE 5 y1 6 5 SE 5 y2 6 5 3.32
Because n 5 32, each sample has n 5 n 2 1 5 31 degrees of
freedom.
[217, 23]
the 95% confidence interval for the mean of the drug treatment population is
[227, 213]
Because these individual confidence intervals overlap, we
might conclude that there is insufficient evidence to declare
that the two population means differ. In this example, 86%
confidence intervals for the population means would just fail
to overlap: that is, we could declare that the population
means differ at the P 5 0.14 level. Note that when we
calculate a confidence interval for the difference between
these population means, we are more confident that an actual
difference exists.
We thank Brenda B. Rauner, Publications Manager and Executive
Editor, APS Publications, for providing the information about research manuscripts published by the American Physiological Society.
This review was supported in part by the Dept. of Pediatrics (M.
Douglas Jones, Jr., Chair); by a Grant-in-Aid from the American
Heart Association of Colorado and Wyoming (to D. Curran-Everett);
and by the National Science Foundation Grant DMS 95-10435 (to K.
Kafadar).
Address for reprint requests: D. Curran-Everett, Dept. of Pediatrics, B-195, Univ. of Colorado Health Sciences Center, 4200 East 9th
Ave., Denver, CO 80262 (E-mail: [email protected]).
REFERENCES
1. Altman, D. G. Misuse of statistics is unethical. In: Statistics in
Practice, edited by S. M. Gore and D. G. Altman. London: Br.
Med. Assoc., 1982, p. 1–2.
2. Altman, D. G. Practical Statistics for Medical Research. New
York: Chapman & Hall, 1991.
3. Altman, D. G. Statistics in medical journals: developments in
the 1980s. Stat. Med. 10: 1897–1913, 1991.
4. Altman, D. G., and J. M. Bland. Improving doctors’ understanding of statistics. J. R. Stat. Soc. Ser. A 154: 223–267, 1991.
5. Altman, D. G., S. M. Gore, M. J. Gardner, and S. J. Pocock.
Statistical guidelines for contributors to medical journals. Br.
Med. J. 286: 1489–1493, 1983.
6. Anscombe, F. J. Graphs in statistical analysis. Am. Statistician
27: 17–21, 1973.
7. Appleton, D. R. What statistics should we teach medical
undergraduates and graduates? Stat. Med. 9: 1013–1021, 1990.
8. Arbuthnot, J. Of the Laws of Chance. London: Benj. Motte,
1692.
9. Armitage, P., and G. Berry. Statistical Methods in Medical
Research (3rd ed.). Cambridge, MA: Blackwell Scientific, 1994.
10. Bailar, J. C., III, and F. Mosteller. Guidelines for statistical
reporting in articles for medical journals. Ann. Intern. Med. 108:
266–273, 1988.
11. Bland, J. M., and D. G. Altman. Caveat doctor: a grim tale of
medical statistics textbooks. Br. Med. J. 295: 979, 1987.
12. Bland, J. M., and D. G. Altman. Misleading statistics: errors in
textbooks, software and manuals. Int. J. Epidemiol. 17: 245–247,
1988.
13. Boring, E. G. Mathematical vs. scientific significance. Psychol.
Bull. 16: 335–338, 1919.
14. Box, G. E. P., and D. R. Cox. An analysis of transformations. J.
R. Stat. Soc. Ser. B 26: 211–243, 1964.
15. Burkholder, D. L., and J. Pfanzagl. Estimation. In: International Encyclopedia of the Social Sciences, edited by D. L. Sills.
New York: Macmillan & The Free Press, 1968, vol. 5, p. 142–157.
16. Burnand, B., W. N. Kernan, and A. R. Feinstein. Indexes and
boundaries for ‘‘quantitative significance’’ in statistical decisions.
J. Clin. Epidemiol. 43: 1273–1284, 1990.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
where s 2j is sample variance.
The 100(1 2 a)% confidence interval for µ2 2 µ1, the
difference between population means, is
If we want a 95% confidence interval for each population
mean (Eq. 6), then a 5 0.05, ta/2,n 5 2.04, and the allowance
a 5 2.04 3 3.32 5 6.8. Therefore, the 95% confidence interval
for the mean of the placebo population is
786
INVITED REVIEW
35. Hogg, R. V., and A. T. Craig. Introduction to Mathematical
Statistics (4th ed.). New York: Macmillan, 1978.
36. Iberall, A. S. The problem of low-dose radiation toxicity. Am. J.
Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R7–R13,
1983.
37. Jackson, T. E. Comparison of a class of regression equations.
Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol. 15):
R271–R276, 1984.
38. Kruskal, W. H. Tests of significance. In: International Encyclopedia of the Social Sciences, edited by D. L. Sills. New York:
Macmillan & The Free Press, 1968, vol. 14, p. 238–250.
39. Land, T. A., and M. Secic. How to Report Statistics in Medicine.
Philadelphia, PA: Am. College Physicians, 1997.
40. Landaw, E. M., and J. J. DiStefano III. Multiexponential,
multicompartmental, and noncompartmental modeling. II. Data
analysis and statistical considerations. Am. J. Physiol. 246
(Regulatory Integrative Comp. Physiol. 15): R665–R677, 1984.
41. Montgomery, D. C., and G. C. Runger. Applied Statistics and
Probability for Engineers. New York: Wiley, 1994, p. 361–363.
42. Moses, L. E. Think and Explain with Statistics. Reading, MA:
Addison-Wesley, 1986.
43. Mosteller, F., and J. W. Tukey. Data Analysis and Regression.
Reading, MA: Addison-Wesley, 1977.
44. Murray, G. D. How we should approach the future. Stat. Med. 9:
1063–1068, 1990.
45. SHEP Cooperative Research Group. Prevention of stroke by
antihypertensive drug treatment in older persons with isolated
systolic hypertension. Final results of the systolic hypertension
in the elderly program (SHEP). JAMA 265: 3255–3264, 1991.
46. Slinker, B. K., and S. A. Glantz. Multiple regression for
physiological data analysis: the problem of multicollinearity. Am.
J. Physiol. 249 (Regulatory Integrative Comp. Physiol. 18):
R1–R12, 1985.
47. Snedecor, G. W. The statistical part of the scientific method.
Ann. NY Acad. Sci. 52: 792–799, 1950.
48. Snedecor, G. W., and W. G. Cochran. Statistical Methods (7th
ed.). Ames: Iowa State Univ. Press, 1980.
49. Tuininga, Y. S., D. J. van Veldhuisen, J. Brouwer, J.
Haaksma, H. J. G. M. Crijns, A. J. Man in’t Veld, and K. I.
Lie. Heart rate variability in left ventricular dysfunction and
heart failure: effects and implications of drug treatment. Br.
Heart J. 72: 509–513, 1994.
50. Yates, F., and M. J. R. Healy. How should we reform the
teaching of statistics? J. R. Stat. Soc. Ser. A 127: 199–210, 1964.
51. Yates, F. E. Contribution of statistics to ethics of science. Am. J.
Physiol. 244 (Regulatory Integrative Comp. Physiol. 13): R3–R5,
1983.
Downloaded from http://jap.physiology.org/ by 10.220.32.247 on June 12, 2017
17. Colditz, G. A., and J. D. Emerson. The statistical content of
published medical research: some implications for biomedical
education. Med. Educ. 19: 248–255, 1985.
18. Colton, T. Statistics in Medicine. Boston, MA: Little, Brown,
1974.
19. Cox, D. R. Statistical significance tests. Br. J. Clin. Pharmacol.
14: 325–331, 1982.
20. Denenberg, V. H. Some statistical and experimental considerations in the use of the analysis-of-variance procedure. Am. J.
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R403–
R408, 1984.
21. Denham, M. J., A. Foster, and D. A. J. Tyrrell. Work of a
district ethical committee. Br. Med. J. 2: 1042–1045, 1979.
22. DiStefano, J. J., III, and E. M. Landaw. Multiexponential,
multicompartmental, and noncompartmental modeling. I. Methodological limitations and physiological interpretations. Am. J.
Physiol. 246 (Regulatory Integrative Comp. Physiol. 15): R651–
R664, 1984.
23. Draper, N. R., and H. Smith. Applied Regression Analysis (2nd
ed.). New York: Wiley, 1981.
24. Evans, S. J. W., P. Mills, and J. Dawson. The end of the p
value? Br. Heart J. 60: 177–180, 1988.
25. Fisher, R. A. Statistical Methods and Scientific Inference (3rd
ed.). New York: Hafner, 1973.
26. Flynn, F. V., K. A. J. Piper, P. Garcia-Webb, K. McPherson,
and M. J. R. Healy. The frequency distributions of commonly
determined blood constituents in healthy blood donors. Clin.
Chim. Acta 52: 163–171, 1974.
27. Garcia-Mata, C., and F. I. Shaffner. Solar and economic
relationships: a preliminary report. Q. J. Economics 49: 1–51,
1934.
28. Gardner, M. J., and D. G. Altman. Confidence intervals rather
than P values: estimation rather than hypothesis testing. Br.
Med. J. 292: 746–750, 1986.
29. Garfinkel, D., and K. A. Fegley. Fitting physiological models to
data. Am. J. Physiol. 246 (Regulatory Integrative Comp. Physiol.
15): R641–R650, 1984.
30. Gray, B. H., R. A. Cooke, and A. S. Tannenbaum. Research
involving human subjects. Science 201: 1094–1101, 1978.
31. Healy, M. J. R. Significance tests. Arch. Dis. Child. 66: 1457–
1458, 1991.
32. Healy, M. J. R. Data transformations. Arch. Dis. Child. 69:
260–264, 1993.
33. Hill, A. B. Principles of medical statistics. XII—Common fallacies and difficulties. Lancet i: 706–708, 1937.
34. Hofacker, C. F. Abuse of statistical packages: the case of the
general linear model. Am. J. Physiol. 245 (Regulatory Integrative
Comp. Physiol. 14): R299–R302, 1983.