Download Robust nonparametric statistical methods.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Regression toward the mean wikipedia , lookup

Bias of an estimator wikipedia , lookup

Forecasting wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Robust statistics wikipedia , lookup

Transcript
ROBUST STATISTICAL
METHODS:
A VIABLE ALTERNATIVE?
BY
PROF. B.A. OYEJOLA
Department of Statistics
University of Ilorin
INTRODUCTION
 Classic parametric tests produce accurate results when
assumptions underlying them are sufficiently satisfied.
 Violation lead to:
- inaccurate calculation of p values,
- increased risk of falsely rejecting the null hypothesis
(i.e., concluding that real effects exist when they do
not),
- loss in power to detect genuine effects,
- common measures of effect size (e.g., Cohen’s d) and
confidence intervals may be inaccurately estimated,
- errors in the interpretation of data.
See Kezelman et al., 1998; Leech & Onwuegbuzie, 2002; Wilcox,
2001; Zimmerman, 1998).
PROBLEMS WITH CLASSIC PARAMETRIC
METHODS
Classic parametric methods are based on certain
assumptions.
Data being analyzed are normally distributed.
“Normality is a myth, there never was and never
will be a normal distributions”
few distributions remotely resemble the normal
curve. Instead, the distributions are frequently
multimodal, skewed, and heavy tailed. Studies
have indicated that real data are more likely to
resemble the exponential.
PROBLEMS WITH CLASSIC PARAMETRIC
METHODS
 Equal population variances - homogeneity of
variance, or homoscedasticity.
 When classic parametric tests are used to analyze nonnormal or heteroscedastic data, the true risk of making
a Type I error may be much higher (or lower) than the
obtained p value.
 Equal sample sizes do not always offer protection
against inflated Type I error when variances are
heterogeneous (Harwell, Rubinstein, Hayes, & Olds,
1992).
 Probability of a Type I error when testing at α =0 .05
can exceed 50% when data are non normal and
heteroscedastic (Wilcox, 2003).
Definition
The term robust statistics refers to procedures
that are able to maintain the Type I error rate of
a test at its nominal level and also maintain the
power of the test, even when data are non
normal and heteroscedastic.
WHY ARE MODERN METHODS
UNDERUSED?
Lack of Familiarity With Modern Methods
• This is largely due to lack of exposure to the new
methods. The field of statistics has progressed
markedly since 1960, yet most researchers and
many statisticians rely on outdated methods.
Assumption Testing Issues
 Researchers frequently fail to check whether the
data they are analyzing meet the assumptions
underlying classic parametric tests
This may be due to forgetfulness or not knowing
how to check assumptions.
 Statistical assumption tests built into software
such as SPSS often do a poor job of detecting
violations from normality and homoscedasticity
E.g. Levene’s test often used to test the
homoscedasticity assumption can yield a p value
greater than α, even when variances are unequal to
a degree that could significantly affect the results
of a classic parametric test.
 Another problem is that assumption tests have
their own assumptions. Normality tests usually
assume that data are homoscedastic while tests
of homoscedasticity assume that data are
normally distributed.
The Robustness Argument
Researchers often claim that classic parametric
tests are robust (i.e tests maintain rates of Type I
error close to the nominal level).
Note: Robust statistics control Type I error and also
maintain adequate statistical power.
Even if researchers insist that classic parametric
tests are robust, this does not preclude the use of
alternate procedures.
Modern methods are also robust and more
powerful when data are not normally distributed
and/or heteroscedastic.
Transformations
The use of transformations is problematic for
several reasons, including:
• transformations often fail to restore normality
and homoscedasticity;
• they do not deal with outliers;
• they can reduce power;
• they sometimes rearrange the order of the
means from what they were originally; and
• they make the interpretation of results
difficult
Nonparametric Statistics
• Classic nonparametric are not robust when
used to analyze heteroscedastic data
• Classic nonparametric tests are only
appropriate for analyzing simple, one-way
layouts
Modern robust methods (which include modern
nonparametric procedures) are not susceptible
to these limitations.
Misconceptions About Modern Methods
• One misconception is that software to perform
modern statistical analyses is not readily
available.
Fortunately, proponents of modern methods have
created special software add-ons that allow
analyses using SPSS and SAS. Furthermore, a vast
array of alternative, free software such as R is
available that can conduct modern analyses.
• Another misconception is that modern methods
sometimes involve trimming or ranking
procedures that discard valuable information.
ROBUST MEASURES OF CENTRAL
TENDENCY
CENTRAL TENDENCY
Trimmed Mean
- An alternative to the median when dealing with distributions
with rare (few) outliers.
Definition: A α trimmed mean is given as:
𝑋𝑖 =
1
(𝑋𝑔+1
𝑛−2𝑔
+ 𝑋𝑔+2 + ⋯ + 𝑋𝑛−2𝑔 )
where 0 ≤ α ≤ 0.5 ; g=αn ; g is rounded up to nearest whole
integer. 𝑋𝑖 are the ordered values.
• The value of α=0.2 is often used because under normality
the trimmed mean performs as well as the mean.
• When α=0.5 then the trimmed mean is the median.
Example:
Given the following data, obtain the 20%
trimmed mean
3, 1, 75, 10, 5, 6, 11, 7, 75, 12.
Solution: Rearrange the data to give
1, 3, 5, 6, 7, 10, 11, 12, 75, 75.
Remove the lowest and highest 20% of the
values to give
5, 6, 7, 10, 11, 12.
The trimmed mean is then 8.5
Winsorized Mean
Procedure:
Given 3, 1, 75, 10, 5, 6, 11, 7, 75, 12.
Reorder the scores from lowest to highest:
1, 3, 5, 6, 7, 10, 11, 12, 75, 75.
For 20% trimming, remove the lowest and highest 20% of scores
 the scores 1, 3, 75, and 75 will be removed,
 Leaving 5, 6, 7, 10, 11, 12.
Next, replace the removed scores in the lower tail of the
distribution by the smallest untrimmed score, and the removed
scores in the upper tail of the distribution by the highest
untrimmed score.
The untrimmed and replaced scores are known as Winsorized
scores
=> Winsorized scores are 5, 5, 5, 6, 7, 10, 11, 12, 12, 12.
ROBUST MEASURES OF VARIATION
Winsorized Variance
Using the Windsorized scores
5, 5, 5, 6, 7, 10, 11, 12, 12, 12 with mean 8.5.
The variance of the Winsorized scores is calculated
using the conventional formula using the
Winsorized scores and Winsorized mean
The Winsorized variance is 90.5.
Note: Any software program that can calculate
variance can also be used to calculate Winsorized
variance.
Mean Absolute Deviation (MAD)
For a data set X1, X2, ..., Xn, the MAD is defined
as the median of the absolute deviations from
the data's median (MADAM)
i.e. MAD = mediani(|Xi – median(Xj)|)
That is, MAD is the median of the absolute
values of the deviations from the median
Trimmed range (IQR)
The IQR is the difference between the 75th
percentile and the 25th percentile of a sample.
This is the same as the 25% trimmed range.
Other trimmed ranges include the interdecile
range (10% trimmed range).
The median/MAD method
Procedure:
Example.
4.2 4.5 4.9 5.2 5.6 6.2 9.9
 Take the median (m=5.2) as a robust estimator of the mean.
 Calculate the differences between the values and the median,
-1.0 -0.7 -0.3 0.0 0.4 1.0 4.7
 Find the median of the absolute differences (MAD).
0.0 0.3 0.4 0.7 1.0 1.0 4.7
i.e. MAD = 0.7.
 Obtain the robust standard deviation estimate by multiplying
the MAD by a factor that happens to have a value close to 1.5.
That is s = 1.05.
Note: if no outliers the method provides estimates that are close
to conventional 𝑥 and s.
Huber’s method
An iterative method called winsorization.
Procedure:
 Assume initial estimates called m0, s0.(median-MAD
estimates, or simply x and s.)
 If a value xi falls above m0 +1.5s0 then we change it to
x’i = m0 + 1.5s0 and if below m0 -1.5s0 then change it to
x’i = m0 - 1.5s0. Otherwise, we let x’i =x
 Calculate an improved estimate of mean as m=
mean(x’i), and of the standard deviation as s’ = 1.134 x
stdev(x’i). (The factor 1.134 is derived from the normal
distribution, given a value 1.5 for the multiplier most
often used in the winsorization process.)
Example:
Using earlier data, assume m0 = 5.2, s0 = 1.05.
• No value is below m0 - 1.5s0. while only 9.9 is the
only value above m0 + 1.5s0.
• This high value is replaced by m0 + 1.5s0 = 6.775
• The dataset becomes
• 4.5 4.9 5.6 4.2 6.2 5.2 6.775
The improved estimates are m1=5.34 and s1 = 1.04 .
• This procedure is now iterated by using the
current improved estimates for the winsorisation
at each cycle.
• Check to see that the values converge to mhub =
5.36, shub = 1.15.
ROBUST REGRESSION
Robust regression methods are designed not to
be unduely affected by :
Heteroscedastic errors
Presence of outliers
Note: In many situations, including some areas of
geostatistics and medical statistics, it is precisely the
outliers that are of interest and cannot be removed the
dataset so we cannot eliminate them.
METHODS FOR ROBUST REGRESSION
Least squares alternatives
The simplest methods of estimating parameters in a
regression model is to use least absolute deviations.
Gross outliers can still have a considerable impact on
the model
M-estimation (by Huber).
- Most Common Robust Method
- The M in M-estimation stands for "maximum
likelihood type". The method is robust to outliers in
the response variable, but not resistant to outliers in
the explanatory variables (leverage points).
METHODS FOR ROBUST REGRESSION (CONTD.)
S-estimation. This method finds a line (plane or
hyperplane) that minimizes a robust estimate of the scale
(from which the method gets the S in its name) of the
residuals. This method is highly resistant to leverage points,
and is robust to outliers in the response. However, this
method was also found to be inefficient.
MM-estimation attempts to retain the robustness
and resistance of S-estimation, whilst gaining the efficiency
of M-estimation. The method proceeds by finding a highly
robust and resistant S-estimate that minimizes an Mestimate of the scale of the residuals (the first M in the
method's name). The estimated scale is then held constant
whilst a close-by M-estimate of the parameters is located
(the second M).
Least trimmed squares (LTS) is a viable
alternative (Rousseeuw and Ryan, 1997,
2008).
Theil-Sen (non-parametric) estimator has
a lower breakdown point than LTS but is
statistically efficient and popular.
M-Estimation
Consider the linear model
yi = α + β1xi1 + β2xi2 + ・ ・ ・ + βkxik + εi = x’iβ + εi
The fitted model is
yi = a + b1xi1 + b2xi2 + ・ ・ ・ + bkxik + ei = x’ib + ei
The general M-estimator minimizes the objective
function
𝑛
𝑛
′
•
ρ(𝑒
)
=
ρ(𝑦
−
𝑥
𝑖
𝑖
𝑖 b)
𝑖=1
𝑖=1
where the function ρ gives the contribution of each
residual to the objective function.
A reasonable ρ should have the following
properties:
• ρ(e) ≥ 0
• ρ(0) = 0
• ρ(e) = ρ(−e)
• ρ(ei) ≥ ρ(e’I’ ) for |ei| > |ei’ |
For example, for least-squares estimation, ρ(ei) =
𝑒𝑖2
Procedure
Let ψ = ρ’ be the derivative of ρ.
Differentiating the objective function with respect
to the coefficients, b, and setting the partial
derivatives to 0, produces a system of k +1
estimating equations for the coefficients:
𝑛
ψ 𝑦𝑖 − 𝑥𝑖′ b 𝑥𝑖′ = 0
𝑖=1
• Define the weight function w(e) = ψ(e)/e, and let
wi = w(ei). Then the estimating equations may be
written as
𝑛
′
′
𝑤
𝑦
−
𝑥
b
𝑥
𝑖
𝑖 =0
𝑖=1 𝑖 𝑖
Solving the estimating equations is a weighted
2 2
least-squares problem, minimizing 𝑤𝑖 𝑒𝑖 . The
weights, however, depend upon the residuals,
the residuals depend upon the estimated
coefficients, and the estimated coefficients
depend upon the weights. An iterative solution
(called iteratively reweighted least-squares,
IRLS) is therefore required:
- Select initial estimates b(0), such as the least-squares
estimates.
- At each iteration t, calculate residuals ei(t−1) and associated
weights wi(t−1) from the previous iteration.
- Solve for new weighted-least-squares estimates
b(t) = [X’W(t-1) X]-1 X’W(t-1) y
where X is the model matrix, with xi’ as its ith row, and W(t−1) =
diag{wi(t-1)} is the current weight matrix.
- Steps 2 and 3 are repeated until the estimated coefficients
converge.
The asymptotic covariance matrix of b is
V(b) ={E(ψ2)/[E(ψ’)]2}(X’X)−1
Using [ψ(ei)]2 to estimate E(ψ2), and [ψ’(ei)/n]2 to estimate
[E(ψ’)]2 produces the estimated asymptotic covariance matrix,
𝑉(b) (which is not reliable in small samples).
MODERN RANK STATISTICS
Some Modern rank-based procedures that are
robust to non normality and/or
heteroscedasticity.
 Rank Transform
Conover and Iman (1981) proposed a simple,
two-step procedure known as the rank
transform (RT).
(a) converting data to ranks and
(b) performing a standard parametric analysis on
the ranked data instead of original scores.
 ANOVA-Type Statistic
An alternative to the RT is the ANOVA-type Statistic
(ATS) (Brunner, Domhof, & Langer, 2002; Brunner &
Puri, 2001; Shah & Madden, 2004). It is also called
the Brunner, Dette, and Munk (BDM) method.
 Wilcoxon analysis (WA).
Evaluates hypotheses analogous to those assessed
by classic parametric methods
 Weighted Wilcoxon techniques (WW)
Modified version of WA which ensures that
analyses are robust to outliers in both the x- and yspaces
BREAKDOWN POINT
The breakdown point of an estimator is the proportion of
incorrect observations (e.g. arbitrarily large observations)
an estimator can handle before giving an incorrect result.
• For example, the Arithmetic Mean as an estimator has a
breakdown point of 0 because we can make 𝑋𝑛 arbitrarily
large just by changing any of the values.
• The higher the breakdown point of an estimator, the
more robust it is. Breakdown point cannot exceed 50%
(0.5)
• For example, the median has a breakdown point of 0.5.
• The X% trimmed mean has breakdown point of X%, for
the chosen level of X.
• Statistics with high breakdown points are sometimes
called resistant statistics.
Empirical influence function
The empirical influence function is a measure of
the dependence of the estimator on the value of
one of the points in the sample.
It is a model-free measure in the sense that it
simply relies on calculating the estimator again
with a different sample.
Influence function and sensitivity curve
Instead of relying solely on the data, we could
use the distribution of the random variables.
What we are now trying to do is to see what
happens to an estimator when we change the
distribution of the data slightly: it assumes a
distribution, and measures sensitivity to change
in this distribution.
By contrast, the empirical influence assumes a
sample set, and measures sensitivity to change
in the samples.
MEASURES OF ROBUSTNESS
The basic tools used to describe and measure
robustness are:
The breakdown point,
The influence function and
The sensitivity curve.
BIBLIOGRAPHY
•
•
•
•
•
•
•
•
•
•
•
Acion, L., Peterson, J. J., Temple, S., & Ardnt, S. (2006). Probabilistic index: An intuitive nonparametric approach to measuring the size of treatment effects. Statistics in Medicine, 25, 591–
602.
Akritas, M. G., Arnold, S. F., & Brunner, E. (1997). Nonparametric hypotheses and rank statistics for
unbalanced factorial designs. Journal of the American Statistical Association, 92, 258–265.
Algina, J., Keselman, H. J., & Penfield, R. D. (2005a). An alternative to Cohen’s standardized mean
difference effect size: A robust parameter and confidence interval in the two independent groups
case. Psychological Methods, 10, 317–328.
Algina, J., Keselman, H., & Penfield, R. (2005b). Effect sizes and their intervals: The two-level
repeated measures case. Educational and Psychological Measurement, 65, 241–258.
Algina, J., Keselman, H. J., & Penfield, R. D. (2006a). Confidence interval coverage for Cohen’s effect
size statistic. Educational and Psychological Measurement, 66, 945–960.
Algina, J., Keselman, H., & Penfield, R. (2006b). Confidence intervals for an effect size when
variances are not equal. Journal of Modern Applied Statistical Methods, 5, 2–13.
American Psychological Association. (2001). Publication manual of the American Psychological
Association (5th ed.). Washington, DC: Author.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31,
144–152.
Bradley, J. V. (1980). Nonrobustness in Z, t, and F tests at large sample sizes. Bulletin of the
Psychonomic Society, 16, 333–336.
Brunner, E., & Puri, M. L. (2001). Nonparametric methods in factorial designs. Statistical Papers, 42,
1–52.
Crimin, K., Abebe, A., & McKean, J. W. (in press). Robust general linear models and graphics via a
user interface. Journal of Modern Applied Statistical Methods. (Available from Ash Abebe at
abebeas@auburn .edu or from Joe McKean at [email protected])
• D’Agostino, R. (1986). Tests for the normal distribution. In R. B. D’Agostino
& M. A. Stephens (Eds.), Goodness-of-fit techniques (pp. 367–420). New
York: Dekker.
• Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and
psychology (3rd ed.). Boston: Allyn & Bacon.
• Grissom, R. J. (1994). Probability of the superior outcome of one
treatment over another. Journal of Applied Psychology, 79, 314–316.
• Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of
Consulting and Clinical Psychology, 68, 155–165.
• Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad
practical approach. Mahwah, NJ: Erlbaum.
• Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992).
Summarizing Monte Carlo results in methodological research: The one
and two-factor fixed effects ANOVA cases. Journal of Educational
Statistics, 17, 315–339.
• Hettmansperger, T. P., & McKean, J. W. (1998). Robust nonparametric
statistical methods. London: Arnold.
• Higgins, J. J. (2004). Introduction to modern nonparametric statistics.
Pacific Grove, CA: Brooks/Cole.
• Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K.
(2008). A generally robust approach for testing hypotheses and
setting confidence intervals for effect sizes. Psychological
Methods, 13, 110–129.
• Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects
and their importance to clinical research and practice. Biological
Psychiatry, 59, 990–996.
• Kromrey, J. D., & Coughlin, K. B. (2007, November). ROBUST_ES:
A SAS macro for computing robust estimates of effect size. Paper
presented at the annual meeting of the SouthEast SAS Users
Group, Hilton Head, SC. Retrieved from
http://analytics.ncsu.edu/sesug/2007/PO19.pdf
• Leech, N. L., & Onwuegbuzie, A. J. (2002, November). A call for
greater use of nonparametric statistics. Paper presented at the
Annual Meeting of the Mid-South Educational Research
Association. Retrieved from
http://www.eric.ed.gov/ERICWebPortal/contentdelivery/servlet/
ERICServlet?accno_ED471346
• Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996).
Consequences of assumption violations revisited: A
quantitative review of alternatives to the one-way
analysis of variance “F” test. Review of Educational
Research, 66, 579–619.
• McGraw, K. O., & Wong, S. P. (1992). A common
language effect size statistic. Psychological Bulletin,
111, 361–365.
• McKean, J. W. (2004). Robust analysis of linear models.
Statistical Science, 19, 562–570.
• Micceri, T. (1989). The unicorn, the normal curve, and
other improbable creatures. Psychological Bulletin,
105, 156–166.
• Miller, J. (1988). A warning about median reaction
time. Journal of Experimental Psychology: Human
Perception and Performance, 14, 539–543.
• Ramsey, P. H. (1980). Exact Type I error rates for robustness of Student’s t
test with unequal variances. Journal of Educational Statistics, 5,337–349.
• Sawilowsky, S. S., & Blair, R. C. (1992). A more realistic look at the
robustness and Type II error properties of the t test to departures from
population normality. Psychological Bulletin, 111, 352–360.
• Serlin, R. C., & Harwell, M. R. (2004). More powerful tests of predictor
subsets in regression analysis under nonnormality. Psychological Methods,
9, 492–509.
• Terpstra, J. T., & McKean, J. W. (2005). Rank-based analyses of linear
models using R. Journal of Statistical Software, 14. Retrieved from
http://www.jstatsoft.org/v14/i07
• Toothaker, L. E., & Newman, D. (1994). Nonparametric competitors to the
two-way ANOVA. Journal of Educational and Behavioral Statistics,19, 237–
273.
• Wilcox, R. R. (1998). How many discoveries have been lost by ignoring
modern statistical methods? American Psychologist, 53, 300–314.
• Wilcox, R. R. (2001). Fundamentals of modern statistical methods. New
York: Springer.
• Wilcox, R. R. (2003). Applying contemporary statistical techniques. San
Diego, CA: Academic Press.
• Wilcox, R. R. (2005). Introduction to robust estimation and
hypothesis testing (2nd ed.). San Diego, CA: Academic Press.
• Wilcox, R. R., Charlin, V. L., & Thompson, K. L. (1986). New
Monte Carlo results on the robustness of the ANOVA f, w and
f statistics. Communications in Statistics: Simulation and
Computation, 15, 933–943.
• Wilcox, R. R., & Keselman, H. J. (2003). Modern robust data
analysis methods: Measures of central tendency.
Psychological Methods, 8,254–274.
• Zimmerman, D. W. (1998). Invalidation of parametric and
nonparametric statistical tests by concurrent violation of two
assumptions. Journal of Experimental Education, 67, 55–68.
• Zimmerman, D. W. (2000). Statistical significance levels of
nonparametric tests biased by heterogeneous variances of
treatment groups. Journal of General Psychology, 127, 354–
364.
THANK YOU FOR LISTENING