* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Robust nonparametric statistical methods.
Linear regression wikipedia , lookup
Regression analysis wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Regression toward the mean wikipedia , lookup
Bias of an estimator wikipedia , lookup
Forecasting wikipedia , lookup
Choice modelling wikipedia , lookup
Least squares wikipedia , lookup
Data assimilation wikipedia , lookup
ROBUST STATISTICAL METHODS: A VIABLE ALTERNATIVE? BY PROF. B.A. OYEJOLA Department of Statistics University of Ilorin INTRODUCTION Classic parametric tests produce accurate results when assumptions underlying them are sufficiently satisfied. Violation lead to: - inaccurate calculation of p values, - increased risk of falsely rejecting the null hypothesis (i.e., concluding that real effects exist when they do not), - loss in power to detect genuine effects, - common measures of effect size (e.g., Cohen’s d) and confidence intervals may be inaccurately estimated, - errors in the interpretation of data. See Kezelman et al., 1998; Leech & Onwuegbuzie, 2002; Wilcox, 2001; Zimmerman, 1998). PROBLEMS WITH CLASSIC PARAMETRIC METHODS Classic parametric methods are based on certain assumptions. Data being analyzed are normally distributed. “Normality is a myth, there never was and never will be a normal distributions” few distributions remotely resemble the normal curve. Instead, the distributions are frequently multimodal, skewed, and heavy tailed. Studies have indicated that real data are more likely to resemble the exponential. PROBLEMS WITH CLASSIC PARAMETRIC METHODS Equal population variances - homogeneity of variance, or homoscedasticity. When classic parametric tests are used to analyze nonnormal or heteroscedastic data, the true risk of making a Type I error may be much higher (or lower) than the obtained p value. Equal sample sizes do not always offer protection against inflated Type I error when variances are heterogeneous (Harwell, Rubinstein, Hayes, & Olds, 1992). Probability of a Type I error when testing at α =0 .05 can exceed 50% when data are non normal and heteroscedastic (Wilcox, 2003). Definition The term robust statistics refers to procedures that are able to maintain the Type I error rate of a test at its nominal level and also maintain the power of the test, even when data are non normal and heteroscedastic. WHY ARE MODERN METHODS UNDERUSED? Lack of Familiarity With Modern Methods • This is largely due to lack of exposure to the new methods. The field of statistics has progressed markedly since 1960, yet most researchers and many statisticians rely on outdated methods. Assumption Testing Issues Researchers frequently fail to check whether the data they are analyzing meet the assumptions underlying classic parametric tests This may be due to forgetfulness or not knowing how to check assumptions. Statistical assumption tests built into software such as SPSS often do a poor job of detecting violations from normality and homoscedasticity E.g. Levene’s test often used to test the homoscedasticity assumption can yield a p value greater than α, even when variances are unequal to a degree that could significantly affect the results of a classic parametric test. Another problem is that assumption tests have their own assumptions. Normality tests usually assume that data are homoscedastic while tests of homoscedasticity assume that data are normally distributed. The Robustness Argument Researchers often claim that classic parametric tests are robust (i.e tests maintain rates of Type I error close to the nominal level). Note: Robust statistics control Type I error and also maintain adequate statistical power. Even if researchers insist that classic parametric tests are robust, this does not preclude the use of alternate procedures. Modern methods are also robust and more powerful when data are not normally distributed and/or heteroscedastic. Transformations The use of transformations is problematic for several reasons, including: • transformations often fail to restore normality and homoscedasticity; • they do not deal with outliers; • they can reduce power; • they sometimes rearrange the order of the means from what they were originally; and • they make the interpretation of results difficult Nonparametric Statistics • Classic nonparametric are not robust when used to analyze heteroscedastic data • Classic nonparametric tests are only appropriate for analyzing simple, one-way layouts Modern robust methods (which include modern nonparametric procedures) are not susceptible to these limitations. Misconceptions About Modern Methods • One misconception is that software to perform modern statistical analyses is not readily available. Fortunately, proponents of modern methods have created special software add-ons that allow analyses using SPSS and SAS. Furthermore, a vast array of alternative, free software such as R is available that can conduct modern analyses. • Another misconception is that modern methods sometimes involve trimming or ranking procedures that discard valuable information. ROBUST MEASURES OF CENTRAL TENDENCY CENTRAL TENDENCY Trimmed Mean - An alternative to the median when dealing with distributions with rare (few) outliers. Definition: A α trimmed mean is given as: 𝑋𝑖 = 1 (𝑋𝑔+1 𝑛−2𝑔 + 𝑋𝑔+2 + ⋯ + 𝑋𝑛−2𝑔 ) where 0 ≤ α ≤ 0.5 ; g=αn ; g is rounded up to nearest whole integer. 𝑋𝑖 are the ordered values. • The value of α=0.2 is often used because under normality the trimmed mean performs as well as the mean. • When α=0.5 then the trimmed mean is the median. Example: Given the following data, obtain the 20% trimmed mean 3, 1, 75, 10, 5, 6, 11, 7, 75, 12. Solution: Rearrange the data to give 1, 3, 5, 6, 7, 10, 11, 12, 75, 75. Remove the lowest and highest 20% of the values to give 5, 6, 7, 10, 11, 12. The trimmed mean is then 8.5 Winsorized Mean Procedure: Given 3, 1, 75, 10, 5, 6, 11, 7, 75, 12. Reorder the scores from lowest to highest: 1, 3, 5, 6, 7, 10, 11, 12, 75, 75. For 20% trimming, remove the lowest and highest 20% of scores the scores 1, 3, 75, and 75 will be removed, Leaving 5, 6, 7, 10, 11, 12. Next, replace the removed scores in the lower tail of the distribution by the smallest untrimmed score, and the removed scores in the upper tail of the distribution by the highest untrimmed score. The untrimmed and replaced scores are known as Winsorized scores => Winsorized scores are 5, 5, 5, 6, 7, 10, 11, 12, 12, 12. ROBUST MEASURES OF VARIATION Winsorized Variance Using the Windsorized scores 5, 5, 5, 6, 7, 10, 11, 12, 12, 12 with mean 8.5. The variance of the Winsorized scores is calculated using the conventional formula using the Winsorized scores and Winsorized mean The Winsorized variance is 90.5. Note: Any software program that can calculate variance can also be used to calculate Winsorized variance. Mean Absolute Deviation (MAD) For a data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median (MADAM) i.e. MAD = mediani(|Xi – median(Xj)|) That is, MAD is the median of the absolute values of the deviations from the median Trimmed range (IQR) The IQR is the difference between the 75th percentile and the 25th percentile of a sample. This is the same as the 25% trimmed range. Other trimmed ranges include the interdecile range (10% trimmed range). The median/MAD method Procedure: Example. 4.2 4.5 4.9 5.2 5.6 6.2 9.9 Take the median (m=5.2) as a robust estimator of the mean. Calculate the differences between the values and the median, -1.0 -0.7 -0.3 0.0 0.4 1.0 4.7 Find the median of the absolute differences (MAD). 0.0 0.3 0.4 0.7 1.0 1.0 4.7 i.e. MAD = 0.7. Obtain the robust standard deviation estimate by multiplying the MAD by a factor that happens to have a value close to 1.5. That is s = 1.05. Note: if no outliers the method provides estimates that are close to conventional 𝑥 and s. Huber’s method An iterative method called winsorization. Procedure: Assume initial estimates called m0, s0.(median-MAD estimates, or simply x and s.) If a value xi falls above m0 +1.5s0 then we change it to x’i = m0 + 1.5s0 and if below m0 -1.5s0 then change it to x’i = m0 - 1.5s0. Otherwise, we let x’i =x Calculate an improved estimate of mean as m= mean(x’i), and of the standard deviation as s’ = 1.134 x stdev(x’i). (The factor 1.134 is derived from the normal distribution, given a value 1.5 for the multiplier most often used in the winsorization process.) Example: Using earlier data, assume m0 = 5.2, s0 = 1.05. • No value is below m0 - 1.5s0. while only 9.9 is the only value above m0 + 1.5s0. • This high value is replaced by m0 + 1.5s0 = 6.775 • The dataset becomes • 4.5 4.9 5.6 4.2 6.2 5.2 6.775 The improved estimates are m1=5.34 and s1 = 1.04 . • This procedure is now iterated by using the current improved estimates for the winsorisation at each cycle. • Check to see that the values converge to mhub = 5.36, shub = 1.15. ROBUST REGRESSION Robust regression methods are designed not to be unduely affected by : Heteroscedastic errors Presence of outliers Note: In many situations, including some areas of geostatistics and medical statistics, it is precisely the outliers that are of interest and cannot be removed the dataset so we cannot eliminate them. METHODS FOR ROBUST REGRESSION Least squares alternatives The simplest methods of estimating parameters in a regression model is to use least absolute deviations. Gross outliers can still have a considerable impact on the model M-estimation (by Huber). - Most Common Robust Method - The M in M-estimation stands for "maximum likelihood type". The method is robust to outliers in the response variable, but not resistant to outliers in the explanatory variables (leverage points). METHODS FOR ROBUST REGRESSION (CONTD.) S-estimation. This method finds a line (plane or hyperplane) that minimizes a robust estimate of the scale (from which the method gets the S in its name) of the residuals. This method is highly resistant to leverage points, and is robust to outliers in the response. However, this method was also found to be inefficient. MM-estimation attempts to retain the robustness and resistance of S-estimation, whilst gaining the efficiency of M-estimation. The method proceeds by finding a highly robust and resistant S-estimate that minimizes an Mestimate of the scale of the residuals (the first M in the method's name). The estimated scale is then held constant whilst a close-by M-estimate of the parameters is located (the second M). Least trimmed squares (LTS) is a viable alternative (Rousseeuw and Ryan, 1997, 2008). Theil-Sen (non-parametric) estimator has a lower breakdown point than LTS but is statistically efficient and popular. M-Estimation Consider the linear model yi = α + β1xi1 + β2xi2 + ・ ・ ・ + βkxik + εi = x’iβ + εi The fitted model is yi = a + b1xi1 + b2xi2 + ・ ・ ・ + bkxik + ei = x’ib + ei The general M-estimator minimizes the objective function 𝑛 𝑛 ′ • ρ(𝑒 ) = ρ(𝑦 − 𝑥 𝑖 𝑖 𝑖 b) 𝑖=1 𝑖=1 where the function ρ gives the contribution of each residual to the objective function. A reasonable ρ should have the following properties: • ρ(e) ≥ 0 • ρ(0) = 0 • ρ(e) = ρ(−e) • ρ(ei) ≥ ρ(e’I’ ) for |ei| > |ei’ | For example, for least-squares estimation, ρ(ei) = 𝑒𝑖2 Procedure Let ψ = ρ’ be the derivative of ρ. Differentiating the objective function with respect to the coefficients, b, and setting the partial derivatives to 0, produces a system of k +1 estimating equations for the coefficients: 𝑛 ψ 𝑦𝑖 − 𝑥𝑖′ b 𝑥𝑖′ = 0 𝑖=1 • Define the weight function w(e) = ψ(e)/e, and let wi = w(ei). Then the estimating equations may be written as 𝑛 ′ ′ 𝑤 𝑦 − 𝑥 b 𝑥 𝑖 𝑖 =0 𝑖=1 𝑖 𝑖 Solving the estimating equations is a weighted 2 2 least-squares problem, minimizing 𝑤𝑖 𝑒𝑖 . The weights, however, depend upon the residuals, the residuals depend upon the estimated coefficients, and the estimated coefficients depend upon the weights. An iterative solution (called iteratively reweighted least-squares, IRLS) is therefore required: - Select initial estimates b(0), such as the least-squares estimates. - At each iteration t, calculate residuals ei(t−1) and associated weights wi(t−1) from the previous iteration. - Solve for new weighted-least-squares estimates b(t) = [X’W(t-1) X]-1 X’W(t-1) y where X is the model matrix, with xi’ as its ith row, and W(t−1) = diag{wi(t-1)} is the current weight matrix. - Steps 2 and 3 are repeated until the estimated coefficients converge. The asymptotic covariance matrix of b is V(b) ={E(ψ2)/[E(ψ’)]2}(X’X)−1 Using [ψ(ei)]2 to estimate E(ψ2), and [ψ’(ei)/n]2 to estimate [E(ψ’)]2 produces the estimated asymptotic covariance matrix, 𝑉(b) (which is not reliable in small samples). MODERN RANK STATISTICS Some Modern rank-based procedures that are robust to non normality and/or heteroscedasticity. Rank Transform Conover and Iman (1981) proposed a simple, two-step procedure known as the rank transform (RT). (a) converting data to ranks and (b) performing a standard parametric analysis on the ranked data instead of original scores. ANOVA-Type Statistic An alternative to the RT is the ANOVA-type Statistic (ATS) (Brunner, Domhof, & Langer, 2002; Brunner & Puri, 2001; Shah & Madden, 2004). It is also called the Brunner, Dette, and Munk (BDM) method. Wilcoxon analysis (WA). Evaluates hypotheses analogous to those assessed by classic parametric methods Weighted Wilcoxon techniques (WW) Modified version of WA which ensures that analyses are robust to outliers in both the x- and yspaces BREAKDOWN POINT The breakdown point of an estimator is the proportion of incorrect observations (e.g. arbitrarily large observations) an estimator can handle before giving an incorrect result. • For example, the Arithmetic Mean as an estimator has a breakdown point of 0 because we can make 𝑋𝑛 arbitrarily large just by changing any of the values. • The higher the breakdown point of an estimator, the more robust it is. Breakdown point cannot exceed 50% (0.5) • For example, the median has a breakdown point of 0.5. • The X% trimmed mean has breakdown point of X%, for the chosen level of X. • Statistics with high breakdown points are sometimes called resistant statistics. Empirical influence function The empirical influence function is a measure of the dependence of the estimator on the value of one of the points in the sample. It is a model-free measure in the sense that it simply relies on calculating the estimator again with a different sample. Influence function and sensitivity curve Instead of relying solely on the data, we could use the distribution of the random variables. What we are now trying to do is to see what happens to an estimator when we change the distribution of the data slightly: it assumes a distribution, and measures sensitivity to change in this distribution. By contrast, the empirical influence assumes a sample set, and measures sensitivity to change in the samples. MEASURES OF ROBUSTNESS The basic tools used to describe and measure robustness are: The breakdown point, The influence function and The sensitivity curve. BIBLIOGRAPHY • • • • • • • • • • • Acion, L., Peterson, J. J., Temple, S., & Ardnt, S. (2006). Probabilistic index: An intuitive nonparametric approach to measuring the size of treatment effects. Statistics in Medicine, 25, 591– 602. Akritas, M. G., Arnold, S. F., & Brunner, E. (1997). Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Journal of the American Statistical Association, 92, 258–265. Algina, J., Keselman, H. J., & Penfield, R. D. (2005a). An alternative to Cohen’s standardized mean difference effect size: A robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 317–328. Algina, J., Keselman, H., & Penfield, R. (2005b). Effect sizes and their intervals: The two-level repeated measures case. Educational and Psychological Measurement, 65, 241–258. Algina, J., Keselman, H. J., & Penfield, R. D. (2006a). Confidence interval coverage for Cohen’s effect size statistic. Educational and Psychological Measurement, 66, 945–960. Algina, J., Keselman, H., & Penfield, R. (2006b). Confidence intervals for an effect size when variances are not equal. Journal of Modern Applied Statistical Methods, 5, 2–13. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152. Bradley, J. V. (1980). Nonrobustness in Z, t, and F tests at large sample sizes. Bulletin of the Psychonomic Society, 16, 333–336. Brunner, E., & Puri, M. L. (2001). Nonparametric methods in factorial designs. Statistical Papers, 42, 1–52. Crimin, K., Abebe, A., & McKean, J. W. (in press). Robust general linear models and graphics via a user interface. Journal of Modern Applied Statistical Methods. (Available from Ash Abebe at abebeas@auburn .edu or from Joe McKean at [email protected]) • D’Agostino, R. (1986). Tests for the normal distribution. In R. B. D’Agostino & M. A. Stephens (Eds.), Goodness-of-fit techniques (pp. 367–420). New York: Dekker. • Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston: Allyn & Bacon. • Grissom, R. J. (1994). Probability of the superior outcome of one treatment over another. Journal of Applied Psychology, 79, 314–316. • Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology, 68, 155–165. • Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum. • Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing Monte Carlo results in methodological research: The one and two-factor fixed effects ANOVA cases. Journal of Educational Statistics, 17, 315–339. • Hettmansperger, T. P., & McKean, J. W. (1998). Robust nonparametric statistical methods. London: Arnold. • Higgins, J. J. (2004). Introduction to modern nonparametric statistics. Pacific Grove, CA: Brooks/Cole. • Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129. • Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biological Psychiatry, 59, 990–996. • Kromrey, J. D., & Coughlin, K. B. (2007, November). ROBUST_ES: A SAS macro for computing robust estimates of effect size. Paper presented at the annual meeting of the SouthEast SAS Users Group, Hilton Head, SC. Retrieved from http://analytics.ncsu.edu/sesug/2007/PO19.pdf • Leech, N. L., & Onwuegbuzie, A. J. (2002, November). A call for greater use of nonparametric statistics. Paper presented at the Annual Meeting of the Mid-South Educational Research Association. Retrieved from http://www.eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ ERICServlet?accno_ED471346 • Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996). Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance “F” test. Review of Educational Research, 66, 579–619. • McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361–365. • McKean, J. W. (2004). Robust analysis of linear models. Statistical Science, 19, 562–570. • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. • Miller, J. (1988). A warning about median reaction time. Journal of Experimental Psychology: Human Perception and Performance, 14, 539–543. • Ramsey, P. H. (1980). Exact Type I error rates for robustness of Student’s t test with unequal variances. Journal of Educational Statistics, 5,337–349. • Sawilowsky, S. S., & Blair, R. C. (1992). A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. Psychological Bulletin, 111, 352–360. • Serlin, R. C., & Harwell, M. R. (2004). More powerful tests of predictor subsets in regression analysis under nonnormality. Psychological Methods, 9, 492–509. • Terpstra, J. T., & McKean, J. W. (2005). Rank-based analyses of linear models using R. Journal of Statistical Software, 14. Retrieved from http://www.jstatsoft.org/v14/i07 • Toothaker, L. E., & Newman, D. (1994). Nonparametric competitors to the two-way ANOVA. Journal of Educational and Behavioral Statistics,19, 237– 273. • Wilcox, R. R. (1998). How many discoveries have been lost by ignoring modern statistical methods? American Psychologist, 53, 300–314. • Wilcox, R. R. (2001). Fundamentals of modern statistical methods. New York: Springer. • Wilcox, R. R. (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press. • Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Academic Press. • Wilcox, R. R., Charlin, V. L., & Thompson, K. L. (1986). New Monte Carlo results on the robustness of the ANOVA f, w and f statistics. Communications in Statistics: Simulation and Computation, 15, 933–943. • Wilcox, R. R., & Keselman, H. J. (2003). Modern robust data analysis methods: Measures of central tendency. Psychological Methods, 8,254–274. • Zimmerman, D. W. (1998). Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. Journal of Experimental Education, 67, 55–68. • Zimmerman, D. W. (2000). Statistical significance levels of nonparametric tests biased by heterogeneous variances of treatment groups. Journal of General Psychology, 127, 354– 364. THANK YOU FOR LISTENING