Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 11 Tests of Comparison Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter Overview • Statistical procedures used to test hypotheses and investigate differences between groups or within groups across time. • Types of data and the differences between parametric and nonparametric statistics. • Concepts of Type I and II error, statistical power, and interaction between independent variables. • Introduction to planned and post-hoc comparisons and analysis of covariance. • The use of t-tests and the link between t-tests and ANOVA. • Introduction to issues of statistical significance, clinical meaningfulness, confidence interval analysis, and effect size provides a context for the critical appraisal of clinical research. • Overview of nonparametric test of comparison and a working example of a Mann– Whitney U test. • All practitioners claiming to practice from an evidence base must also understand the principles of statistics. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Selecting Statistics and Types of Data • Data are categorized as: – Nominal – Ordinal – Interval – Ratio • Nominal simply means “to name.” The assignment of numeric values for analysis of nominal data is arbitrary. • Ordinal data are ordered in a particular and meaningful manner (e.g., numeric pain scales) • Nonparametric statistical methods of comparison are used to analyze nominal data. • Parametric statistics are appropriate for analyzing interval and ratio data under most circumstances. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Interval and Ratio Data • Interval and ratio data permit the calculation and useful understanding of a mean or average value and a standard deviation. • The mode is the most useful measure of central tendency when analyzing nominal data. • The appropriate measure of central tendency is the median, whereas range could be provided as a measure of dispersion for some ordinal scales. • Interval data: “Interval” implies that the differences between points of measure are consistent and meaningful. • Ratio data: Similar to interval data but can yield meaningful ratio values. • The absence of an absolute 0 precludes the calculation of meaningful ratios. • In all other respects, interval and ratio data are similar. Both types of data are analyzed with the same statistical procedures. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Differences Between Nonparametric and Parametric Procedures • Variance and standard deviation are measures of dispersion for interval and ratio data. • Median and range value are reported for ordinal data and the mode for nominal data. • Parametric statistics analyze the distribution of variance, hence the term “analysis of variance (ANOVA).” • Variance is the difference between a score or value and a mean. • Standard deviation is the square root of variance. • Variance from the mean can be used to compare sets of data. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Steps to Complete ANOVA Steps in Preparing and Completing Analysis of Variance 1. Formulate an answerable question that includes identifying independent and dependent variables from a research idea. 2. Write the research question in a null form. Abbreviate the null as “NES = no NES” 3. Collect data. 4. Organize data, and perform analysis of variance. 5. Interpret the results and report findings. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Analysis of Variance • The purpose of most research involving comparisons is to infer the results to the population. • The analysis estimates the probability that the differences found reflect real population differences. • Statistical analysis asks if the data collected provide sufficient evidence to determine a difference in an entire population. • Statistical tests of comparison are used to reject a null hypothesis: two sets of data are drawn from the same population. • Rejecting the null: the groups are different. • It is not possible to accept a null since two groups will not truly be equal. If we fail to reject a null, then we must suspend judgment as to whether the groups differ. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins F-Value and Unexplained Variance • The result of ANOVA is an F value, which is a point on an F distribution that permits estimates of probability. • The formula for an F is a ratio of variance estimates—thus the term “analysis of variance.” • F = mean square explained / mean square unexplained (also sometimes referred to as “ms error”). • A mean square is essentially the sum of the squared differences from each score and a mean divided by the number of scores minus 1. • Unexplained variance is variation from the mean that is attributed to factors beyond the scope of the research design. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Interpreting F • F is a point on a distribution. • There are an infinite number of F distributions that are reflective of the number of degrees of freedom in the numerator and denominator. • The larger the F (ratio of explained / unexplained variance), the less likely that the differences observed were chance occurrences. • By convention, researchers are generally willing to accept less than a 5% risk that an F value obtained is a chance occurrence. • When the F value is larger, we reject the null hypothesis; thus, differences observed are due to the effects of our intervention. • The alpha value specifies the level of accepted risk of incorrectly concluding that observed differences do not reflect true differences in a population of 100. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Alpha Values and Types of Error • Type I error occurs when a null is rejected when in fact population differences do not exist. The alpha value is really the level of risk of Type I error. • Type II error occurs when a null is not rejected yet a study of the population would reveal differences between groups. • Researchers guard against Type I error by selecting the alpha level. • Statistical power is required to decrease the risk of Type II error. • Power is influenced by 3 factors: – The mean difference between groups – The variance within groups – Sample size • In reality, the only factor investigators can control is sample size. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Complexity and Interaction • When a between-subjects variable and a within-subjects variable exist in a research design, the design may be referred to as a “mixed model” (very common in health care research). • It is possible to have multiple between-subjects and withinsubjects variables within a research design. • Greater complexity in research designs and data analysis is not necessarily an indicator of better research. • Significant interaction: “significant” suggests that the finding is a reflection of a population phenomenon. • To better understand how variables interact, you can turn to tables that include “cell” means and standard deviations. • To interpret the meaning of interactions between variables, use graphic representation. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Levels of Variables, Planned Comparison, and Post-Hoc Analysis • There may be multiple levels within a variable. – Example: Time as a within-subject variable. • Addition of levels of independent variables can maximize efficiencies and yield greater insights into the interactions between the variables of interest. • Comparisons between pairs of means: pre-planned pairwise comparison. • Post-hoc test: Tukey, Scheffe, and Bonferroni procedures. • When one encounters reference to procedures of post-hoc testing, the investigator is conveying that additional analyses were performed to isolate the sources of significant differences between sets of scores. • Risk of Type I error exists with each analysis performed. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Analysis of Covariance • ANCOVA signifies analysis of covariance. • ANCOVA is a special case of ANOVA in which a variable is introduced for the purpose of accounting for unexplained variance. • ANCOVA can increase statistical power. • MANOVA refers to multivariant analysis of variance or cases where more than one dependent measure is analyzed simultaneously. • MANOVA is best applied when the investigator is interested in the effect of the independent variable(s) on the collection of dependent variables. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins T-Tests • T- Tests are a special case of ANOVA in which there are only two sets of data in the comparison. • t2= F • t values are points on a curve, and there are an infinite number of t distributions. • Each t value corresponds to the DF associated with unexplained variance. The DF associated with explained variance is always 1. • t = mean A – mean B / S pool √1 / na + 1 / n b • Standard deviation (SD) is the square root of variance. Thus, it’s the link between the formula for t and ANOVA. • t values may be positive or negative. (F values are always positive.) • With t there is a choice of a null of A = B or A = or >B or vice versa A = or < B. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Significance and Confidence Intervals • ANOVA, t-tests, and the nonparametric procedures address the probability that differences observed reflect population differences but do not address the magnitude of differences. • It is possible to reject a null hypothesis and conclude when the magnitude is of little clinical consequence or conversely fail to reject a null when the possibility of clinically meaningful differences exists. • The solution to this problem is the reporting of confidence intervals. • Focus on the interpretation of confidence intervals, and provide only one example of the calculation process. • A statistically meaningful difference may not reflect clinically meaningful differences; thus, additional information may be needed before deciding on a plan of care. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Effects Size • Effect size is a calculation that shows the typical response to intervention. It is a useful approach to understanding what the observed differences between groups means in terms of magnitude of effect. • Effect size calculations place the magnitude of differences between groups in the context of group variance. • Jacob Cohen (1988): Cohen’s d is one of the most commonly referenced methods of calculating effect size: • d= meana - meanb / s • s is the pooled variance estimate: • s = √ (n1 – 1)s12 + (n2 – 1)s22 / n1 + n2 Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Effects Size • Hedges (1981): Similar equation to Cohen’s. Yields higher effect size estimates. Denominator based on the degrees of freedom: s = √ (n1 – 1)s12 + (n2 – 1)s22 / n1 + n2 -2 • Effect size of – 0.2 represents a small effect. – 0.5 represents a moderate effect. – > 0.8 represents a large effect. • These values are based in social science rather than in biomedical research. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Nonparametric Statistics • The terms nonparametric and distribution-free can be used interchangeably. • When parametric analyses are completed, it is assumed that data is based on observations of a normally distributed population with similar variance, and that samples are drawn at random. • If these assumptions are not met, nonparametric procedures may be the appropriate analytical methods. Nonparametric statistics test hypotheses about medians or nominal data distribution. • Violation of the assumptions is unlikely to have a substantial impact on the statistical outcome, as procedures such as ANOVA are robust. • Three of the most common nonparametric procedures: – – – Mann–Whitney U Kruskal–Wallis One-Way Analysis of Variance by Ranks Friedman Two-Way Analysis of Variance by Ranks Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Mann–Whitney U Test • The Mann–Whitney U test is analogous to the paired t-test. • The analysis tests the null hypothesis that the median score in one group (A) is < or = to the median score of a second group (B) (A < or = B). • If the analysis reveals the median of B > A, then one might reject the null hypothesis. • The Mann–Whitney U result is designated as a T. • As with parametric tests, the null hypothesis (A < or = B) is rejected only if the probability of obtaining a Tvalue is sufficiently small (e.g., less than 5%). Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Kruskal–Wallis One-Way and Friedman Two-Way • A Kruskal–Wallis One-Way Analysis of Variance by Ranks is appropriate when there are more than two groups. – The result is an H-value. – The probability of H can be found by consulting a table specific to this analysis. • A Friedman Two-Way Analysis of Variance by Ranks is appropriate for analyses in which there are repeated measures within one group. • None of these nonparametric tests allow for the analysis of repeated measures from multiple groups, known as a mixed model design. • This represents one of the major limitations of these statistical tests in clinical research. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter Summary and Key Points • Statistics do not prove anything. • Do not read to accept the conclusions of a research report as an absolute or final answer. • Numbers can lie, and the misinterpretation of data and statistical analyses can mislead. • Most students preparing for careers in health care are not fond of statistics. • Careful consideration and critical appraisal inform quality clinical practice; thus, it is necessary to understand the principles of statistics. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins