Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review of Methods from Prerequisite Course Assuming exposure to all of the content from STAT 601 – Statistical Methods for Healthcare Research Presentation Outline • Review of variable types • Review will cover both descriptive and inferential methods • Methods for numeric (or possibly ordinal) response variables • Methods for categorical (or possibly ordinal) response variables * Before viewing this presentation download and print the supplements! Brief Review of Data Types There are three main data types with further subclasses within some of them. • Continuous – measurements or counts Important subclasses – discrete, continuous, ratio scale, & interval scale (Wiki these scales) • Ordinal – ordered categories May be coded numerically and could be treated as such. • Nominal – unordered categories May also be coded numerically, BUT cannot be treated as such. Brief Review of Data Types In JMP (and SPSS) these are the three classifications. In JMP (which we’ll use)… • Continuous variables are denoted: • Ordinal variables are denoted: • Nominal variables are denoted: ICU Study – used in most examples • This study consists of 200 subjects who were admitted to an adult intensive care (ICU). A major goal of this study was to predict the probability of survival to hospital discharge of these patients. (Lemeshow, Teres, Avrunin & Pastides, 1988) • Several measurements were taken at the time of admission and the ultimate survival of the patients was recorded. ICU Study – used in most examples The variable descriptions and coding are found in this table. Comments: Notice that most of the information has been coded numerically, although only Age, Systolic BP, and Heart Rate are continuous. Some of the dichotomous variables have been created using continuous measurements (e.g. PO2, PH, PCO, etc.) The Level of Consciousness variable (LOC) could be treated as ordinal as the levels indicate increasing states of unresponsiveness. Methods for a Numeric Response Print this flowchart for reference (see website) • One population inference • Two population inference • More than two population inference Covers both parametric and nonparametric methods. One or Two or More Populations? • Is the study comparative in nature or are we making an inference about a single population? • Most studies are certainly comparative (i.e. multivariable) in nature! • However, we will review methods for a single numeric variable first. Methods for a Single Numeric Variable Descriptive Methods Visual Descriptions • Histogram • Boxplots • Stem Leaf Plots (archaic) • Cumulative Distribution Plots (CDF) • Normal Quantile Plots Numeric Descriptions • Measures of central tendency • Measures of variation • Measures of relative standing • Measures of distributional shape Plots for a Single Numeric Variable CDF Plot - shows P(X < x) vs. x e.g. P(X < 100) = .60 or 60% chance a patient’s heart rate is less or equal to 100 bpm at admission to ICU. Visual Summaries of Heart Rate @ Admission (ICU Study) • • • • Histogram Boxplots (outlier and quantile) Normal quantile plot CDF plot Summary Statistics for a Numeric Variable Measure of Central Tendency • Mean, Median, Mode (3 M’s) - mode is not unique! • Trimmed Mean (5%) – mean with the 5% of the obs. trimmed off the tails. • Geometric Mean - mean in the log-scale transformed back to original scale. Good measure for skewed right data! Summary Statistics for a Numeric Variable Measure of Relative Standing • Quantiles/Percentiles – values such that k% of the observations are less and (100-k)% are greater. • Quartiles – specific percentiles Q1 – first quartile (25th percentile) Q2 – second quartile (median) Q3 – third quartile (75th percentile) Measures of Shape • Skewness – measures degree of skewness of the distribution. If the distribution is symmetric (e.g. normal) then Skewness is 0. If Skewness > 0 then distribution is skewed to the right, if Skewness < 0 then distribution is skewed to the left. • Kurtosis – measures degree of kurtosis. If the distribution is approx. normal the kurtosis is zero. If it is positive the distribution has heavier tails than a normal distribution (outliers on each end) and if it negative the distribution has thinner tails than a normal distribution and more observations near the mean. (Wiki kurtosis for pictures) Parametric Inference for the Population Mean (m) Assuming either the outcome comes from a normally distributed population or if the sample size is sufficiently “large”. Test Statistic x mo t ~ t distributi on df n 1 s n Sample size required for Confidence Interval for m s x t n margin of error (E) with 95% confidence 1.96 n E 2 Example: Heart Rate of ICU patients Example: Heart Rate of ICU patients Output from JMP The upper-tail test p-value = .00000238 or (p < .0001), thus we have strong evidence to suggest that patients admitted to the adult ICU have a mean heart rate that would be considered high (i.e. m > 90 bpm). Furthermore we estimate that the mean resting heart rate of adults admitted to the ICU is between 95.18 bpm and 102.67 bpm with 95% confidence. Nonparametric Inference for a Single Numeric Variable If the outcome/response does NOT come from a normally distributed population or if the sample size is NOT sufficiently “large”. To test the general hypothesis that in the population of patients admitted to the adult ICU have elevated/high resting heart rates we could use the Wilcoxon Signed-Rank Test as an alternative to the t-Test. 1) Form differences 𝑑𝑖 = 𝑦𝑖 − 90 and drop any that are 0. 2) Compute the signed rank statistics 𝑇+ 𝑎𝑛𝑑 𝑇− . 3) Compare the smaller of these to the critical values from a Wilcoxon Signed-Rank Test table. 4) Better yet, use statistical software! Nonparametric Inference for a Single Numeric Variable The upper-tail p-value from Wilcoxon Signed-Rank Test is (p < .0001) thus we conclude that the median heart rate of the population of patients admitted to the adult ICU is considered high (above 90 bpm). The Wilcoxon Signed-Rank Test is used to make inferences about the population median rather than the mean. Comparing a Continuous Response Between Two Populations • When comparing a numeric response between two populations we must first consider the sampling scheme or experiment that generated the data, namely were the two samples drawn independently or dependently? • For dependent samples, there is a one-to-one correspondence between an individual in one population to an individual in the other. e.g. Pre-test vs. Post-test situations More on Dependent Samples • Pre-test vs. Post-test, e.g. Before treatment vs. After treatment (i.e. subjects = blocks) • Comparing different treatments using the same subjects, e.g. pain relievers used on the same subjects (again subjects = blocks) • Matched subjects in the two populations according to some criteria, e.g. matched patients on basis of age, race, gender, socioeconomic status, weight, height, existing health conditions, etc. (Note: Need to be careful here!) Example 1: Captopril & Systolic Blood Pressure • Research Question: Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg? • Experiment: Measure the blood pressure of 15 patients before and after taking Captopril. Our interest is on the measured changes in blood pressure and whether or not we believe that those changes have a mean greater than 10 mmHg. Example 1: Captopril & Systolic Blood Pressure Summary Statistics 𝑑 = 18.93 𝑚𝑚𝐻𝑔 𝑠𝑑 = 9.03 𝑚𝑚𝐻𝑔 𝑛 = 15 Once the paired differences have been formed we simply treat them as a single numeric response and make inferences accordingly. Parametric Inference for the Mean Paired Difference (md) Assuming either the paired differences come from a normally distributed population or if the sample size (i.e. # of pairs) is sufficiently “large”. Test Statistic 𝑡= 𝑑−𝜇𝑑 𝑠𝑑 𝑛 ~ t-distribution df = n - 1 Confidence Interval for md 𝑑 ± 𝑡𝛼 ∙ 2 𝑠𝑑 𝑛 𝜇𝑑 = the hypothesized difference under the null hypothesis. Typically this will be 0! Note: These formulae are the same as those for single population mean (m)! Example 1: Captopril & Systolic Blood Pressure • Research Question: Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg? • HYPOTHESES 𝐻𝑜 : 𝜇𝑑 ≤ 10 𝑚𝑚𝐻𝑔 , mean decrease in systolic blood pressure 30 minutes following taking Captopril is not greater than 10 mmHg. 𝐻𝑎 : 𝜇𝑑 > 10 𝑚𝑚𝐻𝑔 , mean decrease in systolic blood pressure 30 minutes following taking Captopril is greater than 10 mmHg. Example 1: Captopril & Systolic Blood Pressure We have evidence to suggest that the mean decrease in systolic blood pressure 30 minutes after taking Captopril is more than 10 mmHg (p = .0009). Furthermore, we estimate the mean decrease is between 13.93 mmHg and 23.93 mmHg with 95% confidence. Nonparametric Inference for Paired Differences Use if the paired differences do NOT come from a normally distributed population or if the sample size (# of pairs) is NOT sufficiently “large”. To test the general hypothesis that the change in systolic blood pressure is more than 10 mmHg we could use the Wilcoxon Signed-Rank Test as an alternative to the paired t-Test. 1) Form paired differences 𝑑𝑖 and subtract 10, dropping any that are 0. If simply testing for a difference we would not subtract 10. 2) Compute the signed rank statistics 𝑇+ 𝑎𝑛𝑑 𝑇− . 3) Compare the smaller of these to the critical values from a Wilcoxon Signed-Rank Test table. 4) Better yet, use statistical software! Nonparametric Inference for Paired Differences We have evidence to suggest the median change in systolic blood pressure 30 minutes following taking Captopril is more than 10 mmHg (p = .0010). Nonparametric Inference for Paired Differences • Another nonparametric option is to use the Sign Test. • For the Sign Test we simply looks at the number of positive and negative paired differences and computes the p-value using a binomial distribution with n = # of pairs and p = .50. • This should only be used if the response is difficult to measure or is ordinal ! Independent Samples Comparison of Two Population Means • For independent samples we are either: - drawing samples from two existing populations (i.e. observational study), e.g. males & females, smokers & non-smokers. - randomly allocating subjects into two populations (i.e. experiment), e.g. treatment vs. placebo, therapy A vs. therapy B, etc. Independent Samples Comparison of Two Population Means • Analysis of these two situations is the same, although the conclusions reached may differ (i.e. association vs. causation). • This an example of a bivariate analysis, Y = response (continuous, possibly ordinal) X = population identifier (nominal) • If the response is normally distributed or if both sample sizes are “large” we can use a parametric approach. Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The heart rate at admission appears higher for those admitted through the ER, about 10 bpm higher on average. This apparent difference could be due to chance variation however! Heart rate is approximately normally distributed for both samples. Variation in the heart rates appear to be similar. Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The separation between the CDF plots suggest a potential difference in the heart rate distributions for patients admitted to the adult ICU through the ER and those that were not. In particular, it looks like the heart rate of patients admitted through the ER have tendency to have higher heart rates. Independent Samples Comparison of Two Population Means For testing equality of means Ho: m1 = m2 or (m1 – m2) = 0 The possible alternatives are: Ha: m1 > m2 or (m1 – m2) > 0 (upper-tailed) Ha: m1 < m2 or (m1 – m2) < 0 (lower-tailed) Ha: m1 m2 or (m1 – m2) 0 (two-tailed) Note: If we wanted to establish that one mean was say e.g. at least 10 units larger than the other we could replace 0 in these statements by 10. In general to establish a difference of at least D units then we replace 0 by D. Independent Samples Comparison of Two Population Means Test statistic 𝑡= 𝑦1 −𝑦2 −∆ ~ t-distribution (df) 𝑆𝐸 𝑦1 −𝑦2 The standard error of the difference in the sample means and the degrees of freedom (df) are calculated two different ways depending on whether or not we assume the population variances are equal. Rule O’ Thumb: Assume variances are equal only if neither sample variance is more than twice that of the other sample variance. Independent Samples Comparison of Two Population Means – Pooled t-Test 𝑆𝐸 𝑦1 − 𝑦2 = 𝑠𝑝2 1 1 + 𝑛1 𝑛2 where Test statistic 𝑦1 − 𝑦2 − ∆ ~ t-distribution (df) 𝑡= 𝑆𝐸 𝑦1 − 𝑦2 Confidence Interval for (𝜇1 − 𝜇2 ) (𝑦1 − 𝑦2 ) ± 𝑡 ∙ 𝑆𝐸(𝑦1 − 𝑦2 ) The degrees of freedom for the associated test statistic is 𝑑𝑓 = 𝑛1 + 𝑛2 − 2 2 2 𝑛 − 1 𝑠 + 𝑛 − 1 𝑠 1 2 1 2 𝑠𝑝2 = 𝑛1 + 𝑛2 Pooled estimate of the common variance to both populations, it is essentially a weighted average of the two sample variances. It is called pooled because both samples are combined (or pooled) to estimate the variance common to both populations. Assuming 𝜎12 = 𝜎22 = 𝜎 2 common variance Independent Samples Comparison of Two Population Means – Welch’s t-Test 𝑆𝐸 𝑦1 − 𝑦2 = 𝑠12 𝑠22 + 𝑛1 𝑛2 where Test statistic 𝑦1 − 𝑦2 − ∆ ~ t-distribution (df) 𝑡= 𝑆𝐸 𝑦1 − 𝑦2 Confidence Interval for (𝜇1 − 𝜇2 ) 𝑑𝑓 ≈ 2 2 2 𝑠1 𝑠2 𝑛1 + 𝑛2 2 2 𝑠12 𝑠22 𝑛1 𝑛2 𝑛1 − 1 + 𝑛2 − 1 Always round down! (𝑦1 − 𝑦2 ) ± 𝑡 ∙ 𝑆𝐸(𝑦1 − 𝑦2 ) The degrees of freedom for the associated test statistic is 𝑑𝑓 = 𝑢𝑔𝑙𝑦 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 Assuming 𝜎12 ≠ 𝜎22 , i.e. unequal variances Independent Samples Comparison of Two Population Means – Formally Testing Equality of the Population Variances Assumption • We can formally test the equality of the population variances rather than use the Rule O’ Thumb. • In some situations it may also be of interest to compare the population variances in addition to the population means. • HYPOTHESES 𝐻𝑜 : 𝜎12 = 𝜎22 𝐻𝑎 : 𝜎12 ≠ 𝜎22 (or we could use a one-tailed alternative) Test Statistic (for comparing two population variances) 𝐹 = 𝑚𝑎𝑥 𝑠22 𝑠12 , 𝑠12 𝑠22 ~ F-distribution with 𝑛𝑢𝑚 𝑑𝑓 = 𝑛2 − 1, 𝑑𝑒𝑛 𝑑𝑓 = 𝑛1 − 1 𝑛𝑢𝑚 𝑑𝑓 = 𝑛1 − 1, 𝑑𝑒𝑛 𝑑𝑓 = 𝑛2 − 1 respectively. Large F statistic value small p-value (Reject Ho) • There are several other tests for equality of variance. Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The F-test for comparing population variances do not provide evidence of a significant difference in heart rate variation between the two groups of patients (p = .3992). None of the other tests (O’Brien, BrownForsythe, Levene, Bartlett) have significant p-values either. Given these results we could conduct a pooled t-Test to compare the mean heart rates. Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The two-tailed p-value = .0131, thus we conclude there is a statistically significant difference in the population mean heart rates between these two populations of patients admitted to the adult ICU. Furthermore, we estimate that the mean heart rate for patients admitted to the adult ICU through the emergency room anywhere from 2.26 bpm to 19 bpm larger than the mean for those who were not admitted to the ICU through the emergency room. Note: order of subtraction 1-0, i.e. 𝜇1 − 𝜇0 , i.e. ER mean – non-ER mean. The results from the confidence interval lend themselves to a brief discussion of the concept of practical significance and/or effect size (ES). While a difference in the means of 19 bpm seems physiologically meaningful, the same could not be said for the lower confidence limit which is roughly 2 bpm. We will examine the concepts of practical significance and effect size in more detail later in the course. The output from the non-pooled option (t-Test) is presented in exactly the same format. Nonparametric Testing for Two Independent Samples • If the population distributions do not appear to be normally distributed or if the sample sizes are “small”, we may choose to use a nonparametric test to compare the size of the values from the two populations. • There a few options available but by far the most frequently used nonparametric test for comparing a numeric response across two populations is the Wilcoxon Rank Sum Test (also known as the Mann-Whitney Test). • The test utilizes the sum of the ranks assigned to observations from the two populations when the two samples are combined. Essentially the larger the difference in the rank sums when taking the sample sizes into account, the more evidence we have against equality of the two distributions in terms of the size of the values. Nonparametric Testing for Two Independent Samples HYPOTHESES 𝐻𝑜 : 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2, i.e. the distribution of the two populations is essentially the same, particularly in terms of the size of the values. 𝐻𝑎 : 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 ≠ 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2, i.e. the distributions of the two population is different, specifically we believe one distribution is shifted to the right or left of the other. Note: One-tailed alternatives are fine also, meaning we can specify which population has larger values than the other in the alternative. Here the alternative hypothesis states population A is shifted to the right of population B, i.e. population A has larger values than population B. Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The Wilcoxon Rank Sum Test p-value = .0137, thus we conclude the two populations of patients differ in terms of their heart rate at admission to the adult ICU. In particular, we conclude those that were admitted to the adult ICU via the ER had higher heart rates in general than those not admitted through the ER. Comparing a Continuous Response Between Three or More Populations • As with two populations comparisons, there are independent and dependent sampling schemes when comparing several populations. • Assuming normality and equality of population variances across populations both situations use a form of Analysis of Variance (ANOVA) to compare the means of the populations. Comparing a Continuous Response Between Three or More Populations • We will cover ANOVA in more detail later in the course and review both one-way ANOVA and randomized block designs as part of that discussion. • For now we will look at an quick example of each. Example: Age and Race (Descriptive Summaries) Race of Patient 1 = White 2 = Black 3 = Other Although this may not be of interest in this study, here we compare the ages of patients in this study across race classified as white, black, or other. White patients in the sample were the oldest with a mean age of 59, while the other two race groups have a mean age of around 47. The age distributions do appear to be left-skewed or kurtotic (i.e. nonnormal) and the standard deviations differ enough that equality of variances may be suspect. Example: Age & Race (Comparing Variances) Race of Patient 1 = White 2 = Black 3 = Other All four tests for equality of variance do provide statistically significant evidence of unequal population variances (p > .05). If these tests did suggest a problem with the equality of population variance assumption we could use Welch’s ANOVA (like the non-pooled t-Test) to determine if the mean ages differed across race. Example: Age and Race (One-way ANOVA) Race of Patient 1 = White 2 = Black 3 = Other From the one-way ANOVA F-test we conclude that at least two population means differ (p = .0222). With only three populations controlling for the experiment-wise error rate using Tukey’s HSD is not vital, as there are only three possible pairwise comparisons (white vs. black, white vs. other, and black vs. other). Example: Age & Race (Multiple Comparisons) Race of Patient 1 = White 2 = Black 3 = Other Using Tukey’s HSD we see that none of the pairwise comparisons suggest a difference between the population means (all p > .05). Two-sample t-Tests (pooled) not controlling for experiment-wise error rate (EER) Without controlling for EER we see that the mean ages of white and black patients differ significantly (p = .0283). However, the estimated difference in means covers a wide range 1.26 years to 22.24 years. On the low end of the confidence interval this difference is certainly inconsequential. Example: Age and Race (Nonparametric Test and Multiple Comparisons) Race of Patient 1 = White 2 = Black 3 = Other The nonparametric alternative to the one-way ANOVA F-test is the KruskalWallis test. We conclude the populations differ in terms of the age distributions (p = .0110). The nonparametric alternative to Tukey’s HSD is the Steel-Dwass Method which suggests that the age distributions between white and black patients significantly differ (p = .0268). Again the CI for the difference in typical ages is wide, from 1 year to 25 years, with the low end representing a very small difference. Methods for a Numeric Response We have just reviewed the following: : • One population inference • Two population inference • More than two population inference Covered both parametric and nonparametric methods. We will cover block designs and their analysis when we cover ANOVA in more detail later in the course. Methods for a Categorical Response For a dichotomous categorical response we covered many of the methods in the flow chart to the left in the prerequisite course. A dichotomous response has two levels which we can generically classify as “success” or “failure” or “yes” or “no”. We will briefly review some these methods from the prerequisite course using the ICU study data and data from other studies. We will cover more advanced methods for the analysis of categorical data later in the course. ICU Study – variables & coding The variable descriptions and coding are found in this table. Comments: There are numerous dichotomous variables in this study, vital status (STA) is the primary outcome of interest. Some of the dichotomous variables have been created using continuous measurements (e.g. PO2, PH, PCO, etc.) The Level of Consciousness variable (LOC) could be treated as ordinal as the levels indicate increasing states of unresponsiveness. Summary of Inference for Single Proportion (p) Assuming the sample size n sufficiently “large”. Test Statistic z pˆ p o p o (1 p o ) n ~ standard normal Confidence Interval for p pˆ (1 pˆ ) pˆ z n Sample size required for margin of error (E) with 95% confidence assuming prior value for p 1.96 2 p (1 p ) n 2 E Conservative approach 1.96 2 n 4E 2 Summary of Inference for Single Proportion (p) Exact inferential methods using the binomial distribution Binomial Exact Test (one-sided) 𝐻𝑜 : 𝑝 = 𝑝𝑜 and 𝐻𝑎 : 𝑝 < 𝑝𝑜 𝑜𝑟 𝐻𝑎 : 𝑝 > 𝑝𝑜 Find the probability of observing the number of successes as extreme or more extreme than those observed (𝑥) assuming the null is true. Use a binomial table to calculate the p-value 𝐹𝑜𝑟 𝐻𝑎 : 𝑝 > 𝑝𝑜 𝑡ℎ𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝑋 ≥ 𝑥 𝑛, 𝑝𝑜 ) 𝐹𝑜𝑟 𝐻𝑎 : 𝑝 < 𝑝𝑜 𝑡ℎ𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃(𝑋 ≤ 𝑥|𝑛, 𝑝𝑜 ) A two-sided alternative would have p-value equal to the smaller of the probabilities above multiplied by 2. Summary of Inference for Single Proportion (p) Exact inferential methods using the binomial distribution Binomial Exact 95% Confidence Interval for Use a binomial table to find the proportions that make the following probability statements true: 𝑃 𝑋 ≥ 𝑥 𝑛, 𝑝𝐿 ) = .025 𝑃 𝑋 ≤ 𝑥 𝑛, 𝑝𝑈 = .025 The Exact 95% Confidence Interval for p is given by (𝑝𝐿 , 𝑝𝑈 ) Example: Gender of ICU Patients Research Question: Is there evidence that a majority of adult ICU admissions are men? Here the parameter of interest is : p = proportion of adult ICU admissions that are men. In our sample of n = 200 patients 124 or 62% were men, which certainly represents a majority. However, this could be due to sampling variation and in actuality there is an equal balance of ICU admissions based on gender. Example: Gender of ICU Patients Research Question: Is there evidence that a majority of adult ICU admissions are men? 𝐻𝑜 : 𝑝 ≤ .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠 𝐻𝑎 : 𝑝 > .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠 .62 − .50 𝑧= = 3.39 → 𝑃 𝑍 > 3.39 = .00034 .50(1 − .50) 200 Thus we have evidence that a majority of patients admitted to the adult ICU are males. 95% 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑝 . 62 ± 1.96 .62(1−.62) 200 = (.5527, .6873) or (55.27%, 68.73%) Example: Gender of ICU Patients Research Question: Is there evidence that a majority of adult ICU admissions are men? 𝐻𝑜 : 𝑝 ≤ .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠 𝐻𝑎 : 𝑝 > .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠 𝑃 𝑋 ≥ 124 𝑛 = 200, 𝑝 = .50 = .000423 Thus we have evidence that a majority of patients admitted to the adult ICU are males. 𝐸𝑥𝑎𝑐𝑡 95% 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑝 𝑃 𝑋 ≥ 124 𝑛 = 200, 𝑝 = .549 = .0252 𝑃 𝑋 ≤ 124 𝑛 = 200, 𝑝 = .687 = .0259 Thus a Exact 95% CI for p is (54.9%, 68.7%). Independent Samples Comparison of Two Population Proportions For testing equality of two proportions Ho: p1 = p2 or (p1 – p2) = 0 The possible alternatives are: Ha: p1 > p2 or (p1 – p2) > 0 (upper-tailed) Ha: p1 < p2 or (p1 – p2) < 0 (lower-tailed) Ha: p1 p2 or (p1 – p2) 0 (two-tailed) Note: If we wanted to establish that one proportion was say e.g. at least .10 or 10 percentage points larger than the other we could replace 0 in these statements by .10. In general to establish a difference of at least D , then we replace 0 by D. Independent Samples Comparison of Two Test Statistic for Large Independent Samples Population Proportions (𝑝1 𝑣𝑠. 𝑝2 ) For testing to see if difference is at least D Ho: (p1 – p2) = D HA: (p1 – p2) > D (p1 – p2) < D (upper-tail) (lower-tail) Provided n1p1 > 10 & n1q1 > 10 and Most important case n2 p2 > 10 & n2q2 > 10 Independent Samples Comparison of Two Population Proportions (𝑝1 𝑣𝑠. 𝑝2 ) Provided n1p1 > 10 & n1q1 > 10 n2 p2 > 10 & n2q2 > 10 The confidence interval for (p1 – p2) has a general form: z-values 90% z = 1.645 95% z = 1.960 99% z = 2.576 Example: Comparing Service at Admission Across Survival Status • The ICU study is a case control study – that is 40 patients who died and 160 who did not die were sampled and the admission related variables were collected. • Because of this we cannot calculate the probability of patient death using these data. • To identify variables related to survival we use vital status (STA) as the population identifier, i.e. as the X variable in JMP. Example: Comparing Service at Admission Across Survival Status Amongst the patients in the study who died in the ICU, 65% were admitted from the Medical unit and 35% from the Surgical unit. For patients that did not die 58.1% were admitted from the Surgical unit and 41.9% were admitted from the Medical unit. These percentages are used to construct the mosaic plot and are displayed in the cells of the plot. The 2 X 2 contingency table below the plot gives frequencies and row percentages (i.e. a percentage breakdown of the column variable within each row). You can see the row %’s are the same as those discussed above. Example: Comparing Service at Admission Across Survival Status The large sample test p-values and confidence interval for the difference in the proportions are given under the heading Two Sample Test for Proportions. The proportion of patients admitted to the ICU from the surgical unit is significantly higher for those that survived (p = .0038). This finding is certainly expected. We estimate that the percentage of patients coming from the surgical unit is between 5.9 and 38.7 percentage points higher for ICU survivors. The difference in proportions is also known as the attributable risk (AR). Example: Comparing Service at Admission Across Survival Status Another large sample test for 2 X 2 tables is the chi-square test, either Pearson’s or Likelihood Ratio, which suggests that the proportion of patients coming the surgical unit differs for survivors and non-survivors (p = .0087 or .0085). The Fisher’s Exact Test p-values do not rely on the large sample assumption. This test is preferable to either of the large sample procedures. The alternative hypothesis is communicated along with the associated pvalues. The Left p-value = .0071, which leads us to conclude that the proportion of patients coming from the surgical unit is higher for the survivor group. Example: Comparing Service at Admission Across Survival Status The Odds Ratio (OR) is used to quantify risk when a case-control study was used. The easiest way to calculate the OR is the formula: 𝑂𝑅 = 𝑎𝑑 𝑏𝑐 The a cell in table corresponds to those that have the adverse outcome (in this case death) and have the risk factor present – which in this case is coming from the medical unit (vs. surgical unit). Thus a = 26 and subsequently b = 14, c = 67, and d = 93. Thus the estimated OR is 𝑂𝑅 = 26 ∙ 93 = 2.58 67 ∙ 14 Example: Comparing Service at Admission Across Survival Status From the previous slide the estimated odds ratio is 𝑂𝑅 = 26∙93 67∙14 = 2.58 However JMP reports a different OR, this is because JMP does computations alphanumerically, essentially reversing the roles of 0 and 1. If JMP gives an OR that is inconsistent with your calculation, then you simply need to reciprocate the OR from JMP. OR = 1/.388 = 2.58, giving the result we want. Thus the 95% CI for the OR is given by (1/.79828, 1/.188511) = (1.25 , 5.30) . Patients admitted to the ICU from the medical unit have at a minimum a 25% increase in odds for death. We estimate the odds ratio is between 1.25 and 5.30. Quick Recap o o We have just compared the proportion of patients in both service at admission categories across survival status (p1 vs. p2) using the z-test, a CI for (p1 – p2) & Fisher’s Exact Test. Computed the Odds Ratio (OR) and found a CI for the population OR. Development of a Test Statistic to Measure Lack of Independence One way to generalize the question of interest to the researchers is to think of it as follows: Q: Is there an association between the service at admission and the survival status of patients admitted to the adult ICU? Development of a Test Statistic to Measure Lack of Independence If there is not an association, we say that these variables are independent. In the probability we say that two events A and B are said to be independent if P(A|B) = P(A). Development of a Test Statistic to Measure Lack of Independence In the context of our study this would mean for example, P(Medical|Patient Survived) = P(Medical) i.e. knowing that the patient survived tells you nothing about the chance that they came to the ICU from the medical unit vs. the surgical unit. Development of a Test Statistic to Measure Lack of Independence P(Medical) = 93/200 = .465 In this study 46.5% of the patients admitted to the adult ICU came from the medical unit. When we consider this percentage conditioning on survival status we see that relationship for independence does not hold for these data. P(Medical|Died) = 26/40 = .650 P(Medical|Survived) = 67/160 = .419 Should both be equal to .465 Development of a Test Statistic to Measure Lack of Independence o o Of course the observed differences could be due to random variation and in truth it may be the case that disease and risk factor status are independent. Therefore we need a means of assessing how different the observed results are from what we would expect to see if the these two factors were independent. 2 X 2 Example: Case-Control Study Survival Status and Service at Admission Survival Status Serviced in Serviced in Medical Surgical Unit Unit Died (Case) 26 Survived (Control) Column Totals 67 a 14 b Row Totals 40 R1 93 160 d R2 93 107 200 C1 C2 c n Development of a Test Statistic to Measure Lack of Independence The unconditional probability of risk presence (admission from medical) for these data is given by: C1 P( Risk ) n From this table we can calculate the conditional probability of admission from medical given that the patient died as follows: R1C1 a and setting these a C1 a P( Risk | Disease ) n n R1 to equal we have R1 Development of a Test Statistic to Measure Lack of Independence Thus we expect the frequency in the a cell to be equal to: R1C1 a n Similarly we find the following expected frequencies for the cells making up the 2 X 2 table R1C1 a n R2 C1 c n R1C 2 b n R2 C 2 d n Development of a Test Statistic to Measure Lack of Independence In general we denote the observed frequency in the ith row and jth column as Oij or just O for short. We denote the expected frequency for the ith row and jth column as Eij Ri C j n 𝑅𝑖 = 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑟𝑜𝑤 𝑖 or just E for short. 𝑎𝑛𝑑 𝐶𝑗 = 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑐𝑜𝑙𝑢𝑚𝑛 𝑗 Development of a Test Statistic to Measure Lack of Independence o o To measure how different our observed results are from what we expected to see if the two variables in question were independent we intuitively should look at the difference between the observed (O) and expected (E) frequencies, i.e. O – E or more specifically Oij Eij However this will give too much weight to differences where these frequencies are both large in size. Development of a Test Statistic to Measure Lack of Independence o One test statistic that addresses the “size” of the frequencies issue is Pearson’s Chi-Square (c2) c 2 (O E ) all cells r c i 1 j 1 (O ij 2 E Eij ) Eij 2 Notice this test statistic still uses (O – E) as the basic building block. This statistic will be large when the observed frequencies do NOT match the expected values for independence. ~ c 2 ~ chi - squared distributi on wit h df (r 1) (c 1) Chi-square Distribution (c2) p-value c2 This is a graph of the chi-square distribution with 4 degrees of freedom. The area to the right of Pearson’s chi-square statistic give the p-value. The p-value is always the area to the right! 2 X 2 Example: Case-Control Study Survival Status and Service at Admission Survival Status Died (Case) Healthy (Control) Column Totals Served by Medical Unit Served by Surgical Unit Row Totals 40 26 O11 14 O12 67 O21 93 O22 160 93 107 200 C1 C2 R1 R2 n Calculating Expected Frequencies Survival Status and Service at Admission Survival Status Died (Case) Survived (Control) Column Totals Served by Medical Unit 26 Served by Surgical Unit Row Totals (18.6) 67 (21.4) 93 40 R1 160 (74.4) (85.6) R2 107 366 n 93 C1 14 C2 R1C1 40 93 n 200 18.6 R1C2 40 107 E12 n 200 21.4 R C 160 93 E21 2 1 n 200 74.4 R2C2 160 107 E22 n 200 85.6 E11 Eij expected frequency for ith row and jth cell Calculating the Pearson Chi-square c 2 all cells (O E )2 E 2 2 ( 26 18.6 ) (14 21.4 ) 18.6 21.4 2 2 ( 67 74.4 ) (93 85.6 ) 74.4 85.6 2.94 2.56 .736 .640 6.879 c 2 6.879 df (2 1) (2 1) 1 p-value .0087 http://www.stat.tamu.edu/~west/applets/chisqdemo.html Chi-square Probability Calculator in JMP Enter the test statistic value and df and the p-value is automatically calculated. p-value = P(c2 > 6.879) .0087 2 X 2 Example: Case-Control Study Service at Admission and Survival Status Conclusion: We have strong evidence to suggest that at service at admission and survival status are NOT independent, and thus conclude they are associated or related (p =.0087). In particular, we found that the proportion of patients admitted to the adult ICU from the medical unit was higher amongst patients who did not survive. Dependent Samples Comparison of Two Population Proportions (𝑝1 𝑣𝑠. 𝑝2 ) • The test used to compare two proportions using dependent samples is called McNemar’s test. • As with most tests, there are both a “large” sample and “small” sample version of the test. • The small sample version uses the binomial distribution and is an exact test, so technically there is no reason to use the large sample version, though many do. Example: Low pH and Elevated PCO2 Levels • For each patient in the ICU study the pH levels and PCO2 levels found in their blood gases were measured. If pH levels were below 7.5 they were coded as being low (1) or not (0). If PCO2 levels were above 45 mmHg they were coded as being high (1) or not (0). • pH < 7.5 Low pH (bad) • PCO2 > 45 mmHg Elevated PCO2 (bad) Example: Low pH and Elevated PCO2 Levels • If we wish to compare the proportion of patients with low/“bad” pH levels 𝑝1 to the proportion of patients with elevated/ “bad” PCO2 levels (𝑝2 ) we could not compare them using the independent samples approach because these measurements are being made on the same patients. Thus we have dependent samples. Example: Low pH and Elevated PCO2 Levels The mosaic plot shows that the relationship between pH and PCO2 levels. Patients with low pH levels are more likely to also have high PCO2 levels. Amongst those with low pH 61.5% have high PCO2 levels and amongst those with normal pH levels only 6.4% have high PCO2 levels. Fisher’s Exact test confirms that the difference in the percentages discussed above are statistically significant (p < .0001). We can conclude that have low pH levels are more likely to have high PCO2 levels. This analysis however, does not compare the incidence of these two conditions to one another, it only suggests that the two conditions are significantly related. Example: Low pH and Elevated PCO2 Levels The 2 X 2 contingency table constructed by cross-tabulating these levels vs. one another is shown to the left. 20 We can see that 200 = .10 𝑜𝑟 10% of the patients have high PCO2 levels. 13 Also 200 = .065 𝑜𝑟 6.5% of the same patients have low pH levels. The McNemar’s test p-value is not significant (p = .0896). Therefore we cannot conclude the differences in these two percentages is statistically significant. This is a two-sided p-value! So in our sample of ICU patients we see that a higher percentage of them have elevated PCO2 levels, but is this difference statistically significant? McNemar’s test is used to determine this. The results of this test from JMP (using the large sample test) is shown to the left. Exact McNemar’s Test: p-values (uses binomial distribution) Ha: p1 > p2 Reject Ho if Ha: p1 < p2 Reject Ho if P ( X c | n (b c), p .50) Ha: p1 = p2 Reject Ho if 2 P( X max(b, c) | n (b c), p .50) Use either binomial probability tables or computer software to find these probabilities. Example: Low pH and Elevated PCO2 Levels Here 𝑏 = 12 and 𝑐 = 5 , therefore we have 12 + 5 = 17 discordant pairs. If our research hypothesis was that a greater proportion of patients had elevated PCO2 levels than had low pH levels the p-value is found using the binomial distribution as: 𝑃 𝑋 ≥ 12 𝑛 = 17, 𝑝 = .50 = .0717 Exact McNemar’s Test using the binomial distribution. If our research hypothesis was that a greater proportion of patients had low pH levels than had elevated PCO2 levels the p-values is found using the binomial distribution as: 𝑃 𝑋 ≥ 5 n = 17, p = .50 = .9755 Notice the difference in the twotailed p-values from the exact vs. large sample approximation. Finally the two-tailed p-value = 2 × .0717 = .1434 Methods for a Categorical Response For a dichotomous categorical response we have just reviewed the following: • One population inference • Two population inference • Covered both large sample and exact methods. We will cover the Cox-Stuart (or Cochran-Armitage) test for trend later in the course when cover more advanced methods for analyzing categorical data. Methods for a Categorical Response • Data in 2 X 2 Tables (covered above) Comparing two population proportions using independent samples (Fisher’s Exact Test) Comparing two population proportions using dependent samples (McNemar’s Test) Relative Risk (RR), Odds Ratios (OR), Risk Difference, Attributable Risk (AR), & NNT/NNH • Data in r X c Tables Tests of Independence & Association and Homogeneity. Example: Response to Treatment and Histological Type of Hodgkin’s Disease In this study a random sample of 538 patients diagnosed with some form of Hodgkin’s Disease was taken and the histological type: nodular sclerosis (NS), mixed cellularity (MC), lymphocyte predominance (LP), or lymphocyte depletion (LD) was recorded along with the outcome from standard treatment which was recorded as being none, partial, or complete remission. Q: Is there an association between type of Hodgkin’s and response to treatment? If so, what is the nature of the relationship? Example: Response to Treatment and Histological Type of Hodgkin’s Disease Type Row Totals None Partial Positive LD 44 10 18 72 LP 12 18 74 104 MC 58 54 154 266 NS 12 16 68 96 Column Totals 126 98 314 n = 538 Some Probabilities of Potential Interest Probability of Positive Response to Treatment P(positive) = 314/538 = .5836 Probability of Positive Response to Treatment Given Disease Type P(positive|LD) = 18/72 = .2500 P(positive|LP) = 74/104 = .7115 P(positive|MC) = 154/266 = .5789 P(positive|NS) = 68/96 = .7083 Notice the conditional probabilities are not equal to the unconditional!!! Mosaic plot of the results Response to Treatment vs. Histological Type Clearly we see that LP and NS respond most favorably to treatment with over 70% of those sampled having experiencing complete remission, whereas lymphocyte depletion has a majority (61.1%) of patients having no response to treatment. A statistical test at this point seems unnecessary as it seems clear that there is an association between the type of Hodgkin’s disease and the response to treatment, nonetheless we will proceed… Example: Response to Treatment and Histological Type of Hodgkin’s Disease Type Row Totals None Partial Positive LD 44 (16.86) 10 (13.11) 18 (42.02) 72 LP 12 (24.36) 18 (18.94) 74 (60.69) 104 E12 MC 58 (62.30) 54 (48.45) 154 (155.25) 266 NS 12 (22.48) 16 (17.49) 68 (56.03) 96 126 98 314 n = 538 Column Totals R1C1 72 126 E11 n 538 16.86 R1C 2 72 98 n 538 13.11 R1C 3 72 314 n 538 42.02 E 21 ... E 43 R4 C 3 96 314 n 538 56.03 Pearson’s Chi-Square Test of Independence Pearson’s Chi-Square (c2) c2 (O E )2 all cells r c i 1 j 1 Notice this test statistic still uses (O – E) as the basic building block. This statistic will be large when the observed frequencies do NOT match the expected values for independence. (O ij E Eij ) Eij 2 ~ c 2 ~ chi - squared distributi on wit h df (r 1) (c 1) Chi-square Distribution (c2) p-value c2 This is a graph of the chi-square distribution with 4 degrees of freedom. The area to the right of Pearson’s chi-square statistic give the p-value. The p-value is always the area to the right! Example: Response to Treatment and Histological Type of Hodgkin’s Disease Type LD Row Totals None 44 (16.86) Partial 10 (13.11) Positive 18 (42.02) c 2 all cells r 72 (O E )2 c c 2 i 1 j 1 E (O ij E ij ) 2 E ij 2 ( 44 16.86 ) c 2 LP 12 (24.36) 18 (18.94) 74 (60.69) 104 MC 58 (62.30) 54 (48.45) 154 (155.25) 266 NS Column Totals 12 (22.48) 16 (17.49) 68 (56.03) 96 126 98 314 n = 538 2 ( 10 13.11) 16.86 13.11 2 ( 68 56.03) ... 75.89 56.03 c 2 75.89 df 6 p value .0001 We have strong evidence of an association between the type of Hodgkin’s and response to treatment (p < .0001). Summary of Review • • • We have reviewed most of the methods covered in the prerequisite course that were organized in the flow charts for a numeric response and for a dichotomous categorical response. Additionally we reviewed the chi-square test of independence for r x c contingency tables. The other major topics covered in the prerequisite course that were not reviewed are basic study design, correlation, and regression modeling. We will review and extend our coverage of these topics later in the course.