Download What methods do I use? - Winona State University

Document related concepts

Prenatal testing wikipedia , lookup

Forensic epidemiology wikipedia , lookup

Transcript
Review of Methods from
Prerequisite Course
Assuming exposure to all of the content from
STAT 601 – Statistical Methods for Healthcare Research
Presentation Outline
• Review of variable types
• Review will cover both descriptive and
inferential methods
• Methods for numeric (or possibly ordinal)
response variables
• Methods for categorical (or possibly
ordinal) response variables
* Before viewing this presentation download
and print the supplements!
Brief Review of Data Types
There are three main data types with further
subclasses within some of them.
• Continuous – measurements or counts
Important subclasses – discrete, continuous, ratio scale,
& interval scale (Wiki these scales)
• Ordinal – ordered categories
May be coded numerically and could be treated as such.
• Nominal – unordered categories
May also be coded numerically, BUT cannot be treated
as such.
Brief Review of Data Types
In JMP (and SPSS) these are the three
classifications. In JMP (which we’ll use)…
• Continuous variables are denoted:
• Ordinal variables are denoted:
• Nominal variables are denoted:
ICU Study – used in most examples
• This study consists of 200 subjects who were
admitted to an adult intensive care (ICU). A
major goal of this study was to predict the
probability of survival to hospital discharge of
these patients. (Lemeshow, Teres, Avrunin & Pastides, 1988)
• Several measurements were taken at the time of
admission and the ultimate survival of the
patients was recorded.
ICU Study – used in most examples
The variable descriptions and coding
are found in this table.
Comments:
Notice that most of the information has
been coded numerically, although only
Age, Systolic BP, and Heart Rate are
continuous.
Some of the dichotomous variables
have been created using continuous
measurements (e.g. PO2, PH, PCO,
etc.)
The Level of Consciousness variable
(LOC) could be treated as ordinal as
the levels indicate increasing states of
unresponsiveness.
Methods for a Numeric Response
Print this flowchart for
reference (see website)
• One population inference
• Two population inference
• More than two population
inference
Covers both parametric and
nonparametric methods.
One or Two or More Populations?
• Is the study comparative in nature or are
we making an inference about a single
population?
• Most studies are certainly comparative
(i.e. multivariable) in nature!
• However, we will review methods for a
single numeric variable first.
Methods for a Single Numeric Variable
Descriptive Methods
Visual Descriptions
• Histogram
• Boxplots
• Stem Leaf Plots (archaic)
• Cumulative Distribution Plots
(CDF)
• Normal Quantile Plots
Numeric Descriptions
• Measures of central
tendency
• Measures of variation
• Measures of relative
standing
• Measures of
distributional shape
Plots for a Single Numeric Variable
CDF Plot - shows
P(X < x) vs. x
e.g.
P(X < 100) = .60 or
60% chance a patient’s
heart rate is less or
equal to 100 bpm at
admission to ICU.
Visual Summaries of Heart Rate @
Admission (ICU Study)
•
•
•
•
Histogram
Boxplots (outlier and quantile)
Normal quantile plot
CDF plot
Summary Statistics for a Numeric Variable
Measure of Central Tendency
• Mean, Median, Mode (3 M’s)
- mode is not unique!
• Trimmed Mean (5%) – mean
with the 5% of the obs.
trimmed off the tails.
• Geometric Mean - mean in
the log-scale transformed
back to original scale.
Good measure for skewed right data!
Summary Statistics for a Numeric Variable
Measure of Relative Standing
• Quantiles/Percentiles –
values such that k% of the
observations are less and
(100-k)% are greater.
•
Quartiles – specific percentiles
Q1 – first quartile (25th percentile)
Q2 – second quartile (median)
Q3 – third quartile (75th percentile)
Measures of Shape
• Skewness – measures degree of skewness of the distribution. If the distribution is
symmetric (e.g. normal) then Skewness is 0. If Skewness > 0 then distribution is
skewed to the right, if Skewness < 0 then distribution is skewed to the left.
• Kurtosis – measures degree of kurtosis. If the distribution is approx. normal the
kurtosis is zero. If it is positive the distribution has heavier tails than a normal
distribution (outliers on each end) and if it negative the distribution has thinner tails
than a normal distribution and more observations near the mean.
(Wiki kurtosis for pictures)
Parametric Inference for the
Population Mean (m)
Assuming either the outcome comes from a normally distributed
population or if the sample size is sufficiently “large”.
Test Statistic
x  mo
t
~ t  distributi on df  n  1
s
n
Sample size required for
Confidence Interval for m
 s 
x  t   
 n
margin of error (E) with
95% confidence
 1.96   
n

 E 
2
Example: Heart Rate of ICU patients
Example: Heart Rate of ICU patients
Output from JMP
The upper-tail test p-value = .00000238 or (p < .0001),
thus we have strong evidence to suggest that patients
admitted to the adult ICU have a mean heart rate that
would be considered high (i.e. m > 90 bpm).
Furthermore we estimate that the mean resting
heart rate of adults admitted to the ICU is
between 95.18 bpm and 102.67 bpm with 95%
confidence.
Nonparametric Inference for a
Single Numeric Variable
If the outcome/response does NOT come from a normally distributed
population or if the sample size is NOT sufficiently “large”.
To test the general hypothesis that in the population of patients
admitted to the adult ICU have elevated/high resting heart
rates we could use the Wilcoxon Signed-Rank Test as an
alternative to the t-Test.
1) Form differences 𝑑𝑖 = 𝑦𝑖 − 90 and drop any that are 0.
2) Compute the signed rank statistics 𝑇+ 𝑎𝑛𝑑 𝑇− .
3) Compare the smaller of these to the critical values from a
Wilcoxon Signed-Rank Test table.
4) Better yet, use statistical software!
Nonparametric Inference for a
Single Numeric Variable
The upper-tail p-value from Wilcoxon
Signed-Rank Test is (p < .0001) thus we
conclude that the median heart rate of the
population of patients admitted to the adult
ICU is considered high (above 90 bpm).
The Wilcoxon Signed-Rank Test is used
to make inferences about the population
median rather than the mean.
Comparing a Continuous Response
Between Two Populations
• When comparing a numeric response between
two populations we must first consider the
sampling scheme or experiment that generated
the data, namely were the two samples drawn
independently or dependently?
• For dependent samples, there is a one-to-one
correspondence between an individual in one
population to an individual in the other.
e.g. Pre-test vs. Post-test situations
More on Dependent Samples
• Pre-test vs. Post-test, e.g. Before treatment vs.
After treatment (i.e. subjects = blocks)
• Comparing different treatments using the same
subjects, e.g. pain relievers used on the same
subjects (again subjects = blocks)
• Matched subjects in the two populations according
to some criteria, e.g. matched patients on basis of
age, race, gender, socioeconomic status, weight,
height, existing health conditions, etc.
(Note: Need to be careful here!)
Example 1: Captopril & Systolic Blood Pressure
• Research Question: Is there evidence that
patients will experience a mean decrease in systolic
blood pressure of more than 10 mmHg?
• Experiment: Measure the blood pressure of 15
patients before and after taking Captopril. Our
interest is on the measured changes in blood
pressure and whether or not we believe that those
changes have a mean greater than 10 mmHg.
Example 1: Captopril & Systolic Blood Pressure
Summary Statistics
𝑑 = 18.93 𝑚𝑚𝐻𝑔
𝑠𝑑 = 9.03 𝑚𝑚𝐻𝑔
𝑛 = 15
Once the paired differences have been formed we simply treat
them as a single numeric response and make inferences
accordingly.
Parametric Inference for the Mean
Paired Difference (md)
Assuming either the paired differences come from a normally
distributed population or if the sample size (i.e. # of pairs) is
sufficiently “large”.
Test Statistic
𝑡=
𝑑−𝜇𝑑
𝑠𝑑
𝑛
~
t-distribution df = n - 1
Confidence Interval for md
𝑑 ± 𝑡𝛼 ∙
2
𝑠𝑑
𝑛
𝜇𝑑 = the hypothesized difference
under the null hypothesis.
Typically this will be 0!
Note: These formulae are
the same as those for
single population mean (m)!
Example 1: Captopril & Systolic Blood Pressure
• Research Question: Is there evidence that
patients will experience a mean decrease in systolic
blood pressure of more than 10 mmHg?
• HYPOTHESES
𝐻𝑜 : 𝜇𝑑 ≤ 10 𝑚𝑚𝐻𝑔 , mean decrease in systolic blood
pressure 30 minutes following taking Captopril is
not greater than 10 mmHg.
𝐻𝑎 : 𝜇𝑑 > 10 𝑚𝑚𝐻𝑔 , mean decrease in systolic blood
pressure 30 minutes following taking Captopril is
greater than 10 mmHg.
Example 1: Captopril & Systolic Blood Pressure
We have evidence to suggest that the mean decrease in systolic blood
pressure 30 minutes after taking Captopril is more than 10 mmHg (p = .0009).
Furthermore, we estimate the mean decrease is between 13.93 mmHg and
23.93 mmHg with 95% confidence.
Nonparametric Inference for
Paired Differences
Use if the paired differences do NOT come from a normally
distributed population or if the sample size (# of pairs) is NOT
sufficiently “large”.
To test the general hypothesis that the change in systolic blood pressure is
more than 10 mmHg we could use the Wilcoxon Signed-Rank Test as
an alternative to the paired t-Test.
1) Form paired differences 𝑑𝑖 and subtract 10, dropping any that are 0. If
simply testing for a difference we would not subtract 10.
2) Compute the signed rank statistics 𝑇+ 𝑎𝑛𝑑 𝑇− .
3) Compare the smaller of these to the critical values from a Wilcoxon
Signed-Rank Test table.
4) Better yet, use statistical software!
Nonparametric Inference for
Paired Differences
We have evidence to suggest the median
change in systolic blood pressure 30
minutes following taking Captopril is more
than 10 mmHg (p = .0010).
Nonparametric Inference for
Paired Differences
• Another nonparametric option is to use the
Sign Test.
• For the Sign Test we simply looks at the
number of positive and negative paired
differences and computes the p-value using
a binomial distribution with n = # of pairs and
p = .50.
• This should only be used if the response
is difficult to measure or is ordinal !
Independent Samples Comparison of
Two Population Means
• For independent samples we are either:
- drawing samples from two existing
populations (i.e. observational study), e.g.
males & females, smokers & non-smokers.
- randomly allocating subjects into two
populations (i.e. experiment), e.g. treatment
vs. placebo, therapy A vs. therapy B, etc.
Independent Samples Comparison
of Two Population Means
• Analysis of these two situations is the
same, although the conclusions reached
may differ (i.e. association vs. causation).
• This an example of a bivariate analysis,
Y = response (continuous, possibly ordinal)
X = population identifier (nominal)
• If the response is normally distributed or if
both sample sizes are “large” we can use a
parametric approach.
Example: Heart Rate and Type of Admission
Type of
admission (TYP)
1 = ER
0 = non-ER
The heart rate at admission
appears higher for those
admitted through the ER,
about 10 bpm higher on
average.
This apparent difference
could be due to chance
variation however!
Heart rate is approximately
normally distributed for both
samples.
Variation in the heart rates
appear to be similar.
Example: Heart Rate and Type of Admission
Type of
admission (TYP)
1 = ER
0 = non-ER
The separation between
the CDF plots suggest a
potential difference in the
heart rate distributions for
patients admitted to the
adult ICU through the ER
and those that were not.
In particular, it looks like the
heart rate of patients
admitted through the ER
have tendency to have
higher heart rates.
Independent Samples Comparison
of Two Population Means
For testing equality of means
Ho: m1 = m2 or (m1 – m2) = 0
The possible alternatives are:
Ha: m1 > m2 or (m1 – m2) > 0 (upper-tailed)
Ha: m1 < m2 or (m1 – m2) < 0 (lower-tailed)
Ha: m1  m2 or (m1 – m2)  0 (two-tailed)
Note: If we wanted to establish that one mean was say e.g. at least 10
units larger than the other we could replace 0 in these statements by
10. In general to establish a difference of at least D units then we
replace 0 by D.
Independent Samples Comparison of Two
Population Means
Test statistic
𝑡=
𝑦1 −𝑦2 −∆
~ t-distribution (df)
𝑆𝐸 𝑦1 −𝑦2
The standard error of the difference in the sample means and the
degrees of freedom (df) are calculated two different ways
depending on whether or not we assume the population variances
are equal.
Rule O’ Thumb:
Assume variances are equal only if neither sample variance is more than twice that of the
other sample variance.
Independent Samples Comparison of Two
Population Means – Pooled t-Test
𝑆𝐸 𝑦1 − 𝑦2 =
𝑠𝑝2
1
1
+
𝑛1 𝑛2
where
Test statistic
𝑦1 − 𝑦2 − ∆
~ t-distribution (df)
𝑡=
𝑆𝐸 𝑦1 − 𝑦2
Confidence Interval for (𝜇1 − 𝜇2 )
(𝑦1 − 𝑦2 ) ± 𝑡 ∙ 𝑆𝐸(𝑦1 − 𝑦2 )
The degrees of freedom for the
associated test statistic is
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
2
2
𝑛
−
1
𝑠
+
𝑛
−
1
𝑠
1
2
1
2
𝑠𝑝2 =
𝑛1 + 𝑛2
Pooled estimate of the common variance
to both populations, it is essentially a
weighted average of the two sample
variances. It is called pooled because
both samples are combined (or pooled) to
estimate the variance common to both
populations.
Assuming 𝜎12 = 𝜎22 = 𝜎 2  common variance
Independent Samples Comparison of Two
Population Means – Welch’s t-Test
𝑆𝐸 𝑦1 − 𝑦2 =
𝑠12 𝑠22
+
𝑛1 𝑛2
where
Test statistic
𝑦1 − 𝑦2 − ∆
~ t-distribution (df)
𝑡=
𝑆𝐸 𝑦1 − 𝑦2
Confidence Interval for (𝜇1 − 𝜇2 )
𝑑𝑓 ≈
2
2 2
𝑠1 𝑠2
𝑛1 + 𝑛2
2
2
𝑠12
𝑠22
𝑛1
𝑛2
𝑛1 − 1 + 𝑛2 − 1
Always round down!
(𝑦1 − 𝑦2 ) ± 𝑡 ∙ 𝑆𝐸(𝑦1 − 𝑦2 )
The degrees of freedom for the
associated test statistic is
𝑑𝑓 = 𝑢𝑔𝑙𝑦 𝑓𝑜𝑟𝑚𝑢𝑙𝑎
Assuming 𝜎12 ≠ 𝜎22 , i.e. unequal variances
Independent Samples Comparison of Two Population
Means – Formally Testing Equality of the
Population Variances Assumption
• We can formally test the equality of the population
variances rather than use the Rule O’ Thumb.
• In some situations it may also be of interest to compare the
population variances in addition to the population means.
• HYPOTHESES
𝐻𝑜 : 𝜎12 = 𝜎22
𝐻𝑎 : 𝜎12 ≠ 𝜎22 (or we could use a one-tailed alternative)
Test Statistic (for comparing two population variances)
𝐹 = 𝑚𝑎𝑥
𝑠22 𝑠12
,
𝑠12 𝑠22
~ F-distribution with
𝑛𝑢𝑚 𝑑𝑓 = 𝑛2 − 1, 𝑑𝑒𝑛 𝑑𝑓 = 𝑛1 − 1
𝑛𝑢𝑚 𝑑𝑓 = 𝑛1 − 1, 𝑑𝑒𝑛 𝑑𝑓 = 𝑛2 − 1
respectively. Large F statistic value  small p-value
(Reject Ho)
• There are several other tests for equality of variance.
Example: Heart Rate and Type of Admission
Type of
admission (TYP)
1 = ER
0 = non-ER
The F-test for comparing population
variances do not provide evidence of a
significant difference in heart rate
variation between the two groups of
patients (p = .3992).
None of the other tests (O’Brien, BrownForsythe, Levene, Bartlett) have
significant p-values either.
Given these results we could conduct a
pooled t-Test to compare the mean heart
rates.
Example: Heart Rate and Type of Admission
Type of
admission (TYP)
1 = ER
0 = non-ER
The two-tailed p-value = .0131, thus we conclude there is a statistically significant
difference in the population mean heart rates between these two populations of patients
admitted to the adult ICU.
Furthermore, we estimate that the mean heart rate for patients admitted to the adult ICU
through the emergency room anywhere from 2.26 bpm to 19 bpm larger than the mean
for those who were not admitted to the ICU through the emergency room.
Note: order of subtraction 1-0, i.e. 𝜇1 − 𝜇0 , i.e. ER mean – non-ER mean.
The results from the confidence interval lend themselves to a brief discussion of the
concept of practical significance and/or effect size (ES). While a difference in the
means of 19 bpm seems physiologically meaningful, the same could not be said for the
lower confidence limit which is roughly 2 bpm. We will examine the concepts of practical
significance and effect size in more detail later in the course.
The output from the non-pooled option (t-Test) is presented in exactly the same format.
Nonparametric Testing for Two
Independent Samples
• If the population distributions do not appear to be normally
distributed or if the sample sizes are “small”, we may choose to use
a nonparametric test to compare the size of the values from the two
populations.
• There a few options available but by far the most frequently used
nonparametric test for comparing a numeric response across two
populations is the Wilcoxon Rank Sum Test (also known as the
Mann-Whitney Test).
• The test utilizes the sum of the ranks assigned to observations from
the two populations when the two samples are combined.
Essentially the larger the difference in the rank sums when taking
the sample sizes into account, the more evidence we have against
equality of the two distributions in terms of the size of the values.
Nonparametric Testing for Two
Independent Samples
HYPOTHESES
𝐻𝑜 : 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2, i.e. the distribution of the two
populations is essentially the same, particularly in terms of the size of
the values.
𝐻𝑎 : 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 ≠ 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2, i.e. the distributions of the two
population is different, specifically we believe one distribution is
shifted to the right or left of the other.
Note: One-tailed alternatives are fine also, meaning we can specify
which population has larger values than the other in the alternative.
Here the alternative hypothesis states
population A is shifted to the right of
population B, i.e. population A has
larger values than population B.
Example: Heart Rate and Type of Admission
Type of
admission (TYP)
1 = ER
0 = non-ER
The Wilcoxon Rank Sum Test p-value = .0137,
thus we conclude the two populations of
patients differ in terms of their heart rate at
admission to the adult ICU. In particular, we
conclude those that were admitted to the adult
ICU via the ER had higher heart rates in
general than those not admitted through the
ER.
Comparing a Continuous Response
Between Three or More Populations
• As with two populations comparisons,
there are independent and dependent
sampling schemes when comparing
several populations.
• Assuming normality and equality of
population variances across populations
both situations use a form of Analysis of
Variance (ANOVA) to compare the means
of the populations.
Comparing a Continuous Response
Between Three or More Populations
• We will cover ANOVA in more detail later
in the course and review both one-way
ANOVA and randomized block designs as
part of that discussion.
• For now we will look at an quick example
of each.
Example: Age and Race
(Descriptive Summaries)
Race of Patient
1 = White
2 = Black
3 = Other
Although this may not be of
interest in this study, here we
compare the ages of patients in
this study across race classified
as white, black, or other.
White patients in the sample
were the oldest with a mean
age of 59, while the other two
race groups have a mean age
of around 47.
The age distributions do appear to be left-skewed or kurtotic (i.e. nonnormal) and the standard deviations differ enough that equality of
variances may be suspect.
Example: Age & Race
(Comparing Variances)
Race of Patient
1 = White
2 = Black
3 = Other
All four tests for equality of
variance do provide statistically
significant evidence of unequal
population variances (p > .05).
If these tests did suggest a
problem with the equality of
population variance assumption
we could use Welch’s ANOVA
(like the non-pooled t-Test) to
determine if the mean ages
differed across race.
Example: Age and Race (One-way ANOVA)
Race of Patient
1 = White
2 = Black
3 = Other
From the one-way ANOVA F-test we conclude that at least two population
means differ (p = .0222).
With only three populations controlling for the experiment-wise error rate using
Tukey’s HSD is not vital, as there are only three possible pairwise comparisons
(white vs. black, white vs. other, and black vs. other).
Example: Age & Race (Multiple Comparisons)
Race of Patient
1 = White
2 = Black
3 = Other
Using Tukey’s HSD we see that
none of the pairwise comparisons
suggest a difference between the
population means (all p > .05).
Two-sample t-Tests (pooled) not controlling for experiment-wise error rate (EER)
Without controlling for EER we see
that the mean ages of white and
black patients differ significantly
(p = .0283). However, the
estimated difference in means
covers a wide range 1.26 years to
22.24 years. On the low end of the
confidence interval this difference is
certainly inconsequential.
Example: Age and Race
(Nonparametric Test and Multiple Comparisons)
Race of Patient
1 = White
2 = Black
3 = Other
The nonparametric alternative to the one-way ANOVA F-test is the KruskalWallis test. We conclude the populations differ in terms of the age distributions
(p = .0110).
The nonparametric alternative to Tukey’s HSD is the Steel-Dwass Method which
suggests that the age distributions between white and black patients significantly
differ (p = .0268). Again the CI for the difference in typical ages is wide, from 1 year
to 25 years, with the low end representing a very small difference.
Methods for a Numeric Response
We have just reviewed the
following:
:
• One population inference
• Two population inference
• More than two population
inference
Covered both parametric and
nonparametric methods.
We will cover block designs
and their analysis when we
cover ANOVA in more detail
later in the course.
Methods for a Categorical Response
For a dichotomous
categorical response we
covered many of the
methods in the flow chart
to the left in the
prerequisite course.
A dichotomous response
has two levels which we
can generically classify as
“success” or “failure” or
“yes” or “no”.
We will briefly review some these methods
from the prerequisite course using the ICU
study data and data from other studies.
We will cover more
advanced methods for the
analysis of categorical data
later in the course.
ICU Study – variables & coding
The variable descriptions and coding are
found in this table.
Comments:
There are numerous dichotomous
variables in this study, vital status (STA)
is the primary outcome of interest.
Some of the dichotomous variables have
been created using continuous
measurements (e.g. PO2, PH, PCO,
etc.)
The Level of Consciousness variable
(LOC) could be treated as ordinal as the
levels indicate increasing states of
unresponsiveness.
Summary of Inference for Single Proportion (p)
Assuming the sample size n sufficiently “large”.
Test Statistic
z
pˆ  p o
p o (1  p o )
n
~ standard normal
Confidence Interval for p
pˆ (1  pˆ )
pˆ  z 
n
Sample size required for
margin of error (E) with
95% confidence assuming
prior value for p
 1.96 2 p (1  p ) 

n  
2
E


Conservative approach
1.96 2
n
4E 2
Summary of Inference for Single Proportion (p)
Exact inferential methods using the binomial distribution
Binomial Exact Test (one-sided)
𝐻𝑜 : 𝑝 = 𝑝𝑜 and 𝐻𝑎 : 𝑝 < 𝑝𝑜 𝑜𝑟 𝐻𝑎 : 𝑝 > 𝑝𝑜
Find the probability of observing the number of successes as
extreme or more extreme than those observed (𝑥) assuming
the null is true.
Use a binomial table to calculate the p-value
𝐹𝑜𝑟 𝐻𝑎 : 𝑝 > 𝑝𝑜 𝑡ℎ𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝑋 ≥ 𝑥 𝑛, 𝑝𝑜 )
𝐹𝑜𝑟 𝐻𝑎 : 𝑝 < 𝑝𝑜 𝑡ℎ𝑒 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃(𝑋 ≤ 𝑥|𝑛, 𝑝𝑜 )
A two-sided alternative would have p-value equal to the smaller
of the probabilities above multiplied by 2.
Summary of Inference for Single Proportion (p)
Exact inferential methods using the binomial distribution
Binomial Exact 95% Confidence Interval for
Use a binomial table to find the proportions that make
the following probability statements true:
𝑃 𝑋 ≥ 𝑥 𝑛, 𝑝𝐿 ) = .025
𝑃 𝑋 ≤ 𝑥 𝑛, 𝑝𝑈 = .025
The Exact 95% Confidence Interval for p is given by
(𝑝𝐿 , 𝑝𝑈 )
Example: Gender of ICU Patients
Research Question: Is there evidence that a
majority of adult ICU admissions are men?
Here the parameter of interest is :
p = proportion of adult ICU admissions
that are men.
In our sample of n = 200 patients 124 or 62%
were men, which certainly represents a
majority. However, this could be due to
sampling variation and in actuality there is an
equal balance of ICU admissions based on
gender.
Example: Gender of ICU Patients
Research Question: Is there evidence that a
majority of adult ICU admissions are men?
𝐻𝑜 : 𝑝 ≤ .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠
𝐻𝑎 : 𝑝 > .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠
.62 − .50
𝑧=
= 3.39 → 𝑃 𝑍 > 3.39 = .00034
.50(1 − .50)
200
Thus we have evidence that a majority of patients admitted to the
adult ICU are males.
95% 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑝
. 62 ± 1.96
.62(1−.62)
200
= (.5527, .6873) or (55.27%, 68.73%)
Example: Gender of ICU Patients
Research Question: Is there evidence that a
majority of adult ICU admissions are men?
𝐻𝑜 : 𝑝 ≤ .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠
𝐻𝑎 : 𝑝 > .50, 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑚𝑎𝑙𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑖𝑛 𝐼𝐶𝑈 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠
𝑃 𝑋 ≥ 124 𝑛 = 200, 𝑝 = .50 = .000423
Thus we have evidence that a majority of patients admitted to the
adult ICU are males.
𝐸𝑥𝑎𝑐𝑡 95% 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑝
𝑃 𝑋 ≥ 124 𝑛 = 200, 𝑝 = .549 = .0252
𝑃 𝑋 ≤ 124 𝑛 = 200, 𝑝 = .687 = .0259
Thus a Exact 95% CI for p is (54.9%, 68.7%).
Independent Samples Comparison
of Two Population Proportions
For testing equality of two proportions
Ho: p1 = p2 or (p1 – p2) = 0
The possible alternatives are:
Ha: p1 > p2 or (p1 – p2) > 0 (upper-tailed)
Ha: p1 < p2 or (p1 – p2) < 0 (lower-tailed)
Ha: p1  p2 or (p1 – p2)  0 (two-tailed)
Note: If we wanted to establish that one proportion was say e.g. at least
.10 or 10 percentage points larger than the other we could replace 0 in
these statements by .10. In general to establish a difference of at least
D , then we replace 0 by D.
Independent
Samples
Comparison
of Two
Test Statistic
for Large
Independent
Samples
Population Proportions (𝑝1 𝑣𝑠. 𝑝2 )
For testing to see if difference is at least D
Ho: (p1 – p2) = D
HA: (p1 – p2) > D
(p1 – p2) < D
(upper-tail)
(lower-tail)
Provided n1p1 > 10 & n1q1 > 10 and
 Most important case
n2 p2 > 10 & n2q2 > 10
Independent Samples Comparison of Two
Population Proportions (𝑝1 𝑣𝑠. 𝑝2 )
Provided n1p1 > 10 & n1q1 > 10
n2 p2 > 10 & n2q2 > 10
The confidence interval for (p1 – p2) has a general form:
z-values
90%
z = 1.645
95%
z = 1.960
99%
z = 2.576
Example: Comparing Service at
Admission Across Survival Status
• The ICU study is a case control study – that
is 40 patients who died and 160 who did not
die were sampled and the admission related
variables were collected.
• Because of this we cannot calculate the
probability of patient death using these data.
• To identify variables related to survival we
use vital status (STA) as the population
identifier, i.e. as the X variable in JMP.
Example: Comparing Service at
Admission Across Survival Status
Amongst the patients in the study who
died in the ICU, 65% were admitted
from the Medical unit and 35% from
the Surgical unit. For patients that did
not die 58.1% were admitted from the
Surgical unit and 41.9% were admitted
from the Medical unit.
These percentages are used to construct
the mosaic plot and are displayed in
the cells of the plot. The 2 X 2
contingency table below the plot gives
frequencies and row percentages (i.e. a
percentage breakdown of the column
variable within each row). You can see
the row %’s are the same as those
discussed above.
Example: Comparing Service at
Admission Across Survival Status
The large sample test p-values and
confidence interval for the difference in
the proportions are given under the
heading Two Sample Test for
Proportions. The proportion of patients
admitted to the ICU from the surgical
unit is significantly higher for those that
survived (p = .0038). This finding is
certainly expected.
We estimate that the percentage of
patients coming from the surgical unit is
between 5.9 and 38.7 percentage points
higher for ICU survivors. The difference
in proportions is also known as the
attributable risk (AR).
Example: Comparing Service at
Admission Across Survival Status
Another large sample test for 2 X 2 tables is
the chi-square test, either Pearson’s or
Likelihood Ratio, which suggests that the
proportion of patients coming the surgical
unit differs for survivors and non-survivors
(p = .0087 or .0085).
The Fisher’s Exact Test p-values do not rely
on the large sample assumption. This test is
preferable to either of the large sample
procedures. The alternative hypothesis is
communicated along with the associated pvalues. The Left p-value = .0071, which leads
us to conclude that the proportion of patients
coming from the surgical unit is higher for
the survivor group.
Example: Comparing Service at
Admission Across Survival Status
The Odds Ratio (OR) is used to quantify risk
when a case-control study was used. The
easiest way to calculate the OR is the formula:
𝑂𝑅 =
𝑎𝑑
𝑏𝑐
The a cell in table corresponds to those that
have the adverse outcome (in this case death)
and have the risk factor present – which in
this case is coming from the medical unit (vs.
surgical unit). Thus a = 26 and subsequently
b = 14, c = 67, and d = 93.
Thus the estimated OR is
𝑂𝑅 =
26 ∙ 93
= 2.58
67 ∙ 14
Example: Comparing Service at
Admission Across Survival Status
From the previous slide the estimated odds ratio is 𝑂𝑅 =
26∙93
67∙14
= 2.58
However JMP reports a different OR, this is because JMP does computations alphanumerically, essentially reversing the roles of 0 and 1. If JMP gives an OR that is
inconsistent with your calculation, then you simply need to reciprocate the OR from
JMP.
OR = 1/.388 = 2.58, giving the result we want.
Thus the 95% CI for the OR is given by (1/.79828, 1/.188511) = (1.25 , 5.30) .
Patients admitted to the ICU from the medical unit have at a minimum a 25%
increase in odds for death. We estimate the odds ratio is between 1.25 and 5.30.
Quick Recap
o
o
We have just compared the proportion of
patients in both service at admission
categories across survival status (p1 vs. p2)
using the z-test, a CI for (p1 – p2) &
Fisher’s Exact Test.
Computed the Odds Ratio (OR) and
found a CI for the population OR.
Development of a Test Statistic to
Measure Lack of Independence
One way to generalize the question of interest to
the researchers is to think of it as follows:
Q: Is there an association between the
service at admission and the survival
status of patients admitted to the adult
ICU?
Development of a Test Statistic to
Measure Lack of Independence
If there is not an association, we say that
these variables are independent.
In the probability we say that two events A
and B are said to be independent if
P(A|B) = P(A).
Development of a Test Statistic to
Measure Lack of Independence
In the context of our study this would mean
for example,
P(Medical|Patient Survived) = P(Medical)
i.e. knowing that the patient survived tells you
nothing about the chance that they came to the
ICU from the medical unit vs. the surgical unit.
Development of a Test Statistic to
Measure Lack of Independence
P(Medical) = 93/200
= .465
In this study 46.5% of
the patients admitted to
the adult ICU came from
the medical unit.
When we consider this percentage conditioning on survival
status we see that relationship for independence does not
hold for these data.
P(Medical|Died) = 26/40 = .650
P(Medical|Survived) = 67/160 = .419
Should both be
equal to .465
Development of a Test Statistic to
Measure Lack of Independence
o
o
Of course the observed differences could be
due to random variation and in truth it may be
the case that disease and risk factor status are
independent.
Therefore we need a means of assessing how
different the observed results are from what
we would expect to see if the these two factors
were independent.
2 X 2 Example: Case-Control Study
Survival Status and Service at Admission
Survival
Status
Serviced in Serviced in
Medical
Surgical
Unit
Unit
Died
(Case)
26
Survived
(Control)
Column
Totals
67
a
14
b
Row
Totals
40
R1
93
160
d
R2
93
107
200
C1
C2
c
n
Development of a Test Statistic to
Measure Lack of Independence
The unconditional probability of
risk presence (admission from
medical) for these data is given
by:
C1
P( Risk ) 
n
From this table we can calculate the conditional probability of
admission from medical given that the patient died as follows:
R1C1
a and setting these a C1

a
P( Risk | Disease ) 
n
n
R1 to equal we have R1
Development of a Test Statistic to
Measure Lack of Independence
Thus we expect the frequency in the a cell to be equal to:
R1C1
a
n
Similarly we find the following expected frequencies for
the cells making up the 2 X 2 table
R1C1
a
n
R2 C1
c
n
R1C 2
b
n
R2 C 2
d
n
Development of a Test Statistic to
Measure Lack of Independence
In general we denote the observed frequency in
the ith row and jth column as Oij or just O for
short.
We denote the expected frequency for the ith row
and jth column as
Eij 
Ri C j
n
𝑅𝑖 = 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑟𝑜𝑤 𝑖
or just E for short.
𝑎𝑛𝑑
𝐶𝑗 = 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑐𝑜𝑙𝑢𝑚𝑛 𝑗
Development of a Test Statistic to
Measure Lack of Independence
o
o
To measure how different our observed results
are from what we expected to see if the two
variables in question were independent we
intuitively should look at the difference between
the observed (O) and expected (E) frequencies,
i.e. O – E or more specifically Oij  Eij
However this will give too much weight to
differences where these frequencies are both
large in size.
Development of a Test Statistic to
Measure Lack of Independence
o
One test statistic that addresses the “size” of the
frequencies issue is Pearson’s Chi-Square (c2)
c 
2
(O  E )

all cells
r
c
 
i 1 j 1
(O
ij
2
E
 Eij )
Eij
2
Notice this test statistic still uses (O – E) as
the basic building block. This statistic will
be large when the observed frequencies do
NOT match the expected values for
independence.
~ c 2 ~ chi - squared distributi on
wit h df  (r  1)  (c  1)
Chi-square Distribution (c2)
p-value
c2
This is a graph of the chi-square distribution with 4 degrees
of freedom. The area to the right of Pearson’s chi-square
statistic give the p-value. The p-value is always the area to the
right!
2 X 2 Example: Case-Control Study
Survival Status and Service at Admission
Survival
Status
Died
(Case)
Healthy
(Control)
Column
Totals
Served by
Medical
Unit
Served by
Surgical
Unit
Row
Totals
40
26
O11
14
O12
67
O21
93
O22
160
93
107
200
C1
C2
R1
R2
n
Calculating Expected Frequencies
Survival Status and Service at Admission
Survival
Status
Died
(Case)
Survived
(Control)
Column
Totals
Served by
Medical
Unit
26
Served by
Surgical
Unit
Row
Totals
(18.6)
67
(21.4)
93
40
R1
160
(74.4)
(85.6)
R2
107
366
n
93
C1
14
C2
R1C1 40  93

n
200
 18.6
R1C2 40 107
E12 

n
200
 21.4
R C 160  93
E21  2 1 
n
200
 74.4
R2C2 160 107
E22 

n
200
 85.6
E11 
Eij  expected frequency for ith row and jth cell
Calculating the Pearson Chi-square
c 
2

all cells
(O  E )2
E
2
2
(
26  18.6 ) (14  21.4 )


18.6
21.4
2
2
(
67  74.4 ) (93  85.6 )


74.4
85.6
 2.94  2.56  .736  .640  6.879
c 2  6.879 df  (2  1)  (2  1)  1
p-value  .0087
http://www.stat.tamu.edu/~west/applets/chisqdemo.html
Chi-square Probability Calculator in JMP
Enter the test statistic value and df and the
p-value is automatically calculated.
p-value = P(c2 > 6.879)  .0087
2 X 2 Example: Case-Control Study
Service at Admission and Survival Status
Conclusion:
We have strong evidence to suggest that at service at
admission and survival status are NOT independent,
and thus conclude they are associated or related
(p =.0087). In particular, we found that the
proportion of patients admitted to the adult ICU
from the medical unit was higher amongst patients
who did not survive.
Dependent Samples Comparison of Two
Population Proportions (𝑝1 𝑣𝑠. 𝑝2 )
• The test used to compare two proportions
using dependent samples is called
McNemar’s test.
• As with most tests, there are both a “large”
sample and “small” sample version of the
test.
• The small sample version uses the binomial
distribution and is an exact test, so
technically there is no reason to use the large
sample version, though many do.
Example: Low pH and Elevated PCO2
Levels
• For each patient in the ICU study the pH
levels and PCO2 levels found in their blood
gases were measured. If pH levels were
below 7.5 they were coded as being low (1)
or not (0). If PCO2 levels were above 45
mmHg they were coded as being high (1) or
not (0).
• pH < 7.5  Low pH (bad)
• PCO2 > 45 mmHg  Elevated PCO2 (bad)
Example: Low pH and Elevated PCO2
Levels
• If we wish to compare the proportion of
patients with low/“bad” pH levels 𝑝1 to
the proportion of patients with elevated/
“bad” PCO2 levels (𝑝2 ) we could not
compare them using the independent
samples approach because these
measurements are being made on the
same patients. Thus we have dependent
samples.
Example: Low pH and Elevated PCO2 Levels
The mosaic plot shows that the
relationship between pH and PCO2
levels. Patients with low pH levels are
more likely to also have high PCO2
levels. Amongst those with low pH 61.5%
have high PCO2 levels and amongst
those with normal pH levels only 6.4%
have high PCO2 levels.
Fisher’s Exact test confirms that the
difference in the percentages discussed
above are statistically significant (p <
.0001). We can conclude that have low
pH levels are more likely to have high
PCO2 levels.
This analysis however, does not compare
the incidence of these two conditions to
one another, it only suggests that the two
conditions are significantly related.
Example: Low pH and Elevated PCO2 Levels
The 2 X 2 contingency table constructed
by cross-tabulating these levels vs. one
another is shown to the left.
20
We can see that 200 = .10 𝑜𝑟 10% of the
patients have high PCO2 levels.
13
Also 200 = .065 𝑜𝑟 6.5% of the same
patients have low pH levels.
The McNemar’s test p-value is
not significant (p = .0896).
Therefore we cannot conclude
the differences in these two
percentages is statistically
significant. This is a two-sided
p-value!
So in our sample of ICU patients we see
that a higher percentage of them have
elevated PCO2 levels, but is this
difference statistically significant?
McNemar’s test is used to determine this.
The results of this test from JMP (using
the large sample test) is shown to the left.
Exact McNemar’s Test: p-values
(uses binomial distribution)
Ha: p1 > p2 Reject Ho if
Ha: p1 < p2 Reject Ho if P ( X  c | n  (b  c), p  .50)  
Ha: p1 = p2 Reject Ho if
2  P( X  max(b, c) | n  (b  c), p  .50)  
Use either binomial probability
tables or computer software to
find these probabilities.
Example: Low pH and Elevated PCO2 Levels
Here 𝑏 = 12 and 𝑐 = 5 , therefore we
have 12 + 5 = 17 discordant pairs.
If our research hypothesis was that a
greater proportion of patients had
elevated PCO2 levels than had low pH
levels the p-value is found using the
binomial distribution as:
𝑃 𝑋 ≥ 12 𝑛 = 17, 𝑝 = .50 = .0717
Exact McNemar’s Test
using the binomial
distribution.
If our research hypothesis was that a
greater proportion of patients had low pH
levels than had elevated PCO2 levels the
p-values is found using the binomial
distribution as:
𝑃 𝑋 ≥ 5 n = 17, p = .50 = .9755
Notice the difference in the twotailed p-values from the exact vs.
large sample approximation.
Finally the two-tailed p-value = 2 × .0717
= .1434
Methods for a Categorical Response
For a dichotomous
categorical response we
have just reviewed the
following:
• One population inference
• Two population inference
• Covered both large sample
and exact methods.
We will cover the Cox-Stuart
(or Cochran-Armitage) test for
trend later in the course when
cover more advanced
methods for analyzing
categorical data.
Methods for a Categorical Response
• Data in 2 X 2 Tables (covered above)
Comparing two population proportions using
independent samples (Fisher’s Exact Test)
Comparing two population proportions using
dependent samples (McNemar’s Test)
Relative Risk (RR), Odds Ratios (OR), Risk Difference,
Attributable Risk (AR), & NNT/NNH
• Data in r X c Tables
Tests of Independence & Association and
Homogeneity.
Example: Response to Treatment and
Histological Type of Hodgkin’s Disease
In this study a random sample of 538 patients
diagnosed with some form of Hodgkin’s Disease was
taken and the histological type: nodular sclerosis
(NS), mixed cellularity (MC), lymphocyte
predominance (LP), or lymphocyte depletion (LD)
was recorded along with the outcome from standard
treatment which was recorded as being none, partial,
or complete remission.
Q: Is there an association between type of
Hodgkin’s and response to treatment? If so,
what is the nature of the relationship?
Example: Response to Treatment and
Histological Type of Hodgkin’s Disease
Type
Row
Totals
None
Partial
Positive
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
Column
Totals
126
98
314
n = 538
Some Probabilities of Potential
Interest
Probability of Positive Response to
Treatment
P(positive) = 314/538 = .5836
Probability of Positive Response to
Treatment Given Disease Type
P(positive|LD) = 18/72 = .2500
P(positive|LP) = 74/104 = .7115
P(positive|MC) = 154/266 = .5789
P(positive|NS) = 68/96 = .7083
Notice the conditional probabilities
are not equal to the unconditional!!!
Mosaic plot of the results
Response to Treatment vs. Histological Type
Clearly we see that LP and NS
respond most favorably to
treatment with over 70% of
those sampled having
experiencing complete
remission, whereas lymphocyte
depletion has a majority
(61.1%) of patients having no
response to treatment.
A statistical test at this point seems unnecessary as it seems clear
that there is an association between the type of Hodgkin’s disease
and the response to treatment, nonetheless we will proceed…
Example: Response to Treatment and
Histological Type of Hodgkin’s Disease
Type
Row
Totals
None
Partial
Positive
LD
44
(16.86)
10
(13.11)
18
(42.02)
72
LP
12
(24.36)
18
(18.94)
74
(60.69)
104
E12
MC
58
(62.30)
54
(48.45)
154
(155.25)
266
NS
12
(22.48)
16
(17.49)
68
(56.03)
96
126
98
314
n = 538
Column
Totals
R1C1 72  126
E11 

n
538
 16.86
R1C 2 72  98


n
538
 13.11
R1C 3 72  314

n
538
 42.02
E 21 
...
E 43
R4 C 3 96  314


n
538
 56.03
Pearson’s Chi-Square Test of
Independence
Pearson’s Chi-Square (c2)
c2 
(O  E )2

all cells
r
c
 
i 1 j 1
Notice this test statistic still uses (O – E) as
the basic building block. This statistic will
be large when the observed frequencies do
NOT match the expected values for
independence.
(O
ij
E
 Eij )
Eij
2
~ c 2 ~ chi - squared distributi on
wit h df  (r  1)  (c  1)
Chi-square Distribution (c2)
p-value
c2
This is a graph of the chi-square distribution with 4 degrees
of freedom. The area to the right of Pearson’s chi-square
statistic give the p-value. The p-value is always the area to the
right!
Example: Response to Treatment and
Histological Type of Hodgkin’s Disease
Type
LD
Row
Totals
None
44
(16.86)
Partial
10
(13.11)
Positive
18
(42.02)
c 
2

all cells
r
72
(O  E )2
c
c 2  
i 1 j 1
E
(O
ij
 E ij )
2
E ij
2
(
44  16.86 )
c 
2
LP
12
(24.36)
18
(18.94)
74
(60.69)
104
MC
58
(62.30)
54
(48.45)
154
(155.25)
266
NS
Column
Totals
12
(22.48)
16
(17.49)
68
(56.03)
96
126
98
314
n = 538
2
(
10  13.11)

16.86
13.11
2
(
68  56.03)
 ... 
 75.89
56.03
c 2  75.89 df  6
p  value  .0001
We have strong evidence of an
association between the type of
Hodgkin’s and response to
treatment (p < .0001).
Summary of Review
•
•
•
We have reviewed most of the methods covered in the
prerequisite course that were organized in the flow
charts for a numeric response and for a dichotomous
categorical response.
Additionally we reviewed the chi-square test of
independence for r x c contingency tables.
The other major topics covered in the prerequisite
course that were not reviewed are basic study design,
correlation, and regression modeling. We will review
and extend our coverage of these topics later in the
course.