Download Descriptive Statistics and

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Introduction to Biostatistics
Descriptive Statistics and
Sample Size Justification
Julie A. Stoner, PhD
October 18, 2004
1
Statistics Seminars
• Goal: Interpret and critically evaluate
biomedical literature
• Topics:
– Sample size justification
– Exploratory data analysis
– Hypothesis testing
2
Example #1
• Aim: Compare two antihypertensive
strategies for lowering blood pressure
– Double-blind, randomized study
– 5 mg Enalapril + 5 mg Felodipine ER to 10 mg
Enalapril
– 6-week treatment period
– 217 patients
– AJH, 1999;12:691-696
3
Example #2
• Aim: Demonstrate that D-penicillamine (DPA) is
effective in prolonging the overall survival of
patients with primary biliary cirrhosis of the liver
(PBC)
–
–
–
–
–
Mayo Clinic
Double-blind, placebo controlled, randomized trial
312 patients
Collect clinical and biochemical data on patients
Reference: NEJM. 312:1011-1015.1985.
4
Example #2
• Patients enrolled over 10 years, between January 1974 and
May 1984
• Data were analyzed in July 1986
• Event: death (x)
• Censoring: some patients are still alive at end of study (o)
1/1974
5/1984
6/1986
_____________________________X
___________________________o
________________________o
5
Statistical Inference
• Goal: describe factors associated with
particular outcomes in the population at
large
• Not feasible to study entire population
• Samples of subjects drawn from population
• Make inferences about population based on
sample subset
6
Why are descriptive statistics
important?
• Identify signals/patterns from noise
• Understand relationships among variables
• Formal hypothesis testing should agree with
descriptive results
7
Outline
• Types of data
– Categorical data
– Numerical data
• Descriptive statistics
– Measures of location
– Measures of spread
• Descriptive plots
8
Types of Data
• Categorical data: provides qualitative description
– Dichotomous or binary data
• Observations fall into 1 of 2 categories
• Example: male/female, smoker/non-smoker
– More than 2 categories
• Nominal: no obvious ordering of the categories
– Example: blood types A/B/AB/O
• Ordinal: there is a natural ordering
– Example: never-smoker/ex-smoker/light
smoker/heavy smoker
9
Types of Data
• Numerical data (interval/ratio data)
– Provides quantitative description
– Discrete data
• Observations can only take certain numeric values
• Often counts of events
• Example: number of doctor visits in a year
– Continuous data
• Not restricted to take on certain values
• Often measurements
• Example: height, weight, age
10
Descriptive Statistics:
Numerical Data
• Measures of location
– Mean: average value
For n data points, x1, x2,, …, xn the mean is the
sum of the observations divided by the
number of observations
1
x 
n
n
x
i
i 1
11
Descriptive Statistics:
Numerical Data
• Measures of location
– Mean:
• Example: Find the mean triglyceride level
(in mg/100 ml) of the following patients
159, 121, 130, 164, 148, 148, 152
Sum = 1022, Count = 7,
Mean = 1022/7 = 146
12
Descriptive Statistics:
Numerical Data
• Measures of location
– Percentile: value that is greater than a
particular percentage of the data values
• Order data
• Pth percentile has rank r = (n+1)*(P/100)
– Median: the 50th percentile, 50% of the data
values lie below the median
13
Descriptive Statistics:
Numerical Data
• Measures of location
– Median
• Example: Find the median triglyceride level from
the sample
159, 121, 130, 164, 148, 148, 152
Order: 121, 130, 148, 148, 152, 159, 164
Median: rank = (7+1) * (50/100) = 4
4TH ordered observation is 148
14
Descriptive Statistics:
Numerical Data
• Measures of location
– Mode: most common element of a set
– Example: Find the mode of the triglyceride
values
159, 121, 130, 164, 148, 148, 152
Mode = 148
15
Descriptive Statistics:
Numerical Data
• Measures of location: comparison of mean
and median
– Example: Compare the mean and median from
the sample of triglyceride levels
159, 141, 130, 230, 148, 148, 152
Mean = 1108/7=158.29, Median = 148
– The mean may be influenced by extreme data
points.
16
Skewed Distributions
• Data that is not symmetric and bell-shaped is skewed.
Positive skew, or skewed to the
right, mean > median
Negative skew, or skewed to the
left, mean < median
• Mean may not be a good measure of central
tendency. Why?
17
Motivation
• Example:
1) 2 60 100
 =54
2) 53 54 55
 =54
• Both data sets have a mean of 54 but
scores in set 1 have a larger range
and variation than the scores in set 2.
18
Descriptive Statistics:
Numerical Data
• Measures of spread
– Variance: average squared deviation from the
mean
For n data points, x1, x2,, …, xn the variance is
n
s
2
2
1
 xi  x 


n  1 i 1
– Standard deviation: square root of variance, in
same units as original data
19
Descriptive Statistics:
Numerical Data
• Measures of spread
– Standard Deviation:
• Example: find the standard deviation of the
triglyceride values
159, 121, 130, 164, 148, 148, 152
Distance from mean: 13, -25, -16, 18, 2, 2, 6
Sum of squared differences: 1418
Standard deviation: sqrt(1418/6)=15.37
20
Descriptive Statistics:
Numerical Data
• Standard deviation: How much variability can we
expect among individual responses?
• Standard error of the mean: How much variability
can we expect in the mean response among
various samples?
21
Descriptive Statistics:
Numerical Data
• The standard error of the mean is estimated as
s.d .
s.e.m. 
n
where s.d. is the estimated standard deviation
• Based on the formula, will the standard error of the mean
will always be smaller or larger than the standard deviation
of the data?
– Answer: smaller
22
Descriptive Statistics:
Numerical Data
• Measures of spread
– Minimum, maximum
– Range: maximum-minimum
– Interquartile range: difference between 25th and 75th
percentile, values that encompass middle 50% of data
23
Descriptive Statistics:
Numerical Data
• Measures of spread
– Example: find the range and the interquartile range for
the triglyceride values
159, 121, 130, 164, 148, 148, 152
Range: 164 - 121 = 43
Interquartile Range:
Order: 121, 130, 148, 148, 152, 159, 164
IQR:
159 - 130 = 29
24
Descriptive Statistics:
Numerical Data
• Helpful to describe both location and spread of
data
– Location: mean
Spread: standard deviation
– Location: median
Spread: min, max, range
interquartile range
quartiles
25
Descriptive Statistics:
Categorical Data
• Measures of distribution
– Proportion:
Number of subjects with characteristics
Total number subjects
– Percentage:
Proportion * 100%
26
Descriptive Statistics:
Categorical Data
• Measures of distribution: example
No Flu
Flu
Vaccinated
202
198
400
Not
Vaccinated
179
221
400
381
419
800
• What percentage of vaccinated individuals
developed the flu?
198/400 = 0.495 49.5%
27
Example
• Consider the table of descriptive statistics for
characteristics at baseline
Parameter
Number
Gender
(% Male)
Age years
Mean (SD)
Enalapril+
Felodipine ER
109
61%
Enalapril
52(9)
53(11)
108
54%
• What do we conclude about comparability of the
groups at baseline in terms of gender and age?
28
Descriptive Plots:
• Single variable
– Bar plot
– Histogram
– Box-plot
• Multiple variables
– Box-plot
– Scatter plot
– Kaplan-Meier survival plots
29
Barplot
• Goal: Describe the distribution of values
for a categorical variable
• Method:
– Determine categories of response
– For each category, draw a bar with height equal
to the number or proportion of responses
30
Barplot
31
Histogram
• Goal: Describe the distribution of values
for a continuous variable
• Method:
– Determine intervals of response (bins)
– For each interval, draw a bar with height equal
to the number or proportion of responses
32
Histogram
33
Box-plot
• Goal: Describe the distribution of values for a continuous
variable
• Method:
– Determine 25th, 50th, and 75th percentiles of
distribution
– Determine outlying and extreme values
– Draw a box with lower line at the 25th percentile,
middle line at the median, and upper line at the 75th
percentile
– Draw whiskers to represent outlying and extreme
values
34
Boxplot
75th percentile
Median
25th percentile
35
Box-plot
36
Scatter Plot
• Goal: Describe joint distribution of values
from 2 continuous variables
• Method:
– Create a 2-dimensional grid (horizontal and
vertical axis)
– For each subject in the dataset, plot the pair of
observations from the 2 variables on the grid
37
Scatter Plot
38
Scatter Plot
39
Kaplan-Meier Survival Curves
• Goal: Summarize the distribution of times to an
event
• Method:
– Estimate survival probabilities while
accounting for censoring
– Plot the survival probability corresponding to
each time an event occurred
40
Kaplan-Meier Survival Curves
41
Kaplan-Meier Survival Curves
42
Kaplan-Meier Survival Curves
43
Descriptive Plots Guidelines
• Clearly label axes
• Indicate unit of measurement
• Note the scale when interpreting graphs
44
Descriptive Statistics
Exercises
45
Example
• Below are some descriptive plots and
statistics from a study designed to
investigate the effect of smoking on the
pulmonary function of children
• Tager et al. (1979) American Journal of
Epidemiology. 110:15-26
46
Example
• The primary question, for this exercise, is whether
or not smoking is associated with decreased
pulmonary function in children, where pulmonary
function is measured by forced expiratory volume
(FEV) in liters per second.
• The data consist of observations on 654 children
aged 3 to 19.
47
GENDER
Female
Male
Total
SMOKING
STATUS
NonSmoker
smoker
279
39
310
26
589
65
Total
318
336
654
• Proportion Male:
– (336/654)100% = 51.4%
• Proportion Smokers:
– (65/654)100% = 9.9%
• Proportion of Smokers who are Male:
– (26/65)100% = 40%
48
Compare the FEV1 distribution between
smokers and non-smokers
• Answer
– The smokers appear
to have higher FEV values
and therefore better lung
function. Specifically, the
median FEV for smokers is
3.2 liters/sec. (IQR 3.75-3=0.75)
compared to a median FEV of
2.5 liters/sec. (IQR 3-2=1) for
non-smokers.
49
Compare the age distribution between
smokers and non-smokers.
• Answer:
– The smokers are
older than the nonsmokers in general.
Specifically, the median
age for the smokers is
13 years (IQR 15-12=3)
compared to 9 years
(IQR 11-8=3) for the
non-smokers.
50
Can you explain the apparent differences in
pulmonary function between smokers and
non-smokers displayed in Figure 1?
• The relationship between FEV and smoking
status is probably confounded by age
(smokers are older and older children have
better lung function). A comparison of FEV
between smokers and non-smokers should
account for age.
51
Sample Size Justification
52
Outline
• Statistical Concepts: hypotheses and errors
• Effect size and variation
• Influence on sample size and power
53
Sample Size Justification
• Example: Intensifying Antihypertensive Treatment
– “A sample size calculation indicated that 114 patients
per treatment group would be necessary for 90% power
to detect a true mean difference in change from baseline
of 3 mm Hg in sitting DBP between the two
randomized treatment groups. This calculation
assumed a two-sided test, =0.05, and standard
deviation in sitting DBP of 7 mm Hg.”
Source: AJH. 1999;12:691-696
54
Importance of Careful Study Design
• Goal of sample size calculations:
– Adequate sample size to detect clinically meaningful
treatment differences
– Ethical use of resources
• Important to justify sample size early in planning stages
• Examples of inadequate power:
– NEJM 299:690-694, 1978
55
Type of Response
• Sample size calculations depend on type of response
variable and method of analysis
– Continuous response
• Example: cholesterol, weight, blood pressure
– Dichotomous response
• Example: yes/no, presence/absence, success/failure
– Time to event
• Example: survival time, time to adverse event
56
Statistical Concepts
Hypotheses
• Null hypothesis: H0
– Typically a statement of no treatment effect
– Assumed true until evidence suggests otherwise
– Example: H0: No difference in DBP between treatment
groups
• Alternative: HA
– Reject null hypothesis in favor of alternative hypothesis
– Often two-sided
– Example: HA: DBP differs between treatment groups
57
Statistical Concepts
Hypotheses
• Alternative hypothesis may be one-sided or two-sided
– Example:
• Null hypothesis: Mean DBP is same in patients
receiving different treatments
• Alternative hypothesis:
– One-sided: Mean DBP is lower in patients
receiving treatment A
– Two-sided: Mean DBP is different in patients
receiving treatment A relative to treatment B
• Choice of alternative does affect sample size calculations.
Typically a two-sided test is recommended.
58
Statistical Concepts
Errors
• Errors associated with hypothesis testing
TRUTH
STUDY
Association
No
Association
Reject
Null
Correct
Fail to
Reject
Null
Type II
Error
False
negative
Type I Error
False
positive
Correct
59
Statistical Concepts
Significance Level
• Significance level: 
– Probability of a Type I error
– Probability of a false positive
– Example: If the effect on DBP of the treatments do not
differ, what is the probability of incorrectly concluding
that there is a difference between the treatments?
– When calculating sample size, we need to specify a
significance level, meaning, the probability that we will
detect a treatment effect purely by chance.
– Typically chosen to be 5%, or 0.05
60
Statistical Concepts
Power
• Power: (1-)
– Probability of detecting a true treatment effect
–
(1- probability of a false negative)
= (1-probability of Type II error)
= (1-) = probability of a true positive
– Example: If the effects of the treatments do differ, what
is the probability of detecting such a difference?
– Typically chosen to be 80-99%
61
Treatment Effect
• What is the minimal, clinically significant difference in
treatments we would like to detect?
• Pilot studies may indicate magnitude
• Example: The authors felt that a 3 mm Hg difference in
DBP between the treatment groups was clinically
significant
62
Variability in Response
• To estimate sample size, we need an estimate of the
variability of the response in the population
• Estimate variability from pilot or previous, related study
• Example: The authors estimate that the standard deviation
of DBP is 7 mm Hg.
63
Factors Influencing Sample Size
Assuming all other factors fixed,
•  power 
 sample size
•  significance level 
 sample size
•  variability in response 
 sample size
•  significant difference 
 sample size
64
Factors Influencing Power
Assuming all other factors fixed,
•  significance level 
 power
•  significant difference 
 power
•  variability in response 
 power
•  sample size 
 power
65
Summary
• Sample size calculations are an important component of
study design
• Want sufficient statistical power to detect clinically
significant differences between groups when such
differences exist
• Calculated sample sizes are estimates
• Can manipulate sample size formulas to determine:
– What is the power for detecting a particular difference given the
sample size employed?
– What difference can be detected with a certain amount of power
given the sample size employed?
66
Factors Influencing Sample Size
• A double-blind randomized trial was conducted to
determine how inhaled corticosteroids compare
with oral corticosteroids in the management of
severe acute asthma in children. In the study, 100
children were randomized to receive one dose of
either 2 mg of inhaled fluticasone or 2 mg of oral
prednisone per kilogram of body weight. The
primary outcome was forced expiratory volume
(as a percentage of the predicted value) 4 hours
after treatment administration.
• Schuh et al., (2000) NEJM. 343(10)689-694.
67
Factors Influencing Sample Size
• The null hypothesis is that the mean FEV, as
a percentage of predicted value, is the same
for both treatment groups.
• The alternative hypothesis is that the mean
FEV, as a percentage of predicted value, is
different for the two treatment groups.
68
• What is a Type I Error in this example?
– Incorrectly concluding that the treatments differ
• What is a Type II Error in this example?
– Failing to detect a true treatment difference
69
In the article the authors state “In order to
allow detection of a 10 percentage point difference
between the groups in the degree of improvement
in FEV (as a percentage of the predicted value)
from base line to 240 minutes and to maintain an
 error of 0.05 and a  error of 0.10, the required
size of the sample was 94 children.”.
What is the power of the study and what does it
mean?
What is the significance level of the study and
what does this level mean?
70
• Power:
– The power is 90%
– There is a 90% chance of detecting a treatment
difference of 10 percentage points, given such a
difference really exists
• Significance Level:
– The significance level is 0.05
– There is a 5% chance of concluding the
treatments differ when in fact there is no
difference
71
• Assuming a 5 percentage point difference
between the groups, what happens to
power?
– The power of the study, as proposed, would be
less than 90%
• Assuming an 0.01 significance level what
happens to power?
– The power of the study, as proposed, would be
less than 90%
72
References
Descriptive Statistics
• Altman, D.G., Practical Statistics for Medical Research. Chapman &
Hall/CRC, 1991.
Sample Size Justification
• Freiman, J. A. et al. “The importance of beta, the type II error and sample
size in the design and interpretation of the randomized control trial: Survey of
72 “negative” trials. N Engl J Med. 299:690-694, 1978.
• Friedman, L. M., Furberg, C. D., DeMets, D. L., Fundamentals of Clinical
Trials, Springer-Verlag, 1998, Chapter 7.
• Lachin, J. M. “Introduction to sample size determination and power analysis
for clinical trials”. Controlled Clinical Trials. 2:93-113. 1981.
73