Download Describing Data and Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Describing Data and Descriptive Statistics Peter Moffett MD May 17, 2012 Introduction Data can be categorized into nominal, ordinal, or continuous data. This data then must summarized using a measure of central tendency and a measure of variability. Once the data is presented in this fashion, relationships can be inferred, or simply left for the reader to interpret. In the real world, a researcher never studies an entire population, but rather a specific sample and then draws conclusions about the rest of the population. Describing data in the correct way, can allow the researcher to make the correct interpretations about the population. Sample versus Population Population The universe about which the investigator wants to draw conclusions.1 Example: All males in the United States Army. Sample The subset of the population that is actually being observed or studied.1 In order for a sample to be accurate it should be random with an equal chance of every member of the population having a chance of being selected. This helps to reduce bias. Example: A sample of 100 males from each US Army base randomly selected from the population. Types of Data (Scales of Measurement) Nominal Data •
•
•
•
Data that are divided into categories or groups with no implied order or scale. Examples: Male/female, aspirin vs. placebo, urban vs. suburban vs. rural. Hint: If you can ask a “yes or no” question, the answer is nominal data. Is the person male? Did the patient take aspirin? Is the patient from an urban environment? Proportions: This is another name for percentages. Be careful, percentages are actually nominal data. You are technically asking, what percentage of this sample is male? You get a number as an answer (a percentage) but it is actually just a way of quantifying your “yes/no” answer. So is the patient a male? Answer: Yes….50% of the time. Ordinal Data •
•
•
•
Data that can be placed into some kind of meaningful order, but without any indication about the size of the interval. Example: Runners come in at 1st, 2nd, and 3rd place in the race. You have no indication if they were seconds, or minutes apart. Likert Scale: This is one of the most common ordinal scales in biomedical studies. These are the 5-­‐point scales that ask someone if they like something, or dislike something. Other medical examples: Glascow coma scale (a 6 is different than a 3 but it is not 2 times a 3), Stages of Hypertension (you cannot tell if someone who is Stage I HTN is actually at the highest allowable or lowest allowable blood pressure). Continuous (Interval/Ratio) Data •
•
•
A statistician would be angry over this, but it is useful to lump this all together. Essentially these are all terms for data that have a meaningful scale. These will often just be referred to as “continuous data.” If you want the technical definitions: o Interval-­‐ Data that has meaningful intervals but no absolute zero. So while we can quantify a 30° difference between 30° and 60 °C, technically we cannot say that 60° is twice as hot as 30° because 0° C is not a true absence of heat. o Ratio data-­‐ Data that has meaningful intervals and an absolute zero. So the Kelvin scale has an absolute zero so we can say that 60° K is twice as hot as 30° K. Most of our data in medicine is ratio (weight, time, heartbeat, etc.) o Continuous data-­‐ Data that may include any value (including fractions and parts of a whole). From now on we will ignore the distinctions and refer to all of this as “continuous data” Examples: Heart rate, blood pressure, time, weight Useful points about types of data • Some data can only be described using a nominal scale. Think of male versus female. • Many types of data can be described using all of the techniques above. o Example: Hypertension in patients § Nominal: Hypertension (Y/N) § Ordinal: Pre-­‐Hypertension, Stage I Hypertension, Stage II Hypertension § Continuous: Actual blood pressure measurements •
•
Researchers pick a certain scale (when multiple choices exist) for a variety of reasons. Continuous data has higher information content and typically require smaller sample sizes. Nominal data may be easier to collect. Proportions/Percentages: It bears repeating. A percentage seems like continuous data, but it is actually nominal data. Measures of Central Tendency All data sets can be described by a measure of central tendency. Different types of data are best summarized by different measures of central tendency. Mean •
•
•
Average of all of the data. Can only describe continuous data. Nominal data does not have a mean, and describing ordinal data with a mean is misleading. The mean is affected by extreme values. If a data set has a few extreme values, it will change the mean enough to make it an unreliable measure of central tendency. Median •
•
•
•
The mid-­‐point in the data. 50% of the data points are above the median and 50% are below the median. The median is not affected by extreme values because it only responds to the number of observations, not the magnitude of the observations. Ordinal data is best described using the median. Continuous data with extreme values is best described using the median. Mode •
•
•
The value that appears most often in the data set. Often used to describe nominal data. Not influenced by extreme values Examples • Take the data set: (1, 2, 3, 4, 5, 6, 7, 8, 9) o Mean: (1+2+3+4+5+6+7+8+9)/9= 5 o Median: 5 o Mode: 5 (in this case since all numbers are represented only once, it is typical to pick the median as the mode) • Take the data set with an extreme outlier: (1, 2, 3, 4, 5, 6, 7, 8, 500) o Mean: (1+2+3+4+5+6+7+8+500)/9= 59.6 o Median: 5 o Mode: 5 Data Distributions For each type of data, you can take the frequency with which each value appears and plot it on a graph. This gives you a data distribution. You need to have a little background on data distributions to understand the concept of measures of variability as well as in the future when we discuss statistical tests. Normal/Parametric/Gaussian Distribution •
•
•
All of these terms apply to the bell-­‐shaped curve we are all familiar with. Many biologic phenomenon fall into a normal distribution. The mean, median, and mode for a normal distribution are all the same. Mean, Median, Mode Skewed Distributions •
•
•
These are also referred to as non-­‐parametric distributions. Extreme outlier data tends to “pull” the mean in a certain direction away from the true midpoint of the data. The mean is pulled toward the data “tail” and this is how the distributions are named (not for the true midpoint or “hump” on the data distribution) Positive Skew Negative Skew •
•
Notice in the above examples how the median is the true midpoint of the data in skewed populations. Non-­‐parametric (skewed) data cannot be tested using parametric tests (more in future lectures) Medical Example 2 If we measure the average systolic blood pressure from a sample of 30 non-­‐
hypertensive men aged 30-­‐40 years we would find the following normal distribution. Each X is one data measurement so you can see that the most frequently obtained value (the mode) is 120mm Hg. blood pressure of
o 40 years. The
mode all equal
blood pressure in
ascular hyperten; m e d i a n , 230;
blood pressure of
oblood
40 years.
pressureThe
in
dgnant
mode women.
all equal
M
cts, Fdenotes fen, 106.1; median,
blood pressure in
ascular hyperten; m e d i a n , 230;
t blood
h e " mpressure
i d - m o s t "in
ribution.
It is the
egnant
women.
M
or
b
e
l
o
w
w h i cfeh
cts, Fdenotes
nts 106.1;
lie.2,4,median,
5 Altern,
n is the 50th peristribution.
naffected by outm o r e useful t h a n
e data w h e n outt hceo n"t imn iudo-ums o data
st"
ribution.
It
is
d i s t r i b u t e d . 4 Tthe
he
b e l o w wordihich
r or
describing
nts
lie.2,4,
he m
a g n i t5u Alterd e of
np is
50th
pero i nthe
t s of
a data
istribution.
onsistent to deternaffected
by2 The
outentile
value.
m
o
r
e
useful
t
e f u l to d e s c r hi ba en
be
h e n arbiouta u sdata
e of wthe
c
o
n
t
i
n
u
o
u
s
data
n u m b e r s used to
distributed. 4 The
r describing ordithe m a g n i t u d e of
p ooisnt t sc oofm m
a odata
m
nly
onsistent
v a l u e s onto adeterdata
entile
The
t p o i nvalue.
t of a 2peak
es ft ruilb uto
t i odne. s2c rTi hb ee
al uws eh eof
the clusarbin two
nbimodal
u m b e r s distribuused to
oup m e a n is misess. 2 T h e m o d e is
n o m i n a l data, dem o s t characteriscommonly
alent
v a l u e s on a data
t p o i n t of a peak
stribution. 2 The
ples
ld iws threi nb utwo
t i o n sclusof
nbimodal
e d to d edistributermine
aoup
d i sm
t r ei baunt iis
o nmisobess.
2 The m
o d e is
reviously
defined
n otendency.
m i n a l data, detalent
s n o rcharacterism a l l y disstolic blood presd 31 to 40 years.
ples
i b u t e d data, t h e
nd,i s tmr iebdui taino,n sa nof
d
n e d to d e t e r m i n e
a d i s t r i b u t i o n obreviously defined
tendency.
t s n o r m a l l y disstolic blood presd 31 to 40 years.
i b u t e d data, t h e
n, median, and
I
I
I
I
100
I
I
110
I
I
120
I
I
130
I
140
M e a n = 120
Median
= 120
M o d e = 120
I
I
I
I
I
I
I
I
I
I
I
If instead we took the 110
blood pressure of 26 patients with renal hypertension we 100
120
130
140
would get the following curve. Notice in this example that we expect the systolic M e a n = 120
blood pressure to be really n these people. Yet some of them are outliers and M e dh
i aigh n
= i120
M o d e = 120
actually have a lower blood pressure than expected. This “pulls” the mean down and is a negatively skewed distribution. x
x
X
X
X
X
X
X
I
180
I
190
I
200
I
210
I
220
I
230
I
240
I
250
Mean = 228.7
Median
= 230
M o d e = 240
2
x
x
X
X
X
X
X
X
I
180
I
190
I
200
I
210
I
220
I
230
I
240
I
250
2
F
Mean = 228.7
Median
= 230
M o d e = 240
F
F
I
70
!
80
I
90
F
I
100
F
M
I
110
M
F
I
120
M
M
I
130
\
!
140
M e a n = 106,1
Median
= 105
M o d e = 95,120
Measures of Variability 3
F
In order Fto fully describe data, you must report not only the central tendency F
F
F
M
F
M
M
of Figure
the d
ata, b
ut a
lso t
he oMI f talter
hat dthe
ata. L
ook a\t tmean,
he ! following frequency 2 p r e s! e n t s t h e oI r e t i cvaariability l syswill
value
ofI the
but
I
I
I
tolic
blood
pressure
data
of
patients
n
o
t
the
m
e
d
i
a
n
or
mode.
70 to see 80why. N
90otice that 100 the m
110
140
graph eans, m120edians, 130
and modes of the data are exactly w i t h u n t r e a t e d r e n o v a s c u l a r hyperFigure 3 p r e s e n t s s y s t o l i c b1 l o o d
the s
ame, b
ut y
et t
he d
ata a
re o
bviously s
omehow d
ifferent.
t e n s i o n . T h e d i s t r i b u t i o n is n e g a p r e s s u r e for a s a m p l e t h a t i n c l u d e s
e a n = 106,1
tively
skewed. In the absence of Mnort w o groups, pregnant w o m e n in their
Median
= 105
mality, the mean, median, and m
M ood ed e =
are
3 not equal. Also, an outlier, such
as a systolic blood pressure value of
150 m m Hg instead of 180 m m Hg,
Figure 2 p r e s e n t s t h e o r e t i c a l systolic Annals
blood of
pressure
dataMedicine
of patients
Emergency
w i t h u n t r e a t e d r e n o v a s c u l a r hypert e n s i o n . T h e d i s t r i b u t i o n is n e g a tively skewed. In the absence of normality, the mean, median, and m o d e
are not equal. Also, an outlier, such
as a systolic blood pressure value of
second t r i m e s t e r and men. Again, the
m e a n and m e d i a n are u n e q u a l in this
n o n - n o r m a l l y distributed data. Also,
there exist t w o peaks of data cluster,
will alter the value of the mean, b u t
n o t the m e d i a n or mode.
19:3 March 1990
Figure 3 p r e s e n t s s y s t o l i c b l o o d
p r e s s u r e for a s a m p l e t h a t i n c l u d e s
t w o groups, pregnant w o m e n in their
second t r i m e s t e r and men. Again, the
m e a n and m e d i a n are u n e q u a l in this
n o n - n o r m a l l y distributed data. Also,
95,120
Range •
•
•
Reports the lowest and highest numbers. Purely descriptive Affected by outliers Interquartile Range Reports the range of values from the 25th percentile through the 75th percentile • The median is always the 50th percentile (so 50% of values fall below the median). • The interquartile range contains 50% of the data points (between 25-­‐75th percentile) • Often reported with medians for ordinal data or with a median to describe continuous data with outliers. 25 50 75 Percentile Median •
Standard Deviation (SD) •
•
•
•
A unit of measure that has to do with variance around a mean with continuous data Can only be used with normally distributed data Approximately 68% of all data falls within 1 SD of a mean Approximately 95% of all data falls within 2 SD of a mean 50 th percentile
S
15.9 th percentile
2.3 rd percentile
84.1 percentile
! , /
7.7 TM percentile
2"14°/°
, "13°/° ,
-4SO
~
-3SD
34.13%
I 13"59°/°
-2SD
I
-1 SD
I
34.13%
o
68.26%
95.44o/0
99.720/0
99"980/0
13"59% i
+1 SD
I
+2SD
+3 SD
+4SD
J
I
,,,
I
4
2 FIGURE 4. SD and the normal distri-
TABLE 1. Applicability of measures of central tendency
bution: 6 8 . 2 6 % of all scores fall
w i t h i n ± 1 SD f r o m t h e m e a n ;
So if we find that our mean heart rate is 80 in a population 95.44%
and a ofSD f 10, fall
then 68% alloscores
within
± 2
Median70 and Mode
f r o mof the
a n ; 99.72%
of Characteristic
all people will have a heart Mean
rate between 90, and SD
95% all mpe eople will of all
scores fall within +_ 3 SD from the
Useful with interval, ratio data
Yes
have a heart rate between 60 aYes
nd 100. Yes
m e a n ; 9 9 . 9 8 % of all s c o r e s fall
Useful with ordinal data
No
Yes
Yes
wqthin ± 4 SD from the mean. £6
• with
The standard deviation No
is often used “normal” lab values. Useful
nominal
data
No to determine Yes
Affected by outliers
Yes
No
No
a series of m e a s u r e s , and t h u s t h e
presence of one outlier can m a r k e d l y
i n f l u e n c e t h e r a n g e . T h e r a n g e is
two modes. To ignore the b i m o d a l asdency is best for all situations. 5 The
purely a descriptive tool and should
pect of this d i s t r i b u t i o n w o u l d be to
a p p l i c a b i l i t y of m e a s u r e s of central
n o t be used to infer w h e t h e r groups
below Also,
are the
several vdiffer
arious data types, overlook
itsListed u n i q u e feature.
tendt
e nables c y is s uto m mhaelp r i z e dsummarize (Table 1).
statistically.
presence
of anooutlier
w o u l d altercthe
methods f d
escribing entral t
endency a
nd m
easures o
f v
ariability. mean, b u t n o t the m e d i a n or modes.
In Figures 2 and 3 the mean, meM e a s u r e s of c e n t r a l t e n d e n c y do
T h e i n t e r q u a r t i l e range is a m e a dian, a n d mode(s) are u n e q u a l benot describe the variability, or
sure of v a r i a b i l i t y directly related to
Type Data of that
Variability cause
dataof are
n o t n o r m a lExample l y distribspread, Measure of data. S t a nodf a rC
d ientral z e d estitheMeasure median. Recall
the median, a
uted. 5 Thus, the m e a s u r e of central
m a t e s d e f i n i n gTendency d a t a v a r i a b i l i t y are
m e a s u r e of central t e n d e n c y applicat e n d e n c y m o s t useful to data analyn e e d e d to h e l p i n f e r w h e t h e r t w o
ble to ordinal and n o n - n o r m a l l y dissis depends on the type of data, and
groups studied differ significantly. In
t r i b u t e d d a t a , is t h e m i d d l e m o s t
w h a tNominal aspect of the data is toMale be con-v other words, measures
of variability
v a l u e of a setRange of data.? T h e m e d i a n
Mode veyed. Fortunately, m o s t physiologic
are used to help infer w h e t h e r two or
r e p r e s e n t s the 50th p e r c e n t i l e . T h e
d a t a are n o r m a l l y or n e a r nFemale ormally
m o r e groups studied are drawn from
i n t e r q u a r t i l e range is t h a t range ded i s t r i b u t e d so t h a t m e a n , m e d i a n ,
d i f f e r e n t p o p u l a t i o n s . S e v e r a l estiscribed by the interval b e t w e e n the
Ordinal 5 point ordiLikert Median ange 6
and m
o d e are equal. However,
m a t e s of variability
exist.
25th Interquartile and 75th percentileRvalues.
nal scale data have no consistent
It has been suggested that the inm a g n i t u d e of d i f f e r e n c e b e t w e e n
terquartile range be used for describu n i t s of the data scale, and m o s t orThe range is the interval b e t w e e n
ing the v a r i a b i l i t y of data that do n o t
d i n a l d a t a are n o t n o r m a l l y distribthe lowest and highest values w i t h i n
m e e t p a r a m e t r i c analysis standards,
Continuous Heart r
ate Mean Standard Deviation uted. 3 Therefore, t h e m e a n is misa data group. 2 It is the s i m p l e s t measuch as ordinal scale data. 6 T h e interleading as a m e a s u r e of central tensure of v a r i a b i l i t y to u n d e r s t a n d and
q u a r t i l e range clearly defines w h e r e
dency for ordinal scale data.Z, 3
dentify. While simple, the range
the m i d d l e 50% of measures occurs
N o single m e a s u r e of central ten- ionly
considers the e x t r e m e values of
and indicates the spread of the data
Summary and Conclusions about central tendency and variability MEASURESOF VARIABILITY
InterquartileRange
Range
19:3 March 1990
Annals of Emergency Medicine
311/143
Characteristic Useful with continuous data Useful with ordinal data Useful with nominal data Affected by outliers Mean Yes No No Yes Median Yes Yes No No Mode Yes Yes Yes No Confidence Intervals (CI) Definition The most basic way to think about confidence intervals is to think of them as mathematical predictions about where the real value for the variable exists. • We typically use 95% confidence intervals in clinical medicine. • A confidence interval simply takes the data we actually have in our sample and tells us how this applies to the population (real world). • Examples o I measure the heart rate of 200 active duty soldiers and find that the mean heart rate is 50 with a 95% CI of 42-­‐61. The correct interpretation of this is that my sample mean is 50 beats per minute but that I am 95% certain that the mean heart rate for the entire population of active duty soldiers (whom I did not study) is between 42 and 61. o I ask 40 parents if they like the experience their child had when receiving intranasal fentanyl as a sedative. We use the following Likert scale (1-­‐ hate, 2-­‐ dislike, 3-­‐ neutral, 4-­‐ like, 5-­‐ love). We find a median score of 4 and our 95% CI comes back at 3-­‐5. The correct interpretation is that in our sample, half of the parents liked or loved the sedation. In the real world we are 95% certain that 50% of parents will fall somewhere between 3 (neutral) and 5 (love). o I take 30 doctors I know and ask them if they know what a confidence interval is. I report that 40% do know what it is with a 95% CI of 5-­‐
75%. The correct interpretation is that in my sample 40% of physicians knew what a confidence interval is, and that I’m 95% certain that between 5% and 75% of physicians know what a confidence interval is. • Note: You will sometimes see a researcher describe their sample with a 95% CI. So you look at the first table and see they are reporting that they enrolled 65% males with a 95% CI of 55-­‐75%. This is a little confusing if you do not understand confidence intervals. Most people will look at that and think…wait, they can’t count? Why aren’t they 100% certain that they have a sample with 65% males. In reality, they are saying that they have a sample with 65% males, and that they think that in the population they are studying, there are somewhere between 55-­‐75% males. The confidence interval is derived from the sample data but refers to the overall population. Methods There are mathematical formulas to derive confidence intervals for almost any type of data. You can read a very thorough review and explanation with formulas in Chapter 7 of Glantz’s book.3 Here are some key points. • Standard Error of the Mean (SEM)-­‐ You may see this term used. Essentially this is a mathematical way of taking the standard deviation from a sample, and determining how representative it is of the population. The SEM is then used to calculate a confidence interval. This is only useful for continuous data, but is always used as the example for how to calculate confidence intervals. o Note: The SEM should never be reported for a sample. It is a measure relating to the population, not the sample. Since the SEM is always smaller than the standard deviation of a population, some authors mistakenly use the SEM instead of the standard deviation. • You may select various cutoffs of confidence for your confidence interval. We typically use 95% in medicine, but you can choose 90%, 99% or whatever other number you would like. If you want to be 99% sure that your confidence intervals include the population values then the width of the confidence interval will be wider. • The sample size directly impacts the width of the confidence interval. If you have a large sample size, your confidence interval “tightens” or “narrow”. Example-­‐ Below is a graph showing our data from the blood pressure measurements in normal men aged 30-­‐40. It shows the relationship between SD, SEM, and the confidence interval.2 ~XX.K, x
x21¢xN
<~3¢.K;
I
I
l
100
l
I
I
I
110
I
I
120
130
SD = 9.37
SEM
I
I
I
140
SD = 9.37
= 1,71
SEM = 1,71
I
I
1
9 5 % CI
l
MEAN
FIGURE 5. Systolic blood pressure of
m e n aged 31 to 40 years. SD, SE, and
95% CI are noted.
TABLE 3. Applicability of measures of variability
Range
Interquartile
Range
SD
SEM
Useful to describe interval or
ratio data
Yes
Yes
Yes
Yes
Used to describe ordinal data
Yes
Yes
No
No
Characteristic
statistical inference techniques. T h e
calculation of SD from the n o r m a l l y
distributed systolic blood pressure
data of Figure 1 is shown (Table 2).
values. The SEM
e s t i m a t e the preof a sample, as it
a t i o n from w h i c h
wn.lO, 11 The SEM
n e s t i m a t e of the
ta about the samuld n o t be used as
f u l b e c a u s e i t is
l a t i o n of " c o n f i w h i c h c o n t a i n an
e m e a n for an enm w h i c h the samnfidence intervals
riptive or inferen-
SEM for the nordata p r e s e n t e d in
Table 2).
on Versus
the Mean
M are measures of
r, the two statisnd are f r e q u e n t l y
e d , la T h e SD deof s a m p l e d a t a
ple mean. The SD
an the SEM. The
o n l y calculated to
nce intervals.
have c o m m e n t e d
a l sleight of h a n d
SEM when only
o describe sample
A3 Bunce et aU 3
es in six journals
SD or SEM were
SEM values were
y t h e SD w o u l d
ate. T h e a u t h o r s
a n y workers m a y
e SEM because it
t h a n t h e SD. ''13
use of SEM to dev a r i a b i l i t y m a y be
r s in an a t t e m p t
nificant difference
ups, w h e n in fact
. Elenbaas et al la
oncluding that audata as m e a n _+
a n ± SD m a y be
m p a i r the reader's
identify the varidata.
or by design, it is
epresent the varid a t a as m e a n _+
hat readers m u l t i t o obtain the SD
eously used to exility. It is not an
Example: The following Table2 shows how changing your confidence level, or your sample size affects the reporting of the confidence intervals. TABLE 4. Effect of confidence level and sample size on confidence
interval width
Calculation of CIs for Data Presented in Table 2
C1(%)
SD
n
SEM
CI
90
9.377
30
1.712
120 -+ 2.82
95
9.377
30
1.712
120 -+ 3.36
99
9.377
30
1.712
120 _+ 3.83
Effect of Sample Size on CI for Data With A Mean of 120 and a SD of 9.377
C1(%)
SD
n
SEM
CI
95
9.377
30
1.712
120 _+ 3.36
95
9.377
100
0.938
120 _+ 1.84
95
9.377
1000
0.297
120 ± 0.582
error toCuonfidence s e SEM in Isntervals peculating
Using a
lationA s Also, the closer a p o i n t lies
range, or confidence interval, w i t h i n
to t h e m i d d l e of t h e CI, t h e m o r e
w h i c h aConfidence intervals an be l iuk sed two different ofwthe
ays: t r u e population
m e a n cis
e l y itin is-representative
population. 16
l i k e l y to fall. The SD and SEM of the
data shown in Figure 1 are given (FigThough by convention the 95% CI
urely For the
example, according to an u r e 1.
5). Confidence T a b l e 3 s u m imntervals a r i z e s tchan e be ispm
o s t c odmescriptive. m o n l y reported,
95%
4
analysis o
f t
he C
anadian C
T H
ead R
ule
t
he s
ensitivity o
f t
he r
ule for finding proper use of e s t i m a t e s of variability.
level is n o t rigidly required. W i d e r
CIs,
such
as
a
99%
or
99.9%
CI,
are
neurosurgical lesions was reported as 100% 95CI (64.6-­‐100). The authors Confidence Intervals
even m o r e l i k e l y to include the true
their from
sample nd they are 95% p o p u l a tiis o na 1
p a00% r a m e tse ensitivity r v a l u e and aare
W h e n are s t a t issaying t i c s derived
the estimate samplingcertain of a p o ptuhe l a t itorue n aretest studied
c o m mis o nslomewhere y used for critical
appraisal
sensitivity between 64.6% and 100%. of data. T h e y are also advocated for
to infer values for p o p u l a t i o n paramConfidence can also ake during
inferences. eters,2.it w
o u l d be usefulintervals to have cone x ab
me i nua sed t i o n sto ofm
data
ongoing We will discuss aicn c utm
u l fauture t i o n of b
subjects
in a sclinical
ypothesis he ut a brief ummary here will make f i d e n c e classical t h a t t h e sh
am
p l e s t a t i s t itcesting al
value, such
as
a
m
e
a
n
or
SD,
w
o
u
l
d
trial.
15
N
a
r
r
o
w
e
r
CIs,
s
u
c
h
as
the
the idea more clear. be representative of the true popula90% CI, can be used w h e n s t u d y aution p a r a m e ta.
e r . Classical One c a n n o thypothesis be certthesting o r s f i np
d roduces i t a c c e p taa bPl -­‐value e t h a t at end n tells the tain that a s a m p l researcher/reader e statistical value is
t
i
m
e
s
out
of
100,
the
true
p
o
p
u
l
a
t
i
o
that the observed difference is nSTATISTICALLY representative of the true population
p a r a m e t e r m a y n o t lie w i t h i n the CI.
significant. othing of the
the of that difference. parameter, b u t one
can c a l c u lIat t esays a nHowever,
w im
d tagnitude h of a CI depends
n
o
t
only
on
the
variability
of
the
data between two range of values
k eilnstead y to be represenb. l iIf a researcher reports the actual difference and
the
level
of
confidence
selected,
t a t i v e of t h e p onumbers p u l a t i o n paand r a mgeives abut
95% confidence interval then the magnitude of ter.4,14 That range of values is called
also the sample size.
is obvious. IW
n haeddition, if the ca onfidence a c o n f i d e n c e i n t ethe r v a leffect (CI). Calculan one broadens
CI by mov- interval crosses the ing
a 95%
to aa99%
CI,Saccuracy
is
tion of a CI is a m“identity” e t h o d of e s t p
i moint, a t i n g then the results re not TATISTICALLY significant the range of values l i k e l y to include
i n c r e a s e d b e c a u s e t h e c a l c u l a t e d CI
“identity bpeoint” he i “nnull” c o m e s (also m o r e clalled i k e l y tto
c l u d e paoint, or “no effect” the true value of a p o pi.u l a tThe i o n paramp o p u ltahat tionm
parameter.
However,
e t e r . S i n c e o n e c a n n o t point) s t u d y ias l lthe ntrue
umber eans there is no effect. If you are w h e n the level of CI is held constant
m e m b e r s of a population, a represensubtracting two results then obviously the null point is zero. t a t i v e s a m p l e of t h e p o p u l a t i o n is
and s a m p l e size is increased, SEM is
This i
s w
hy i
t i
s c
a confidence interval d
e
c
r
eommonly a s e d a n d t htaught u s t h e that CI isif narstudied, and from this one uses the
rowed.
This
narrowing
of
the
CI
in- however that for a m e a n and SEM to w o r k backward
to
crosses zero it is not significant. Remember creases the precision of the CI. The
e s t i m a t e a CI.
ratio, the point actually “1”. A rselected
atio of 1 is meaningless. ofis level
of confidence
T h e w i d t h of the CI depends
on null effect
the SEM and c.
the The degreesummary of confidence
and
s
a
m
p
l
e
size
on
the
w
i
d
t
h
of a CI between two of this is that you find the difference we arbitrarily choose. For instance, a
is shown (Table 4).
numbers and report the confidence interval around it. If that 95% CI, w h i c h is the degree of confiC a l c u l a t i o n of the CI for e s t i m a interval ction
ontains statistically not of truea n
p oumber p u l a t i o nthat m e a ins values
dence m o s t c o m confidence m o n l y expressed,
~4
is a range of v asignificant l u e s b r o a d e(nthe o u g hnull p
applies
n t i n u o u s data
noroint) to
or ccolinically not from
significant (2BPM m a l or n e a r - n o r m a l d i s t r i b u t i o n s . 4
that, if the entire p o p u l a t i o n could be
reduction n heart rate) ou becan eject he such
results. Also,then a CI y
can
e s tri m
a t e d tfor
studied, 95% of the
t i m e the ipopulao t h e r s t a t i s t i c s as m e d i a n s , regrestion m e a n w o u l d fall w i t h the CI ess i o n slopes, r e l a t i v e r i s k data, ret i m a t e d from the sample of the popuAnnals of Emergency Medicine
19:3 March 1990
Confidence Intervals and Hypothesis Testing (Examples): Let us pretend we have a new drug called RATE-­‐A-­‐BLATE that is supposed to lower the heart rate rapidly and without any blood pressure side effects in people with atrial fibrillation and rapid-­‐ventricular response. Using this drug, we can look at the various ways confidence intervals can be used and how they are impacted by certain factors. 1. In the first experiment, we take 10 patients and give them RATE-­‐A-­‐BLATE (RAB). We find that in the sample RAB drops the heart rate approximately 30 beats per minute. After the math, we report the results as -­‐30 95% CI (-­‐
50-­‐2). -­‐50
-­‐30
0
So based on this sample we are 95% certain that the drug may lower the heart rate on average 50 BPM or raise it 2BPM. Looking at this data you would think, “well I do not want to give a drug that may raise the heart rate a few beats, but the possible values for this drug (negative 50 through 2) are predominantly on the heart rate lowering side. If another study was performed with larger numbers we might get a narrower range of possible values to examine.” 2. In the second experiment we add 40 more patients in atrial fibrillation with rapid ventricular response so that we are now studying a total of 50 patients. Our RAB study finds a mean change in heart rate of -­‐30 95%CI (-­‐40-­‐ -­‐20). -­‐50
-­‐30
0
This time we can say that we are 95% certain our drug lowers heart rate between 20 and 40 beats per minute. The increase in sample size has narrowed our confidence interval. Now we are likely to accept that the drug is most certainly effective at lowering heart rate. If we had just reported the sample means in both cases, then we would all just think that RAB always lowers heart rate by 30bpm. Do not trust a study that does that. 3. Now let us move on to study RAB versus diltiazem. Assume we run a well-­‐
designed double-­‐blind, randomized control trial and determine that once again RAB gives us a mean change in heart rate of -­‐30 95%CI (-­‐40-­‐ -­‐20). We also find that diltiazem causes a mean change in heart rate of -­‐20 (-­‐30-­‐ -­‐10). -­‐50
-­‐30
0
Diltiazem
RAB
Notice that initially we are simply describing the two heart rate lowering properties of the drugs. Looking at the data there appears to be a lot of overlap in the possible mean heart rate lowering effects of the drug. This may not be statistically significant. So we take our mean heart rate changes from the two drugs and compare them using the Student’s T test (more in another lecture) and find that RAB lowers the heart rate by 10 bpm more than diltiazem but with a P-­‐
value of >.05 so it is not statistically significant. Another way (some argue a better way) is to just show the mean difference with a confidence interval. We get a -­‐10 95%CI (-­‐35-­‐ 5). -­‐35
-­‐10
0 5
Mean difference in heart rate between diltiazem and RAB
This is the same information simply presented in a different fashion. Instead of just saying there was a -­‐10BPM difference between the two drugs and it was not statistically significant, we can see how close it came to being statistically significant. In this graph we are plotting the difference between two numbers so a “zero” is not statistically significant. But we and realize that it was “almost” statistically significant. We plan another study. 4. Once again we plan to increase our sample size to see if the mean difference between the two drugs is statistically significant. We enroll over 5000 patients into the same double-­‐blind RCT comparing diltiazem to RAB. We end up this time with results indicating that the mean difference between RAB and diltiazem is -­‐10 95% CI (-­‐11-­‐ -­‐9). -­‐35
-­‐10
0 5
Mean difference in heart rate between diltiazem and RAB
This time we did it. The confidence interval is very narrow (only 1 BPM on either side of our sample estimate of -­‐10BPM). Plus it does not cross zero (and sure enough the P-­‐value will be <.05). It looks like with enough patients; we have finally shown that RAB has a statistically significant reduction in heart rate over diltiazem of about 10BPM (to be exact it could be anywhere from 9 to 11). Example Wrap-­‐Up: We can make the following conclusions about RAB based on our studies: • RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response by an average of about 30 BPM but could lower it anywhere from 20 to 40 BPM compared to doing nothing. • RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response a little better than diltiazem. It lowers the heart rate about 10 BPM but could lower it anywhere from 9 to 11 BPM. But what does this tell us clinically? In the end these are only STATISTICALLY significant numbers. Remember that CLINICAL significance is not the same thing. Consider the following situations: • RAB is cheaper, and does not drop blood pressure like diltiazem. You think that the data supports its use. • RAB turns out to be more expensive, and does not drop blood pressure like diltiazem. You decide to only use RAB in normotensive or hypotensive patients and save costs by using diltiazem when you have a hypertensive patient. • RAB turns out to be less expensive, and does not drop blood pressure like diltiazem; however it is associated with causing a myocardial infarction about 10% of the time it is given. You decide that those extra 10BPM over diltiazem (as well as the cost and BP stable effects) are not worth the consequences. • RAB turns out to be less expensive, and does not drop blood pressure like diltiazem. You decide however that a 10 BPM difference over diltiazem is not really that clinically significant and you are more used to diltiazem so you continue to use it instead of RAB. All of those conclusions are possible and individualized based on the data. If you look at a confidence interval, think about all of the values in the range as the “real world value” and think it is worth your time, then use that intervention. If not, drop it. Resources: •
•
•
•
I highly recommend the following two references:2,5 For a very complete description of all things related to confidence intervals refer to.3 If you want to plan a study and derive the sample size based on a certain confidence interval you wish to obtain, go to: http://www.epibiostat.ucsf.edu/dcr/ If you have data and want to calculate a confidence interval around it go to: http://www.graphpad.com/quickcalcs/ References: 1. 2. 3. 4. 5. Glaser AN. High-­‐Yield Biostatistics. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2005. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 2, Descriptive statistics. Annals of emergency medicine. Mar 1990;19(3):309-­‐315. Glantz SA. Primer of Biostatistics. 6th ed. New York: McGraw-­‐Hill; 2005. Smits M, Dippel DW, de Haan GG, et al. External validation of the Canadian CT Head Rule and the New Orleans Criteria for CT scanning in patients with minor head injury. JAMA : the journal of the American Medical Association. Sep 28 2005;294(12):1519-­‐1525. Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 1, Basic concepts. Annals of emergency medicine. Jan 1990;19(1):86-­‐89.