Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Describing Data and Descriptive Statistics Peter Moffett MD May 17, 2012 Introduction Data can be categorized into nominal, ordinal, or continuous data. This data then must summarized using a measure of central tendency and a measure of variability. Once the data is presented in this fashion, relationships can be inferred, or simply left for the reader to interpret. In the real world, a researcher never studies an entire population, but rather a specific sample and then draws conclusions about the rest of the population. Describing data in the correct way, can allow the researcher to make the correct interpretations about the population. Sample versus Population Population The universe about which the investigator wants to draw conclusions.1 Example: All males in the United States Army. Sample The subset of the population that is actually being observed or studied.1 In order for a sample to be accurate it should be random with an equal chance of every member of the population having a chance of being selected. This helps to reduce bias. Example: A sample of 100 males from each US Army base randomly selected from the population. Types of Data (Scales of Measurement) Nominal Data • • • • Data that are divided into categories or groups with no implied order or scale. Examples: Male/female, aspirin vs. placebo, urban vs. suburban vs. rural. Hint: If you can ask a “yes or no” question, the answer is nominal data. Is the person male? Did the patient take aspirin? Is the patient from an urban environment? Proportions: This is another name for percentages. Be careful, percentages are actually nominal data. You are technically asking, what percentage of this sample is male? You get a number as an answer (a percentage) but it is actually just a way of quantifying your “yes/no” answer. So is the patient a male? Answer: Yes….50% of the time. Ordinal Data • • • • Data that can be placed into some kind of meaningful order, but without any indication about the size of the interval. Example: Runners come in at 1st, 2nd, and 3rd place in the race. You have no indication if they were seconds, or minutes apart. Likert Scale: This is one of the most common ordinal scales in biomedical studies. These are the 5-‐point scales that ask someone if they like something, or dislike something. Other medical examples: Glascow coma scale (a 6 is different than a 3 but it is not 2 times a 3), Stages of Hypertension (you cannot tell if someone who is Stage I HTN is actually at the highest allowable or lowest allowable blood pressure). Continuous (Interval/Ratio) Data • • • A statistician would be angry over this, but it is useful to lump this all together. Essentially these are all terms for data that have a meaningful scale. These will often just be referred to as “continuous data.” If you want the technical definitions: o Interval-‐ Data that has meaningful intervals but no absolute zero. So while we can quantify a 30° difference between 30° and 60 °C, technically we cannot say that 60° is twice as hot as 30° because 0° C is not a true absence of heat. o Ratio data-‐ Data that has meaningful intervals and an absolute zero. So the Kelvin scale has an absolute zero so we can say that 60° K is twice as hot as 30° K. Most of our data in medicine is ratio (weight, time, heartbeat, etc.) o Continuous data-‐ Data that may include any value (including fractions and parts of a whole). From now on we will ignore the distinctions and refer to all of this as “continuous data” Examples: Heart rate, blood pressure, time, weight Useful points about types of data • Some data can only be described using a nominal scale. Think of male versus female. • Many types of data can be described using all of the techniques above. o Example: Hypertension in patients § Nominal: Hypertension (Y/N) § Ordinal: Pre-‐Hypertension, Stage I Hypertension, Stage II Hypertension § Continuous: Actual blood pressure measurements • • Researchers pick a certain scale (when multiple choices exist) for a variety of reasons. Continuous data has higher information content and typically require smaller sample sizes. Nominal data may be easier to collect. Proportions/Percentages: It bears repeating. A percentage seems like continuous data, but it is actually nominal data. Measures of Central Tendency All data sets can be described by a measure of central tendency. Different types of data are best summarized by different measures of central tendency. Mean • • • Average of all of the data. Can only describe continuous data. Nominal data does not have a mean, and describing ordinal data with a mean is misleading. The mean is affected by extreme values. If a data set has a few extreme values, it will change the mean enough to make it an unreliable measure of central tendency. Median • • • • The mid-‐point in the data. 50% of the data points are above the median and 50% are below the median. The median is not affected by extreme values because it only responds to the number of observations, not the magnitude of the observations. Ordinal data is best described using the median. Continuous data with extreme values is best described using the median. Mode • • • The value that appears most often in the data set. Often used to describe nominal data. Not influenced by extreme values Examples • Take the data set: (1, 2, 3, 4, 5, 6, 7, 8, 9) o Mean: (1+2+3+4+5+6+7+8+9)/9= 5 o Median: 5 o Mode: 5 (in this case since all numbers are represented only once, it is typical to pick the median as the mode) • Take the data set with an extreme outlier: (1, 2, 3, 4, 5, 6, 7, 8, 500) o Mean: (1+2+3+4+5+6+7+8+500)/9= 59.6 o Median: 5 o Mode: 5 Data Distributions For each type of data, you can take the frequency with which each value appears and plot it on a graph. This gives you a data distribution. You need to have a little background on data distributions to understand the concept of measures of variability as well as in the future when we discuss statistical tests. Normal/Parametric/Gaussian Distribution • • • All of these terms apply to the bell-‐shaped curve we are all familiar with. Many biologic phenomenon fall into a normal distribution. The mean, median, and mode for a normal distribution are all the same. Mean, Median, Mode Skewed Distributions • • • These are also referred to as non-‐parametric distributions. Extreme outlier data tends to “pull” the mean in a certain direction away from the true midpoint of the data. The mean is pulled toward the data “tail” and this is how the distributions are named (not for the true midpoint or “hump” on the data distribution) Positive Skew Negative Skew • • Notice in the above examples how the median is the true midpoint of the data in skewed populations. Non-‐parametric (skewed) data cannot be tested using parametric tests (more in future lectures) Medical Example 2 If we measure the average systolic blood pressure from a sample of 30 non-‐ hypertensive men aged 30-‐40 years we would find the following normal distribution. Each X is one data measurement so you can see that the most frequently obtained value (the mode) is 120mm Hg. blood pressure of o 40 years. The mode all equal blood pressure in ascular hyperten; m e d i a n , 230; blood pressure of oblood 40 years. pressureThe in dgnant mode women. all equal M cts, Fdenotes fen, 106.1; median, blood pressure in ascular hyperten; m e d i a n , 230; t blood h e " mpressure i d - m o s t "in ribution. It is the egnant women. M or b e l o w w h i cfeh cts, Fdenotes nts 106.1; lie.2,4,median, 5 Altern, n is the 50th peristribution. naffected by outm o r e useful t h a n e data w h e n outt hceo n"t imn iudo-ums o data st" ribution. It is d i s t r i b u t e d . 4 Tthe he b e l o w wordihich r or describing nts lie.2,4, he m a g n i t5u Alterd e of np is 50th pero i nthe t s of a data istribution. onsistent to deternaffected by2 The outentile value. m o r e useful t e f u l to d e s c r hi ba en be h e n arbiouta u sdata e of wthe c o n t i n u o u s data n u m b e r s used to distributed. 4 The r describing ordithe m a g n i t u d e of p ooisnt t sc oofm m a odata m nly onsistent v a l u e s onto adeterdata entile The t p o i nvalue. t of a 2peak es ft ruilb uto t i odne. s2c rTi hb ee al uws eh eof the clusarbin two nbimodal u m b e r s distribuused to oup m e a n is misess. 2 T h e m o d e is n o m i n a l data, dem o s t characteriscommonly alent v a l u e s on a data t p o i n t of a peak stribution. 2 The ples ld iws threi nb utwo t i o n sclusof nbimodal e d to d edistributermine aoup d i sm t r ei baunt iis o nmisobess. 2 The m o d e is reviously defined n otendency. m i n a l data, detalent s n o rcharacterism a l l y disstolic blood presd 31 to 40 years. ples i b u t e d data, t h e nd,i s tmr iebdui taino,n sa nof d n e d to d e t e r m i n e a d i s t r i b u t i o n obreviously defined tendency. t s n o r m a l l y disstolic blood presd 31 to 40 years. i b u t e d data, t h e n, median, and I I I I 100 I I 110 I I 120 I I 130 I 140 M e a n = 120 Median = 120 M o d e = 120 I I I I I I I I I I I If instead we took the 110 blood pressure of 26 patients with renal hypertension we 100 120 130 140 would get the following curve. Notice in this example that we expect the systolic M e a n = 120 blood pressure to be really n these people. Yet some of them are outliers and M e dh i aigh n = i120 M o d e = 120 actually have a lower blood pressure than expected. This “pulls” the mean down and is a negatively skewed distribution. x x X X X X X X I 180 I 190 I 200 I 210 I 220 I 230 I 240 I 250 Mean = 228.7 Median = 230 M o d e = 240 2 x x X X X X X X I 180 I 190 I 200 I 210 I 220 I 230 I 240 I 250 2 F Mean = 228.7 Median = 230 M o d e = 240 F F I 70 ! 80 I 90 F I 100 F M I 110 M F I 120 M M I 130 \ ! 140 M e a n = 106,1 Median = 105 M o d e = 95,120 Measures of Variability 3 F In order Fto fully describe data, you must report not only the central tendency F F F M F M M of Figure the d ata, b ut a lso t he oMI f talter hat dthe ata. L ook a\t tmean, he ! following frequency 2 p r e s! e n t s t h e oI r e t i cvaariability l syswill value ofI the but I I I tolic blood pressure data of patients n o t the m e d i a n or mode. 70 to see 80why. N 90otice that 100 the m 110 140 graph eans, m120edians, 130 and modes of the data are exactly w i t h u n t r e a t e d r e n o v a s c u l a r hyperFigure 3 p r e s e n t s s y s t o l i c b1 l o o d the s ame, b ut y et t he d ata a re o bviously s omehow d ifferent. t e n s i o n . T h e d i s t r i b u t i o n is n e g a p r e s s u r e for a s a m p l e t h a t i n c l u d e s e a n = 106,1 tively skewed. In the absence of Mnort w o groups, pregnant w o m e n in their Median = 105 mality, the mean, median, and m M ood ed e = are 3 not equal. Also, an outlier, such as a systolic blood pressure value of 150 m m Hg instead of 180 m m Hg, Figure 2 p r e s e n t s t h e o r e t i c a l systolic Annals blood of pressure dataMedicine of patients Emergency w i t h u n t r e a t e d r e n o v a s c u l a r hypert e n s i o n . T h e d i s t r i b u t i o n is n e g a tively skewed. In the absence of normality, the mean, median, and m o d e are not equal. Also, an outlier, such as a systolic blood pressure value of second t r i m e s t e r and men. Again, the m e a n and m e d i a n are u n e q u a l in this n o n - n o r m a l l y distributed data. Also, there exist t w o peaks of data cluster, will alter the value of the mean, b u t n o t the m e d i a n or mode. 19:3 March 1990 Figure 3 p r e s e n t s s y s t o l i c b l o o d p r e s s u r e for a s a m p l e t h a t i n c l u d e s t w o groups, pregnant w o m e n in their second t r i m e s t e r and men. Again, the m e a n and m e d i a n are u n e q u a l in this n o n - n o r m a l l y distributed data. Also, 95,120 Range • • • Reports the lowest and highest numbers. Purely descriptive Affected by outliers Interquartile Range Reports the range of values from the 25th percentile through the 75th percentile • The median is always the 50th percentile (so 50% of values fall below the median). • The interquartile range contains 50% of the data points (between 25-‐75th percentile) • Often reported with medians for ordinal data or with a median to describe continuous data with outliers. 25 50 75 Percentile Median • Standard Deviation (SD) • • • • A unit of measure that has to do with variance around a mean with continuous data Can only be used with normally distributed data Approximately 68% of all data falls within 1 SD of a mean Approximately 95% of all data falls within 2 SD of a mean 50 th percentile S 15.9 th percentile 2.3 rd percentile 84.1 percentile ! , / 7.7 TM percentile 2"14°/° , "13°/° , -4SO ~ -3SD 34.13% I 13"59°/° -2SD I -1 SD I 34.13% o 68.26% 95.44o/0 99.720/0 99"980/0 13"59% i +1 SD I +2SD +3 SD +4SD J I ,,, I 4 2 FIGURE 4. SD and the normal distri- TABLE 1. Applicability of measures of central tendency bution: 6 8 . 2 6 % of all scores fall w i t h i n ± 1 SD f r o m t h e m e a n ; So if we find that our mean heart rate is 80 in a population 95.44% and a ofSD f 10, fall then 68% alloscores within ± 2 Median70 and Mode f r o mof the a n ; 99.72% of Characteristic all people will have a heart Mean rate between 90, and SD 95% all mpe eople will of all scores fall within +_ 3 SD from the Useful with interval, ratio data Yes have a heart rate between 60 aYes nd 100. Yes m e a n ; 9 9 . 9 8 % of all s c o r e s fall Useful with ordinal data No Yes Yes wqthin ± 4 SD from the mean. £6 • with The standard deviation No is often used “normal” lab values. Useful nominal data No to determine Yes Affected by outliers Yes No No a series of m e a s u r e s , and t h u s t h e presence of one outlier can m a r k e d l y i n f l u e n c e t h e r a n g e . T h e r a n g e is two modes. To ignore the b i m o d a l asdency is best for all situations. 5 The purely a descriptive tool and should pect of this d i s t r i b u t i o n w o u l d be to a p p l i c a b i l i t y of m e a s u r e s of central n o t be used to infer w h e t h e r groups below Also, are the several vdiffer arious data types, overlook itsListed u n i q u e feature. tendt e nables c y is s uto m mhaelp r i z e dsummarize (Table 1). statistically. presence of anooutlier w o u l d altercthe methods f d escribing entral t endency a nd m easures o f v ariability. mean, b u t n o t the m e d i a n or modes. In Figures 2 and 3 the mean, meM e a s u r e s of c e n t r a l t e n d e n c y do T h e i n t e r q u a r t i l e range is a m e a dian, a n d mode(s) are u n e q u a l benot describe the variability, or sure of v a r i a b i l i t y directly related to Type Data of that Variability cause dataof are n o t n o r m a lExample l y distribspread, Measure of data. S t a nodf a rC d ientral z e d estitheMeasure median. Recall the median, a uted. 5 Thus, the m e a s u r e of central m a t e s d e f i n i n gTendency d a t a v a r i a b i l i t y are m e a s u r e of central t e n d e n c y applicat e n d e n c y m o s t useful to data analyn e e d e d to h e l p i n f e r w h e t h e r t w o ble to ordinal and n o n - n o r m a l l y dissis depends on the type of data, and groups studied differ significantly. In t r i b u t e d d a t a , is t h e m i d d l e m o s t w h a tNominal aspect of the data is toMale be con-v other words, measures of variability v a l u e of a setRange of data.? T h e m e d i a n Mode veyed. Fortunately, m o s t physiologic are used to help infer w h e t h e r two or r e p r e s e n t s the 50th p e r c e n t i l e . T h e d a t a are n o r m a l l y or n e a r nFemale ormally m o r e groups studied are drawn from i n t e r q u a r t i l e range is t h a t range ded i s t r i b u t e d so t h a t m e a n , m e d i a n , d i f f e r e n t p o p u l a t i o n s . S e v e r a l estiscribed by the interval b e t w e e n the Ordinal 5 point ordiLikert Median ange 6 and m o d e are equal. However, m a t e s of variability exist. 25th Interquartile and 75th percentileRvalues. nal scale data have no consistent It has been suggested that the inm a g n i t u d e of d i f f e r e n c e b e t w e e n terquartile range be used for describu n i t s of the data scale, and m o s t orThe range is the interval b e t w e e n ing the v a r i a b i l i t y of data that do n o t d i n a l d a t a are n o t n o r m a l l y distribthe lowest and highest values w i t h i n m e e t p a r a m e t r i c analysis standards, Continuous Heart r ate Mean Standard Deviation uted. 3 Therefore, t h e m e a n is misa data group. 2 It is the s i m p l e s t measuch as ordinal scale data. 6 T h e interleading as a m e a s u r e of central tensure of v a r i a b i l i t y to u n d e r s t a n d and q u a r t i l e range clearly defines w h e r e dency for ordinal scale data.Z, 3 dentify. While simple, the range the m i d d l e 50% of measures occurs N o single m e a s u r e of central ten- ionly considers the e x t r e m e values of and indicates the spread of the data Summary and Conclusions about central tendency and variability MEASURESOF VARIABILITY InterquartileRange Range 19:3 March 1990 Annals of Emergency Medicine 311/143 Characteristic Useful with continuous data Useful with ordinal data Useful with nominal data Affected by outliers Mean Yes No No Yes Median Yes Yes No No Mode Yes Yes Yes No Confidence Intervals (CI) Definition The most basic way to think about confidence intervals is to think of them as mathematical predictions about where the real value for the variable exists. • We typically use 95% confidence intervals in clinical medicine. • A confidence interval simply takes the data we actually have in our sample and tells us how this applies to the population (real world). • Examples o I measure the heart rate of 200 active duty soldiers and find that the mean heart rate is 50 with a 95% CI of 42-‐61. The correct interpretation of this is that my sample mean is 50 beats per minute but that I am 95% certain that the mean heart rate for the entire population of active duty soldiers (whom I did not study) is between 42 and 61. o I ask 40 parents if they like the experience their child had when receiving intranasal fentanyl as a sedative. We use the following Likert scale (1-‐ hate, 2-‐ dislike, 3-‐ neutral, 4-‐ like, 5-‐ love). We find a median score of 4 and our 95% CI comes back at 3-‐5. The correct interpretation is that in our sample, half of the parents liked or loved the sedation. In the real world we are 95% certain that 50% of parents will fall somewhere between 3 (neutral) and 5 (love). o I take 30 doctors I know and ask them if they know what a confidence interval is. I report that 40% do know what it is with a 95% CI of 5-‐ 75%. The correct interpretation is that in my sample 40% of physicians knew what a confidence interval is, and that I’m 95% certain that between 5% and 75% of physicians know what a confidence interval is. • Note: You will sometimes see a researcher describe their sample with a 95% CI. So you look at the first table and see they are reporting that they enrolled 65% males with a 95% CI of 55-‐75%. This is a little confusing if you do not understand confidence intervals. Most people will look at that and think…wait, they can’t count? Why aren’t they 100% certain that they have a sample with 65% males. In reality, they are saying that they have a sample with 65% males, and that they think that in the population they are studying, there are somewhere between 55-‐75% males. The confidence interval is derived from the sample data but refers to the overall population. Methods There are mathematical formulas to derive confidence intervals for almost any type of data. You can read a very thorough review and explanation with formulas in Chapter 7 of Glantz’s book.3 Here are some key points. • Standard Error of the Mean (SEM)-‐ You may see this term used. Essentially this is a mathematical way of taking the standard deviation from a sample, and determining how representative it is of the population. The SEM is then used to calculate a confidence interval. This is only useful for continuous data, but is always used as the example for how to calculate confidence intervals. o Note: The SEM should never be reported for a sample. It is a measure relating to the population, not the sample. Since the SEM is always smaller than the standard deviation of a population, some authors mistakenly use the SEM instead of the standard deviation. • You may select various cutoffs of confidence for your confidence interval. We typically use 95% in medicine, but you can choose 90%, 99% or whatever other number you would like. If you want to be 99% sure that your confidence intervals include the population values then the width of the confidence interval will be wider. • The sample size directly impacts the width of the confidence interval. If you have a large sample size, your confidence interval “tightens” or “narrow”. Example-‐ Below is a graph showing our data from the blood pressure measurements in normal men aged 30-‐40. It shows the relationship between SD, SEM, and the confidence interval.2 ~XX.K, x x21¢xN <~3¢.K; I I l 100 l I I I 110 I I 120 130 SD = 9.37 SEM I I I 140 SD = 9.37 = 1,71 SEM = 1,71 I I 1 9 5 % CI l MEAN FIGURE 5. Systolic blood pressure of m e n aged 31 to 40 years. SD, SE, and 95% CI are noted. TABLE 3. Applicability of measures of variability Range Interquartile Range SD SEM Useful to describe interval or ratio data Yes Yes Yes Yes Used to describe ordinal data Yes Yes No No Characteristic statistical inference techniques. T h e calculation of SD from the n o r m a l l y distributed systolic blood pressure data of Figure 1 is shown (Table 2). values. The SEM e s t i m a t e the preof a sample, as it a t i o n from w h i c h wn.lO, 11 The SEM n e s t i m a t e of the ta about the samuld n o t be used as f u l b e c a u s e i t is l a t i o n of " c o n f i w h i c h c o n t a i n an e m e a n for an enm w h i c h the samnfidence intervals riptive or inferen- SEM for the nordata p r e s e n t e d in Table 2). on Versus the Mean M are measures of r, the two statisnd are f r e q u e n t l y e d , la T h e SD deof s a m p l e d a t a ple mean. The SD an the SEM. The o n l y calculated to nce intervals. have c o m m e n t e d a l sleight of h a n d SEM when only o describe sample A3 Bunce et aU 3 es in six journals SD or SEM were SEM values were y t h e SD w o u l d ate. T h e a u t h o r s a n y workers m a y e SEM because it t h a n t h e SD. ''13 use of SEM to dev a r i a b i l i t y m a y be r s in an a t t e m p t nificant difference ups, w h e n in fact . Elenbaas et al la oncluding that audata as m e a n _+ a n ± SD m a y be m p a i r the reader's identify the varidata. or by design, it is epresent the varid a t a as m e a n _+ hat readers m u l t i t o obtain the SD eously used to exility. It is not an Example: The following Table2 shows how changing your confidence level, or your sample size affects the reporting of the confidence intervals. TABLE 4. Effect of confidence level and sample size on confidence interval width Calculation of CIs for Data Presented in Table 2 C1(%) SD n SEM CI 90 9.377 30 1.712 120 -+ 2.82 95 9.377 30 1.712 120 -+ 3.36 99 9.377 30 1.712 120 _+ 3.83 Effect of Sample Size on CI for Data With A Mean of 120 and a SD of 9.377 C1(%) SD n SEM CI 95 9.377 30 1.712 120 _+ 3.36 95 9.377 100 0.938 120 _+ 1.84 95 9.377 1000 0.297 120 ± 0.582 error toCuonfidence s e SEM in Isntervals peculating Using a lationA s Also, the closer a p o i n t lies range, or confidence interval, w i t h i n to t h e m i d d l e of t h e CI, t h e m o r e w h i c h aConfidence intervals an be l iuk sed two different ofwthe ays: t r u e population m e a n cis e l y itin is-representative population. 16 l i k e l y to fall. The SD and SEM of the data shown in Figure 1 are given (FigThough by convention the 95% CI urely For the example, according to an u r e 1. 5). Confidence T a b l e 3 s u m imntervals a r i z e s tchan e be ispm o s t c odmescriptive. m o n l y reported, 95% 4 analysis o f t he C anadian C T H ead R ule t he s ensitivity o f t he r ule for finding proper use of e s t i m a t e s of variability. level is n o t rigidly required. W i d e r CIs, such as a 99% or 99.9% CI, are neurosurgical lesions was reported as 100% 95CI (64.6-‐100). The authors Confidence Intervals even m o r e l i k e l y to include the true their from sample nd they are 95% p o p u l a tiis o na 1 p a00% r a m e tse ensitivity r v a l u e and aare W h e n are s t a t issaying t i c s derived the estimate samplingcertain of a p o ptuhe l a t itorue n aretest studied c o m mis o nslomewhere y used for critical appraisal sensitivity between 64.6% and 100%. of data. T h e y are also advocated for to infer values for p o p u l a t i o n paramConfidence can also ake during inferences. eters,2.it w o u l d be usefulintervals to have cone x ab me i nua sed t i o n sto ofm data ongoing We will discuss aicn c utm u l fauture t i o n of b subjects in a sclinical ypothesis he ut a brief ummary here will make f i d e n c e classical t h a t t h e sh am p l e s t a t i s t itcesting al value, such as a m e a n or SD, w o u l d trial. 15 N a r r o w e r CIs, s u c h as the the idea more clear. be representative of the true popula90% CI, can be used w h e n s t u d y aution p a r a m e ta. e r . Classical One c a n n o thypothesis be certthesting o r s f i np d roduces i t a c c e p taa bPl -‐value e t h a t at end n tells the tain that a s a m p l researcher/reader e statistical value is t i m e s out of 100, the true p o p u l a t i o that the observed difference is nSTATISTICALLY representative of the true population p a r a m e t e r m a y n o t lie w i t h i n the CI. significant. othing of the the of that difference. parameter, b u t one can c a l c u lIat t esays a nHowever, w im d tagnitude h of a CI depends n o t only on the variability of the data between two range of values k eilnstead y to be represenb. l iIf a researcher reports the actual difference and the level of confidence selected, t a t i v e of t h e p onumbers p u l a t i o n paand r a mgeives abut 95% confidence interval then the magnitude of ter.4,14 That range of values is called also the sample size. is obvious. IW n haeddition, if the ca onfidence a c o n f i d e n c e i n t ethe r v a leffect (CI). Calculan one broadens CI by mov- interval crosses the ing a 95% to aa99% CI,Saccuracy is tion of a CI is a m“identity” e t h o d of e s t p i moint, a t i n g then the results re not TATISTICALLY significant the range of values l i k e l y to include i n c r e a s e d b e c a u s e t h e c a l c u l a t e d CI “identity bpeoint” he i “nnull” c o m e s (also m o r e clalled i k e l y tto c l u d e paoint, or “no effect” the true value of a p o pi.u l a tThe i o n paramp o p u ltahat tionm parameter. However, e t e r . S i n c e o n e c a n n o t point) s t u d y ias l lthe ntrue umber eans there is no effect. If you are w h e n the level of CI is held constant m e m b e r s of a population, a represensubtracting two results then obviously the null point is zero. t a t i v e s a m p l e of t h e p o p u l a t i o n is and s a m p l e size is increased, SEM is This i s w hy i t i s c a confidence interval d e c r eommonly a s e d a n d t htaught u s t h e that CI isif narstudied, and from this one uses the rowed. This narrowing of the CI in- however that for a m e a n and SEM to w o r k backward to crosses zero it is not significant. Remember creases the precision of the CI. The e s t i m a t e a CI. ratio, the point actually “1”. A rselected atio of 1 is meaningless. ofis level of confidence T h e w i d t h of the CI depends on null effect the SEM and c. the The degreesummary of confidence and s a m p l e size on the w i d t h of a CI between two of this is that you find the difference we arbitrarily choose. For instance, a is shown (Table 4). numbers and report the confidence interval around it. If that 95% CI, w h i c h is the degree of confiC a l c u l a t i o n of the CI for e s t i m a interval ction ontains statistically not of truea n p oumber p u l a t i o nthat m e a ins values dence m o s t c o m confidence m o n l y expressed, ~4 is a range of v asignificant l u e s b r o a d e(nthe o u g hnull p applies n t i n u o u s data noroint) to or ccolinically not from significant (2BPM m a l or n e a r - n o r m a l d i s t r i b u t i o n s . 4 that, if the entire p o p u l a t i o n could be reduction n heart rate) ou becan eject he such results. Also,then a CI y can e s tri m a t e d tfor studied, 95% of the t i m e the ipopulao t h e r s t a t i s t i c s as m e d i a n s , regrestion m e a n w o u l d fall w i t h the CI ess i o n slopes, r e l a t i v e r i s k data, ret i m a t e d from the sample of the popuAnnals of Emergency Medicine 19:3 March 1990 Confidence Intervals and Hypothesis Testing (Examples): Let us pretend we have a new drug called RATE-‐A-‐BLATE that is supposed to lower the heart rate rapidly and without any blood pressure side effects in people with atrial fibrillation and rapid-‐ventricular response. Using this drug, we can look at the various ways confidence intervals can be used and how they are impacted by certain factors. 1. In the first experiment, we take 10 patients and give them RATE-‐A-‐BLATE (RAB). We find that in the sample RAB drops the heart rate approximately 30 beats per minute. After the math, we report the results as -‐30 95% CI (-‐ 50-‐2). -‐50 -‐30 0 So based on this sample we are 95% certain that the drug may lower the heart rate on average 50 BPM or raise it 2BPM. Looking at this data you would think, “well I do not want to give a drug that may raise the heart rate a few beats, but the possible values for this drug (negative 50 through 2) are predominantly on the heart rate lowering side. If another study was performed with larger numbers we might get a narrower range of possible values to examine.” 2. In the second experiment we add 40 more patients in atrial fibrillation with rapid ventricular response so that we are now studying a total of 50 patients. Our RAB study finds a mean change in heart rate of -‐30 95%CI (-‐40-‐ -‐20). -‐50 -‐30 0 This time we can say that we are 95% certain our drug lowers heart rate between 20 and 40 beats per minute. The increase in sample size has narrowed our confidence interval. Now we are likely to accept that the drug is most certainly effective at lowering heart rate. If we had just reported the sample means in both cases, then we would all just think that RAB always lowers heart rate by 30bpm. Do not trust a study that does that. 3. Now let us move on to study RAB versus diltiazem. Assume we run a well-‐ designed double-‐blind, randomized control trial and determine that once again RAB gives us a mean change in heart rate of -‐30 95%CI (-‐40-‐ -‐20). We also find that diltiazem causes a mean change in heart rate of -‐20 (-‐30-‐ -‐10). -‐50 -‐30 0 Diltiazem RAB Notice that initially we are simply describing the two heart rate lowering properties of the drugs. Looking at the data there appears to be a lot of overlap in the possible mean heart rate lowering effects of the drug. This may not be statistically significant. So we take our mean heart rate changes from the two drugs and compare them using the Student’s T test (more in another lecture) and find that RAB lowers the heart rate by 10 bpm more than diltiazem but with a P-‐ value of >.05 so it is not statistically significant. Another way (some argue a better way) is to just show the mean difference with a confidence interval. We get a -‐10 95%CI (-‐35-‐ 5). -‐35 -‐10 0 5 Mean difference in heart rate between diltiazem and RAB This is the same information simply presented in a different fashion. Instead of just saying there was a -‐10BPM difference between the two drugs and it was not statistically significant, we can see how close it came to being statistically significant. In this graph we are plotting the difference between two numbers so a “zero” is not statistically significant. But we and realize that it was “almost” statistically significant. We plan another study. 4. Once again we plan to increase our sample size to see if the mean difference between the two drugs is statistically significant. We enroll over 5000 patients into the same double-‐blind RCT comparing diltiazem to RAB. We end up this time with results indicating that the mean difference between RAB and diltiazem is -‐10 95% CI (-‐11-‐ -‐9). -‐35 -‐10 0 5 Mean difference in heart rate between diltiazem and RAB This time we did it. The confidence interval is very narrow (only 1 BPM on either side of our sample estimate of -‐10BPM). Plus it does not cross zero (and sure enough the P-‐value will be <.05). It looks like with enough patients; we have finally shown that RAB has a statistically significant reduction in heart rate over diltiazem of about 10BPM (to be exact it could be anywhere from 9 to 11). Example Wrap-‐Up: We can make the following conclusions about RAB based on our studies: • RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response by an average of about 30 BPM but could lower it anywhere from 20 to 40 BPM compared to doing nothing. • RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response a little better than diltiazem. It lowers the heart rate about 10 BPM but could lower it anywhere from 9 to 11 BPM. But what does this tell us clinically? In the end these are only STATISTICALLY significant numbers. Remember that CLINICAL significance is not the same thing. Consider the following situations: • RAB is cheaper, and does not drop blood pressure like diltiazem. You think that the data supports its use. • RAB turns out to be more expensive, and does not drop blood pressure like diltiazem. You decide to only use RAB in normotensive or hypotensive patients and save costs by using diltiazem when you have a hypertensive patient. • RAB turns out to be less expensive, and does not drop blood pressure like diltiazem; however it is associated with causing a myocardial infarction about 10% of the time it is given. You decide that those extra 10BPM over diltiazem (as well as the cost and BP stable effects) are not worth the consequences. • RAB turns out to be less expensive, and does not drop blood pressure like diltiazem. You decide however that a 10 BPM difference over diltiazem is not really that clinically significant and you are more used to diltiazem so you continue to use it instead of RAB. All of those conclusions are possible and individualized based on the data. If you look at a confidence interval, think about all of the values in the range as the “real world value” and think it is worth your time, then use that intervention. If not, drop it. Resources: • • • • I highly recommend the following two references:2,5 For a very complete description of all things related to confidence intervals refer to.3 If you want to plan a study and derive the sample size based on a certain confidence interval you wish to obtain, go to: http://www.epibiostat.ucsf.edu/dcr/ If you have data and want to calculate a confidence interval around it go to: http://www.graphpad.com/quickcalcs/ References: 1. 2. 3. 4. 5. Glaser AN. High-‐Yield Biostatistics. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2005. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 2, Descriptive statistics. Annals of emergency medicine. Mar 1990;19(3):309-‐315. Glantz SA. Primer of Biostatistics. 6th ed. New York: McGraw-‐Hill; 2005. Smits M, Dippel DW, de Haan GG, et al. External validation of the Canadian CT Head Rule and the New Orleans Criteria for CT scanning in patients with minor head injury. JAMA : the journal of the American Medical Association. Sep 28 2005;294(12):1519-‐1525. Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 1, Basic concepts. Annals of emergency medicine. Jan 1990;19(1):86-‐89.