Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary Statistics & Confidence Intervals Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal NHS Foundation Trust [email protected] 0161 2064567 Timetable Time Task 60 mins Presentation 20 mins Coffee Break 90 mins Practical Tasks in IT Room Outline • Sampling • Summary statistics • Confidence intervals • Statistics Packages ‘Population’ and ‘Sample’ • Studying population of interest. Usually would like to know typical value and spread of outcome measure in population. • Data from entire population usually impossible or inefficient/expensive so take a sample (even census data can have missing values). • Sample must be representative of population. • Randomise! E.g. Randomised Controlled Trial (RCT) POPULATION GROUP 1 OUTCOME GROUP 2 OUTCOME SAMPLE RANDOMISATION Types of Data Categorical Numerical/Continuous Example: • Yes/No • Blood Group Example: • Weight • Pain Score Graphs: • Bar Chart • Pie Chart Graphs: • Histogram • Box and Whisker Plot Summary: • Frequency (n) • Proportion (%) Summary: • Mean & Standard Deviation (SD) • Median & Inter-quartile range (IQR) Types of Average (‘Average’ - a number which typifies a set of numbers) • Mean = Total divided by n • Median = Middle value • Mode = Most common value/group (rarely used) Types of Average - Example Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4 Median 2nd 3rd 5th 6th 8th 9th Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10 Mean = (1 + 3 + 4 + … + 10) ÷ 10 = 5.8 Median = (6+7) ÷ 2 = 6.5 Mode = 7 Mean or Median? Roughly Normally distributed: • Mean or median • Mean by convention 20 15 10 5 0 -3 -1 1 3 5 7 9 11 13 15 17 19 21 50 40 30 20 10 0 0 1 2 3 4 5 6 7 Skewed: • Median • Less affected by extreme values Variation and Spread • Standard Deviation (‘SD’) - Average distance from mean - Use alongside mean • Inter-Quartile Range (‘IQR’) - Range in which middle 50% of the data lie (middle 50% when ordered) - Use alongside median • Range - Highest and lowest value - Possibly quote in addition to SD/IQR Types of Variation - Example Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4 Median 2nd 3rd 5th 6th 8th 9th Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10 IQR SD = 2.6 IQR = (3.75, 7.25) Range = (1,10) Standard Error • Not the same as standard deviation. • Calculated using a measure of variability and sample size. • Used to construct confidence intervals. • Not very informative when given alongside statistics or as error bars on a plot. Sample statistic is the best guess of the (true) population value • E.g. Sample mean is the best estimate of mean in population. • Mean likely to be different if take a new sample from the population. • Know that estimate not likely to be exactly right. Confidence Intervals (CIs) • Confidence interval = “range of values that we can be confident will contain the true value of the population”. • The “give or take a bit” for best estimate. • Convention is to use a 95% confidence interval (‘95% CI’). • But also leaves 5% confidence that this interval does not contain the true value. Example: Legislation for smoke-free workplaces and health of bar workers in Ireland: before and after study (Allwright et al; BMJ Oct 2005) Salivary cotinine (nmol/l) Before N=138 After N=138 Difference (95% CI) 29.0 5.1 -22.7 (-26.7 to -19.0) Median Any respiratory symptoms n (%) 90 (65%) 67 (49%) -16.7 (-26.1 to -7.3) Runny nose/sneezing n (%) 61 (44%) 48 (35%) -9.4 (-19.8 to 0.9) Example: Supplementary feeding with either ready-touse fortified spread or corn-soy blend in wasted adults starting antiretroviral therapy in Malawi (MacDonald et al; BMJ May 2009) “After 14 weeks, patients receiving fortified spread had a greater increase in BMI and fat-free body mass than those receiving corn-soy blend: 2.2 (SD 1.9) v 1.7 (SD 1.6) (difference 0.5, 95% confidence interval 0.2 to 0.8), and 2.9 (SD 3.2) v 2.2 (SD 3.0) kg (difference 0.7 kg, 0.2 to 1.2 kg), respectively.” Example: Sample size matters What proportion of patients attending clinic are satisfied? Sample size 10 Number satisfied 7 Proportion satisfied 70% 95% CI for proportion 35% to 93% 25 18 70% 50% to 88% 50 35 70% 55% to 82% 100 70 70% 60% to 79% 1000 700 70% 67% to 73% Example: % confidence matters What proportion of patients attending clinic are satisfied? Sample size = 50 No. satisfied = 35 Proportion satisfied= 70% 90% CI 58% to 81% 95% CI 55% to 82% 99% CI 51% to 85% p-values vs. Confidence Intervals • p-value: - Weight of evidence to reject null hypothesis - No clinical interpretation • - Confidence Interval: Can be used to reject null hypothesis Clinical interpretation Effect size Direction of effect Precision of population estimate So… it’s not all about p-values! • For some hypotheses p-value and CI will both indicate whether to reject it or not. • A CI will also provide an estimate, as well as a range for that estimate. • General medical journals prefer CI. Statistical Packages Package Summary Statistics SPSS Stats Direct • Not user-friendly • Gives a large choice of statistics to calculate Confidence Intervals Doesn’t provide a CI for some key comparative statistics: e.g. simple percentage • One right-click Provides a CI for most • Will produce a set statistics 20 or so of the most commonly used statistics Thanks for listening!