Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
ICRO Course Statistics Refresher DR. INDRANIL MALLICK TATA MEDICAL CENTER, KOLKATA NOV 2016, BANGALORE What are we discussing today? Not everything about medical statistics! A few important concepts: A) Population and Sampling B) Hypothesis testing, the p-value and ‘significance’ B) Choosing a statistical test C) Survival analysis A free question round… ask me what you want. Populations and samples Answer these questions 1. What is the mean age of breast cancer patients in India? 2. What is the mean dose of Radiotherapy to the left parotid gland to the head and neck cancers treated in your hospital in the last 20 years? 3. What proportion of breast cancer patients in India are Her-2-neu positive? 4. What is the mean ADC value in DW-MRI of cervical cancer primaries? We may not have data on all patients Populations and samples Population – entire group of individuals in whom we are interested Sample – a smaller group of individuals who are representative of the population. We often calculate summary measures from a sample to draw conclusions about the population Sampling error – an error introduced in the ‘point estimate’ or sample statistic by using a sample instead of a population The error can be kept small by taking a random sample, or a representative sample Estimating the population from the sample We can estimate several measures of the population from a sample ◦ Mean or Proportion ◦ Standard deviation The value for a sample is never identical to the population How do we know how precisely we have estimated? If we can estimate the sampling error, then we can estimate where the actual population summary measures could lie. ◦ Standard error of mean ◦ Standard error of proportion Standard errors Standard error of mean ◦ Assumption: sample is large/small but follows a normal distribution – the means of several samples then also follow a normal distribution, and the standard deviation of the means gives an estimate of the standard error 𝜎 𝑠 ◦ SEM = = √𝑛 √𝑛 ◦ Factors affecting the standard error: SD increases, sample size decreases Standard error of proportion ◦ SE(p) = 𝑝(1−𝑝) 𝑛 Standard deviation vs. Standard Error Standard deviation (SD) ◦ variability of data values in a dataset ◦ we are not concerned estimation of a larger population Standard error (SE) ◦ precision of mean ◦ Concerned about estimation of the mean/proportion of a larger population Confidence intervals 95% Confidence intervals of the mean ◦ = sample mean +/- 1.96*SEM 95% confidence intervals of the proportion ◦ Sample proportion +/- 1.96*SE(p) Interpretation: ◦ How wide is it? ◦ Is it clinically important? ◦ Does it contain the hypothesized value? Confidence intervals A tricky concept! When we try to estimate a population parameter (e.g. mean, proportion, difference between values) from a sample estimate, the estimate is often described with a ‘confidence interval’. The mean weight of a random sample of 100 students from a group of 10,000 is 34 kg (95% CI = 31 to 38 kg) 95% of the students weight will be between 31 to 38 kg There is 95% probability that the true mean weight of the whole class is between 31 to 38 kg If we take many such samples and calculate interval estimates, then 95% of interval estimates will contain the true mean of the population. Confidence intervals are a comment on the sampling method – not the ‘probability’ of the true mean Probability and Hypothesis testing Probability Basically: How likely is an event? Properties of probability: ◦Lies between 0 and 1 ◦When an outcome can never happen, p=0 ◦When an outcome must happen, p=1 Hypothesis testing What is a hypothesis? A theory (based on observations/ expectations) Expressing the hypothesis Null hypothesis (H0)= assumes no effect in the population (difference in means = 0) ◦ Exercising does not change blood cholesterol levels Alternative hypothesis (H1) = when the null hypothesis is false (usually what we are trying to investigate) ◦ Exercising changes blood cholesterol levels ◦ ‘one tailed’ vs ‘two tailed’ testing Steps of hypothesis testing Define the null and alternative hypothesis Obtain the values from the control and test populations/situations Calculate the test statistic Compare the test statistic to a table of known probability distributions Obtain and interpret the p values The p value The mean duration of hospital stay for open vs. robot assisted radical prostatectomy was 12 days vs 9 days (p=0.03) The incidence of grade 3/4 haematological toxicity was ◦ 15% with RT alone vs. 20% with RT + cetuximab (p=0.002) ◦ 16% with RT alone vs. 22% with RT + cisplatin (p=0.02) The median survival with RT alone was 23 months vs 30 months with chemoradiotherapy (p=0.09) The p value The probability of obtaining these results, or something more extreme, if the null hypothesis is true Does not quantify difference Is not the same as = probability of the null hypothesis being true. The null hypothesis is either true (accepted) /false (rejected). Using the p value to derive conclusions Can we reject the null hypothesis? An arbitrary cut-off of 5% (0.05) is often used. ◦ If p <0.05 – the probability of getting this result/difference if the null hypothesis is true is <5% - we reject H0 (at a significance level of 5%) ◦ If p>0.05 – ‘we cannot reject the null hypothesis’ (not the same as there is no difference) The cut-off value can be changed (made stricter) by using 1% (p<0.01) or 0.1% (p<0.001) ◦ If the implications of rejecting the null hypothesis are very severe ◦ Multiple comparisons are being made ◦ Must be decide before data is collected What does a non-significant p value mean? The median survival with RT alone was 23 months vs 30 months with chemoradiotherapy (p=0.09) It does not mean that the two groups being studied are the same (or that there is no difference) It simply means that from the results obtained – we cannot conclude that there is a difference (we cannot reject the null hypothesis) Some Reasons –inadequate sample size (power) Are we slaves to the p value? Using confidence intervals when comparing groups Hypothesis test - make a decision and provide an exact p-value. Confidence interval – ◦ quantifies the effect of interest (e.g. the difference in means) ◦ enables us to assess the clinical implications of the results. Provides a range of plausible values for the true effect - can also be used to make a decision about the p value even though the exact Pvalue is not provided. ◦ For example, if the hypothesized value for the effect (e.g. zero) lies outside the 95% confidence interval then we believe the hypothesized value is implausible and would reject H0. In this instance, we know that the P-value is less than 0.05 but do not know its exact value Statistical significance vs clinical significance Association is not causation Errors in hypothesis testing Errors in hypothesis testing Null hypothesis accepted Null hypothesis rejected (non-significant p value) (significant p value) Null hypothesis is true (no difference in groups) Null hypothesis is false (there is actually a difference) Correct interpretation Type I error () – false positive result Type II error () – false negative result Power (1-) Acceptable error (commonly used) Type I = 5% or 0.05 Type II = 20% or 0.2 Power = ‘ability to detect a difference if there is one’ Factors affecting power The sample size Larger sample = higher power The variability Larger SD = less power The effect size Larger effect size = higher power The significance level p<0.01 = higher power Principle of sample size calculations Calculation of an appropriate sample size in studies is crucial The methodology used depends on the type of estimation/comparison Example: ◦ Difference of means between two groups: ◦ Expected means in the two groups (includes effect size) ◦ Standard deviation ◦ Type I error and Power Online calculator – example Equivalence and non-inferiority trials When? ◦ New treatment is less toxic/simpler/less expensive ◦ Bio-equivalence studies of drugs What’s the difference: The traditional null and alternate hypothesis does not hold. (their roles are essentially reversed) The significance test or p value is of limited use The confidence intervals are important Clinically important effect size is important Multiple comparisons – what is wrong? Examples Very common in research Subgroup testing Multiple comparisons – between 2+ groups, different timepoints Multiple outcome variables Interim analyses Greatly increases the chance of false positive results If =0.05, then the rate of acceptable false positive results is 5%. If we make multiple comparisons then the false positive rate is much higher – > 60% for 20 comparisons Multiple comparisons - solutions Define a stricter type I error () threshold e.g. 0.01 Correct the p value obtained by multiplying it with the number of comparisons carried out (Bonferroni correction) ◦ E.g. p value obtained = 0.02, no of comparisons made = 6, corrected p value = 0.02 x 6 = 0.12 Plan a subgroup analysis a priori and make sure that it is adequately powered to detect a significant difference. Choosing the right statistical test Tests for comparison of two or more groups What kind of data is it? ◦ Numerical or categorical ◦ Numerical: likely to be normally distributed or very skewed? ◦ Categorical: are the categories nominal or ordinal? Who is being compared? ◦ Same group at different times/circumstances ◦ Different groups ◦ How many groups? What type of data is this? Age Sex Tumor size (maximum dimension) N stage Type of treatment: Surgery vs. surgery + RT Moderate dose RT (60Gy) vs High dose RT (70Gy) Severity of reactions: Grade 1, Grade 2, Grade 3, Grade 4 Numerical data Catgeorical - 1 group 1 sample Sign test 2 groups >2 groups Paired Unpaired Unpaired Paired T test Unpaired T test ANOVA Wilcoxon signed rank test Mann Whitney U test Kruskall Wallis test Mann-Whitney U test Categorical data Categorical – 2 categories 1 group 1 sample Z-test for proportion Sign test 2 groups Paired McNemar test >2 groups Unpaired Chi-square test Fisher exact test Mann-Whitney U test Unpaired Chi-square test Chi-square test for trend Chi-square test - proportions ◦2 x 2 or r x c contingency table ◦Observed and expected values ◦Chi-square value ◦Degrees of freedom ◦Chi-square table Chi-square test – important variants Chi-square test for trend (ordinal values) McNemar test (paired value) Fisher exact test = when the expected value in a cell of the 2 x 2 contingency table is less than 5 Correlation and Regression Correlation: Measures the degree of association between two variables (x and y). Regression: Measures how one variable (x) is affected by one or more other variables (y). Tries to predict what x will be based on the value of y (creates an equation) Correlation Pearson’s correlation – 2 numerical variables ◦ Assumptions: Paired, Linearity, outliers, bivariate normality Spearman’s correlation – numerical or ordinal variables ◦ Assumptions: Monotonic, paired Interpreting correlation Coefficient: -1 to +1. Zero means no correlation. Positive and negative correlation r increases when the range of measurements of x and y increases r2 = what percentage of the variability in y is explained by its linear relationship with x Linear Regression Simple linear regression: predicting x from y Multilinear regression: predicting x from y1, y2, y3, y4 Logistic regression Logistic regression: when x is a dichotomous variable (yes/no) Output is in the form of odds ratios with confidence intervals for each variable Survival analysis Summarizing and comparing survival How do we summarize and compare survival (in months) between two groups of patients, randomized to radiotherapy vs. chemoradiotherapy for advanced cervical cancer? a) Paired T test b) Unpaired T test c) Wilcoxon signed rank test d) Mann-Whitney U test Time-to-event data Described by • Event • The time of the event • Not all patients have had the event at the time of analysis Understanding censoring Censoring can occur by virtue of patients lost to follow up or end of study. Patients The event in question could have happened to the individual after the last follow-up or end of study. Follow up and outcomes The Kaplan Meier Curve • Cumulative survival probability plotted over time • Step ladder pattern • Step occurs at each event time point • The size of the step is determined by the number of events and the number of cases at risk at that time point Can you read this curve? Tabulating and Plotting a KM survival curve Survival starts at 1 (or 100%) Note each time-point (ti where i=1 to n) Each each timepoint t note how many events (di) and number of individuals at risk (ni) The survival probability right after each event is 1-di/ni The cumulative survival probability is the running product of each calculation. Let’s plot a survival curve! Let’s plot a survival curve! Comparing time-to-event data Testing one factor: Log rank test (univariate analysis) ◦ Prospective comparative study – treatment ◦ Retrospective study – treatment, prognostic factor Testing many factors simultaneously: Cox regression (multivariate analysis) ◦ Cox proportional hazards model ◦ Tests if a factor affects the rate of the event occurring after the influence of other factors have been eliminated ◦ Shows how much more likely an event is based on the change of a factor Hazard ratio Represents the increased risk of an event happening if the independent factor changes For categorical factors For numerical factors Your questions now… Educational activities at TMC Kolkata FRCR (Clinical Oncology) ◦ Examinations – Part 1 and 2a – Spring and Autumn ◦ Part 1 course (with Christie Hospital, Manchester) – Jan 2017 ◦ Part 2a course (with Leeds Oncology Center) – Dec 2016 www.igrtonline.com – 1100 participants from 53 countries certification 2 year Clinical Oncology Fellowship – modelled on the FRCR requirements