* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Biostatistics - A Revist (for DT204
Survey
Document related concepts
Transcript
Biostatistics – A Revisit What are they? Why do we need them? Their relevance and importance The Research Question (+ scientific hypothesis) - Be specific with the question and familiar with the literature Select a Study design to answer the question – observation versus intervention Study design issue – the sample - type of sample? Random or non-random - sample size? (question of power and feasibility) Lab/Field work + Data entry - ensure quality control, most important How should you present your data? – Descriptive Statistics -as they are i.e. as tables of raw numbers -Numerical summaries * frequencies (%) * mean and SD (or median and IQR) -visually (in figures or graphs) -Generally tables of raw data go into appendix -Tables useful for detailed and specific information (data) - Graphs/Figures for visual impact (less precise) -Frequencies (%) or means? To answer this need to know what kind of data you have What kind of data do you have? Categorical Nominal – no order e.g. Gender, hair colour Ordinal – has an order e.g. Gene type, blood group, Age group, tumour grade Metric Continuous – measured, has units e.g. Age, Height, Systolic blood pressure Discrete – counts e.g No. of asthma attacks/month white blood cell count, no. of children Categorical data and discrete data Numerically - Express these as frequencies (%) Graphically – use bar charts or pie charts Continuous data (show units of measurement in tables/graphs) Numerically - Mean + SD (for symmetrical distributions) - Median + IQR (75%tile - 25%tile) (for skewed data) Graphically – use histograms or boxplots Most common research questions: Is there a difference, a change, an association? While we may see a difference, a change, an apparent association – how do we go about determining if they are statistically different? INFERENTIAL STATISTICS (excluding chance as a possible explanation of your results) Before choosing an inferential test need to know: what type of data? – categorical or continuous what type of distribution? – normal or non-normal INFERENTIAL STATISTICS – Scenario I Have 2 categorical variables e.g. sex and gene type Research question: is there a difference in gene profile by sex? Data analysis: Run a cross-tab and apply the chi-square statistical test to examine if the two variables are related. If they are - you will see a difference in the gene frequencies between males and females. Also, the p value will be small (< .05) I.e. very unlikely that chance is accounting for the differences observed. The c2 will be large. INFERENTIAL STATISTICS – Scenario II Have 1 categorical variable and 1 continuous variable (most common scenario in lab.) Research question: is there a difference in creatinine levels between smokers and non-smokers? Data analysis: Before choosing an appropriate test answer following questions - how many groups being compared? - are they paired or independent (non-related)? - is creatinine (continuous variable) nomally distributed? (To test for normality run a 1 K-S test) If data is normally distributed then select a parametric test – two groups non-related so select an independent sample t-test (most common test) If data not normally distributed (e.g. very skewed) then you have a choice: i)Transform the data and use a parametric test Or ii) use a non-parametric test (e.g. Mann whitney U test) The choice here depends largely on precedent – how have others in the field analysed this type of data Data Normal and involves two groups – related use a paired T-test (T statistic) Data Normal and involves two groups – non-related use a independent T-test (T statistic) Data Normal and involves more than two groups – use a 1-way ANOVA (F statistic) Data is NOT Normally distributed – use a non-parametric test which is equivalent to one of the above situations Non-parametric tests are distribution free – make no assumptions about the data are good for skewed data or small samples are NOT as powerful as parametric tests and therefore not as likely to pick up statistically significant differences when they may be there When is a test statistically significant? When P is small – less than 0.05 (p<0.05) and the test statistic is large e.g. c2 , F statistic (ANOVA), T statistic (T tests). When P is large e.g. p = 0.69 then the test is not significant and chance cannot be excluded as a possible explanation of the results. Always show the exact p value – provides more information than just giving NS (not significant) N.B. Just because a difference is found to be statistically significant does not mean that it is clinically or practically significant. Be sure to distinguish these! INFERENTIAL STATISTICS – Scenario III Have 2 continuous variables – how are they related (associated)? E.g. weight and height Research question: is there an association between weight and height? Data analysis: i) Examine this visually using a scatterplot ii) Examine this statistically using Pearson correlation coefficient r (values -1 to +1). This will assess the strength of linear association between the two variables Association and Agreement Association and Agreement are not the same! when you have a new method and you wish to assess it against a gold standard then it is appropriate to examine the level of agreement as well as linear association. If variable is continuous then use a Bland and Altman plot and assess the statistical differences between methods using a Paired T test. Assess degree of bias by getting the mean of the difference. For categorical data use Kappa. (values > .7 indicate good agreement).