Download Biostatistics - A Revist (for DT204

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Biostatistics – A Revisit
What are they?
Why do we need them?
Their relevance and importance
The Research Question (+ scientific hypothesis)
- Be specific with the question and familiar with the literature
Select a Study design to answer the question
– observation versus intervention
Study design issue – the sample
- type of sample? Random or non-random
- sample size? (question of power and feasibility)
Lab/Field work + Data entry
- ensure quality control, most important
How should you present your data? – Descriptive
Statistics
-as they are i.e. as tables of raw numbers
-Numerical summaries
* frequencies (%)
* mean and SD (or median and IQR)
-visually (in figures or graphs)
-Generally tables of raw data go into appendix
-Tables useful for detailed and specific information (data)
- Graphs/Figures for visual impact (less precise)
-Frequencies (%) or means?
To answer this need to know what kind of data you have
What kind of data do you have?
Categorical
Nominal – no order e.g. Gender, hair colour
Ordinal – has an order e.g. Gene type, blood group,
Age group, tumour grade
Metric
Continuous – measured, has units e.g. Age, Height,
Systolic blood pressure
Discrete – counts e.g No. of asthma attacks/month
white blood cell count, no. of children
Categorical data and discrete data
Numerically - Express these as frequencies (%)
Graphically – use bar charts or pie charts
Continuous data (show units of measurement in tables/graphs)
Numerically - Mean + SD (for symmetrical distributions)
- Median + IQR (75%tile - 25%tile)
(for skewed data)
Graphically – use histograms or boxplots
Most common research questions:
Is there a difference, a change, an association?
While we may see a difference, a change, an apparent
association – how do we go about determining if they are
statistically different?
INFERENTIAL STATISTICS
(excluding chance as a possible explanation of your results)
Before choosing an inferential test need to know:
what type of data? – categorical or continuous
what type of distribution? – normal or non-normal
INFERENTIAL STATISTICS – Scenario I
Have 2 categorical variables e.g. sex and gene type
Research question: is there a difference in gene profile by sex?
Data analysis: Run a cross-tab and apply the chi-square
statistical test to examine if the two variables are related.
If they are - you will see a difference in the gene frequencies
between males and females.
Also, the p value will be small (< .05) I.e. very unlikely that
chance is accounting for the differences observed.
The c2 will be large.
INFERENTIAL STATISTICS – Scenario II
Have 1 categorical variable and 1 continuous variable
(most common scenario in lab.)
Research question: is there a difference in creatinine levels
between smokers and non-smokers?
Data analysis:
Before choosing an appropriate test answer following questions
- how many groups being compared?
- are they paired or independent (non-related)?
- is creatinine (continuous variable) nomally distributed?
(To test for normality run a 1 K-S test)
If data is normally distributed then select a parametric test – two
groups non-related so select an independent sample t-test (most
common test)
If data not normally distributed (e.g. very skewed) then you have
a choice:
i)Transform the data and use a parametric test
Or
ii) use a non-parametric test (e.g. Mann whitney U test)
The choice here depends largely on precedent – how have others
in the field analysed this type of data
Data Normal and involves two groups – related
use a paired T-test (T statistic)
Data Normal and involves two groups – non-related
use a independent T-test (T statistic)
Data Normal and involves more than two groups –
use a 1-way ANOVA (F statistic)
Data is NOT Normally distributed
– use a non-parametric test which is equivalent
to one of the above situations
Non-parametric tests
are distribution free – make no assumptions about
the data
are good for skewed data or small samples
are NOT as powerful as parametric tests and therefore
not as likely to pick up statistically significant
differences when they may be there
When is a test statistically significant?
When P is small – less than 0.05 (p<0.05) and the test statistic
is large e.g. c2 , F statistic (ANOVA), T statistic (T tests).
When P is large e.g. p = 0.69 then the test is not significant
and chance cannot be excluded as a possible explanation of the
results. Always show the exact p value – provides more
information than just giving NS (not significant)
N.B. Just because a difference is found to be statistically
significant does not mean that it is clinically or practically
significant. Be sure to distinguish these!
INFERENTIAL STATISTICS – Scenario III
Have 2 continuous variables – how are they related (associated)?
E.g. weight and height
Research question: is there an association between weight and
height?
Data analysis:
i) Examine this visually using a scatterplot
ii) Examine this statistically using Pearson correlation
coefficient r (values -1 to +1). This will assess the strength
of linear association between the two variables
Association and Agreement
Association and Agreement are not the same!
when you have a new method and you wish to
assess it against a gold standard then it is
appropriate to examine the level of agreement as
well as linear association.
If variable is continuous then use a Bland and
Altman plot and assess the statistical differences
between methods using a Paired T test. Assess
degree of bias by getting the mean of the difference.
For categorical data use Kappa. (values > .7
indicate good agreement).