Download Normality or not? Different distributions and their

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Normality or not? Different
distributions and their importance
Stats Club 3
Marnie Brennan
References
• Petrie and Sabin - Medical Statistics at a
Glance: Chapter 7, 8, 9, 35 Good
• Petrie and Watson - Statistics for Veterinary
and Animal Science: Chapter 3 Good
• Thrusfield – Veterinary Epidemiology: Chapter
12
• Kirkwood and Sterne – Essential Medical
Statistics
What is a distribution?
• Empirical frequency distribution versus
theoretical distribution
• Very easy!
– Empirical frequency distribution is something that
you actually measure and calculate
• E.g. Coat colour in cats – Tabby, Ginger, Tortoiseshell,
Seal-point
• In a population, each one of these has a frequency e.g.
5 x Tabby, 9 x Ginger, 15 x Tortoiseshell, 8 x Seal-point
Theoretical distributions
• Theoretical distribution – is just that –
theoretical!
• It is something we measure our data
(empirical frequency distribution) against to
see which distribution describes it the best
– This helps to signpost us to what statistical
analyses we do next, according to the distribution
it ‘approximates’
Theoretical distributions and types of data
• Back to our flow charts in the back of Petrie
and Sabin, and Petrie and Watson
• Relates to what type of variable you have
– Continuous? E.g. Heights of Japanese men
– Categorical or discrete? E.g. Coat colour in cats
Continuous distributions
• Normal distribution – The grandaddy of them all!
– Also known as the Gaussian distribution (after Gauss, German
mathematician)
Our focus today
– e.g. heights of adult men in the UK
• T-distribution
– Similar shape to Normal, but is more spread out with longer tails
– Useful for calculating confidence intervals
• Chi-squared distribution
– Right-skewed distribution
– Useful for analysing categorical data
• F-distribution
– Skewed to the right
– Useful for comparing variances and more than 2 means (i.e. > 2
groups)
• DO NOT BE SCARED – THIS IS ANOTHER EXERCISE IN
TERMINOLOGY!!
Discrete distributions
• Binomial distribution
– Could be skewed to the right or left (!)
– Good for analysing proportion data – i.e. it is either one thing,
or another, such as an animal either has a disease or does not
have a disease
• Poisson distribution
– Right skewed
– Good for analysing count data – i.e. the number of hospital
admissions per day, the number of parasitic eggs per gram of
faecal sample
• Many of these distributions approximate normal when your
sample size increases
• A lot of this goes on behind your computer when doing
statistics; it is here to help explain some of the terminology
and basic ideas only (don’t worry too much about it!)
The useful bit.......
• You have collected continuous data from your
research e.g. length in millimetres of the
diameter of rabbit skulls
• You would like to find out if this is normally
distributed or not (as you know that this will
affect what statistical tests you do)
• How do you measure whether this variable is
normal or not?
4 steps to Normality!
• Plot your data
– Create a histogram with frequencies and
determine by eye
• Does it look bell-shaped and symmetrical?
• Does it look unimodal i.e. does it only have one peak?
– Subjective measurement, but you should be doing
this anyway!
4 steps (continued)
• How different are the mean and median?
– Mean = Total of your data added up/total no. of
measurements
– Median = The midpoint of your values i.e. what is the
‘halfway’ value in your data?
• If they are very different, the data is probably not normally
distributed
• If they are very similar, your data could be normally
distributed
– Another rule of thumb, so not always correct
4 steps (continued)
• Skewness and kurtosis
– Skewness (how symmetrical the data is)
• Normal – this value is 0
• Right-skewed distribution – positive value
• Left-skewed distribution – negative value
– Kurtosis (the ‘peakedness’ of the data) – does your
data have a pointy bit, or is it flat?
• Normal – this value is 0
• Sharply peaked data – positive value
• Flat peaked data – negative value
– Can measure these in Minitab or SPSS
4 steps (continued)
• Bespoke tests for normality
– Shapiro-Wilk test (Ryan-Joiner test)
– Kolmogorov-Smirnov test
– Anderson-Darling test
• Watch interpretation of p-values – if it is <0.05, it is not
normal (reject null hypothesis of normality)
• The good news!
– Computers do this for us so we don’t have to!
Next month
• Spread of your data – how do we measure
this?
– mean, standard deviation, variance
– median, interquartile range
– mode