Download Statistics - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics
• How can we know that scientific information is
reliable and valid?
• Why does Biology need statistical methods?
• Ben Goldacre...
Can statistics help us?
• Chocolate gives you spots
• Late nights sap young people’s brain power
• Coffee can make you see dead people
• Mobile phones cause cancer!
What do we do with Biological data?
1. EYEBALL the data:
• Measure ‘central value’: mean, median, mode
• Measure ‘spread’ (variance): range, standard
deviation, interquartile range
• We’ll learn this!
2. Compare data sets (STATISTICAL TESTS)
3. Look for relationships (often called correlations)
between data sets
• We’ll learn this NEXT year!
What do we need to know about
statistics?
• ‘Average’: mean (median, mode)
• ‘Error bars’: Range, standard deviation,
[standard error of the mean, (interquartile
range)]
• Correlation (positive/negative/no correlation)
• The relationship of causation and correlation
• Classic graphs
How do we make sense of data?
Descriptive statistics
Look for patterns and outliers in different groups
Graphs, tables, means and variance
You can’t use the results to generalise about the
population beyond the data
Sample size matters
• Bigger samples make
it easier to detect
differences
• A good guideline is
to aim for 20 – 30
data points in each
test group
Looking at data
Biological data are often normally
distributed
•
•
•
•
•
Height
Blood pressure
Heart rate
Marks on an exam
Errors in machine-made products
If NOT normally distibuted, data can
be skewed (or just jumbled!)
First, ‘eyeball’ the data: ‘Descriptive
statistics’
Measure the central tendency (mean,
median, mode)
Why not just look at the means
(central tendency)?
The means(/medians/modes) may show you a
difference, but we can’t be sure that it’s a
reliable difference
Which of these data sets shows the greatest
variation?
Is this difference reliable?
(i.e., does the drug
really make a difference?)
Cholesterol concentration
after 1 month
In order to compare test samples, we
also need to look at the spread of
results
Measurement of ‘spread’ (variance):
• Range
• Variance
• Standard
deviation
Range – and its limitations
Standard deviation σ
• A measure of spread
• It is, simply, the square root of the variance
• It gives us an idea of the spread of most of the
data and is much more reliable than range
(less affected by anomalous data)
• You just need to press a button
• You don’t need to know the formula
Variance
Officially:
• Variance: the average of the squared differences from
the mean in a sample
• You calculate it using a calculator or EXCEL
Standard deviation
• Only applicable to
normal
distributions
• 68% of values are
within 1 standard
deviation of the
mean
• 95% of values are
within 2 SD’s of the
mean
Error bars
Error bars on graphs
They are graphical
representations of the
spread (variability) of the
data
May represent:
• Range
• Standard deviation
• (Standard error)
• (Confidence intervals)
• (Interquartile range)
Question check:
• Which data set has the
highest mean?
• Which data set has the
highest variability?
• What do the error bars
represent?
Question check:
Question check: