Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics • How can we know that scientific information is reliable and valid? • Why does Biology need statistical methods? • Ben Goldacre... Can statistics help us? • Chocolate gives you spots • Late nights sap young people’s brain power • Coffee can make you see dead people • Mobile phones cause cancer! What do we do with Biological data? 1. EYEBALL the data: • Measure ‘central value’: mean, median, mode • Measure ‘spread’ (variance): range, standard deviation, interquartile range • We’ll learn this! 2. Compare data sets (STATISTICAL TESTS) 3. Look for relationships (often called correlations) between data sets • We’ll learn this NEXT year! What do we need to know about statistics? • ‘Average’: mean (median, mode) • ‘Error bars’: Range, standard deviation, [standard error of the mean, (interquartile range)] • Correlation (positive/negative/no correlation) • The relationship of causation and correlation • Classic graphs How do we make sense of data? Descriptive statistics Look for patterns and outliers in different groups Graphs, tables, means and variance You can’t use the results to generalise about the population beyond the data Sample size matters • Bigger samples make it easier to detect differences • A good guideline is to aim for 20 – 30 data points in each test group Looking at data Biological data are often normally distributed • • • • • Height Blood pressure Heart rate Marks on an exam Errors in machine-made products If NOT normally distibuted, data can be skewed (or just jumbled!) First, ‘eyeball’ the data: ‘Descriptive statistics’ Measure the central tendency (mean, median, mode) Why not just look at the means (central tendency)? The means(/medians/modes) may show you a difference, but we can’t be sure that it’s a reliable difference Which of these data sets shows the greatest variation? Is this difference reliable? (i.e., does the drug really make a difference?) Cholesterol concentration after 1 month In order to compare test samples, we also need to look at the spread of results Measurement of ‘spread’ (variance): • Range • Variance • Standard deviation Range – and its limitations Standard deviation σ • A measure of spread • It is, simply, the square root of the variance • It gives us an idea of the spread of most of the data and is much more reliable than range (less affected by anomalous data) • You just need to press a button • You don’t need to know the formula Variance Officially: • Variance: the average of the squared differences from the mean in a sample • You calculate it using a calculator or EXCEL Standard deviation • Only applicable to normal distributions • 68% of values are within 1 standard deviation of the mean • 95% of values are within 2 SD’s of the mean Error bars Error bars on graphs They are graphical representations of the spread (variability) of the data May represent: • Range • Standard deviation • (Standard error) • (Confidence intervals) • (Interquartile range) Question check: • Which data set has the highest mean? • Which data set has the highest variability? • What do the error bars represent? Question check: Question check: