Download Topic guide 10.1: The role of statistics in experimental

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Statistical inference wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Unit 10: Statistics for experimental design
.
10 1
The role of statistics in
experimental design
Today, most statistical analysis is carried out using a computer, typically using a
spreadsheet or statistical analysis program. Any such program has a vast array
of statistical functions built in. However, these functions must be used with
caution. The many misuses of statistical analysis over the years mean that the
popular phrase: ‘There are three kinds of lies: lies, damned lies and statistics’ is
still just as relevant today as when Mark Twain popularised it in 1906. This unit
is primarily a guide to how to use these functions; it explains where each can
be used and defines their mathematical basis.
This topic guide explores the role of statistics in experimental design. It will cover
the factors that need to be taken into account in experimental design, population
sampling, the concept and laws of probability and probability distributions.
On successful completion of this topic you will:
•• understand the role of statistics in experimental design (LO1).
To achieve a Pass in this unit you need to show that you can:
•• discuss the factors behind experimental design from a statistical view
point (1.1)
•• explain the mechanics of population sampling with regards to controlling
error (1.2)
•• evaluate probabilities using approximation methods (1.3).
1
Unit 10: Statistics for experimental design
Case study: Moneyball – the misuse of statistics in
professional sport
Professional sport is a multinational and multi-billion pound industry, with enormous financial
rewards to the winners. Fans of the various sports, as well as those working in the industry, are
presented with a bewildering variety of statistics about players and teams. However, it is only in
the last few years that there has been a serious effort to understand which of these performance
indicators are actually correlated with improved performance.
For example, over one Premier League season, a particular footballer averaged a pass completion rate
of 92% and ran 8500 metres per game. That is above average, but does it actually matter? Would buying
him help a team win more games? Imagine that the team’s revenue will increase by £3 million per game
they win next season – then how much is that footballer worth?
In a number of sports, teams have managed to get a significant edge by doing the statistics right and
getting to the bottom of these questions. This process was made famous in the book Moneyball: The
Art of Winning an Unfair Game (Michael Lewis, 2011). The central premise of this book is that the
collected wisdom of baseball insiders (including players, managers, coaches and scouts) is often flawed.
Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are
not correlated with winning games. The book argues that the Oakland A’s management took advantage
of more rigorous statistical analysis of player performance (known as Sabermetrics) to field a team that
could compete successfully against richer competitors in Major League Baseball in the US.
Activity: Who’s the greatest?
Choose your favourite sport. Decide what you think are the most important measures of a good
player (e.g. points scored, points conceded, games played, championships won) and find out who is
the best. Some examples of how to do this are in the Case Study: Moneyball. Do you get a different
answer if you use a different statistic as your measure? Consider whether you would want to allow for
external factors in your analysis. For example, in international football, is it fair to compare George Best
(Northern Ireland) and Ryan Giggs (Wales) to Pele (Brazil) on goals scored or championships won? All
were exceptional players, but Pele was in a team of 11 exceptional players, whereas Best and Giggs
played for very small countries who could not field 11 players from clubs in the highest league. Would
you want to correct your statistics to the average performance by a player from that country?
Some statistics websites that can be used as sources:
•• Cricket: http://www.espncricinfo.com/ci/content/stats
•• Football: http://www.statto.com/football/stats
•• Rugby Union: http://stats.espnscrum.com/statsguru/rugby/stats/index.html
•• Tennis: http://www.tennis-x.com/stats/tennisrecords.php.
1 Planning a scientific experiment
The scientific method is the application of logic and objectivity to our
observations. It includes formulating a hypothesis, planning an experiment to test
this hypothesis and collecting data.
Key term
Hypothesis: A tentative explanation
for an observation, phenomenon or
scientific problem that can be tested
by further investigation.
Specialised statistical designs are used to make important decisions in all
walks of life. For example, clinical trials are used to make evaluations about the
effectiveness and safety of new medicines. Microbial assays are used to analyse
the compounds or substances that have effects on microorganisms, such as
antibiotics. Ecological field studies are used to unravel the complex relationships
within an ecosystem, for example, food chains and co-dependencies. (See the
Professional profile on a biostatistician on page 4.)
10.1: The role of statistics in experimental design
2
Unit 10: Statistics for experimental design
Key terms
Treatment: Something that is
administered to the experimental
subjects.
Population: A population is any
entire collection of people, animals,
plants or things from which we may
collect data. It is the entire group we
wish to describe or draw conclusions
about.
Sample: A sample is a group of units
selected from a larger group (the
population). By studying the sample
it is hoped to draw valid conclusions
about the larger group. For example,
the population for a study of infant
health might be all children born in the
UK in the 1980s. The sample might be
all babies born on 7th May in any of
the years.
Generally, in biological sciences, to study the effect of a treatment within a
population, a sample that is meant to be representative of that whole population
is studied. Treatment is a general term for any procedure applied to a sample
set. For each population there are many possible samples. To assess the effect of
the treatment there must also be a control sample set that does not receive the
treatment but is otherwise statistically identical.
A sample statistic gives information about a corresponding population
parameter. For example, the sample mean for a set of data would give information
about the overall population mean. The steps that need to be taken when
designing a scientific experiment that is appropriate for statistical analysis are
as follows.
1 Review what has been done before (the literature).
2Define objectives and hypothesis – characteristics of a good hypothesis
include being simple, clear enough to test and able to explain the observation.
3 Define the population.
4 Evaluate the feasibility of testing the hypothesis.
Statistic: A statistic is a quantity that
is calculated from a sample of data.
It is used to give information about
unknown values in the corresponding
population. For example, the average
of the data in a sample is used to give
information about the overall average
in the population from which that
sample was drawn.
5Select research procedure – the research procedure includes the sampling
method, the sample size and the number of samples, the measurement type
and the statistical analysis procedure.
Parameter: A parameter is a value
used to represent a certain population
characteristic. For example, the
population mean is a parameter that
is often used to indicate the average
value of a quantity. Parameters are
often assigned Greek letters (e.g.
sigma), whereas statistics are assigned
Roman letters (e.g. s).
9 Prepare a scientifically written report.
6 Select suitable measuring instruments and control bias.
7 Set up the experiment.
8 Collect and analyse the data.
Note that it is possible to draw more than one sample from the same population
and the value of a statistic will vary from sample to sample. For example, the
average value in a sample is a statistic. The average values in more than one
sample, drawn from the same population, will not necessarily be equal.
Controls and replicates
Not all experiments require a ‘control’ experiment. Generally, to compare two
things in their natural environments, a control is not needed. For example, to
answer the question ‘Are people taller in Britain or the USA?’ would only require a
sample of British people and a sample of American people.
On the other hand, if you want to study the effect of a specific treatment on a
population, then both a treated sample and a control sample are required. If
the question is ‘Does drinking lots of milk as a child make people taller? ’ then a
sample group given additional milk and a control sample group from equivalent
backgrounds who are not fed additional milk are required. To confirm that the
results of a sample study are representative of a population as a whole it is usually
necessary to replicate the experiment with a different sample and control group.
10.1: The role of statistics in experimental design
3
Unit 10: Statistics for experimental design
Multiple factors
To evaluate two or more factors simultaneously a factorial design is used. The
treatments are combinations of levels of the factors.
The advantages of a factorial design over separate experiments studying one
factor at a time are that it is more efficient and that it allows interactions between
factors to be detected. For example, sometimes a combination of two medicines
together is much more effective than either drug is on its own.
Biostatistician
A biostatistician working in the pharmaceutical industry provides statistical support to a clinical
study from its initial conception and design, through collecting and analysing the data, and
finally reporting the results. Biostatisticians may also become involved in the development of
regulatory guidance and the analysis of effectiveness and safety of new treatments. Biostatisticians
at all levels routinely work around the world, often as part of large international teams, where
communication is vital. Experienced biostatisticians may take on responsibility for all statistical
activities for a particular treatment, supervising the work of other biostatisticians on the project.
Link
This unit builds on Unit 3: Analysis of scientific data and information. Before starting this section
of work you should ensure that you are confident with the content of that unit.
Further information about the definitions of the standard statistical terms can be found at:
http://www.stats.gla.ac.uk/steps/glossary/basic_definitions.html.
2 Random sampling
Setting up an experiment requires taking a sample from a population. A simple
random sampling refers to a sampling method that has the following properties:
•• the population contains N objects
•• the sample size is n objects
•• all possible samples of n objects are equally likely.
The benefit of simple random sampling is that it enables scientists to use statistical
methods to analyse sample results and then make statistical inferences about the
population as a whole. For example, using a simple random sample, scientists can
use statistics to define a confidence interval around a sample mean. Statistical
analysis is not appropriate when non-random (or biased) sampling methods are
used. There are a number of ways to obtain a simple random sample. An example
is the lottery method. Each of the N members of the population is given a unique
number. The numbers are mixed up and then n numbers are pulled from the hat
without looking. For larger sample sizes a random number table or computer
random number generator can be used to do the picking. Population members
that have the selected numbers are included in the sample.
10.1: The role of statistics in experimental design
4
Unit 10: Statistics for experimental design
3 Evaluating probabilities using
approximation methods
Link
Before starting this section of work
you should ensure that you are
confident with the meaning of the
terms mean and median. If not,
consult a suitable level 2 textbook,
such as BTEC First: Principles of
Applied Science (Goodfellow,
Hocking and Musa, Pearson, 2012) or
BTEC First: Applications of Applied
Science (Goodfellow, Hocking and
Musa, Pearson, 2012), or look at
websites such as the BBC Bitesize
Science revision site.
The probability of an event describes the likelihood that the event will occur.
Mathematically, the probability that an event will occur is expressed as a number
between 0 and 1. The probability of event A is represented by P(A).
•• P(A) = 0: event A will certainly not happen
•• P(A) ~ 0: event A is very unlikely to happen
•• P(A) = 0.5: there is a 50:50 chance that event A will happen
•• P(A) ~ 1: event A is very likely to happen
•• P(A) = 1: event A will certainly happen.
In a statistical experiment, the probability is normalised so that the sum of
probabilities for all possible outcomes is equal to one. Therefore, if an experiment
has three possible outcomes (A, B and C), it follows that:
P(A) + P(B) + P(C) = 1.
How to compute probability: equally likely outcomes
In some cases, each outcome of an experiment is equally likely. If a subset of
d outcomes are classed as desired outcomes, then the probability of a desired
outcome, (D), is:
P(D) =
Number of desired outcomes d
=
Total number of outcomes
n
Consider the following experiment. A box contains 20 chocolates with different
centres. Four are toffee, four are fudge, six are strawberry cream and six are praline.
If a chocolate is randomly selected, what is the probability that it is strawberry
flavoured?
In this experiment, there are 20 equally likely outcomes, six of which are
strawberry. Therefore, the probability of choosing a strawberry flavoured
6
chocolate is 20 or 0.30.
Probability can also be considered in terms of its relative frequency over the long
term. The relative frequency of an event is the number of times an event occurs,
divided by the total number of trials.
P(A) =
Frequency of event A
Number of trials
Laws of probability
Addition
When two events, A and B, are mutually exclusive, the probability that A or B will
occur is the sum of the probability of each event.
P(A or B) = P(A) + P(B)
10.1: The role of statistics in experimental design
5
Unit 10: Statistics for experimental design
When two events, A and B, are not mutually exclusive, the probability that A or B
will occur is:
P(A or B) = P(A) + P(B) − P(A and B)
Because there is some overlap between these events, the sum of the probability
of each event is corrected for ‘double-counting’ by subtracting the probability of
the overlap. For example, if the probability of a person owning a laptop is 52%, the
probability of a person owning a tablet is 35% and the probability of owning both
is 18%, then the probability of a person owning either a laptop or a tablet is:
P(L or T) = P(L) + P(T) – P(L and T) = 0.52 + 0.35 − 0.18 = 0.69
Multiplication
The multiplication rule also deals with two independent events, but the events
occur as a result of separate events:
P(A then B) = P(A) 3 P(B)
For example, if we throw one six-sixed die, followed by another, then the probability
of throwing a two on the first die, followed by a five on the second die is:
(1) (1)
1
P(2 then 5) = P(2) 3 P(5) = 6 3 6 = 36
However, note that this only gives the likelihood of a specific sequence. To
determine the probability of an overall outcome, the number of ways that this
outcome can be reached also needs to be considered. For example, the probability
of throwing a two and a five is:
(1) (1)
2
P(2,5) = P(2 then 5) + P(5 then 2) = 36 + 36 = 36
Similarly the probability of scoring seven on two dice is:
P(7) = P(1 then 6) + P(2 then 5) + P(3 then 4) + P(4 then 3) + P(5 then 2)
(6)
1
+ P(6 then 1) = 36 = 6
Binomial probability
Binomial probability is a way of calculating probabilities in an experiment when
there are only two outcomes, typically success and failure. Note that there can be
many specific outcomes, but these must be grouped together as successes and
failures. For example, if the experiment is throwing a two on a six-sided die, then
1
5
P(success) = P(2) = 6 , and P(failure) = P(1) + P(3) + P(4) + P(5) + P(6) = 6
When computing a binomial probability, it is necessary to calculate and multiply
three separate factors:
1 the number of ways to select exactly r successes,
2 the probability of success (p) raised to the r power,
3 the probability of failure (q) raised to the (n − r) power.
Then in n total trials, the probability of exactly r successes is given by the
probability mass function:
(n)
Probability mass function: P(X = r) = r pr qn–r
10.1: The role of statistics in experimental design
6
Unit 10: Statistics for experimental design
(n)
n!
Binomial coefficient: r = r! (n – r)!
Example
When rolling a die 100 times, what is the probability of rolling a two exactly
20 times?
Solution:
n = 100; r = 20; n – r = 80
1
p = 6 = probability of success (rolling a two)
5
q = 1 – p = 6 = probability of failure (not rolling a two)
100!
( 1 ) ( 56 )
P(X = r) = 20! 80! 6
20
20
= 0.06
Activity
Make a ’Pascal’s Triangle’ with at least 10 rows, and work out the probabilities of reaching each end
point. It can be an experiment with falling balls, a computer simulation or simply the numbers.
Some ideas can be found in the links below:
Falling balls: http://www.youtube.com/watch?v=nOenO-JLD5w&NR=1&feature=fvwp
Computer simulation: http://www.youtube.com/watch?v=yzJqYl9EHgA
Numbers: http://www.youtube.com/watch?v=YUqHdxxdbyM.
Poisson approximation
A special case of the binomial approximation is known as the Poisson
approximation (or Poisson distribution). It describes the probability of a given
number of events occurring in a fixed interval if these events occur with a known
average rate and independently of the time since the last event. If the expectation
value (the mean) of the number of events is l, then the probability distribution is
described by:
λr e–λ
Probability mass function: P(X = r) = r! Activity
Find three more examples of realworld applications of the Poisson
approximation.
Poisson distribution approximations are used in many real-world situations. For
example: in civil engineering it is used to describe cars arriving at a busy junction.
In biology it is used to describe the number of mutations on a strand of DNA per
unit length. In finance it is used to predict the number of losses/claims that will
occur in a given period of time.
Probability and statistics
Once an experiment is carried out and the results are measured, the researcher has
to decide whether the results of the treatments are different. This would be easy if
the results were perfectly consistent. For example:
Cabbage sizes for Treatment 1 (cm): 30, 30, 30, 30, 30, 30, 30, 30
Cabbage sizes for Treatment 2 (cm): 35, 35, 35, 35, 35, 35, 35, 35
Obviously Treatment 2 results in larger cabbages.
Unfortunately, real-life results are not so simple. There are many different
10.1: The role of statistics in experimental design
7
Unit 10: Statistics for experimental design
possible outcomes, each with a defined probability within the distribution of
possible values.
Cabbage sizes for Treatment 1 (cm): 27, 33, 36, 37, 27, 30, 33, 33
Cabbage sizes for Treatment 2 (cm): 34, 31, 39, 32, 41, 37, 33, 35
The differences are not obvious, so we need statistics.
Statistics are used when individual characteristics are variable. You have to
measure several individuals to determine how variable they are. To do that you
need replication.
Some physical properties are very consistent, that is, they have low variability. An
example might be the speed at which a heavy object falls – in this case the biggest
source of variation is probably the accuracy of the timing device. How many
cannon balls do you have to drop from a tower before you know how long the
next one will take?
Biological properties, on the other hand, usually have a high variability due to the
many variations in genetics and environment even within a single species. How
many student heights do you need to measure to know the average height of
students in a classroom? How many do you need to measure to know whether the
average heights of people at the front and back of the room are the same
or different?
The mean and the median
The difference between the mean and median can be illustrated with an example.
Suppose we take a sample of seven men and measure their heights. They are
170 cm, 170 cm, 180 cm, 185 cm, 190 cm, 195 cm and 200 cm.
To find the median, arrange the observations in order from smallest to largest. If
there are an odd number of observations, then the median is the middle value. If
there are an even number of observations, then the median is the average of those
two middle values. Therefore, in the sample of seven men, the median value would
be 185 cm because 185 cm is the middle value height.
The mean of a sample or a population is calculated by summing all observations
and then dividing by the number of observations. Returning to the example of the
seven men, the mean height would equal:
(170 cm + 170 cm + 180 cm + 185 cm + 190 cm + 195 cm + 200 cm)/7
= 1290/7 = 184.3 cm.
In the general case, the mean can be calculated using one of the following
equations:
ΣX
Mean of a population: µ = N
Σx
Mean of a sample: x = n
where ΣX is the sum of all the population observations, N is the number of
population observations, Σx is the sum of all the sample observations, and n is the
number of sample observations.
10.1: The role of statistics in experimental design
8
Unit 10: Statistics for experimental design
In statistics the Greek letter μ refers to the mean value for a population, while in a
sample x refers to the mean value.
Key term
Outlier: A value that differs greatly
from all of the other values.
As measures of central tendency, the mean and the median each have
advantages and disadvantages. The median may be a better indicator of the most
typical value if a set of scores has an outlier. However, when the sample size is
large and does not include outliers, the mean score usually provides a better
measure of central tendency.
To illustrate the way in which the mean can be distorted by an outlier if the
sample size is small, consider household incomes. Suppose we have a sample
of 10 households and would like to estimate the typical family income. Nine of
the households have incomes between £15 000 and £100 000, but the tenth
household has an annual income of £50 000 000. That last household is an
outlier, and thus the mean will greatly overestimate the income of a typical family
(because of the outlier), while the median will not. However, if we expanded our
sample size to 1000, then the effect of the outlier would be greatly diminished and
the mean would become reliable.
Normal distribution
In probability theory, the central limit theorem states that, given certain
conditions, the mean of a sufficiently large number of independent random
variables, each with a well-defined mean and well-defined variance, will be
approximately normally distributed. Most populations and samples do indeed
follow a normal or Gaussian distribution that looks like a bell-shaped curve, as
shown in Figure 10.1.1. The normal distribution of the characteristic (such as
height, weight, earnings, exam score) is described by the mean, the standard
deviation, variance, and sums of squares. If we consider a certain defining
parameter of a population as a curve then the mean describes where the curve is
centred and a higher variance or standard deviation describes a wider curve.
Variance of a population: σ2 =
Variance of a sample: s2 =
Standard deviation of a population: σ = √s2
Standard deviation of a sample: s = √s2
__
__
Frequency
Figure 10.1.1: Frequency of observation
of values of characteristic (X) in a
normally distributed population with
mean x and standard deviation s.
Σ(x – x)2
N Σ(x – x)2
n–1 x
s
Characteristic (X)
10.1: The role of statistics in experimental design
9
Unit 10: Statistics for experimental design
The statistics calculated so far describe samples and populations, but do not test
for differences between samples and populations. For such tests the distributions
of sample means are needed.
Histograms
Histograms are graphs that visually represent the frequency distribution of a data
set, allowing its statistical properties to be understood. Whereas traditional bar
graphs usually represent mean values, a histogram represents the frequency of
a particular event. Histograms require a data set that can be divided into classes,
with each class having a known frequency of occurrence. Histograms can be made
manually from a data set, or using a spreadsheet program (see the Worked example
box to see how this can be done).
Worked example
Below is a worked example of how to create a histogram in Microsoft® Excel® 2010, using data on
UK external temperatures (October to March) from 1980–2010 from www.data.gov.uk. This data
can be found in Data sheet 10.1.1 in the spreadsheet Topic guide 10.1 data sheets.xlsx. If you are
using a different version of Excel®, the steps may vary slightly.
First, decide on an appropriate bin size for your data set. The bin size describes the range of values
that fall into each class. Here we are going to look at external UK temperatures and 0.5 °C is
appropriate. There is no data less than 0 °C or more than 10 °C, and generally 10–20 groups of data
is desirable. This means the bins are average temperatures 0–0.5, 0.5–1, 1–1.5, 1.5–2 and so on, to
a maximum bin of 9.5–10. Now you are ready to make your histogram.
•• First download the Excel®2010 Analysis ToolPak. Select File, and then Options. From AddIns select Excel Add-Ins in the Manage box and then click Go. Check the Analysis ToolPak
checkbox from the list and then click OK.
•• Type the bin widths in column A of a blank worksheet, beginning with the lowest number. For
the temperature range example, type 0, 0.5, 1, 1.5, 2, etc.
•• Copy from the data spreadsheet, or type the data points in column B of the worksheet.
•• Save your spreadsheet at this point because the raw data will be deleted in the next step. If you
are having trouble, an example of how your spreadsheet should look at this point is given in the
Excel spreadsheet ‘Topic guide 10.1 example book before histogram’
•• Click Data Analysis in the data tab of the analysis section of Excel®. Highlight the histogram
tool from the Analysis Tools box and click OK. Click in the Input Range box and then highlight
the raw data in column B. It should now say ‘$B$1:$B$35’ in this box. Next click in the Bin Range
box and then highlight the bin ranges in column A. It should now say ‘$A$1:$A$21’ in this box.
Select Chart Output in the output options section to generate a histogram and then click OK. If
you are having trouble, an example of how your spreadsheet should look at this point is given
in the Excel spreadsheet ‘Topic guide 10.1 example book with histogram’.
•• You can use the Chart Tools section to modify the design, layout and format of your histogram.
Double-clicking on the x- and y-axis labels allows you to change them.
Activity
Now find and download some other data sets from the www.data.gov.uk website, and plot as
histograms. A good place to start is the UK average earnings by industry data in the Topic
guide 10.2 data sheet.
10.1: The role of statistics in experimental design
10
Unit 10: Statistics for experimental design
Take it further
More information about random number tables can be found at http://www.nist.gov/pml/wmd/
pubs/upload/AppenB-HB133-05-Z.pdf.
Introduction to histograms and normal distributions from the University of California: Berkeley:
http://www.stat.berkeley.edu/users/huang/STAT141/STATC141-lectureIII.pdf.
Further reading
Boslaugh, S. (2012) Statistics in a Nutshell, O’Reilly Media
Ellison, S. et al. (2009) Practical Statistics for the Analytical Scientist, RSC
Larsen, R. and Fox Stroup, D. (1976) Statistics in the Real World, Macmillan
Miller, J. and Miller, J. (2010) Statistics and Chemometrics for Analytical Chemistry, Prentice Hall
Samuels, M. et al. (2010) Statistics for the Life Sciences, Pearson
Swartz, M. and Krull, I. (2012) Handbook of Analytical Validation, CRC Press
Statistical calculators online:
http://www.danielsoper.com/statcalc3/
http://www.measuringusability.com/calc.php
Checklist
At the end of this topic guide, you should be:
 familiar with the role of statistics in experimental design
 able to discuss the factors behind experimental design from a statistical view point (1.1)
 able to explain the mechanics of population sampling with regards to controlling error (1.2)
 able to evaluate probabilities using approximation methods (1.3).
Acknowledgements
The publisher would like to thank the following for their kind permission to reproduce their
photographs:
Shutterstock.com: Sofiaworld
Every effort has been made to trace the copyright holders and we apologise in advance for any
unintentional omissions. We would be pleased to insert the appropriate acknowledgement in any
subsequent edition of this publication.
10.1: The role of statistics in experimental design
11