Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Student's t-test wikipedia , lookup
Statistical inference wikipedia , lookup
Foundations of statistics wikipedia , lookup
Unit 10: Statistics for experimental design . 10 1 The role of statistics in experimental design Today, most statistical analysis is carried out using a computer, typically using a spreadsheet or statistical analysis program. Any such program has a vast array of statistical functions built in. However, these functions must be used with caution. The many misuses of statistical analysis over the years mean that the popular phrase: ‘There are three kinds of lies: lies, damned lies and statistics’ is still just as relevant today as when Mark Twain popularised it in 1906. This unit is primarily a guide to how to use these functions; it explains where each can be used and defines their mathematical basis. This topic guide explores the role of statistics in experimental design. It will cover the factors that need to be taken into account in experimental design, population sampling, the concept and laws of probability and probability distributions. On successful completion of this topic you will: •• understand the role of statistics in experimental design (LO1). To achieve a Pass in this unit you need to show that you can: •• discuss the factors behind experimental design from a statistical view point (1.1) •• explain the mechanics of population sampling with regards to controlling error (1.2) •• evaluate probabilities using approximation methods (1.3). 1 Unit 10: Statistics for experimental design Case study: Moneyball – the misuse of statistics in professional sport Professional sport is a multinational and multi-billion pound industry, with enormous financial rewards to the winners. Fans of the various sports, as well as those working in the industry, are presented with a bewildering variety of statistics about players and teams. However, it is only in the last few years that there has been a serious effort to understand which of these performance indicators are actually correlated with improved performance. For example, over one Premier League season, a particular footballer averaged a pass completion rate of 92% and ran 8500 metres per game. That is above average, but does it actually matter? Would buying him help a team win more games? Imagine that the team’s revenue will increase by £3 million per game they win next season – then how much is that footballer worth? In a number of sports, teams have managed to get a significant edge by doing the statistics right and getting to the bottom of these questions. This process was made famous in the book Moneyball: The Art of Winning an Unfair Game (Michael Lewis, 2011). The central premise of this book is that the collected wisdom of baseball insiders (including players, managers, coaches and scouts) is often flawed. Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are not correlated with winning games. The book argues that the Oakland A’s management took advantage of more rigorous statistical analysis of player performance (known as Sabermetrics) to field a team that could compete successfully against richer competitors in Major League Baseball in the US. Activity: Who’s the greatest? Choose your favourite sport. Decide what you think are the most important measures of a good player (e.g. points scored, points conceded, games played, championships won) and find out who is the best. Some examples of how to do this are in the Case Study: Moneyball. Do you get a different answer if you use a different statistic as your measure? Consider whether you would want to allow for external factors in your analysis. For example, in international football, is it fair to compare George Best (Northern Ireland) and Ryan Giggs (Wales) to Pele (Brazil) on goals scored or championships won? All were exceptional players, but Pele was in a team of 11 exceptional players, whereas Best and Giggs played for very small countries who could not field 11 players from clubs in the highest league. Would you want to correct your statistics to the average performance by a player from that country? Some statistics websites that can be used as sources: •• Cricket: http://www.espncricinfo.com/ci/content/stats •• Football: http://www.statto.com/football/stats •• Rugby Union: http://stats.espnscrum.com/statsguru/rugby/stats/index.html •• Tennis: http://www.tennis-x.com/stats/tennisrecords.php. 1 Planning a scientific experiment The scientific method is the application of logic and objectivity to our observations. It includes formulating a hypothesis, planning an experiment to test this hypothesis and collecting data. Key term Hypothesis: A tentative explanation for an observation, phenomenon or scientific problem that can be tested by further investigation. Specialised statistical designs are used to make important decisions in all walks of life. For example, clinical trials are used to make evaluations about the effectiveness and safety of new medicines. Microbial assays are used to analyse the compounds or substances that have effects on microorganisms, such as antibiotics. Ecological field studies are used to unravel the complex relationships within an ecosystem, for example, food chains and co-dependencies. (See the Professional profile on a biostatistician on page 4.) 10.1: The role of statistics in experimental design 2 Unit 10: Statistics for experimental design Key terms Treatment: Something that is administered to the experimental subjects. Population: A population is any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we wish to describe or draw conclusions about. Sample: A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about the larger group. For example, the population for a study of infant health might be all children born in the UK in the 1980s. The sample might be all babies born on 7th May in any of the years. Generally, in biological sciences, to study the effect of a treatment within a population, a sample that is meant to be representative of that whole population is studied. Treatment is a general term for any procedure applied to a sample set. For each population there are many possible samples. To assess the effect of the treatment there must also be a control sample set that does not receive the treatment but is otherwise statistically identical. A sample statistic gives information about a corresponding population parameter. For example, the sample mean for a set of data would give information about the overall population mean. The steps that need to be taken when designing a scientific experiment that is appropriate for statistical analysis are as follows. 1 Review what has been done before (the literature). 2Define objectives and hypothesis – characteristics of a good hypothesis include being simple, clear enough to test and able to explain the observation. 3 Define the population. 4 Evaluate the feasibility of testing the hypothesis. Statistic: A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn. 5Select research procedure – the research procedure includes the sampling method, the sample size and the number of samples, the measurement type and the statistical analysis procedure. Parameter: A parameter is a value used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity. Parameters are often assigned Greek letters (e.g. sigma), whereas statistics are assigned Roman letters (e.g. s). 9 Prepare a scientifically written report. 6 Select suitable measuring instruments and control bias. 7 Set up the experiment. 8 Collect and analyse the data. Note that it is possible to draw more than one sample from the same population and the value of a statistic will vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal. Controls and replicates Not all experiments require a ‘control’ experiment. Generally, to compare two things in their natural environments, a control is not needed. For example, to answer the question ‘Are people taller in Britain or the USA?’ would only require a sample of British people and a sample of American people. On the other hand, if you want to study the effect of a specific treatment on a population, then both a treated sample and a control sample are required. If the question is ‘Does drinking lots of milk as a child make people taller? ’ then a sample group given additional milk and a control sample group from equivalent backgrounds who are not fed additional milk are required. To confirm that the results of a sample study are representative of a population as a whole it is usually necessary to replicate the experiment with a different sample and control group. 10.1: The role of statistics in experimental design 3 Unit 10: Statistics for experimental design Multiple factors To evaluate two or more factors simultaneously a factorial design is used. The treatments are combinations of levels of the factors. The advantages of a factorial design over separate experiments studying one factor at a time are that it is more efficient and that it allows interactions between factors to be detected. For example, sometimes a combination of two medicines together is much more effective than either drug is on its own. Biostatistician A biostatistician working in the pharmaceutical industry provides statistical support to a clinical study from its initial conception and design, through collecting and analysing the data, and finally reporting the results. Biostatisticians may also become involved in the development of regulatory guidance and the analysis of effectiveness and safety of new treatments. Biostatisticians at all levels routinely work around the world, often as part of large international teams, where communication is vital. Experienced biostatisticians may take on responsibility for all statistical activities for a particular treatment, supervising the work of other biostatisticians on the project. Link This unit builds on Unit 3: Analysis of scientific data and information. Before starting this section of work you should ensure that you are confident with the content of that unit. Further information about the definitions of the standard statistical terms can be found at: http://www.stats.gla.ac.uk/steps/glossary/basic_definitions.html. 2 Random sampling Setting up an experiment requires taking a sample from a population. A simple random sampling refers to a sampling method that has the following properties: •• the population contains N objects •• the sample size is n objects •• all possible samples of n objects are equally likely. The benefit of simple random sampling is that it enables scientists to use statistical methods to analyse sample results and then make statistical inferences about the population as a whole. For example, using a simple random sample, scientists can use statistics to define a confidence interval around a sample mean. Statistical analysis is not appropriate when non-random (or biased) sampling methods are used. There are a number of ways to obtain a simple random sample. An example is the lottery method. Each of the N members of the population is given a unique number. The numbers are mixed up and then n numbers are pulled from the hat without looking. For larger sample sizes a random number table or computer random number generator can be used to do the picking. Population members that have the selected numbers are included in the sample. 10.1: The role of statistics in experimental design 4 Unit 10: Statistics for experimental design 3 Evaluating probabilities using approximation methods Link Before starting this section of work you should ensure that you are confident with the meaning of the terms mean and median. If not, consult a suitable level 2 textbook, such as BTEC First: Principles of Applied Science (Goodfellow, Hocking and Musa, Pearson, 2012) or BTEC First: Applications of Applied Science (Goodfellow, Hocking and Musa, Pearson, 2012), or look at websites such as the BBC Bitesize Science revision site. The probability of an event describes the likelihood that the event will occur. Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. The probability of event A is represented by P(A). •• P(A) = 0: event A will certainly not happen •• P(A) ~ 0: event A is very unlikely to happen •• P(A) = 0.5: there is a 50:50 chance that event A will happen •• P(A) ~ 1: event A is very likely to happen •• P(A) = 1: event A will certainly happen. In a statistical experiment, the probability is normalised so that the sum of probabilities for all possible outcomes is equal to one. Therefore, if an experiment has three possible outcomes (A, B and C), it follows that: P(A) + P(B) + P(C) = 1. How to compute probability: equally likely outcomes In some cases, each outcome of an experiment is equally likely. If a subset of d outcomes are classed as desired outcomes, then the probability of a desired outcome, (D), is: P(D) = Number of desired outcomes d = Total number of outcomes n Consider the following experiment. A box contains 20 chocolates with different centres. Four are toffee, four are fudge, six are strawberry cream and six are praline. If a chocolate is randomly selected, what is the probability that it is strawberry flavoured? In this experiment, there are 20 equally likely outcomes, six of which are strawberry. Therefore, the probability of choosing a strawberry flavoured 6 chocolate is 20 or 0.30. Probability can also be considered in terms of its relative frequency over the long term. The relative frequency of an event is the number of times an event occurs, divided by the total number of trials. P(A) = Frequency of event A Number of trials Laws of probability Addition When two events, A and B, are mutually exclusive, the probability that A or B will occur is the sum of the probability of each event. P(A or B) = P(A) + P(B) 10.1: The role of statistics in experimental design 5 Unit 10: Statistics for experimental design When two events, A and B, are not mutually exclusive, the probability that A or B will occur is: P(A or B) = P(A) + P(B) − P(A and B) Because there is some overlap between these events, the sum of the probability of each event is corrected for ‘double-counting’ by subtracting the probability of the overlap. For example, if the probability of a person owning a laptop is 52%, the probability of a person owning a tablet is 35% and the probability of owning both is 18%, then the probability of a person owning either a laptop or a tablet is: P(L or T) = P(L) + P(T) – P(L and T) = 0.52 + 0.35 − 0.18 = 0.69 Multiplication The multiplication rule also deals with two independent events, but the events occur as a result of separate events: P(A then B) = P(A) 3 P(B) For example, if we throw one six-sixed die, followed by another, then the probability of throwing a two on the first die, followed by a five on the second die is: (1) (1) 1 P(2 then 5) = P(2) 3 P(5) = 6 3 6 = 36 However, note that this only gives the likelihood of a specific sequence. To determine the probability of an overall outcome, the number of ways that this outcome can be reached also needs to be considered. For example, the probability of throwing a two and a five is: (1) (1) 2 P(2,5) = P(2 then 5) + P(5 then 2) = 36 + 36 = 36 Similarly the probability of scoring seven on two dice is: P(7) = P(1 then 6) + P(2 then 5) + P(3 then 4) + P(4 then 3) + P(5 then 2) (6) 1 + P(6 then 1) = 36 = 6 Binomial probability Binomial probability is a way of calculating probabilities in an experiment when there are only two outcomes, typically success and failure. Note that there can be many specific outcomes, but these must be grouped together as successes and failures. For example, if the experiment is throwing a two on a six-sided die, then 1 5 P(success) = P(2) = 6 , and P(failure) = P(1) + P(3) + P(4) + P(5) + P(6) = 6 When computing a binomial probability, it is necessary to calculate and multiply three separate factors: 1 the number of ways to select exactly r successes, 2 the probability of success (p) raised to the r power, 3 the probability of failure (q) raised to the (n − r) power. Then in n total trials, the probability of exactly r successes is given by the probability mass function: (n) Probability mass function: P(X = r) = r pr qn–r 10.1: The role of statistics in experimental design 6 Unit 10: Statistics for experimental design (n) n! Binomial coefficient: r = r! (n – r)! Example When rolling a die 100 times, what is the probability of rolling a two exactly 20 times? Solution: n = 100; r = 20; n – r = 80 1 p = 6 = probability of success (rolling a two) 5 q = 1 – p = 6 = probability of failure (not rolling a two) 100! ( 1 ) ( 56 ) P(X = r) = 20! 80! 6 20 20 = 0.06 Activity Make a ’Pascal’s Triangle’ with at least 10 rows, and work out the probabilities of reaching each end point. It can be an experiment with falling balls, a computer simulation or simply the numbers. Some ideas can be found in the links below: Falling balls: http://www.youtube.com/watch?v=nOenO-JLD5w&NR=1&feature=fvwp Computer simulation: http://www.youtube.com/watch?v=yzJqYl9EHgA Numbers: http://www.youtube.com/watch?v=YUqHdxxdbyM. Poisson approximation A special case of the binomial approximation is known as the Poisson approximation (or Poisson distribution). It describes the probability of a given number of events occurring in a fixed interval if these events occur with a known average rate and independently of the time since the last event. If the expectation value (the mean) of the number of events is l, then the probability distribution is described by: λr e–λ Probability mass function: P(X = r) = r! Activity Find three more examples of realworld applications of the Poisson approximation. Poisson distribution approximations are used in many real-world situations. For example: in civil engineering it is used to describe cars arriving at a busy junction. In biology it is used to describe the number of mutations on a strand of DNA per unit length. In finance it is used to predict the number of losses/claims that will occur in a given period of time. Probability and statistics Once an experiment is carried out and the results are measured, the researcher has to decide whether the results of the treatments are different. This would be easy if the results were perfectly consistent. For example: Cabbage sizes for Treatment 1 (cm): 30, 30, 30, 30, 30, 30, 30, 30 Cabbage sizes for Treatment 2 (cm): 35, 35, 35, 35, 35, 35, 35, 35 Obviously Treatment 2 results in larger cabbages. Unfortunately, real-life results are not so simple. There are many different 10.1: The role of statistics in experimental design 7 Unit 10: Statistics for experimental design possible outcomes, each with a defined probability within the distribution of possible values. Cabbage sizes for Treatment 1 (cm): 27, 33, 36, 37, 27, 30, 33, 33 Cabbage sizes for Treatment 2 (cm): 34, 31, 39, 32, 41, 37, 33, 35 The differences are not obvious, so we need statistics. Statistics are used when individual characteristics are variable. You have to measure several individuals to determine how variable they are. To do that you need replication. Some physical properties are very consistent, that is, they have low variability. An example might be the speed at which a heavy object falls – in this case the biggest source of variation is probably the accuracy of the timing device. How many cannon balls do you have to drop from a tower before you know how long the next one will take? Biological properties, on the other hand, usually have a high variability due to the many variations in genetics and environment even within a single species. How many student heights do you need to measure to know the average height of students in a classroom? How many do you need to measure to know whether the average heights of people at the front and back of the room are the same or different? The mean and the median The difference between the mean and median can be illustrated with an example. Suppose we take a sample of seven men and measure their heights. They are 170 cm, 170 cm, 180 cm, 185 cm, 190 cm, 195 cm and 200 cm. To find the median, arrange the observations in order from smallest to largest. If there are an odd number of observations, then the median is the middle value. If there are an even number of observations, then the median is the average of those two middle values. Therefore, in the sample of seven men, the median value would be 185 cm because 185 cm is the middle value height. The mean of a sample or a population is calculated by summing all observations and then dividing by the number of observations. Returning to the example of the seven men, the mean height would equal: (170 cm + 170 cm + 180 cm + 185 cm + 190 cm + 195 cm + 200 cm)/7 = 1290/7 = 184.3 cm. In the general case, the mean can be calculated using one of the following equations: ΣX Mean of a population: µ = N Σx Mean of a sample: x = n where ΣX is the sum of all the population observations, N is the number of population observations, Σx is the sum of all the sample observations, and n is the number of sample observations. 10.1: The role of statistics in experimental design 8 Unit 10: Statistics for experimental design In statistics the Greek letter μ refers to the mean value for a population, while in a sample x refers to the mean value. Key term Outlier: A value that differs greatly from all of the other values. As measures of central tendency, the mean and the median each have advantages and disadvantages. The median may be a better indicator of the most typical value if a set of scores has an outlier. However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency. To illustrate the way in which the mean can be distorted by an outlier if the sample size is small, consider household incomes. Suppose we have a sample of 10 households and would like to estimate the typical family income. Nine of the households have incomes between £15 000 and £100 000, but the tenth household has an annual income of £50 000 000. That last household is an outlier, and thus the mean will greatly overestimate the income of a typical family (because of the outlier), while the median will not. However, if we expanded our sample size to 1000, then the effect of the outlier would be greatly diminished and the mean would become reliable. Normal distribution In probability theory, the central limit theorem states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed. Most populations and samples do indeed follow a normal or Gaussian distribution that looks like a bell-shaped curve, as shown in Figure 10.1.1. The normal distribution of the characteristic (such as height, weight, earnings, exam score) is described by the mean, the standard deviation, variance, and sums of squares. If we consider a certain defining parameter of a population as a curve then the mean describes where the curve is centred and a higher variance or standard deviation describes a wider curve. Variance of a population: σ2 = Variance of a sample: s2 = Standard deviation of a population: σ = √s2 Standard deviation of a sample: s = √s2 __ __ Frequency Figure 10.1.1: Frequency of observation of values of characteristic (X) in a normally distributed population with mean x and standard deviation s. Σ(x – x)2 N Σ(x – x)2 n–1 x s Characteristic (X) 10.1: The role of statistics in experimental design 9 Unit 10: Statistics for experimental design The statistics calculated so far describe samples and populations, but do not test for differences between samples and populations. For such tests the distributions of sample means are needed. Histograms Histograms are graphs that visually represent the frequency distribution of a data set, allowing its statistical properties to be understood. Whereas traditional bar graphs usually represent mean values, a histogram represents the frequency of a particular event. Histograms require a data set that can be divided into classes, with each class having a known frequency of occurrence. Histograms can be made manually from a data set, or using a spreadsheet program (see the Worked example box to see how this can be done). Worked example Below is a worked example of how to create a histogram in Microsoft® Excel® 2010, using data on UK external temperatures (October to March) from 1980–2010 from www.data.gov.uk. This data can be found in Data sheet 10.1.1 in the spreadsheet Topic guide 10.1 data sheets.xlsx. If you are using a different version of Excel®, the steps may vary slightly. First, decide on an appropriate bin size for your data set. The bin size describes the range of values that fall into each class. Here we are going to look at external UK temperatures and 0.5 °C is appropriate. There is no data less than 0 °C or more than 10 °C, and generally 10–20 groups of data is desirable. This means the bins are average temperatures 0–0.5, 0.5–1, 1–1.5, 1.5–2 and so on, to a maximum bin of 9.5–10. Now you are ready to make your histogram. •• First download the Excel®2010 Analysis ToolPak. Select File, and then Options. From AddIns select Excel Add-Ins in the Manage box and then click Go. Check the Analysis ToolPak checkbox from the list and then click OK. •• Type the bin widths in column A of a blank worksheet, beginning with the lowest number. For the temperature range example, type 0, 0.5, 1, 1.5, 2, etc. •• Copy from the data spreadsheet, or type the data points in column B of the worksheet. •• Save your spreadsheet at this point because the raw data will be deleted in the next step. If you are having trouble, an example of how your spreadsheet should look at this point is given in the Excel spreadsheet ‘Topic guide 10.1 example book before histogram’ •• Click Data Analysis in the data tab of the analysis section of Excel®. Highlight the histogram tool from the Analysis Tools box and click OK. Click in the Input Range box and then highlight the raw data in column B. It should now say ‘$B$1:$B$35’ in this box. Next click in the Bin Range box and then highlight the bin ranges in column A. It should now say ‘$A$1:$A$21’ in this box. Select Chart Output in the output options section to generate a histogram and then click OK. If you are having trouble, an example of how your spreadsheet should look at this point is given in the Excel spreadsheet ‘Topic guide 10.1 example book with histogram’. •• You can use the Chart Tools section to modify the design, layout and format of your histogram. Double-clicking on the x- and y-axis labels allows you to change them. Activity Now find and download some other data sets from the www.data.gov.uk website, and plot as histograms. A good place to start is the UK average earnings by industry data in the Topic guide 10.2 data sheet. 10.1: The role of statistics in experimental design 10 Unit 10: Statistics for experimental design Take it further More information about random number tables can be found at http://www.nist.gov/pml/wmd/ pubs/upload/AppenB-HB133-05-Z.pdf. Introduction to histograms and normal distributions from the University of California: Berkeley: http://www.stat.berkeley.edu/users/huang/STAT141/STATC141-lectureIII.pdf. Further reading Boslaugh, S. (2012) Statistics in a Nutshell, O’Reilly Media Ellison, S. et al. (2009) Practical Statistics for the Analytical Scientist, RSC Larsen, R. and Fox Stroup, D. (1976) Statistics in the Real World, Macmillan Miller, J. and Miller, J. (2010) Statistics and Chemometrics for Analytical Chemistry, Prentice Hall Samuels, M. et al. (2010) Statistics for the Life Sciences, Pearson Swartz, M. and Krull, I. (2012) Handbook of Analytical Validation, CRC Press Statistical calculators online: http://www.danielsoper.com/statcalc3/ http://www.measuringusability.com/calc.php Checklist At the end of this topic guide, you should be: familiar with the role of statistics in experimental design able to discuss the factors behind experimental design from a statistical view point (1.1) able to explain the mechanics of population sampling with regards to controlling error (1.2) able to evaluate probabilities using approximation methods (1.3). Acknowledgements The publisher would like to thank the following for their kind permission to reproduce their photographs: Shutterstock.com: Sofiaworld Every effort has been made to trace the copyright holders and we apologise in advance for any unintentional omissions. We would be pleased to insert the appropriate acknowledgement in any subsequent edition of this publication. 10.1: The role of statistics in experimental design 11