Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How many replicate tests do we need to obtain useful data from testing a stove? Ashok Gadgil, Carl Wang, Kathleen Lask, and Mike Sohn [email protected] January 24-25, 2015 ETHOS Conference, Kirkland Acknowledgements: US Dept of Energy LBNL’s Laboratory Directed R&D program 2 Stove testing is needed – but stove performance varies from test to test WHY TEST? To design, develop, and select improved stoves, a stove’s performance must be experimentally tested BUT the stove test results commonly vary from test to test, so we report an average from a number of replicate tests HOW SURE ARE WE? when we report an average test result, we would like to communicate a measure of confidence – how sure are we of the reported result? (For example, what is the 95% uncertainty bound around that result?) 3 We need confidence in the results, but also avoid wasteful excessive replicate testing Published literature on stove testing shows considerable misunderstanding on this point This is not surprising. Classical mechanics landmark is Sir Isaac Newton. ~1687. Modern statistics landmark is Sir Francis Galton, a full two centuries later (e.g., his 1870 paper on Standard Deviation) Student’s-t Test (William Gosset, Guinness Brewery) central to this discussion, was published even later, in 1909, four years after Einstein’s 1905 paper on Special Theory of Relativity!! 4 Almost always, the data are distributed in a bell curve, called a “normal” or “gaussian” distribution I.e., after very large number of tests, the histogram of data will closely fit a gaussian distribution, (like the one shown below) Figure from www.mathisfun.com 5 The “normal” or “gaussian” distribution has well understood properties, is fully defined with only two numbers: its mean and the SD The distribution shown below is for mean=10, and SD=2. Mean = 10 SD = 2 Graph derived from www.vertex42.com 6 Standard Deviation or “SD” is a measure of scatter of the data around the mean As seen below, for SD=0.5, the distribution is gathered more tightly around the mean than in the graph with SD=2 Mean = 10 SD = 0.5 Graph derived from www.vertex42.com 7 If we know the mean and SD of a normal distribution, we know very well how the data are distributed around the mean See famous graphic below. Rule of 68 – 95 – 99.7 Graph posted by Dan Kernler on Wikipedia 8 Example 1: Height of adult males in Kirkland is normally distributed with mean = 176 cm, SD = 14 cm Heights of people are a common example of something that follows a normal distribution Recall the graphic with the famous Rule of 68 – 95 – 99.7 So we can expect 95% chance than a randomly chosen adult male in Kirkland will have a height between (mean+2*SD =) 204 cm and (mean-2*SD=) 148 cm. 9 Example 2: What is the average height of adult giraffes in Serengeti National Park? 10 Example 2: Find the average height of adult giraffes Given: Giraffes heights are also normally distributed. Given: Giraffes are not cooperative. They don’t stand straight. It is not safe to approach them. One needs to triangulate the height of each animal, case by case, with a lot of effort. Given: Measuring giraffe heights is time-consuming, tedious, and sometimes dangerous. 11 Example 2: Giraffe Height Now we are facing a DIFFERENT problem: estimating the true mean from a few sample data points. To obtain each data point requires tedious effort. We don’t know the mean, the SD, and the number of samples needed Best estimate of the true mean is still the mean of the sample data points. But how many data points should we collect? 12 Example 2: Giraffe Height How many data points? The answer depends on how much confidence we want in our estimate of the true mean. Few data points: low confidence. Many (say, hundreds of) data points: very high confidence Confidence is reported with “confidence limits”. So, people may report a number with a 95% confidence limits. These confidence limits will be wide when we have low confidence, and they will be narrow or tight when we have high confidence 13 Example 2: Giraffe Height This is where the so called Student’s-t distribution comes in.. For here, I am just going to say that “Student” was not his real name – his real name was William Gosset, and he worked for the famous Guinness brewery in Dublin, Ireland. But his employer would let him publish papers only under a fake name, so he published his work under “Student” 14 How to get desired confidence in average value Giraffe Height “Student” proved that the correct 95% confidence limits are given by: (sample _ mean) − "# A * (sample _ SD) / N $% (sample _ mean) + !" A *(sample _ SD) / N #$ N is the number of data points in your sample “A” is a number for the central 95% interval in Students-t distribution for the given N. So, A also depends on N. (See Wikipedia, or our paper cited on the last slide, for a table of A values). 15 So, what does this mean for us? How to use this formula? First of all, notice from the previous slide, that the confidence interval has in its denominator, the SQRT of the sample size, N, given by N Increasing the sample size by 9X will narrow the confidence interval by about 3X. And we can have more and more confidence – as much as we want -- as we increase N more and more. 16 Applying to testing Stoves As noted, we can have more and more confidence – as much as we want -- as we increase N. Suppose we want to report fuel efficiency of a stove (in percent) with 95% confidence bound of +/- 2 percent. Then how many tests do we need? Keep with you the equation on slide 14, start doing replicate tests, and keep plugging the data into the equation. As your N increases, the confidence interval shrinks. Keep going until the confidence interval is less than or equal to +/- 2% 17 Wang et al (2014) “How many replicate tests are needed to test cookstove performance and emissions? — Three is not always adequate”. Energy for Sustainable Development 20, pp.21–29. 18 Supplemental Slides Follow 19 Summarized quotes from the Wikipedia: In scientific and technical literature, experimental data is often [presented] either using the mean and standard deviation or the mean with the standard error. This often leads to confusion about their interchangeability. THEY ARE NOT INTERCHANGEABLE Put simply, the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean. The standard error of the sample mean is an estimate of how far the sample mean is likely to be from the population mean. The standard error of the sample mean will tend to zero with increasing sample size. 20 Formula for a “normal” or “gaussian” distribution Where µ (”mu”) stands for the Mean or average, And σ (“sigma”) stands for the Standard Deviation 21 Wang et al (2014) “How many replicate tests are needed to test cookstove performance and emissions? — Three is not always adequate”. Energy for Sustainable Development 20, pp.21–29. 22 Figure from the paper, Wang et al (2014), cited on the previous slide. Illustration to show the dependence of confidence (90%, or 95%, or 99%) on the number of replicates, for a stove with known SD value. Number of replicates to get a confidence interval of +/- 2 minutes, around the average time-to-boil from a water boiling tests. The three graphs present confidence levels of 90%, 95%, and 99%. 23