Download How Many Test Replicates Needed to Obtain Useful Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
How many replicate tests do we
need to obtain useful data from
testing a stove?
Ashok Gadgil, Carl Wang, Kathleen Lask, and Mike Sohn
[email protected]
January 24-25, 2015
ETHOS Conference, Kirkland
Acknowledgements:
US Dept of Energy
LBNL’s Laboratory Directed R&D program
2
Stove testing is needed – but stove
performance varies from test to test
WHY TEST? To design, develop, and select improved stoves,
a stove’s performance must be experimentally tested
BUT the stove test results commonly vary from test to test, so
we report an average from a number of replicate tests
HOW SURE ARE WE? when we report an average test result,
we would like to communicate a measure of confidence –
how sure are we of the reported result? (For example, what
is the 95% uncertainty bound around that result?)
3
We need confidence in the results, but also
avoid wasteful excessive replicate testing
Published literature on stove testing shows considerable
misunderstanding on this point
This is not surprising. Classical mechanics landmark is Sir
Isaac Newton. ~1687. Modern statistics landmark is Sir
Francis Galton, a full two centuries later (e.g., his 1870 paper
on Standard Deviation)
Student’s-t Test (William Gosset, Guinness Brewery) central to
this discussion, was published even later, in 1909, four years
after Einstein’s 1905 paper on Special Theory of Relativity!!
4
Almost always, the data are distributed in a
bell curve, called a “normal” or “gaussian”
distribution
I.e., after very large number of tests, the histogram of data will
closely fit a gaussian distribution, (like the one shown below)
Figure from www.mathisfun.com
5
The “normal” or “gaussian” distribution has
well understood properties, is fully defined
with only two numbers: its mean and the SD
The distribution shown below is for mean=10, and SD=2.
Mean = 10
SD = 2
Graph derived from www.vertex42.com
6
Standard Deviation or “SD” is a measure of
scatter of the data around the mean
As seen below, for SD=0.5, the distribution is gathered more
tightly around the mean than in the graph with SD=2
Mean = 10
SD = 0.5
Graph derived from www.vertex42.com
7
If we know the mean and SD of a normal
distribution, we know very well how the data
are distributed around the mean
See famous graphic below. Rule of 68 – 95 – 99.7
Graph posted by Dan Kernler on Wikipedia
8
Example 1:
Height of adult males in Kirkland is normally
distributed with mean = 176 cm, SD = 14 cm
Heights of people are a common example of something that
follows a normal distribution
Recall the graphic with the famous Rule of 68 – 95 – 99.7
So we can expect 95% chance than a randomly chosen adult
male in Kirkland will have a height between (mean+2*SD =)
204 cm and (mean-2*SD=) 148 cm.
9
Example 2:
What is the average height of adult giraffes
in Serengeti National Park?
10
Example 2:
Find the average height of adult giraffes
Given: Giraffes heights are also normally distributed.
Given: Giraffes are not cooperative. They don’t stand straight.
It is not safe to approach them. One needs to triangulate the
height of each animal, case by case, with a lot of effort.
Given: Measuring giraffe heights is time-consuming, tedious,
and sometimes dangerous.
11
Example 2: Giraffe Height
Now we are facing a DIFFERENT problem: estimating the true
mean from a few sample data points. To obtain each data
point requires tedious effort.
We don’t know the mean, the SD, and the number of samples
needed
Best estimate of the true mean is still the mean of the sample
data points.
But how many data points should we collect?
12
Example 2: Giraffe Height
How many data points? The answer depends on how much
confidence we want in our estimate of the true mean.
Few data points: low confidence.
Many (say, hundreds of) data points: very high confidence
Confidence is reported with “confidence limits”. So, people may
report a number with a 95% confidence limits.
These confidence limits will be wide when we have low
confidence, and they will be narrow or tight when we have
high confidence
13
Example 2: Giraffe Height
This is where the so called Student’s-t distribution
comes in..
For here, I am just going to say that “Student” was not
his real name – his real name was William Gosset,
and he worked for the famous Guinness brewery in
Dublin, Ireland. But his employer would let him
publish papers only under a fake name, so he
published his work under “Student”
14
How to get desired confidence in
average value Giraffe Height
“Student” proved that the correct 95% confidence limits are
given by:
(sample _ mean) − "# A * (sample _ SD) / N $%
(sample _ mean) + !" A *(sample _ SD) / N #$
N is the number of data points in your sample
“A” is a number for the central 95% interval in Students-t
distribution for the given N. So, A also depends on N. (See
Wikipedia, or our paper cited on the last slide, for a table of A
values).
15
So, what does this mean for us?
How to use this formula?
First of all, notice from the previous slide, that the confidence
interval has in its denominator, the SQRT of the sample size,
N, given by N
Increasing the sample size by 9X will narrow the confidence
interval by about 3X. And we can have more and more
confidence – as much as we want -- as we increase N more
and more.
16
Applying to testing Stoves
As noted, we can have more and more confidence – as much
as we want -- as we increase N.
Suppose we want to report fuel efficiency of a stove (in percent)
with 95% confidence bound of +/- 2 percent. Then how
many tests do we need?
Keep with you the equation on slide 14, start doing replicate
tests, and keep plugging the data into the equation. As your
N increases, the confidence interval shrinks. Keep going
until the confidence interval is less than or equal to +/- 2%
17
Wang et al (2014) “How many replicate tests
are needed to test cookstove performance
and emissions? — Three is not always
adequate”. Energy for Sustainable
Development 20, pp.21–29.
18
Supplemental Slides Follow
19
Summarized quotes from the Wikipedia:
In scientific and technical literature, experimental data is often
[presented] either using the mean and standard deviation or the
mean with the standard error. This often leads to confusion
about their interchangeability. THEY ARE NOT
INTERCHANGEABLE
Put simply, the standard deviation of the sample is the degree
to which individuals within the sample differ from the sample
mean.
The standard error of the sample mean is an estimate of how
far the sample mean is likely to be from the population mean.
The standard error of the sample mean will tend to zero with
increasing sample size.
20
Formula for a “normal” or “gaussian”
distribution
Where
µ (”mu”) stands for the Mean or average,
And σ (“sigma”) stands for the Standard Deviation
21
Wang et al (2014) “How many replicate tests
are needed to test cookstove performance
and emissions? — Three is not always
adequate”. Energy for Sustainable
Development 20, pp.21–29.
22
Figure from the paper,
Wang et al (2014),
cited on the previous
slide.
Illustration to show the
dependence of
confidence (90%, or
95%, or 99%) on the
number of replicates,
for a stove with known
SD value.
Number of replicates to get a confidence interval of +/- 2 minutes,
around the average time-to-boil from a water boiling tests. The
three graphs present confidence levels of 90%, 95%, and 99%.
23