Download Lectur - Montana State University College of Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lecture 2
Basic Experimental Statistics
What does this mean?
Chemical Response
250
200
150
100
50
0
0
5
10
15
20
Run
From Snee, R.D. (1983) “Graphical Analysis of Process Variation
Studies,” Journal of Quality Technology, 15, 76-88
Are the specimen different?
250
Chemical Response
Sp #5 Sp #6
200
Sp #1
Sp #2
150
100
Sp #3
50
0
0
5
10
Run
15
20
Are the operators different?
Chemical Response
250
Operator #3
200
Operator #1
150
100
Operator #2
50
0
0
5
10
Run
15
20
Statistics in Experimentation
• Project Planning Phase
– What is to be measured?
– How large is the likely variation?
– What are the influential factors?
• Experimental Design Phase
– Control known sources of variation
– Estimate the size of uncontrolled variation
– Investigate suitable models
• Statistical Analysis Phase
– Next few weeks…
Population
• All possible items or units that determine an
outcome of a well-defined experiment are
collectively called a “population”.
• Examples:
– All 30-ohm resistors produced by a certain
manufacturer during a fixed time period.
– All measurements of the fracture strength of oneinch-thick underwater welds on a steel alloy base
plate that is 200ft deep in salt-water.
Process
• A repeatable series of actions that result in an
observable characteristic or measurement.
• Examples:
– The effect of aspirin on blood pressure.
Sample
• A group of observations taken from a
population or a process.
• We usually take a “convenience sample” (i.e.,
easy to obtain), but these can be of dubious
value because they may not be representative
of the variation in the population!!
• Goal: random sample
Other definitions
• Variable : A property or characteristic on
which information is obtained in an
experiment.
• Distribution: a tabular or graphical description
of the values of a variable using some
measure of how frequently they occur in a
population, process, or sample. (think:
histogram)
Example
9
Frequency (# students)
8
7
6
5
4
3
2
1
0
15
30
45
60
75
90
105
Total Time per Problem (min.)
Population or a Sample?
120
More
Example
9
Frequency (# students)
8
7
6
5
4
3
2
1
0
15
30
45
60
75
90
105
120
More
Total Time per Problem (min.)
Convenience Sample or Random Sample?
Example
9
Frequency (# students)
8
7
6
5
4
3
2
1
0
15
30
45
60
75
90
105
Total Time per Problem (min.)
Is this a Normal Distribution?
120
More
Beware the Normal Distribution
• aka, Gaussian Distribution
• Approximately normal distributions occur in
many situations, as explained by the central
limit theorem. When there is reason to suspect
the presence of a large number of small
effects acting additively and independently, it is
reasonable to assume that observations will be
normal.
• Good example: laser light intensity
Beware the Normal Distribution
• Bad example: Black–Scholes model
• Changes in the logarithm of exchange rates,
price indices, and stock market indices; these
variables behave like compound interest, not
like simple interest, and so are multiplicative;
• While the Black–Scholes model assumes
normality, in reality these variables
exhibit heavy tails, as seen in stock market
crashes (see The Black Swan)
Beware the Normal Distribution
• Concrete example: S&P 500, daily change:
m = +0.035% ± 0.9% (from W. Egan)
• Based on this, you conclude:
– 68% of the time, changes will be less than 0.9%
– 95% of the time, changes will be less than 1.8%
– 99.7% of the time, changes will be less than 2.7%
– 99.994% of the time, changes will be less than 3.6%
• (1 in every 17,000 trading days or once every 70 years)
– A 4.5% change should almost never happen, but it has
happened multiple times in your lives!
Sample Mean or Average
x1  x2  ...  xn
x  n  xi 
n
1
• Sample median:
– M=x(q) if n is odd, where q=(n+1)/2.
– M=[x(q) + x(q+1)]/2 if n is even where q=n/2
• Do not use text for equations (like I just did)
in reports! It looks like crap!
Sample Standard Deviation

  xi  x 
s
2


n

1


2





• In your reports, all values should be
reported as:
x  s or m  
• INCLUDE UNITS!!!!
Sample Standard Error
sd
se 
n
• Some people like to use standard errors because the
smaller values seem to imply better results. You need
to be careful with such people!
Std. Err. vs. Std. Dev.
Thanks to Wikipedia….
• standard error is an estimate of how close to the
population mean your sample mean is likely to be –
it behaves like a confidence interval.
• standard deviation is the degree to which
individuals within the sample differ from the sample
mean (“Spread”).
• Standard error should decrease with larger sample
sizes, as the estimate of the population mean
improves. Standard deviation will be unaffected by
sample size.
Percent Error
your value  true value
% error 
100%
true value
• Note: The order of terms in the numerator can vary.
• Percent Difference
your value  other value
% diff 
100%
better value
• Note: There is no universally accepted choice for the value
that should go in the denominator.
• If one value is believed to be closer to the true value, that
should be in the denominator.
Which to use?
• We almost always need to give an indication
of how accurate our experimental
measurements are. Which measure should
we use?
• When should we use ?
• When should we use % error or % diff?