Download Errors and the normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics, Probability,
and Decision Making
Statistics, Probability and Decision Making
1
Trial
Length
1
25.45
2
25.40
3
25.50
4
25.42
5
25.38
Mean
25.44
Which trial represents
the length?
Most feel the mean is
the best estimate.
Statistics, Probability and Decision Making
2
How Precise is the Estimate?
You decide that the length is 25.43.
But look at the measurements.
Is 25.50 a misfit?
Statistics, Probability and Decision Making
3
What about an unexpected value?
• Get rid of it…
• No, you need a statistical
reason !
• Only if it was a mistake.
Statistics, Probability and Decision Making
4
Is it a mistake?
An outlier: A single
observation "far
away" from the rest.
Q: How far away is “far away”?
A: It depends on whether the
value differs from the rest
within a “reasonable” range.
Statistics, Probability and Decision Making
5
Decisions, decisions…
Statistics, Probability and Decision Making
6
Rejecting Data in a Small Data Set
Trial
1
2
3
4
5
Mean
Length
25.45
Run the “Q-test.”
25.40
To test 25.50, calculate Q.
25.50
25.42
25.38
25.44
Q = (The suspect - the value closest to it)
Range
Q = 0.05 ÷ 0.12 = ≈ 0.42
Statistics, Probability and Decision Making
7
Compare Qcalculated with Qcritical
Qcritical
90% confidence
Number of
trials
0.94
0.76
0.64
0.56
0.51
0.47
0.44
0.41
3
4
5
6
7
8
9
10
• If Qcalc > Qcritical, reject.
• If Qcalc < Qcritical, keep .
Statistics, Probability and Decision Making
8
From the previous example…
Qcalc = 0.42
N = 5, Qcritical = 0.64
• If Qcalc > Qcritical
• If Qcalc < Qcritical
Statistics, Probability and Decision Making
9
Rejecting data in a large set
Use a Normal Distribution
• Find the confidence
interval
µ ± 3 σ
95% of the data falls within
two standard deviations of
the mean.
• Does measurement
falls outside the
confidence interval?
Statistics, Probability and Decision Making
10
Outliers…
Q:
A:
Why worry about them?
Q: Where do they come from?
Values may not be properly A: Possible sources:
distributed.
1. Recording and
measurement errors
2. Incorrect distribution
3. Unknown data structure
Note: Outliers are in red
Statistics, Probability and Decision Making
11
Managing Outliers
If the data is a normal distribution:
1.
Calculate the mean and the standard deviation.
2.
Find the ±3 standard deviation range for
imposing limits on the data.
3.
Identify outliers (greater ± 3 standard deviations).
4.
Get rid of them!!!
Statistics, Probability and Decision Making
12
Related documents