Download Information Analysis Gaussian or Normal Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Information Analysis
Gaussian or Normal Distribution
0.012
Probability
0.01
0.008
0.006
0.004
0.002
0
0
50
100
150
200
X
250
300
350


s


 xi
x
2
i

n 1
n

2
n 1






1/ 2
Probability
= mean, estimated as x
x = observed sample mean = 3x/n
= standard deviation, estimated as s
n = sample size
0.012
S= observed standard deviation
0.01
Area under curve = 1

0.008
0.006
0.004

0.002
0
0
50
100
150
200
X
250
300
350

0.012
Coefficient of Variation
Probability
0.01

0.008
0.006
Cv 

0.004
0.002
50
100
150


0
0
x
200
250
300
s
350
X
0.025
0.007
0.006
Probability
Probability
0.02
0.015
0.01
0.005
0.005
0.004
0.003
0.002
0.001
0
-0.005
0
50
100
150
200
X
Cv = 150/20 = 7.5
250
300
350
0
0
50
100
150
200
X
Cv = 150/60 = 2.5
250
300
350
Example
100 kg of glass is recovered from municipal refuse and processed.
The glass is crushed and sieved. Lot the cumulative
distribution of particle size from the data below
Sieve Size
4
3
2
1
<1
Fraction Retained
10/100 = 0.1
25/100 = 0.25
35/100 = 0.35
20/100 = 0.20
10/100 = 0.1
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
Fraction Retained
3 mm holes
2 mm holes
1 mm holes
No holes
10 kg glass remained on the sieve
(90 kg went through)
25 kg remained on the sieve
35 kg remained on the sieve
20 kg remained on the sieve
10 kg went all the way through
Fraction Retained
4 mm holes
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
00
44
33
22
Sieve
SieveSize
Size(mm)
(mm)
11
Pan
Pan
Cumulative Distribution
Sieve Size
Fraction Smaller
Than sieve size
1 – 0.1 = 0.9
1 – (0.1+0.25) = 0.65
1 –(0.35 + 0.35) = 0.3
1 – (0.7 + 0.2) = 0.1
Fraction of PArticles smaller
than size indicated
4
3
2
1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
Particle Size (mm)
4
5
Graphs
Independent variable
Abscissa (x-axis)
A variable is independent if the value is chosen, like
sieve size in the previous example.
Dependent variable
Ordinate (y-axis)
A value is dependent if is determined by experiment
Probability Paper
X-axis is linear
Y-axis is plotted so that if the probability is normal
(Gaussian) then the cumulative probability will plot as
a straight line.
If this is the case the mean is at
0.5 or 50% and the standard
deviation is 0.335 on either
side of the mean.
You can also calculate s by:
s = 2/5(x90 – x10)
Example
Consider the recycled glass data from the previous example.
What is the mean, the standard deviation, and the 95% interval?
The mean is the value on the x-axis
when the y-axis value is 0.5, 2.4 mm.
The standard deviation is the spread
around the mean so that 68% of the
data fall into the range (or about
34% on either side of the mean).
0.5 + 0.34 = 0.84, which corresponds
to 3.5 mm, so s = 3.5 – 2.4 = 1.1, or:
S=2/5(3.9-1.0) = 1.16
The 95% interval means 95% of the
data is in the range, or between
0.025 and 0.975, or 0.2 mm and
4.8 mm
Return Period
Return period is how often an event is expected to recur.
If the annual probability of an event occurring is 5%,
then the event can be expected to occur once every 20 years,
or have a return period of 20 years:
Return period = 1/fractional probability
To determine return periods, first rank time-variant data
(smallest to largest or largest to smallest) then calculate
the probabilities and plot the data.
Return Period Example
The data below are from a wastewater treatment plant. BOD is the
measure of organic pollution in a water. The BOD is measured daily. .
Does this data fit the normal distribution? Can it be used to calculate
the mean and standard deviation? What is the worst quality expected
in 30 days?
First, rank the data:
Now plot the data.
We will plot m/n (which
is the probability), versus
the BOD
It does fit the normal
distribution fairly well
The mean is about
35 mg/L BOD
To find the worst
quality in a 30 day
period, calculate:
29/30 = 0.967.
This is the fraction of
days the quality is
better than the worst
day out of 30 days
Enter the graph at
0.967 and find the
answer: 67 mg/L BOD
Sometimes data is analyzed after it is grouped. Often the
mean is used to analyze the data.
Example:
Using the data from the previous problem estimate the highest
expected BOD to occur once every 30 days using grouped data analysis
First define groups of BOD values.
Now plot these data
Notice how the data points form
a curve. This means the data
don’t really fit the normal
Distribution, but we’ll go ahead
anyway
Now P29/30 = 0.967 and we read
67 mg/L BOD from the graph.