Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probability distribution functions
•
•
•
•
•
Normal distribution
Lognormal distribution
Mean, median and mode
Tails
Extreme value distributions
Normal (Gaussian) distribution
• Probability density function (PDF)
f ( x) 
1
 1  x   
exp   

 2
 2   
• What does figure tell about the cumulative distribution
x
function (CDF)?
F ( x)  P( X  x) 


f (t )dt
More on the normal distribution
• Normal distribution is denoted 𝑁 𝜇, 𝜎 2 , with the
square giving the variance.
• If X is normal, Y=aX+b is also normal. What would be
the mean and standard deviation of Y?
• Similarly, if X and Y are normal variables, any linear
combination, aX+bY is also normal.
• Can often use any function of a normal random
variables by using a linear Taylor expansion.
• Example: X=N(10,0.52) and Y=X2 . Then 𝑋 2 ≈ 100 +
Estimating mean and standard
deviation
• Given a sample from a normally distributed variable, the
sample mean is the best linear unbiased estimator
(BLUE) of the true mean.
• For the variance the equation gives the best unbiased
estimator, but the square root is not an unbiased
estimate of the standard deviation
n
2
n
1
1
2 
 xi  x  x   xi

n  1 i 1
n i 1
• For example, for a sample of 5 from a standard normal
distribution, the standard deviation will be estimated on
average as 0.94 (with standard deviation of 0.34)
Lognormal distribution
• If ln(X) has normal distribution X has
lognormal distribution. That is, if X is normally
distributed exp(X) is lognormally distributed.
• Notation: ln𝑁 𝜇, 𝜎 2
• PDF f ( x)  1 exp   ln x    
2
x 2

2 2

• Mean and variance
 X  exp     2 / 2  ,


 X2  Var  X   e  1 e 2  
2
2
Mean, mode and median
• Mode (highest point) =exp[𝜇 − 𝜎 2
• Median (50% of samples) = 𝑒 𝜇
• Figure for 𝜇=0.
Light and heavy tails
• Normal distribution has light tail; 4.5 sigma is
equivalent to 3.4e-6 failure or defect probability.
• Lognormal can have heavy tail 𝜇 = 0, 𝜎 = 0.25,7.5e−4 ,
𝜇 = 0, 𝜎 = 1,0.0075
Fitting distribution to data
• Usually fit CDF to minimize maximum distance
(Kolmogorov-Smirnoff test)
• Generated 20 points from N(3,12).
• Normal fit N(3.48,0.932)
1
0.9
• Lognormal lnN(1.24,0.26)
Almost same mean and
0.8
0.7
standard deviation.
CDF
0.6
0.5
0.4
0.3
0.2
experimental
lognormal
normal
0.1
0
1
2
3
4
5
x
6
7
8
Extreme value distributions
• No matter what distribution you sample from, the
mean of the sample tends to be normally distributed as
sample size increases (what mean and standard
deviation?)
• Similarly, distributions of the minimum (or maximum)
of samples belong to other distributions.
• Even though there are infinite number of distributions,
there are only three extreme value distribution.
– Type I (Gumbel) derived from normal.
– Type II (Frechet) e.g. maximum daily rainfall
– Type III (Weibull) weakest link failure
Maximum of normal samples
With normal distribution, maximum of sample is more narrowly distributed
than original distribution.
9000
8000
Max of 10 standard
normal samples.
1.54 mean, 0.59
standard deviation
7000
6000
5000
Max of 100 standard
normal samples.
2.50 mean, 0.43
standard deviation
8000
7000
6000
5000
4000
4000
3000
3000
2000
2000
1000
0
-1
1000
0
0
1
2
3
4
5
6
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Gumbel distribution
exp   z  e  z  ,
x
CDF  exp(e
• .


• Mean, median, mode and variance
PDF 
1
Mean    
Variance 
2
6
z
median     ln(ln(2))
2
z
)
mode=
Euler-Mascheroni constant   0.5772
1
0.9
1
fitted ev1
-max10 data
0.9
0.8
fitted ev1
-max100 data
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
-5
0
-5.5
-4
-3
-2
-1
0
1
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
Weibull distribution
• Probability distribution
• Its log has Gumbel dist.
kx
f ( x;  , k )   

k 1
e
 x /  k
x  0, k  0,   0
• Used to describe distribution of strength or fatigue life in brittle materials.
• If it describes time to failure, then
 k<1 indicates that failure rate decreases with time,
 k=1 indicates constant rate,
 k>1 indicates increasing rate.
• Can add 3rd parameter by replacing x by x-c.
1
0.9
log weibull
ev1 fit
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-8
-6
-4
-2
0
2
4
Exercises
• Find how many samples of normally distributed numbers
you need in order to estimate the mean and standard
deviation with an error that will be less than 10% of the
true standard deviation most of the time.
• Both the lognormal and Weibull distributions are used to
model strength. Find how closely you can approximate data
generated from a standard lognormal distribution by fitting
it with Weibull.
• Take the introduction and preamble of the US Declaration
of Independence, and fit the distribution of word lengths
using the K-S criterion. What distribution fits best?
Compare the graphs of the CDFs. Compare to a more
contemporary text.
Related documents