Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Probability distribution functions
•
•
•
•
•
Normal distribution
Lognormal distribution
Mean, median and mode
Tails
Extreme value distributions
Normal (Gaussian) distribution
• Normal density function
f X ( x) 
1
 1  x   
exp   

 2
 2   
• What does the figure tell us about the values of the
CDF?
More on the normal distribution
• P = normcdf(X,MU,SIGMA) returns the cdf of the normal
distribution with mean MU and standard deviation SIGMA,
evaluated at the values in X. The size of P is the common size
of X, MU and SIGMA.
• normcdf(1)=0.8413.
• 1-normcdf(6)= 9.8659e-010
• If X is normally distributed, Y=aX+b is also normally
distributed. What would be the mean and standard deviation
of Y?
• Notation
N   , 2 
Estimating mean and standard
deviation
• Given a sample from a normally distributed variable,
the sample mean is the best linear unbiased
estimator of the true mean.
• For the variance the equation gives the best
unbiased estimator, but the square root is not an
unbiased estimate of the standard deviation
2
x=randn(5,10000); s=std(x);
mean(s) 0.9463
s2=s.^2; mean(s2) 1.0106
1 n
1 n
2
 
 xi  x  x   xi

n  1 i 1
n i 1
Lognormal distribution
• If ln(X) has normal distribution X has
lognormal distribution. That is, if X is normally
distributed exp(X) is lognormally distributed.
• Notation: ln N   , 
• Probability distribution function (PDF)
2
  ln x   2 
1
f ( x) 
exp  

2
2

x 2


• Mean and variance
  exp     / 2  ,
2
X


 X2  Var  X   e  1 e 2  
2
2
Mean, mode and median
exp[    2 ]
• Mode (highest point)
• Median (50% of samples)
e
Light and heavy tails
• Normal distribution has light tail. Six sigma is
equivalent to .999999999 (nine nines) safety.
• Lognormal is heavy tailed 0.9963
m=exp(0.5)
m =1.6487
v=exp(1)*(exp(1)-1)
v =4.6708
sig=sqrt(v)
sig =2.1612
sig6=m+6*sig
sig6 =14.6159
logncdf(sig6,0,1) =0.9963
Fitting distribution to data
• Typically fit to CDF.
Empirical CDF
[F,X] = ecdf(Y) calculates the Kaplan-Meier estimate of the
cumulative distribution function (cdf), also known as the empirical
cdf. Y is a vector of data values. F is a vector of values of the
empirical cdf evaluated at X.
[F,X,FLO,FUP] = ecdf(Y) also returns lower and upper confidence
bounds for the cdf. These bounds are calculated using Greenwood's
formula, and are not simultaneous confidence bounds.
ecdf(...) without output arguments produces a plot of the empirical
cdf. Use the data cursor to read precise values from the plot.
Example
x=lognrnd(0,1,1,20); ecdf(x)
hold on
x=lognrnd(0,1,1,10000); ecdf(x)
1
0.9
0.8
0.7
F(x)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
x
25
30
35
40
Extreme value distributions
• No matter what distribution you sample from, the
mean of the sample tends to be normally distributed as
sample size increases (what mean and standard
deviation?)
• Similarly, distributions of the minimum (or maximum)
of samples belong to other distributions.
• Even though there are infinite number of distributions,
there are only three extreme value distribution.
– Type I (Gumbel) derived from normal.
– Type II (Frechet) e.g. maximum daily rainfall
– Type III (Weibull) weakest link failure
Example
x=5-0.3*randn(10,1000); minx=min(x); hist(minx); ecdf(minx)
300
1
250
0.9
0.8
200
0.7
0.6
F(x)
150
0.5
0.4
100
0.3
0.2
50
0.1
0
3.6
3.8
4
4.2
4.4
4.6
4.8
5
0
3.6
3.8
4
4.2
4.4
x
4.6
4.8
5
Gumbel distribution
• PDF and CDF
PDF 
1

exp   z  e  z  ,
z
x

CDF  exp(e  z )
• Mean, median, mode and variance
Mean    
Variance 
2
6
median     ln(ln(2))
2
mode=
Euler-Mascheroni constant   0.5772
Weibull distribution
• Probability distributionf ( x;  , k )  k  x 
   
• Used to describe distribution Of
strength or fatigue life in brittle
materials (weakest link connection)
• If it describes time to failure, then
 k<1 indicates that failure rate
decreases with time,
 k=1 indicates constant rate,
 k>1 indicates increasing rate.
• Useful for other phenomena like wind
speed distribution.
• Can add 3rd parameter by replacing x
by x-c.
k 1
e
 x /  k
x  0, k  0,   0