Download Probability Density Functions - Problem Solving with Excel and Matlab

Document related concepts
no text concepts found
Transcript
Statistical Inferences
Jake Blanchard
Spring 2010
Uncertainty Analysis for Engineers
1
Introduction
Statistical inference=process of drawing
conclusions from random data
 Conclusions of this process are
“propositions,” for example

◦
◦
◦
◦
◦

Estimates
Confidence intervals
Credible intervals
Rejecting a hypothesis
Clustering data points
Part of this is the estimation of model
parameters
Uncertainty Analysis for Engineers
2
Parameter Estimation

Point Estimation
◦ Calculate single number from a set of
observational data

Interval Estimation
◦ Determine interval within which true
parameter lies (along with confidence level)
Uncertainty Analysis for Engineers
3
Properties
Bias=expected value of estimator does
not necessarily equal parameter
 Consistency=estimator approaches
parameter as n approaches infinity
 Efficiency=smaller variance of parameter
implies higher efficiency
 Sufficient=utilizes all pertinent
information in a sample

Uncertainty Analysis for Engineers
4
Point Estimation
Start with data sample of size N
 Example: estimate fraction of voters who
will vote for particular candidate
(estimate is based on random sample of
voters)
 Other examples: quality control, clinical
trials, software engineering, orbit
prediction
 Assume successive samples are
statistically independent

Uncertainty Analysis for Engineers
5
Estimators
Maximum likelihood
 Method of moments
 Minimum mean squared error
 Bayes estimators
 Cramer-Rao bound
 Maximum a posteriori
 Minimum variance unbiased estimator
 Best linear unbiased estimator
 etc

Uncertainty Analysis for Engineers
6
Maximum Likelihood
Suppose we have a random variable x
with pdf f(x;)
 Take n samples of x
 What is value of  that will maximize the
likelihood of obtaining these n
observations?
 Let L=likelihood of observing this set of
values for x
 Then maximize L with respect to 

Uncertainty Analysis for Engineers
7
Maximum Likelihood
Lx1 , x2 ,...xn ;   f ( x1 ; ) f ( x2 ; )... f ( xn ; )

Lx1 , x2 ,...xn ;   0


log L x1 , x2 ,...xn ;   0

Uncertainty Analysis for Engineers
8
Example
Time between successive arrivals of
vehicles at an intersection are 1.2, 3, 6.3,
10.1, 5.2, 2.4, and 7.2 seconds
 Assume exponential distribution
 Find MLE for 

Uncertainty Analysis for Engineers
9
Solution
f 
1

e t / 
 1 7 
L e
 7 exp   ti 

i 1 
  i 1 
1 7
35.3
log( L)  7 Log ( )   ti  7 Log ( ) 
7
1
tt / 
1

i 1

 log( L)  7 35.3

 2 0



35.3

 5.04
7
Uncertainty Analysis for Engineers
10
2-Parameter Example
Measure cycles to failure of saturated
sand (25, 20, 28, 33, 26 cycles)
 Assume lognormal distribution

Uncertainty Analysis for Engineers
11
Solution
  1  ln( x )    2 
i
 
f 
exp  
2

2  xi
 
 
n
  1  ln( x )    2 
 1
1
1  n 1
i
   exp  2
  
L
exp  

2  xi
2   i 1 xi 
i 1
 2 
 
 2
n
1 n
2
ln( L)  n ln( 2 )  n ln(  )   ln  xi   2  ln( xi )   
2 i 1
i 1
1
 ln( L) 1
 2


2


ln(
x
)


 i

n

i 1
n
 ln( x )     0
i 1
i
 ln( L)  n 1 n
2

 3  ln( xi )     0

  i 1
1 n
   ln( xi )  3.26
n i 1
1 n
   ln( xi )   2  0.027
n i 1
  0.164
2
Uncertainty Analysis for Engineers
12
Method of Moments

Use sample moments (mean, variance,
etc.) to set distribution parameters
Uncertainty Analysis for Engineers
13
Example
Time between successive arrivals of
vehicles at an intersection are 1.2, 3, 6.3,
10.1, 5.2, 2.4, and 7.2 seconds
 Assume exponential distribution
 Mean=5.05

Uncertainty Analysis for Engineers
14
2-Parameter Example
Measure cycles to failure of saturated
sand (25, 20, 28, 33, 26 cycles)
 Assume lognormal distribution
 Mean=26.4
 Standard Deviation=4.72
 Solve for  and 
 =3.26
 =0.177

Uncertainty Analysis for Engineers
15
Solution
  1  ln( x )    2 
i
 
f 
exp  
2

2  xi
 
 
n
  1  ln( x )    2 
 1
1
1  n 1
i
   exp  2
  
L
exp  

2  xi
2   i 1 xi 
i 1
 2 
 
 2
n
1 n
2
ln( L)  n ln( 2 )  n ln(  )   ln  xi   2  ln( xi )   
2 i 1
i 1
1
 ln( L) 1
 2


2


ln(
x
)


 i

n

i 1
n
 ln( x )     0
i 1
i
 ln( L)  n 1 n
2

 3  ln( xi )     0

  i 1
1 n
   ln( xi )  3.26
n i 1
1 n
   ln( xi )   2  0.027
n i 1
  0.164
2
Uncertainty Analysis for Engineers
16
Minimum Mean Square Error
Choose parameters to minimize mean
squared error between measured data
and continuous distribution
 Essentially a curve fit

Uncertainty Analysis for Engineers
17
Approach

Excel
◦ Guess parameters
◦ Calculate sum of squares of errors
◦ Vary guessed parameters to minimize error
(use the Solver)

Matlab
◦ Use fminsearch function
Uncertainty Analysis for Engineers
18
Example

Solar insolation data
◦ Gather data
◦ Form histogram
◦ Normalize histogram by number of samples
and width of bins
Uncertainty Analysis for Engineers
19
Scatter Plot and Histogram
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
5
10
15
20
2512
30
35
10
8
6
4
2
0
3500
3700
3900
4100
4300
Uncertainty Analysis for Engineers
4500
20
Normal and Weibull Fits
Mean=3980 (fit)
Mean=3915 (data)
0.0035
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
3500
3700
3900
4100
0.00354300
4500
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
3500
3700
3900
4100
4300
Uncertainty Analysis for Engineers
4500
21
Excel Screen Shot
Uncertainty Analysis for Engineers
22
Excel Screen Shot
Uncertainty Analysis for Engineers
23
Solver Set Up
Uncertainty Analysis for Engineers
24
Matlab Script
y=xlsread('matlabfit.xlsx','normal')
[s,t]=hist(y,8);
s=s/((max(t)-min(t))/8)/numel(y);
numpts=numel(t);
zin(1)=mean(t);
zin(2)=std(t);
sumoferrs(zin,t,s)
zout=fminsearch(@(z) sumoferrs(z,t,s), zin)
sumoferrs(zout,t,s)
xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end);
yplot=curve(xplot,zout);
plot(t,s,'+',xplot,yplot)
Uncertainty Analysis for Engineers
25
Matlab Script
function f=curve(x,z)
mu=z(1);
sig=z(2);
f=normpdf(x,mu,sig);
function f=sumoferrs(z, x, y)
f=sum((curve(x,z)-y).^2);
Uncertainty Analysis for Engineers
26
Sampling Distributions

How do we assess inaccuracy in using
sample mean to estimate population
mean?
1 n
x   xi
n i 1
1 n  1
 x  E   xi   n   
 n i 1  n
2
1 n  1
 n  1
2
Var x   Var   xi   2 Var   xi   2 n 
n
 n i 1  n
 i 1  n

x 


n
Uncertainty Analysis for Engineers
27
Conclusions
Expected value of mean is equal to
population mean
 Mean of sample is unbiased estimator of
mean of population
 Variance of sample mean is sampling error
 By CLT, sample mean is Gaussian for large
n
 Mean of x is N(,/n)
 Estimator for  improves as n increases

Uncertainty Analysis for Engineers
28
Sample Mean with Unknown 
In previous derivation,  is the population
mean
 This is generally not known
 All we have is the sample variance (s2)
 If sample size is small, distribution will
not be Gaussian
 We can use a “student’s t-distribution”

 f  1 / 2  t 2 
1  
fT (t ) 
f 
 f   f / 2 

1
 f 1
2
f=number of
degrees of freedom
Uncertainty Analysis for Engineers
29
Distribution of Sample Variance
1 n
2


s 
x

x
 i
n  1 i 1
2
1
1
n
n
2
2
Es 
E  xi  x   
E   xi     x    
n  1  i 1
 n  1  i 1

 
2
2
2















x



x



x



2
x


x



x


 i
 i
i
n
n
2
i 1
i 1
n
n
  xi     x    x     2xi   
2
i 1
i 1
n


  xi     x   n x     2n  2 xi 
i 1
i 1


n
2
n
n


  xi     x   nx     2n  2n x    xi     nx   
2
i 1
i 1
2
2
1  n
2
2
Es 
 E  xi      nE x    
n  1   i 1


 
2
 
E s2 


1
n 2   2   2
n 1
Uncertainty Analysis for Engineers
30
Conclusions

Sample variance is unbiased estimator of
population variance
 4   n 3
2
 
Var s 
 44 

n 
n 1 
 4  E x   4

For normal variates
n
n


(n  1) s   xi  x    xi     nx   
2
2
i 1
(n  1) s 2
2
Chi-Square
Distribution
with n-1 dof
i 1
 x    x 
  i

 
   / n 
i 1 
n
2
2
2
2
This
approaches
normal
distribution
for large n
Uncertainty Analysis for Engineers
31
Testing Hypotheses
Used to make decisions about population
based on sample
 Steps

◦
◦
◦
◦
Define null and alternative hypotheses
Identify test statistic
Estimate test statistic, based on sample
Specify level of significance
 Type I error: rejecting null hypothesis when it is
true
 Type II error: accepting null hypothesis when it is
false
◦ Define region of rejection (one tail or two?)
Uncertainty Analysis for Engineers
32
Level of Significance

Type I error
◦ Level of significance ()
◦ Typically 1-5%

Type II error () is seldom used
Uncertainty Analysis for Engineers
33
Example
We need yield strength of rebar to be at
least 38 psi
 We order sample of 25 rebars
 Sample mean from 25 tests is 37.5 psi
 Standard deviation of rebar strength =3 psi
 Use one-sided test
 Hypotheses: null-=38; alt.- <38

Uncertainty Analysis for Engineers
34
Solution
x
37.5  38
Z

 0.833

3
n
25
z   1 ( )   1 (0.05)  norminv (0.05,0,1)  1.64
So we cannot reject the null
hypothesis and the supplier
is considered acceptable
Uncertainty Analysis for Engineers
35
Variation of This Example
Suppose standard deviation is not known
 Use student’s t-distribution
 Sample stand. dev. = 3.5 psi

x  37.5 psi
s  3.5 psi
x   37.5  38
T

 0.714

3.5
n
25
f  25  1  24dof
So we cannot reject the null
t  tinv (0.05,24)  1.711
hypothesis and the supplier
is considered acceptable
Uncertainty Analysis for Engineers
36
Third Variation
Sample size increased to 41
 Sample mean=37.6 psi
 Sample standard deviation = 3.75 psi
 Null-variance=9
 Alternative-variance>9
 Use Chi-Square distribution

Uncertainty Analysis for Engineers
37
Solution

n  1s 2 41  13.752
C


2
9
2
 62.5
  0.025
f  40
c0.975  chi 2inv (0.975,40)  59.34
So we reject the null
hypothesis and the supplier
is not acceptable
Uncertainty Analysis for Engineers
38
Confidence Intervals
In addition to mean, standard deviation,
etc., confidence intervals can help us
characterize populations
 For example, the mean gives us a best
estimate of the expected value of the
population, but confidence intervals can
help indicate the accuracy of the mean
 Confidence interval is defined as the
range within which a parameter will lie –
within a prescribed probability

Uncertainty Analysis for Engineers
39
CI of the Mean
First, we’ll assume the variance is known
 The central limit theorem states that the
pdf of the mean of n individual
observations from any distribution with
finite mean and variance approaches a
normal distribution as n approaches
infinity

Uncertainty Analysis for Engineers
40
CI of the Mean
K
x

 N (0,1)
n




x
P K  
 K1   1  
2
 2 

n



 

P x  K 
   x  K1
  1
2
2
n
n

CI   
1

 

  x  K
; x  K1

2
2
n
n

 2
1   2 
K   1 1  
2
K1  
2
1
 Is CDF of standard normal
variate
Uncertainty Analysis for Engineers
41
Example
Measure strength of rebar
 25 samples
 Mean=37.5 psi
 Standard deviation=3 psi
 Find 95% confidence interval for mean

Uncertainty Analysis for Engineers
42
Solution
K  K 0.025   1 0.975  1.96
2
K1  K 0.975   1 0.975  1.96
2
3
3 

 0.95   37.5  1.96
; 37.5  1.96

25
25 

 0.95  36.3; 38.7  psi
So the mean of the strength falls
between 36.3 and 38.7 with a 95%
confidence level
Uncertainty Analysis for Engineers
43
The Script
mu=37.5
sig=3
n=25
alpha=0.05
ka=-norminv(1-alpha/2)
k1ma=-ka
cil=mu+ka*sig/sqrt(n)
ciu=mu-ka*sig/sqrt(n)
Uncertainty Analysis for Engineers
44
Variance Not Known
What if the variance of the population ()
is not known?
 That is, we only know variance of sample.
 Let s=standard deviation of sample
 We can show that x  

s
n

does not conform to a normal
distribution, especially for small n
Uncertainty Analysis for Engineers
45
Variance Not Known

We can show that this quantity follows a
Student’s t-distribution with n-1 degrees
of freedom (f)
1
 f 1
2
2

 f  1 / 2
t 
1  
f t (t ) 
f 
  f 2  




x
P t ,n 1 
 t1 ,n 1   1  
s
2
 2

n



Uncertainty Analysis for Engineers
46
Example
Measure strength of rebar
 25 samples
 Mean=37.5 psi
 s=3.5 psi
 Find 95% confidence interval for mean

Uncertainty Analysis for Engineers
47
Script

Result is 36.06, 38.94
xbar=37.5;
s=3.5;
n=25;
alpha=0.05;
ka=-tinv(1-alpha/2,n-1);
kb=-tinv(alpha/2,n-1);
cil=xbar+ka*s/sqrt(n)
ciu=xbar+kb*s/sqrt(n)
Uncertainty Analysis for Engineers
48
One-Sided Confidence Limit
Sometimes we only care about the upper
or lower bounds
 Lower  )1  x  K1 

 )1  x  t1 ,n 1

Upper
n
s
n


1
 x  K1

1
 x  t1 ,n 1
n
s
n
Uncertainty Analysis for Engineers
49
Example
100 steel specimens – measure strength
 Mean=2200 kgf; s=220 kgf
 Specify 95% confidence limit of mean

Assume =s=220 kgf
 1-=0.95; =0.05

k0.95   1 (0.95)  1.65
 0.95  2200  1.65
220
 2164
100
Manufacturer has
95% confidence
that yield strength
is at least 2164 kgf
Uncertainty Analysis for Engineers
50
Example
Now only 15 steel specimen
 Mean=2200 kgf; s=220 kgf
 Specify 95% confidence limit of mean

t0.95,14  1.761
220
 0.95  2200  1.761
 2100
15
Manufacturer has
95% confidence
that yield strength
is at least 2100 kgf
Uncertainty Analysis for Engineers
51
Confidence Interval of Variance



n  1 s 2
P c ,n 1 
 c1 ,n 1   1  
2
2

 2


2
2


n  1 s
n  1 s 
2

 1 
;
 c1 ,n 1
c ,n 1 
2
2


Uncertainty Analysis for Engineers
52
Example


25 storms, sample variance for measured
runoff is 0.36 in2
Find upper 95% confidence limit for variance


2
1

n  1 s 2

c ,n 1
 0.624 in
2
So, we can say, with 95% confidence, that the
upper bound of the variance of the runoff is
0.624 in2 and the upper bound of the
standard deviation is 0.79 in
Uncertainty Analysis for Engineers
53
Script
var=0.36
n=25
alpha=0.05
c=chi2inv(alpha,n-1)
ci=1/c*var*(n-1)
si=sqrt(ci)
Uncertainty Analysis for Engineers
54
Measurement Theory
Suppose we are measuring distances
 d1, d2, …, dn are measured distances
n
1
 Distance estimate is
d   di

n

Standard error is
i 1
s
d 
n
◦ s=standard deviation of sample
◦ d is the expected value of the mean
d
1
s
s 

  d  t ,n 1
; d  t1 ,n 1

2
2
n
n

Uncertainty Analysis for Engineers
55