Download Summary Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Summary Statistics
Jake Blanchard
Spring 2008
Uncertainty Analysis for Engineers
1
Summarizing and Interpreting Data
It is useful to have some metrics for
summarizing statistical data (both input and
output)
 3 key characteristics are

◦ central tendency (mean, median, mode)
◦ Dispersion (variance)
◦ Shape (skewness, kurtosis)
Uncertainty Analysis for Engineers
2
Central Tendency

Mean
n
E ( x)   xi pi

E ( x) 
i 1

 x f ( x)dx

Median=point such that exactly half of the
probability is associated with lower values
and half with greater values
z
 f ( x)dx  0.5


Mode=most likely value (maximum of pdf)
Uncertainty Analysis for Engineers
3
For 1 Dice
mean
1 1 1 1 1 1
E ( x)   xi p ( xi )  1   2   3   4   5   6 
6 6 6 6 6 6
xi 1
E ( x)  3.5
6
median
x  3.5
mod e  3.5
Uncertainty Analysis for Engineers
4
Radioactive Decay

For our example, the mean, median, and
mode are given by
mean



0
E (t )   tf (t )dt   te t dt 
1

median
z
 t

e
 dt  0.5
0
z

ln( 2)

The mode is x=0
Uncertainty Analysis for Engineers
5
Other Characteristics

We can calculate the expected value of
any function of our random variable as

  h( x) f ( x)dx

E h x   
 h x  p  x 
i
i

 i
Uncertainty Analysis for Engineers
6
Some Results
E (c )  c
E (cx)  cE ( x)
n  n
E  x j    E x j 
 j 1  j 1
n
 n
E  b j x j    b j E x j 
 j 1
 j 1


Uncertainty Analysis for Engineers
7
Moments of Distributions

We can define many of these parameters in
terms of moments of the distribution
   x f ( x)dx

1


k


x


f ( x)dx

1

k
 k  Ex   1   
 x   k p ( x )
i
1
i

 i
Mean is first moment.
 Variance is second moment
 Third and fourth moments are related to
skewness and kurtosis

Uncertainty Analysis for Engineers
8
Spread (Variance)

Variance is a measure of spread or dispersion
   2 E x   1  
2
2


 x   
2
1
f ( x)dx

For discrete data sets, the biased variance is:
n
1
2
2
s   x  x 
n i 1

and the unbiased variance is
1 n
2


s 
x

x

n  1 i 1
2

The standard deviation is the square root of
the variance
Uncertainty Analysis for Engineers
9
Skewness

skewness is a measure of asymmetry
 3 Ex   1  
3

 x   
3
1
f ( x)dx


For discrete data sets, the biased skewness
is related to:
n
1
3
m3   x  x 
n i 1

The skewness is often defined as
3
1  3

Uncertainty Analysis for Engineers
10
Skewness
Uncertainty Analysis for Engineers
11
Kurtosis

kurtosis is a measure of peakedness
 4 E x   1  
4

 x   
4
1
f ( x)dx


For discrete data sets, the biased kurtosis is
related to:
n
1
4
m4    x  x 
n i 1

The kurtosis is often defined as
4
2  4 3

Uncertainty Analysis for Engineers
12
Kurtosis

Pdf of Pearson type VII distribution with
kurtosis of infinity (red), 2 (blue), and 0 (black)
Uncertainty Analysis for Engineers
13
Using Matlab
Sample data is length of time a person was
able to hold their breath (40 attempts)
 Try a scatter plot

load RobPracticeHolds;
y = ones(size(breathholds));
h1 = figure('Position',[100 100 400 100],'Color','w');
scatter(breathholds,y);
Uncertainty Analysis for Engineers
14
Adding Information
disp(['The mean is ',num2str(mean(breathholds)),' seconds (green line).']);
disp(['The median is ',num2str(median(breathholds)),' seconds (red line).']);
hold all;
line([mean(breathholds) mean(breathholds)],[0.5 1.5],'color','g');
line([median(breathholds) median(breathholds)],[0.5 1.5],'color','r');
Uncertainty Analysis for Engineers
15
Box Plot
title('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines');
xlabel('');
h3 = figure('Position',[100 100 400 100],'Color','w');
boxplot(breathholds,'orientation','horizontal','widths',.5);
set(gca,'XLim',[40 140]);
title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]);
ylabel('');
Uncertainty Analysis for Engineers
16
Box Plot
Min
Box
represents
inter-quartile
range (half of
data)
Median
Max
Outlier
Uncertainty Analysis for Engineers
17
Empirical cdf
h3 = figure('Position',[100 100 600 400],'Color','w');
cdfplot(breathholds);
Uncertainty Analysis for Engineers
18
Multivariate Data Sets

When there are multiple input variables,
we need some additional ways to
characterize the data
 
   h( x, y ) f ( x, y )dxdy continuous
E h( x, y )  
 h( xi , y j ) p xi , y j  discrete
 i j
Cov( x, y )  E ( xy)  E ( x) E ( y )

If x and y are independent, then
Cov(x,y)=0
Uncertainty Analysis for Engineers
19
Correlation Coefficients
Two random variables may be related
 Define correlation coefficient of input (x) and
output (y) as

 x
m
 x, y 
k 1
k
 x  yk  y 
2




x

x
y

y
k 1 k
k 1 k
m
2
m
Cov( x, y)

 ( x)  ( y )
=1 implies linear dependence, positive slope
 =0 no dependence
 =-1 implies linear dependence, negative
slope

Uncertainty Analysis for Engineers
20
Example
=0.98
=1
=-0.98
=-0.38
Uncertainty Analysis for Engineers
21
Example
x=rand(25,1)-0.5;
y=x;
corrcoef(x,y)
subplot(2,2,1), plot(x,y,'o')
y2=x+0.2*rand(25,1);
corrcoef(x,y2)
subplot(2,2,2), plot(x,y2,'o')
y3=-x+0.2*rand(25,1);
corrcoef(x,y3)
subplot(2,2,3), plot(x,y3,'o')
y4=rand(25,1)-0.5;
corrcoef(x,y4)
subplot(2,2,4), plot(x,y4,'o')
Uncertainty Analysis for Engineers
22
Related documents