Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary Statistics
Jake Blanchard
Spring 2008
Uncertainty Analysis for Engineers
1
Summarizing and Interpreting Data
It is useful to have some metrics for
summarizing statistical data (both input and
output)
3 key characteristics are
◦ central tendency (mean, median, mode)
◦ Dispersion (variance)
◦ Shape (skewness, kurtosis)
Uncertainty Analysis for Engineers
2
Central Tendency
Mean
n
E ( x) xi pi
E ( x)
i 1
x f ( x)dx
Median=point such that exactly half of the
probability is associated with lower values
and half with greater values
z
f ( x)dx 0.5
Mode=most likely value (maximum of pdf)
Uncertainty Analysis for Engineers
3
For 1 Dice
mean
1 1 1 1 1 1
E ( x) xi p ( xi ) 1 2 3 4 5 6
6 6 6 6 6 6
xi 1
E ( x) 3.5
6
median
x 3.5
mod e 3.5
Uncertainty Analysis for Engineers
4
Radioactive Decay
For our example, the mean, median, and
mode are given by
mean
0
E (t ) tf (t )dt te t dt
1
median
z
t
e
dt 0.5
0
z
ln( 2)
The mode is x=0
Uncertainty Analysis for Engineers
5
Other Characteristics
We can calculate the expected value of
any function of our random variable as
h( x) f ( x)dx
E h x
h x p x
i
i
i
Uncertainty Analysis for Engineers
6
Some Results
E (c ) c
E (cx) cE ( x)
n n
E x j E x j
j 1 j 1
n
n
E b j x j b j E x j
j 1
j 1
Uncertainty Analysis for Engineers
7
Moments of Distributions
We can define many of these parameters in
terms of moments of the distribution
x f ( x)dx
1
k
x
f ( x)dx
1
k
k Ex 1
x k p ( x )
i
1
i
i
Mean is first moment.
Variance is second moment
Third and fourth moments are related to
skewness and kurtosis
Uncertainty Analysis for Engineers
8
Spread (Variance)
Variance is a measure of spread or dispersion
2 E x 1
2
2
x
2
1
f ( x)dx
For discrete data sets, the biased variance is:
n
1
2
2
s x x
n i 1
and the unbiased variance is
1 n
2
s
x
x
n 1 i 1
2
The standard deviation is the square root of
the variance
Uncertainty Analysis for Engineers
9
Skewness
skewness is a measure of asymmetry
3 Ex 1
3
x
3
1
f ( x)dx
For discrete data sets, the biased skewness
is related to:
n
1
3
m3 x x
n i 1
The skewness is often defined as
3
1 3
Uncertainty Analysis for Engineers
10
Skewness
Uncertainty Analysis for Engineers
11
Kurtosis
kurtosis is a measure of peakedness
4 E x 1
4
x
4
1
f ( x)dx
For discrete data sets, the biased kurtosis is
related to:
n
1
4
m4 x x
n i 1
The kurtosis is often defined as
4
2 4 3
Uncertainty Analysis for Engineers
12
Kurtosis
Pdf of Pearson type VII distribution with
kurtosis of infinity (red), 2 (blue), and 0 (black)
Uncertainty Analysis for Engineers
13
Using Matlab
Sample data is length of time a person was
able to hold their breath (40 attempts)
Try a scatter plot
load RobPracticeHolds;
y = ones(size(breathholds));
h1 = figure('Position',[100 100 400 100],'Color','w');
scatter(breathholds,y);
Uncertainty Analysis for Engineers
14
Adding Information
disp(['The mean is ',num2str(mean(breathholds)),' seconds (green line).']);
disp(['The median is ',num2str(median(breathholds)),' seconds (red line).']);
hold all;
line([mean(breathholds) mean(breathholds)],[0.5 1.5],'color','g');
line([median(breathholds) median(breathholds)],[0.5 1.5],'color','r');
Uncertainty Analysis for Engineers
15
Box Plot
title('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines');
xlabel('');
h3 = figure('Position',[100 100 400 100],'Color','w');
boxplot(breathholds,'orientation','horizontal','widths',.5);
set(gca,'XLim',[40 140]);
title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]);
ylabel('');
Uncertainty Analysis for Engineers
16
Box Plot
Min
Box
represents
inter-quartile
range (half of
data)
Median
Max
Outlier
Uncertainty Analysis for Engineers
17
Empirical cdf
h3 = figure('Position',[100 100 600 400],'Color','w');
cdfplot(breathholds);
Uncertainty Analysis for Engineers
18
Multivariate Data Sets
When there are multiple input variables,
we need some additional ways to
characterize the data
h( x, y ) f ( x, y )dxdy continuous
E h( x, y )
h( xi , y j ) p xi , y j discrete
i j
Cov( x, y ) E ( xy) E ( x) E ( y )
If x and y are independent, then
Cov(x,y)=0
Uncertainty Analysis for Engineers
19
Correlation Coefficients
Two random variables may be related
Define correlation coefficient of input (x) and
output (y) as
x
m
x, y
k 1
k
x yk y
2
x
x
y
y
k 1 k
k 1 k
m
2
m
Cov( x, y)
( x) ( y )
=1 implies linear dependence, positive slope
=0 no dependence
=-1 implies linear dependence, negative
slope
Uncertainty Analysis for Engineers
20
Example
=0.98
=1
=-0.98
=-0.38
Uncertainty Analysis for Engineers
21
Example
x=rand(25,1)-0.5;
y=x;
corrcoef(x,y)
subplot(2,2,1), plot(x,y,'o')
y2=x+0.2*rand(25,1);
corrcoef(x,y2)
subplot(2,2,2), plot(x,y2,'o')
y3=-x+0.2*rand(25,1);
corrcoef(x,y3)
subplot(2,2,3), plot(x,y3,'o')
y4=rand(25,1)-0.5;
corrcoef(x,y4)
subplot(2,2,4), plot(x,y4,'o')
Uncertainty Analysis for Engineers
22