Download MEGN 537 * Probabilistic Biomechanics Ch.3 * Quantifying

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Transcript
MEGN 537 – Probabilistic Biomechanics
Ch.3 – Quantifying Uncertainty
Anthony J Petrella, PhD
Bioengineering
Big Picture
• Why study traditional probability (Ch. 2)?
• Unions and intersections allow us to
conceptualize system level variability with
multiple sources
• Chapter 3 deals with uncertainty in variables
• Chapter 2 presupposes failure probabilities, but
often times these are not absolute or fixed
• This chapter allows us to characterize variability
in parameters
Random Variables
Continuous (variable) Data
• Data measured on an
infinitely divisible scale or
continuum
• No gaps between possible
values
• Examples
•
•
•
•
•
Height, weight
Blood glycogen level
Joint contact force
Kinematic measures
Response time
(milliseconds)
Discrete (attribute) Data
• Discrete data measures
attributes, qualitative
conditions, counts
• Gaps between possible
values
• Examples
• Surgery or placebo
• Fusion or dynamic
• Number of surgeries per
week or per year
• Pain/satisfaction surveys
• Number of patients
Population vs. Sample
• Population - A total set of all process results
• Sample - A subset of a population
POPULATION
Sample
Sample Measures
Population Measures
N
number of data points
m
mean
s
standard deviation
n
_
x
s
Frequency Histograms
• Histogram visually represent data centering, variability, and
shape
• Histograms are a graphical tool used to depict the
frequency of numerical data by categories (classes or bins)
Histogram
16
Properties
Frequency
14
• All data will fall
into a class or
bin
• No data will
overlap
12
10
8
6
4
2
0
2
3
4
5
6
7
8
9
10
11
Number of Heads after 12 coin flips
12
Descriptors of Uncertainty
• Used to characterize measured data and
distributions
• Common descriptors
•
•
•
•
Mean
Standard deviation
Coefficient of variation
Skewness
Measures of Location
(Central Tendency )
• Mean – also known as average; sum of all values
divided by number of values
• Grand Average: overall average or average of the averages
Mean  m X 
n
1
n
x
i 1
i
• Median - midpoint of the data
• Arrange the data from lowest to highest, the median is the
middle data point number
• 50% of the data points will fall below the median and the
other 50% will fall above the median
• Mode - the most frequent data point, or value
occurring the most often
Example - Measures of Location
• Given the following data on salaries:
$50k, $30k, $170k, $45k, $30k, $55k, $40k
• Mean = (50+30+170+45+30+55+40)/7 = 420/7 = $60k
• Median (midpoint): 30,30,40, 45, 50,55,170
• Mode (most frequent)
Value
30 40 45 50 55 170
# data pts
20 30 40 50 60
Mode Median Mean
2
100
1
1
1
1
1
170
Measures of Dispersion
(Variation or Spread)
• Range - Total width of a distribution.
Range
• Range = Maximum Value - Minimum Value
• Variance (V) – Measure of the spread in data about
the mean
• Second central moment
n
Variance  Var ( x)  n11  xi  m x 2
i1
• Standard Deviation (S) - The most common measure
of dispersion
Std.Dev  s x  Var (x)
Standard Deviation
• Standard deviation is a measure of variation telling us
about consistency around the mean
20
30
X
High spread
High standard deviation
Poor performance
40
50
60
Low spread
Low standard deviation
Consistent performance
170
50
60
70
Other Descriptors
• Coefficient of variation (COV) - Relative indicator of
uncertainty in a variable
• Ratio of standard deviation and the mean
sx
COV   x 
mx
• Skewness – Measure of the spread of data about the
mean emphasizing the shape of the distribution
• Third central moment
n
Skewness  n1  xi  m x 3
i1
Skewness
• Skewness Coefficient (x) – Non dimensional
measure
skewness
x 
s x 3
• 0 = symmetric
• + skewness  Most values below the mean
• - skewness  Most values above the mean
x = +
x = -
f(x)
x
Probability Density Function
• PDF is the typical histogram or bell curve
fX(x) = Probability x is in a specific bin
0.2
f(x)
0.15
0.1
0.05
0
0
5
10
x
15
20
Cumulative Distribution Function
• CDF ranges from 0 to 1
F( x ) 
• Integral of the pdf f(x)
PDF
0.5
0.8
0.3
F(x)
f(x)
0.4
0.6
0.2
0.4
0.1
0.2
0
-6.0
-4.0
-2.0
0.0
x
2.0
4.0
 f (t)dt
CDF
1
m=0
s=1
x
6.0
0
-6.0
-4.0
-2.0
0.0
x
2.0
4.0
6.0
Cumulative Distribution Function
• The CDF gives the probability of a continuous random
variable having a value less than or equal to a specific
value
F( x ) 
x

f ( t )dt
• Relationship between pdf and cdf:
dFx ( x)
fx ( x) 
dx
• The CDF has the following values
• F(x  -∞) = 0
• F(x = mx) = 0.5
• F(x  +∞) =1
Creating Histograms or PDFs
• Arrange data in increasing order
• Create evenly spaced bins and count how many data
points occur in each bin
• Note: the # of bins can affect the appearance of the histogram
• Rule of thumb: k = 1 + 3.3 log10 n
where k = # of bins and n = number of data points
• Plot the number of observations versus the variable
• Note: for PDFs, plot frequency = (# of observations) / n
Creating CDFs
• Arrange data in increasing order
• For each datapoint
• Create an index i = 1, 2, 3,…, n
• Compute F(x) = i / (n+1)
• Plot F(x) as a function of the variable x
• Demo in Excel (come back to this)
Multiple Random Variables
• Joint distributions or joint PDF’s can be defined for
multiple variables
f X ,Y ( x, y )
f X |Y ( x | y )
Joint PDF in 3D represents how the variables are dependent
on each other
Covariance and Correlation
• Covariance indicates the degree of linear relationship
between two random variables, denoted as:
• Cov(X,Y) = E(XY) – E(X)*E(Y) where E( ) is the expected value
• Covariance is the second moment about the respective means
• Covariance = 0 for statistically independent events
• Correlation coefficient (non-dimensional) represents the
degree of linear dependence between two random variables
•
•
•
•
•
ρx,y = Cov(X,Y)/(σx * σy)
Correlation coefficient can range from -1 to +1 (Haldar p. 53)
ρ = 0 no correlation
ρ = +1 perfectly correlated / proportionate
ρ = -1 perfectly correlated / inversely proportionate
Project Demo
Assume the following parameters are normally distributed with a
coefficient of variation of 0.05: |rquad.knee|, rtubercle.x, rtubercle.y, rham.knee.y.
Demos…
• First: Excel for PDF, CDF, trials
• Second: Matlab demo with trials
• Third: NESSUS