Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty Anthony J Petrella, PhD Bioengineering Big Picture • Why study traditional probability (Ch. 2)? • Unions and intersections allow us to conceptualize system level variability with multiple sources • Chapter 3 deals with uncertainty in variables • Chapter 2 presupposes failure probabilities, but often times these are not absolute or fixed • This chapter allows us to characterize variability in parameters Random Variables Continuous (variable) Data • Data measured on an infinitely divisible scale or continuum • No gaps between possible values • Examples • • • • • Height, weight Blood glycogen level Joint contact force Kinematic measures Response time (milliseconds) Discrete (attribute) Data • Discrete data measures attributes, qualitative conditions, counts • Gaps between possible values • Examples • Surgery or placebo • Fusion or dynamic • Number of surgeries per week or per year • Pain/satisfaction surveys • Number of patients Population vs. Sample • Population - A total set of all process results • Sample - A subset of a population POPULATION Sample Sample Measures Population Measures N number of data points m mean s standard deviation n _ x s Frequency Histograms • Histogram visually represent data centering, variability, and shape • Histograms are a graphical tool used to depict the frequency of numerical data by categories (classes or bins) Histogram 16 Properties Frequency 14 • All data will fall into a class or bin • No data will overlap 12 10 8 6 4 2 0 2 3 4 5 6 7 8 9 10 11 Number of Heads after 12 coin flips 12 Descriptors of Uncertainty • Used to characterize measured data and distributions • Common descriptors • • • • Mean Standard deviation Coefficient of variation Skewness Measures of Location (Central Tendency ) • Mean – also known as average; sum of all values divided by number of values • Grand Average: overall average or average of the averages Mean m X n 1 n x i 1 i • Median - midpoint of the data • Arrange the data from lowest to highest, the median is the middle data point number • 50% of the data points will fall below the median and the other 50% will fall above the median • Mode - the most frequent data point, or value occurring the most often Example - Measures of Location • Given the following data on salaries: $50k, $30k, $170k, $45k, $30k, $55k, $40k • Mean = (50+30+170+45+30+55+40)/7 = 420/7 = $60k • Median (midpoint): 30,30,40, 45, 50,55,170 • Mode (most frequent) Value 30 40 45 50 55 170 # data pts 20 30 40 50 60 Mode Median Mean 2 100 1 1 1 1 1 170 Measures of Dispersion (Variation or Spread) • Range - Total width of a distribution. Range • Range = Maximum Value - Minimum Value • Variance (V) – Measure of the spread in data about the mean • Second central moment n Variance Var ( x) n11 xi m x 2 i1 • Standard Deviation (S) - The most common measure of dispersion Std.Dev s x Var (x) Standard Deviation • Standard deviation is a measure of variation telling us about consistency around the mean 20 30 X High spread High standard deviation Poor performance 40 50 60 Low spread Low standard deviation Consistent performance 170 50 60 70 Other Descriptors • Coefficient of variation (COV) - Relative indicator of uncertainty in a variable • Ratio of standard deviation and the mean sx COV x mx • Skewness – Measure of the spread of data about the mean emphasizing the shape of the distribution • Third central moment n Skewness n1 xi m x 3 i1 Skewness • Skewness Coefficient (x) – Non dimensional measure skewness x s x 3 • 0 = symmetric • + skewness Most values below the mean • - skewness Most values above the mean x = + x = - f(x) x Probability Density Function • PDF is the typical histogram or bell curve fX(x) = Probability x is in a specific bin 0.2 f(x) 0.15 0.1 0.05 0 0 5 10 x 15 20 Cumulative Distribution Function • CDF ranges from 0 to 1 F( x ) • Integral of the pdf f(x) PDF 0.5 0.8 0.3 F(x) f(x) 0.4 0.6 0.2 0.4 0.1 0.2 0 -6.0 -4.0 -2.0 0.0 x 2.0 4.0 f (t)dt CDF 1 m=0 s=1 x 6.0 0 -6.0 -4.0 -2.0 0.0 x 2.0 4.0 6.0 Cumulative Distribution Function • The CDF gives the probability of a continuous random variable having a value less than or equal to a specific value F( x ) x f ( t )dt • Relationship between pdf and cdf: dFx ( x) fx ( x) dx • The CDF has the following values • F(x -∞) = 0 • F(x = mx) = 0.5 • F(x +∞) =1 Creating Histograms or PDFs • Arrange data in increasing order • Create evenly spaced bins and count how many data points occur in each bin • Note: the # of bins can affect the appearance of the histogram • Rule of thumb: k = 1 + 3.3 log10 n where k = # of bins and n = number of data points • Plot the number of observations versus the variable • Note: for PDFs, plot frequency = (# of observations) / n Creating CDFs • Arrange data in increasing order • For each datapoint • Create an index i = 1, 2, 3,…, n • Compute F(x) = i / (n+1) • Plot F(x) as a function of the variable x • Demo in Excel (come back to this) Multiple Random Variables • Joint distributions or joint PDF’s can be defined for multiple variables f X ,Y ( x, y ) f X |Y ( x | y ) Joint PDF in 3D represents how the variables are dependent on each other Covariance and Correlation • Covariance indicates the degree of linear relationship between two random variables, denoted as: • Cov(X,Y) = E(XY) – E(X)*E(Y) where E( ) is the expected value • Covariance is the second moment about the respective means • Covariance = 0 for statistically independent events • Correlation coefficient (non-dimensional) represents the degree of linear dependence between two random variables • • • • • ρx,y = Cov(X,Y)/(σx * σy) Correlation coefficient can range from -1 to +1 (Haldar p. 53) ρ = 0 no correlation ρ = +1 perfectly correlated / proportionate ρ = -1 perfectly correlated / inversely proportionate Project Demo Assume the following parameters are normally distributed with a coefficient of variation of 0.05: |rquad.knee|, rtubercle.x, rtubercle.y, rham.knee.y. Demos… • First: Excel for PDF, CDF, trials • Second: Matlab demo with trials • Third: NESSUS