Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Uncertainty Assessment The quantified description of the uncertainty concerning situations and outcomes Presentation Objectives: 1. Concepts – The problem and issues – Means of assessment 2. Analytic Procedures – Regression analysis – Bayes Theorem and Likelihood Ratios Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 1 of 35 Part 1 Concepts The problem and issues Means of assessment Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 2 of 35 The Problem New Design Paradigm requires us to determine Distribution of Uncertainties How can we do this? Issues: 1. Procedure depends on data available – No one way 2. Systematic Biases exist throughout – Need to be aware and careful Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 3 of 35 Methods of Assessment -- Types Logic – Example: Probability (Queen) in a deck of cards – Primary Content of Intro. Probability Subjects – Not really useful to assess System Uncertainty Frequency Statistical Methods Judgment Engineering Systems Analysis for Design Massachusetts Institute of Technology More on each coming up Richard de Neufville © Uncertainty assessment Slide 4 of 35 Frequency Method Analysis Approach – Assemble available data – Find frequency of occurrence of event of interest Example 1: Probability of Rainfall in a location – Use historical data from rain gauges Example 2: P (failure of major dams) ~ 0.00001/dam/year – Source: Baecher et al, data through 1960 Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 5 of 35 Frequency Method -- Issues Assumes process is “stationary”, that is – Future will be like past ; no change in mechanism – Is this always true? – No! Global warming may change rainfall pattern Assumes data representative – That it represents item of interest – When might this not be true? – Technology of construction changes… Different types of dams, etc. Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 6 of 35 Statistical Models Analysis Approach – Assemble Data, Choose a set to analyze – Posit a set of variables (X = price, etc) of interest – Posit form of variables (Price, ΔPrice, Relative P…) – Posit form of equations linking them, f(X) – Do statistical analysis (example: least squares) Example: Future Demand = f(price, income, etc.) + error – Obtain: D = a0 + a1Pb + a2Pc +… Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 7 of 35 Statistical Models – General Issues All the issues associated with Frequency Method – Statistical Analysis is a more sophisticated version of Frequency Method Specifically, assumes that Process is stationary Data are representative Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 8 of 35 Statistical Models – Data Issues Looks precise and technical, BUT GIGO -- “Garbage in, Garbage out” – Results depend closely on data set chosen (see slides in regression analysis later on0 – Consider history of oil prices: which period should we choose? Everything available? Since OPEC 1, 1974? Last 20 years? Last 10 years? Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 9 of 35 History of Oil Prices Average Annual Crude Oil Price $120.00 Price (USD) $100.00 $80.00 $60.00 $40.00 $20.00 19 46 19 50 19 54 19 58 19 62 19 66 19 70 19 74 19 78 19 82 19 86 19 90 19 94 19 98 20 02 20 06 $0.00 Years Nominal Inflation Adjusted 2007 Source: http://www.inflationdata.com/inflation/Inflation_Rate/Historical_Oil_Prices_Table.asp Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 10 of 35 Statistical Models – Data Issues Conceptual “house of cards” Consider Assumptions – Posit a set of variables – Which ones? Note: adding any factor increases statistical fit – Posit form of variables (Price, ΔPrice, etc.) – Which form? When you choose travel, what influences you: total, relative or differential cost? – Posit form of equations linking them, f(X) – Which form? Linear, exponential, log… Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 11 of 35 Statistical Models – Time Issues Special issue when model seeks to determine “rate of change over time” Y = a ert r = rate per period, t Data is a “time series”, as for oil price Issue: Any variable that grows or decreases reasonably also fits the data well – Example: Los Angeles Air Travel correlates well with “prisoners in Oregon”, “divorces in France” – Another possible form of GIGO Easy to get good correlation Causality Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 12 of 35 Expert Judgment Method • Also known as “Subjective Probability” • Analysis Method – Identify “experts” – Ask them questions such as “What will be performance of a new technology? Several variations, concerning how group of experts interact, come to consensus or not… – “Delphi” method, named after mythical oracle Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 13 of 35 Expert Judgment -- Issues • Who are the experts? – May be easy to identify who is not expert – BUT, knowledgeable insiders may have “trained incapacity” – be blind to new factors Overconfidence – Distribution typically much broader than we imagine (remember class experience) Insensitivity to New Information – Information typically should cause us to change opinions more than it does Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 14 of 35 Summary of Part 1 Available methods each have difficulties. These are conceptual – will not be eliminated by better math or technique! Such issues contribute to the fact that “forecasts are ‘always’ wrong We have to be properly modest about how far analysis will take us. Do the analysis, but appreciate weaknesses Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 15 of 35 Part 2 Analytic Procedures Regression analysis Bayes’ Theorem … and Likelihood Ratios Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 16 of 35 Regression Analysis -- Concept Object: “best fit” of data to function Y = f(X) Y often called the “dependent” variable Xs then called the “independent” variables Xs then “explain” the variation in Y Think of Y as deformation of spring under load, X, … – Changes in X do not account for all deformation observed – other factors and errors Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 17 of 35 Regression Analysis – Set up Object: “best fit” of data to function Y = f(X) Assumptions about data – Each measurement of Independent variables (ex: Weights, Xi ) are correct – Dependent variables (ex: Deflection, Yi ) have ‘errors’ Model then is: Deflection = a (Weight) – And we recognize that measured values Ymi will differ from predicted values Yi will by an ‘error’ ei – Ymi = Yi + ei = a Xi + ei Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 18 of 35 Example data Measured Deflection under load 60 50 mm 40 30 20 10 0 0 2 4 6 8 10 12 Kilos Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 19 of 35 Regression Analysis – Math Object: find model that gives “best fit” to data – That is, model that minimizes total “error” Total Error ≡ Sum of squared errors = ∑ (ei )2 Why does this make sense? Because this gives a Gaussian distribution, that is, bell-shaped or random Solution Concept: – Optimization criteria: δ(error) / δ(parameter) = 0 – Solve for each parameter of model (ex: “a”) Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 20 of 35 Regression Analysis – Graph Model of Deflection under Load 60 50 mm 40 30 20 10 0 0 2 4 6 8 10 12 Kilos From Excel using LINEST Formula Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 21 of 35 Regression Analysis - Practice A pause for Demonstration of Excel analysis you can use Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 22 of 35 Note: Lab work versus Real World Object: “best fit” of data to function Y = f(X) If, as in lab, you control all factors and vary a few, then you can be clear on independent and dependent variables and show causality Example: weights on spring cause deflection Thus you can write equation: Deflection = a(Weight) + measurement errors Is this true in “real world”, outside lab Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 23 of 35 Real World It is known that there is a close correlation, a good statistical fit, between number of firemen at a fire and amount of fire damage: Fire damage = f (number of fireman) What does it say about effect of firemen? Nothing much! In this case, correlation is “spurious” : Size of fire => Damage and Number of Firemen thus both also related Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 24 of 35 Consider Effect of Different Periods The Data are all for the price of Google stock (GOOG) starting in August 2004 First, the results for 3 years ending Sept 2007 Next, 3 years ending Sept 2008 DIFFERENT! How about 4 years 2004-08 YET ANOTHER! Results depend on set of data !!! Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 25 of 35 Best Fit GOOG 04 - 07 Best Fit GOOG 04 - 07 700.00 600.00 Price ($) 500.00 400.00 300.00 200.00 100.00 Original Data Best Fit r = 3.54 % /month 0.00 0 5 10 15 20 25 30 35 40 Time (months starting 08/2004) Original Data Engineering Systems Analysis for Design Massachusetts Institute of Technology Best Fit Richard de Neufville © Uncertainty assessment Slide 26 of 35 Best Fit GOOG 05 - 08 Best Fit GOOG Prices 05-08 800.00 700.00 Price ($) 600.00 500.00 400.00 Best Fit r = 1.43 % month 300.00 200.00 Original Data 100.00 0.00 13 18 23 28 33 38 43 48 53 Time (months starting 09/2005) Original Data Engineering Systems Analysis for Design Massachusetts Institute of Technology Best Fit Richard de Neufville © Uncertainty assessment Slide 27 of 35 Best Fits GOOG Best Fits GOOG 800.00 Best Fit 04 – 07 r = 3.54% / month 700.00 Price ($) 600.00 500.00 400.00 300.00 200.00 100.00 Original Data Best Fit 04 -08 R = 2.79% / month 0.00 0 10 20 30 40 50 Time (months starting 08/2004) Original Data Engineering Systems Analysis for Design Massachusetts Institute of Technology Months 0-48 Months 0-36 Richard de Neufville © Uncertainty assessment Slide 28 of 35 Revision of Estimates – Bayes Theorem Definitions – P(E) Prior Probability of Event E – P(E/O) Posterior P(E), after observation O is made. This is the goal of the analysis. – P(O/E) Conditional probability that O is associated with E – P(O) Probability of Event (Observation) O Theorem: P(E/O) = P(E) {P(O/E) / P(O) } Note: Importance of revision depends on: – rarity of observation O – extremes of P(O/E) Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 29 of 35 Application of Bayes Theorem At a certain educational establishment: P(students) = 2/3 P(fem/students) = 1/4 P(staff) = 1/3 P(fem/staff) = 1/2 What is the probability that a woman on campus is a student? {i.e., what is P(student/fem)?} P(fem/student) P(student/fem) = P(student) P(fem) Thus: P(student/fem) = 2/3 {(1/4) / 1/3)} = 1/2 Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 30 of 35 Likelihood Ratios, LR Definitions P(E ) LR P(E) = P(E does not occur) = > P(E) + P(E ) = 1.0 _ = P(E)/P(E ); therefore = LR / (1 + LR) LRi = LR after i observations Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 31 of 35 Likelihood Ratios (2) Formulas LR1 = P(E) {P(Oj/E) / P(Oj)} P(E ) {P(Oj/E ) / P(Oj)} after a single observation Oj CLRi LRN = LR0 = P(Oj/E) / P(Oj/E ) the conditional likelihood ratio for Oj Nj (CLR ) j j Nj = number of observations of type Oj Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 32 of 35 Application of LR (1) Bottle-making machines can be either OK or defective P(D) = 0.1 The frequency of cracked bottles depends upon the state of the machine P(C/D) = 0.2 P(C/OK) = 0.05 Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 33 of 35 Application of LR (2) Picking up 5 bottles at random from a machine, we find {2 cracked, 3 uncracked} What is the Prob (machine defective) LRO = P(D) / P(OK) = 0.1/0.9 = 1/9 CLRC = 0.2/0.05 = 4 CLRuc = 0.8/0.95 = 16/19 LR5 = (1/9) (4)2 (16/19)3 = 1.06 P(D/{2C, 3UC}) = 0.52 = 1.06/(1 + 1.06) Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 34 of 35 Take-aways from today Many ways to try to estimate uncertainties None are clear winners, each has its own use Statistical analysis may be best, a good way to make sense out of available data HOWEVER, each method has its difficulties Statistical analysis concept “house of cards” Above all: Correlation does not prove Cause Bayes Theorem – More on this later Engineering Systems Analysis for Design Massachusetts Institute of Technology Richard de Neufville © Uncertainty assessment Slide 35 of 35