Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pattern recognition wikipedia , lookup
Corecursion wikipedia , lookup
Regression analysis wikipedia , lookup
Inverse problem wikipedia , lookup
Hardware random number generator wikipedia , lookup
Molecular dynamics wikipedia , lookup
Generalized linear model wikipedia , lookup
Military simulation wikipedia , lookup
Quick & Simple Simulation in Excel with Clinical Trials Applications Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com Background • Occasional need for simulations • Excel is convenient, but – does not explicitly support simulations – Simulation usually requires VBA programming (so why not use R or SAS instead) – Or Add-in commercial programs (eg., @Risk) – Or some academic add-ins • Does have iterative calculations, Solver • Why not simulation? Simulate what? • Stochastic Models – Unknown parameters? Guestimate a distribution – Optimizing policy? Test each with simulations • Sensitivity Analysis – Variations in Inputs Variations in Outputs – 2 parameters: use a table – >2 parameters: simulate & compare variation Excel: Pros Common Language / Common Tools • Most people understand Excel MEGO • Many tools available in Excel Transparency: Modeling assumptions can be: Specified -- Graphed -- Debated What you see is what you get! More hands on deck, more eyes on the prize….: Statistician Team Member Initial Model Explores & breaks model Repair & enhance …Repeat until satisfied Excel Cons Slower than in SAS, S+, R, etc Lacks some statistical/probability functions • Latest versions are a little better • Still need to add some VBA code • Known bugs in statistical routines (often fixed) Tradeoffs: • Quicker modifications vs slower execution Simple Solution: Data Tables Excel Data Tables •Creates a table of values of a function – (ie, Random Variables) •Leftmost column is used as an argument – (which is ignored in a simulation) •Data Table repeats calculations for each row – (Each row is an iteration of the simulation) 1. Create Simulation Create Random Variables using Inverse Probability Method: For Random Variable X with distribution function F(x), F(x): → [0,1] If Random Uniform U X = F-1(U) (Excel: U=Rand() ) 2. Align Random Variables • Calculations can be anywhere in Spreadsheet • Reference the Variables in a row • Is best to label variables in same way 3. Select Data Table • Select table region – 1st row is Rand Vars – 1st column is not used (can label iterations) • From toolbar: – Data>Data Table 4. Create Simulation Table • Column input cell = Upper left hand corner of table • Row input cell = ignore • OK Populates the table • (may have to manually recalcule) 5. Execute Simulation Iterative development •Simulation can be changed •Add reporting variables •Recalculate to rerun – (no need to use Data Table again, unless expanding) •Hint: debug with short table, expand for final run The End (of the key concepts) But still more…. • Why use inverse probability distributions (instead of random variables)? • When not to use a spreadsheet for simulation? • Tools: – Macros to set up a simulation – VBA functions for common simulation distributions Inverse Probability Function • Most systems directly generate random variables with the desired distribution • Why use Inverse Probability Functions? – Which are (probably) slower? Personal opinion • Testing & Debugging • Verification Calculates correctly • Validation Calculations answer Problem • Sensitivity Input vs Output variability As Mapping function U Probability Distribution: Random Uniform: Inverse PDF: F-1 ⟼ F(x): → [0,1] U X = F-1(U) For Continuous (or monotone) F-1 Small changes in u∈U small changes in F-1 (u) Mapping 2 Random Uniform Var As input to Deterministic Function Mapping Random numbers in (should) Map to outputs in Example #1 Simple model, function of 2 RV A Max value looks high. Is it a bug? If not, how often? Saved random U[0,1] For each iteration Check u∈U[0,1] That generated high value u=0.983… random high Rarely happens Saving {Ui}: •Verify •Replicate •Quantify Example #1 (Sensitivity) Sort by U1, U2 Sensitive to U1 Insensitive to U2 Spreadsheet limitations • Only simple data structures are available – Rows & columns, no lists & trees – Discrete event simulations • Complex algorithms: difficult – Eg, While or for loops – Can improvise (cumbersome, slow, buggy) • Speed: slow • Data Storage: what-you-see-is-all-you-get Tools: Excel Simulation Template • Adds some missing random functions • Adds some set-up macros Macro SimulateSampler To start a new simulation when you don't remember the names & parameters of common random variables used in simulation: •Run the Macro SimulationSample •Copy, delete, and edit as needed. •Make sure all random values are referenced in the first row of the data table at the bottom. Macro SimulationSampler • Creates a simulation with each of common simulation functions Macro SimulationSampler ……… •Sets up header row for data table •Sets up a place for statistics Macro Simulate • Highlight the row of random variables – (1st row of simulation table) • Run macro "Simulate” – Prompts for which will ask for the number of simulation iterations, – The default number of iterations is 100 – Debug & develop (manually recalculate) – Final run with >1000 iterations – Visual Basic code is computationally intensive, Macro Simulate Note bene • Run Simulate right after SimulationSampler – Risk of “Ref!” error • SimTemplate,Plot,Sampler contains – The sampler – A distribution plot of all random variable • Crude, but handy for quick comparisons – Ready to edit SimTemplate,Plot,Sampler • Crude Distribution plot of ALL variables • Uses Percentile Ranks to save space • Good for Continuous Var. • Bad for Discrete Var. 1. Copy 2. Delete unwanted variables 3. Make it pretty Excel Random Variables Rand() --Random Uniform [0,1] NormSInv() – Inverse Standard Normal Distribution CriticalBinomial() – Inverse Binomial Distribution LogNormInv() - Inverse Log Normal Distribution Caveat: parameters are mean, SD after the Log transformation Erlang Distribution How long do you wait until you get a predetermined number of arrivals? •Interarrival times are distributed IID exponential •Erlang is Gamma with integer parameter Beta Distribution Can use as • Distribution of a Binomial probability • Range = [0,1] • Generic bounded hump (vs Normal as generic unbounded hump) Example#2, Problem Client: “Here’s our plan….” •Simple spreadsheet calculation – But only the expected value, – but not variability Example #2, Simulation • Time to 100th patient • Patients arrive IID Exponential Summary Statistics of Simulated values (below) Interpretation: under the assumptions, 90% of simulations required more than 4.4 months Added VBA Functions Inverse Functions Needed for Simulation •Poisson, Negative Binomial Interpolation from Table •Interpolate: 1 or 2 dimensional interpolation Convenience • Beta with Mean, SD as parameters • Beta with Hi, Low, and Mode used for parameters (often used for PERT/CPM charts) •Log Normal with mean, SD as parameters Missing Statistical Functions • InvPoisson :: Poisson Distribution • InvPascal :: Integer valued Negative Binomial – (how many failures before k successes) Negative Binomial is continuous valued distribution; discrete version is often denoted Pascal distribution Example#3, Patients to Screen Expected Enrollment rate = 75% ± 5% ~ Beta Distribution # Screen Failures ~ Negative Binomial (Pascal) – Depends on Enrollment Rate Beta Distribution (2) For Convenience •Beta distribution given Mean, SD •Beta distribution given Mean, SD, upper, lower bounds •Beta distribution given Mode, Upper, Lower bounds – Sometimes used for PERT/Critical Path Analysis • 3 estimates for tasks: Optimistic, Pessimistic, Most Likely • Beta distributed time for each task – Assumes SD = 1/6 of the interval [low, high] Simulation from a Table Simulate arbitrary distribution: •Top Row: values in [0,1] •Bottom Row: Quantiles •Result: interpolated value of U from table Or a function: y=f(x) •X is found in top row, y is interpolated from bottom row Table Simulation Uses •Polygonal distributions (like Triangular) •Survival curve (for time to event) –Est. K-M curve from data, simulate rest of trial •Arbitrary empirical distributions •Distribution from observations Simulation from a 2-dimensional table Here: • Rows are quartiles of a random function • Left column is value of a parameter • A family of distributions which vary with the parameter • Parameter y=75% (can be random) • Generate random numbers from the interpolated distribution. Example #4: Interim Review • After 2 months, review randomization rates • Continue to Randomize to 100 patients • How long? Example#4: Interim Review (Simulation) Y= # Patients at 2 mos ~ Poisson Time to Randomize (100-Y) additional pts ~ Erlang (Gamma) 80% CI:; (2.5, 3.7) months Clinical Trials Applications • Simulations for planning • Prototyping larger simulation • Checking assumptions/validation Why Simulate? Expected Trial Performance • Usually not of interest -- already done w/o simulation Variability of Trial Performance • Important for Risk Management: “What’s the earliest, the latest, the most, the least, etc” • 80% CIs Structural Problems • Interactions of parameters may doom the trial before it even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } ) Prototyping Prototyping: • Toy simulation with hands-on teamwork • Development model • Get team buy-in on assumptions • Processing speed not important • Rapid modifications are important Ideal? • Develop a prototype in an 1 hour meeting • Check for errors later • Run large simulations later for precise estimates Checking planning assumptions • H0 = Simulation assumptions • Observed: a value X • {xi} = corresponding values in simulation • Rank of X in {xi} ≈ p-value Stored Values: Use Function Percent Rank Descriptive Statistics: Use Frequency Count Use to: • Test assumptions, validate model, +?? • If an observed value of X is rare in the simulation, question assumptions! Checking Assumptions (2) Example: • A trial is designed based on a non-trivial simulation. • The model predicts a completion rate of 65% with 95% C.I.= (55%, 75%) • 4 months into the trial, a 50% completion rate is observed. • How significant is this discrepancy? Resimulate: • {xi} = simulated completion rates (1/iteration) • Rank of observed 50% in {xi} ≈ p-value • “How likely is the observation, under the modeled assumptions?” Example #5: Simulating a 30 patient trial • Each patient is a random variable • Survival times are interpolated • Estimated survival curves have confidence intervals •All 30 patients in an iteration use the same random conf.level •Conf. Level is updated each iteration Example #5: Testing Assumptions Statistics on the patients are the simulation random variables • Assume the trial was carried out • 70% of patients complete Q: is this consistent with the simulations? A: Yes, but… Only 6.1% of simulations had >70% completion Macro Management VBA Editor: Alt-F11 (or find the menu) • Some versions of Excel • Copy Module between sheets • Copy code from .xls sheet & insert into VBA editor • Open & save as new sheet Macro Management (newer) Later versions: In Visual Basic From the Tool Bar •File > Export File – Export VBA code (module: “SweitzerSimulationCoreCode”) •File > Import File – Imports VBA code (into a module) Further resources Commercial and Free software packages Provide: •More rigorous algorithms •More functions – Resampling, multivariate, etc •More support Commercial Add-Ins @RISK www.palisade.com Crystal Ball www.decisioneering.com Free Add-Ins PopTools www.cse.csiro.au/poptools SimTools.xla http://home.uchicago.edu/~rmyerson/addins.htm Caveat: Licensing •Free for non-commercial (eg, education) •Not clear for other uses Semi-Commercial Low-cost Excel simulation add-in: •RiskSim by Michael Middleton www.treeplan.com/ –Currently 1 output var with lots of graphs – also: decision trees, sensitivity analysis – on-line text-book: http://www.treeplan.com/chapters.htm Additional Reading INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION www.informs-sim.org/wsc07papers/008.pdf Spreadsheet Simulation (Seila, 2006) www.informs-sim.org/wsc06papers/002.pdf Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments www.informs-sim.org/wsc06papers/005.pdf Tips for the Successful Practice of Simulation www.informs-sim.org/wsc06papers/007.pdf The End (Actual – not simulated)