Download Lecture

Document related concepts

German tank problem wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Management Science
(G12 2001/02)
Part I (Michaelmas): Uncertainty
Part II (Lent): Choice
Today’s plan
What is Management Science and
what’s the purpose of this course?
• Spreadsheets as a modelling platform
• Introduction to Monte Carlo Simulation
• How can a computer roll a die?
Management Science
• … uses computer technology to help
managers make good decisions and optimise
organisational processes
• Operations Research and Decision Science
are other names
• Professional bodies
– International: Institute for Operations
Research and the Management Sciences
(INFORMS) (www.informs.org)
– UK: Operational Research Society
(www.orsoc.org.uk)
Core issues of Management
Science
• Problem areas
– decision making
– Organisational process design and control
• Core issues
– complexity
– uncertainty
– choice (optimality)
• Methodology:
– Modelling (graphical, computer, or
mathematical models)
Some important side issues
• Models as communication devices
– Presentation of quantitative information
• Quantitative (“hard”) versus qualitative
(“soft”) modelling
– The “human factor”
• Management information systems
– Availability and reliability of data
Aim of the course
• Make you aware of the potential (and
limitations) of the Management Science
approach
• Provide you with some hands-on
experience in modelling (coursework)
• Discuss theoretical underpinning of some
important modelling templates (exam)
Michaelmas term: Uncertainty
• Computer models of uncertainty
– Monte Carlo simulation
• Mathematical models of uncertainty
– Probability theory and stochastic processes
• Modelling templates
– Queuing systems
– Markov chains
– Others as time permits
• Analysing dependencies between
uncertainties
– Linear regression
• Forecasting the future
Supervisions and coursework
• Supervision 1: Simulation in
spreadsheets
• Supervision 2: Stochastic processes
• Supervision 3: Regression and
forecasting
• Coursework: Analysis of a business
case using spreadsheet simulations
(more later…)
Today’s plan
• What is Management Science and
what’s the purpose of this course?
Spreadsheets as a modelling platform
• Introduction to Monte Carlo Simulation
• How can a computer roll a die?
Why spreadsheets?
• A spreadsheet is a tool that allows you to
–
–
–
–
store and present quantitative information
process quantitative information
perform what-if analyses
do much more…
• Spreadsheets have many disadvantages
–
–
–
–
Limited data structure (2-dimensional array)
Difficult to validate and document
Inflexible
Unreliable numerical routines
The Cons
And the Pros
Background Information
• I assume that you are familiar with basic
spreadsheet programming
• If not, go through a free tutorial on the web, e.g.
– http://www.compusmart.ab.ca/alummis/excel/exceltutorial.h
tml
– http://www.usd.edu/trio/tut/excel/
– http://www.jcu.edu/infoservice/training/excel/start.htm
• More advanced material can be found in B.V.
Liengme, A guide to Microsoft Excel for scientists
and engineers (CUED Lib.)
A tip for spreadsheet
modelling
• Clearly separate
–
–
–
–
Data (input to the model that is not under your control)
Design parameters (input that is under your control)
The actual model (logical description of the model)
Model output ( basis for decision, often includes graphical
elements)
• For larger models use separate worksheets
• Ideally, no cell in the logical model section
contains a number
– These cells only contain formulas and references to other
cells, e.g. in the data or parameter section
The five stages of computer
modelling (Donald Knuth)
1. Decide what you want the model to do
2. Decide how to build the model
3. Build the model
4. Debug the model
5. Trash stages 1 to 4 and start again,
now that you know what you really
wanted in the first place
Don’t get frustrated: A modelling
process is a learning process
The main benefit of building a (computer)
model to analyse a problem is not the
quantitative information obtained as
output of the model but the enhanced
understanding of the problem gained
during the modelling process
Today’s plan
• What is Management Science and
what’s the purpose of this course?
• Spreadsheets as a modelling platform
Introduction to Monte Carlo Simulation
• How can a computer roll a die?
Example: A product launch
• Main criterion: profitability
Profit = sales*(unit price- unit costs)-fixed costs
• Suppose fixed costs known
• Price is a decision variable and influences
sales
– Pricing decision depends, among other things, on
the level of competition and the reaction of the
competitors to the launch (uncertain)
• Unit cost is uncertain and depends on prices
for raw materials, energy, etc.
• Let’s look at a spreadsheet model…
The Flaw of the Average
• Plugging average values into uncertain
cells can lead you astray
• The resulting bottom line (e.g. profit) is
often not the average profit
• Mathematical Reason: E(f(X))=f(E(X))
for a random variable X holds ONLY if f
is linear
What do we want to achieve
with a simulation model?
• A single number, e.g. an average, gives very
little information if model input is uncertain
• Manual what-if analysis is cumbersome and
biased
• We want to estimate the distribution of output
cells
– Give a graphical representation of this distribution
• cumulative distribution function
• histogram
Example: Value at risk
• Given a distribution function for profit,
we can read off the loss x such that
there is an α% chance that the loss is
at least £ x
• That number is called the α% value at
risk (α% VAR)
Profit Distribution
100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
-£1,500,000 -£1,000,000 -£500,000
£0
£500,000
£1,000,000 £1,500,000 £2,000,000 £2,500,000
10% VAR is roughly £500,000
5% VAR is roughly £800,000
Main steps of a simulation
project
• Understand the problem
• Programme and validate a deterministic
model
• Determine the distribution of the uncertain
inputs
– Collect relevant data
• Run the simulation experiments
• Analyse the model output
• Communicate the model and its output
Building the model…
• Essential questions before you model:
– What are the questions that the model is to address?
– What are the interesting outputs / performance measures?
– What is the appropriate level of detail?
• Discuss these questions with all stakeholders
in the decision situation / process
• Validate the logic of your model before you
enter uncertainty
Incorporating uncertainty
• Estimate a probability distribution on the basis of
“hard” data whenever possible
– Re-sampling from historic data (see product launch example)
is a simple and valid way of generating numbers for uncertain
cells
• Sometimes you need subjective probabilities
– Get estimates from many independent experts (Delphi
method)
– Check sensitivity of outputs w.r.t. changing probabilities
• A triangular distribution is often a good starting
point
– Defined by lowest, highest and most likely value
Running a simulation
• Important Rule: do more replications
than you expect necessary
– at least several hundred
• Check that running averages of outputs
have settled in a steady state
– E.g. record after each replication the
average profit over all past replications
• Do several runs and compare the
results
Analysing the output
• Use visual aids (histograms, distribution
functions, scatter diagrams, etc.)
• Use statistics (confidence intervals,
hypothesis tests)
• What are the implications of the results
– for your model world?
– for the real world?
Communicating the model and
the results
• IMPORTANT: Communicate regularly
with all stakeholders in the decision
situation or process you are modelling
– Build credibility for your model
– Model becomes a “language”
– Forces you to be as simple as possible
– Forces you to be as relevant as possible
A word about simulation
platforms
• Spreadsheets are useful for the simulation of
many day-to-day decision problems
• They are NOT suitable for the simulation of
complex processes, not least because they
are difficult to validate
• Professional platforms are available that
facilitate the programming and validation of
complex models, e.g. through graphical
interfaces
• Back to spreadsheets…
Preparing a spreadsheet model for
simulation
• Write a model as if all inputs (data) were certain
• Mark clearly all uncertain input cells (colour
them)
• Feed the input cells with appropriate randomly
chosen numbers
– The F9 key (recalculation) now produces one scenario after
the other
• Set up a worksheet for replications of your
model, using the data table command
– More specifics can be found in Ragsdale, chapter 12.4-12.8
Feeding uncertain cells
• Suppose cell x is known to be uniformly
distributed on the interval [0,1]
– Put “=rand()” into cell x
– Pressing F9 is equivalent to sampling from a
uniform distribution and putting the number into
cell x
• randbetween(a,b) samples integers between
a and b, including the integers a and b, with
equal probability 1/(b-a+1)
• Analysis Tool Pack needs to be loaded for
this to work (Tools –> add-ins)
More general distributions
• Will see later how to generate more general
distributions, using the rand() function
(inverse transform method)
• Example: norminv(rand(),a,b) samples from a
normal distribution with mean a and standard
deviation b
• Alternative random variable generators are
provided with the add-ins in the books by
Ragsdale and Savage
Today’s plan
• What is Management Science and
what’s the purpose of this course?
• Spreadsheets as a modelling platform
• Introduction to Monte Carlo Simulation
How can a computer roll a die?
Issues to be addressed
• How can a computer roll a die?
• How can we use past data?
• What if random cells are statistically
dependent (e.g. annual demands over
the next five years, stock prices of BMW
and Daimler-Chrysler)
Random Number Generation:
Can a Computer Roll a Die?
• Computers can only perform arithmetic
operations which by their very nature give
deterministic and not random results.
• There is no such thing as a true random
number generator on a digital computer.
Random numbers generated by a computer
are therefore sometimes called PSEUDORANDOM NUMBERS.
John von Neumann (1951)
Any one who considers arithmetical methods
of producing random digits is, of course, in a
state of sin. For...there is no such thing as a
random number - there are only methods to
produce random numbers, and a strict
arithmetic procedure of course is not such a
method.... We are dealing here with mere
‘‘cooking recipes’’ for making digits...
BUT…
John von Neumann goes on by saying that
these recipes
...probably...can not be justified, but should merely
be judged by their results. Some statistical study of
the digits generated by a given recipe should be
made, but exhaustive tests are impractical. If the
digits work well on one problem, they seem usually to
be successful with others of the same type.
What can we hope for?
• An arithmetic method (‘recipe’) that generates
a sequence of numbers which appear as if
they were randomly chosen in the sense that
they pass certain statistical tests
• Best understood and widely used are linear
congruential RNGs possibly enhanced by a
“shuffling technique”
Linear Congruential Methods
- Choose initial num ber r0 (Seed)
- Recursion form ula: ri  ari 1  c (mod m)
In words: divide ari 1  c by m and let ri be the
rem ainderof this division
- Need four integer parametersfor the method : m, a, c, r0
modulus m  0, mutiplier 0 < a < m,
increment0  c < m and seed 0 < r0 < m
Random number streams
• A linear congruential method produces a
sequence of numbers r0 ,r1 ,r2 ,...
• All numbers ri lie between 0 and m-1.
• The conversion formula ui=(ri+0.5)/m gives a
sequence of numbers u1 , u2,... which lie
between 0 and 1.
• We call the finite sequence u0 ,u1 ,u2 ,..., un a
random number stream generated by a linear
congruential RNG
Good Versus Bad Random
Number Streams
• A random number stream u1,u2,u3,...,un
should resemble a sequence of n
independent samples from a uniform
distribution on the interval [0,1].
• Whether a linear congruential RNG has this
property depends a lot on the choice of the
parameters a,c,m.
Example
• What is the cycle length of the linear congruential
generator with modulus 15, multiplier 4 and
increment 0?
• Try out various seeds until all numbers between
1 and 14 have shown up:
• 1,4,1…
• 2,8,2…
• 3,12,3…
• 5,5…
• etc.
• Why is this a bad RNG?
Cycle Length
• Want the sequence of random numbers to successively
fill the whole interval [0,1] without leaving large gaps.
• Since all numbers ri lie between 0 and m-1, the
sequence r0 ,r1 ,r2 ,.... will repeat after at most m -1
iterations
– It may, however, start repeating much earlier
• The cycle length (or period) of a linear congruential
RNG is the minimal length n of a sequence r0 ,r1 ,...,rn
with rn = r0
• The RNG is said to have maximal cycle length if its
cycle length is m-1
Built-in RNGs
• Be suspicious of built-in RNGs on your computer
– You can assume that it has maximal cycle length but that
does not guarantee good statistical properties
• If possible, find out which generator is used and
whether it has been tested in the literature
• If this information is not available (quite likely), you
should at least perform some statistical tests before
you use it for simulations
Some Popular RNGs
• Most RNGs are purely multiplicative(c=0)
– m=231-1, a=75, c=0 (Learmouth and Lewis 1973)
– m=231-1, a=630,360,016, c=0 (Payne et al. 1969)
• A comparison of various multipliers for the
modulus m=231-1 has been done in a series of
papers by Fishman and Moore
(1981,1982,1986).
– They found that the statistical performance of the Payne
et al. RNG is better than that of the Learmouth Lewis
RNG
Statistical Testing of RNGs
Generate a random number stream
u1,...,un and use statistical tests to see
how closely the stream resembles a
sample of size n drawn from a uniform
distribution on the interval [0,1].
Setting up the test
• We want to check how good a sequence
produced by an RNG ‘fits’ the uniform
distribution
• Divide the interval [0,1] into k subintervals
of equal size. (Typically 100<k<n/5)
• Determine the number fi of values in the
random number stream that fall in the i-th
subinterval
What do we expect?
• If the ui’s are drawn from a uniform distribution over
[0,1] then we expect that fi is approximately n/k
• Chi-square Statistic
k k
n 2
   ( fi  )
n i 1
k
2
• Mathematical result: If the frequencies fi are
obtained from a uniform distribution (and n>5k) then
the distribution of the random variable  is close to
a chi-square distribution with k-1 degrees of freedom
The  goodness of fit test
• Suppose RNG is uniform (hypothesis) and let
x be the observed value of the test statistic 
• The hypothesis is (statistically) inconsistent
with the observation if it is unlikely that the
test statistic  assumes a value as large as x
– Reject hypothesis if P(   x) is small, e.g. below
5%
• P(   x) is called the p-value of the test
– Reject if p-value is below the significance level
(5%)
Higher dimensional goodness
of fit tests
• d-dimensional vectors (u1 ,...,ud ), (ud+1 ,...,u2d ), ... should
be uniformly distributed in the d-dimensional cube [0,1]d
• A division of [0,1] into k subintervals of equal size gives a
division of the d-cube [0,1]d into dk subcubes of equal
volume
• Generate n d-vectors U1 ,...,Un (each requiring the
generation of d random numbers) and let fi1...id be the
number of vectors having their j th component in the ij th
subinterval
• Chi-squared test can be appropriately modified
Shuffling
• One can enhance the performance of an RNG by
using another RNG to shuffle a set of random
numbers generated by the first one.
• Start with an array U=(u1 ,...,un ) of n random
numbers generated by RNG 1. Then repeat the
following procedure:
– Use RNG 2 to generate a random integer M
between 1 and n.
– Select uM as your next random number and
replace uM in the array U by a new random
number generated by RNG 1
Further Tests
• Want consecutive draws to be independent
• Test of independence can be done
– by estimating lag correlation
– by examining the random number stream for
unbroken sub-sequences of maximal length within
which the random numbers increase
monotonically (run-ups or run-downs).
Standard reference: D.E. Knuth “The Art of Computer
Programming”, Volume 2.
Serial correlation
• Main disadvantage of congruential RNGs is that
they are not free of serial correlation
– Example: Steel manufacturer simulates sales
• Main uncertainty drivers aare time between orders
and order quantity
• Uncertain quantities are generated in pairs (time to
next order, order quantity) during the simulation
• There should be no correlation between the two
variables
• More sophisticated RNGs avoid this but they are
computationally more expensive
– There is generally a trade-off between statistical
performance and computing time
Where are we now?
• You know that RNGs do not really generate
random numbers but rather pseudo-random
numbers
• A lot of software has a built-in uniform RNG
– You have seen a popular “recipe” for uniform
RNGs
– You have seen how uniform RNGs can be
tested statistically
• How can we generate other random variables
with a uniform RNG?
The Inverse Transform
Method
(discrete RVs)
• Given a stream of uniform random numbers ui
on [0,1], generate a stream of realizations xi
of a discrete RV X with probability mass
function pi=P(X=i) for i=0,1,2,...
• If U is a uniform RV then
P(p1+...+ps-1 U p1+...+ps-1+ps) = ps
• Hence, given ui we set xi=i if
p1+...+ps-1  ui p1+...+ps-1+ps
The Inverse Transform Method
(continuous RVs)
• Task: Given a stream of uniform random
numbers ui,…,un on [0,1], generate a sample
x1,…, xn from a continuous distribution with
cumulative distribution function F
• Assumption for the method: F increases
strictly on the range of values x with 0<F(x)<1.
– Then F(x)=z has a unique solution x=F-1(z) for
every z with 0<z<1
– and F-1(z) is strictly increasing as well
The Inverse Transform Method
(continuous RVs, cont.)
• If U is uniform RV on [0,1] then
F(x) = P(U F(x))
= P(F-1 (U)  F-1(F(x)))
= P(F-1 (U) x)
• Hence the RV F-1 (U) has the same distribution as
X
• Calculate xi from ui by setting xi = F-1 (ui ), i.e. by
solving F(xi )= ui
– Example Norminv(rand(),a,b)
– Applies to exponential RV with mean m (F(x)=1-e-x/m)
What if you don’t know the
distribution?
• Many distributions (normal, exponential, etc.)
are defined by a small set of parameters
(mean, variance, etc.)
– If you know the parameters you know the
distribution
• Sometimes you don’t know the specific type
of distribution but are instead faced with a set
of historical company data for the uncertain
cell
– e.g. order quantities and times between orders for
a particular product
Resampling
• In this case, you can create a historical
distribution function and sample from this
• There an easy way of doing this
– Suppose you have a list of n data values (in a
spreadsheet)
– Choose a number x between 1 and n at random
(using randbetween(.,.) function)
– Select the x-th data in your list and plug it into the
cell (using the vlookup or hlookup functions)
Statistical dependence
• If uncertain cells are statistically dependent (e.g.
temperature and ice cream sales) then sampling
independently from a list of temperatures for the
temp-cell and from a list of ice cream sales for
the sales cell gives a wrong result
– try this with the product launch model with price and sales
• Instead, set up a list of associated
(temperature,sales) pairs and then resample
from the list of pairs and assign the temperature
and sales from the same pair to the respective
cells
What you have learned?
• Management Sciene is about using computers to help
managers make good decisions and optimise their operational
processes
• Spreadsheets
– are used and accepted by managers everywhere in the world
– are good fun to play with
– have considerable drawbacks and should not be used for complex
models
• Monte Carlo Simulation is a good way of analysing uncertainty
• You know how a computer samples from a distribution
– You can perform a simple test for a built-in RNG
– You can use uniform RNGs to generate samples from more general
distributions
Homework
• Test the Excel rand() function using a 1dimensional goodness of fit test with k=100 and
n=500
– use chidist function (function wizard fx -> statistical) to
compute the tail probability of the chi-square distribution
• Consider the linear congruential RNG with
m=64,a=21, c=1.
– Use Excel to show that it has maximal cycle length.
– Plot the overlapping pairs (u1,u2),(u2,u3),(u3,u4),.... Is the
emerging pattern a pattern that you would expect to be
generated by a good RNG? Could this RNG cause problems
in a simulation?
Homework
• Generate a binomial distribution with parameters
10 (# trials) and 0.2 (winning probability) in a
spreadsheet and generate a histogram and a
cumulative distribution function by running a
simulation
• Set up the Excel spreadsheet for the product
launch and look at the differences in the
simulation results if you sample prices and sales
independently rather than dependently