Download chapter_2

Document related concepts
no text concepts found
Transcript
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-2
The Forecast Process
• The forecast process involves selecting one or
more forecasting techniques depending on the
type of data available.
• The type of data is determined by evaluating
data for trend, seasonal and cyclical
components.
• In this chapter, we will evaluate a number of
data series to see which time-series component
exist in each.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-3
The Forecast Process
• The success of the forecast process depends on
the effectiveness of the communication
between the managers who use forecasts and
the individuals who develop the forecasts.
• It is also important for managers to have some
familarity with the methods used in developing
the forecast.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-4
The Steps in Forecasting Process
1.
2.
3.
4.
5.
6.
7.
8.
Specify objectives
Determine what to forecast
Identify time dimensions
Consider the quantity and the type of data
Select a forecasting method (see Table 2.1)
Evaluate its accuracy
Prepare and present the forecasts
Track the forecasts
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-5
Model Evaluation
• In evaluating forecasting models, it is very
important to distinguish between fit an
accuracy.
• Fit refers to in-sample model performance,
whereas accuracy refers to out-off-sample
performance.
• In many cases models that perform well in
sample perform very poorly out-of-sample.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-6
Model Evaluation
• Since forecast accuracy is always first priority,
emphasis should be placed on out-of-sample RMSE
rather than model fit.
• This is usually accomplished by use of a holdout
period in sample.
• This is a period at the end of the sample in which
forecasts from from earlier periods can be made to
access the accuracy of a given model.
• In summary, fit refers to how well model works with
past data, accuracy relates to how well the model
works in the forecast horizon.(see pg. 52)
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-7
Data Patterns
• The data that are used most often in
forecasting are time series which include a
variety of patterns. The best way to observe
these patterns is to plot them over time:
Trend: A long-term change (positive or
negative) in the level of data. When there is
neither positive nor negative trend, data are
considered stationary.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-8
Data Patterns
Seasonal Pattern: Seasonality occurs when a regular
variation in the level of data repeats itself at the same
time each year/month or week.
Cyclical Pattern: Cyclical fluctuations are related to
business cycles and in comparison to seasonal
fluctuations, they are of longer duration and are less
regular.
Irregular component: Irregular fluctuations occur
randomly.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-9
Gross Domestic Product (GDP) Data Set
A times-series plot of real GDP on a quarterly
basis is given in Fig. 2.1.
What data patterns can be observed from these
times-series?
1. Long-term positive trend
2. Cyclical fluctuation
So, GDP is nonstationary and has a cyclical
component.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-1
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-11
Private Housing Starts (PHS) Data Set
•
PHS data are plotted in Fig. 2.1 from 1980 to
2000, on a quarterly basis.
What data patterns can be observed from these
times-series?
1. Upward trend, 2. Cyclical movements,
3. Seasonal pattern
• The cyclical nature of data is more obvious
in comparison of PHSSA (Deseasonalized
PHS data) with the trend.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-2
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-13
Leo Burnett Advertising Agency
(LBB) Data
LBB data are shown in Fig. 2.3 on an annual
basis from 1950 to 1995.
What data patterns can be observed from these
times-series?
1. Upward non-linear trend
Note that there is no need to consider seasonality
since these are annual data.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-3
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-15
Data Patterns and Model Selection
Model selection depends on data patterns. Now,
let us select a model by using information in
Table 2.1:
For GDP having a trend and a cycle, but no
seasonality:
• Holt’s exponential smoothing
• Linear regression trend
• Casual regression
• Time-series decomposition
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-16
Data Patterns and Model Selection
For PHS data having a trend, seasonality, and a
cycle:
• Winter’s exponential smoothing
• Linear regression with seasonal adjustment
• Casual regression
• Time-series decomposition
For LBB data with a nonlinear trend, nonlinear
and causal regression are appropriate.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Table 2-1
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-18
A Statistical Review
Descriptive Statistics:
1. Measures of central tendency: mean, median
and mode
2. Measures of dispersion: range, variance and
standart deviation, coefficient of variation.
• Mean is arithmetic average of all the
numbers in data, median splits the data into
two equal parts and mode is the response
which occurs most frequently. (see pg. 57)
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-19
A Statistical Review
• Range is the difference between the smallest
and the greatest value.
• Standart deviation measures the squared
differerences between the mean and each
observation. Note that the sum of unsquared
differences around the mean is equal to zero.
• Variance is the square of the standart
deviation.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-20
A Statistical Review
• Coefficient of variation, defined as the
standard deviation of the observations divided
by the mean, provides a measure of relative
variation, whereas standart deviation provides
a measure of absolute variation.
• Using all of these descriptive statistics together
gives a better idea about the data than using
just the mean.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Table 2-2
Table 2-3
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-23
A Statistical Review
These sales data are plotted over time in Fig. 2.4.
What data patterns can be observed from these
times-series?
The sales data are stationary, there is not a trend.
• The reason that the mean is above the other
central tendency measure is the existence of
one large value in data. This large value pulls
up the mean but has little or no effect on the
mean or mode.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-4
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-25
Normal Distribution
• Normal distribution for a continuous random
variable is defined by the mean and the
variance of the variable.
• As seen in Fig. 2.5, all normal distributions are
symmetrical around the mean (i.e., the mean is
equal to the median).
μ - / + 1σ includes about 68% of the area
μ - / + 2σ includes about 95% of the area
μ - / + 3σ includes about 99% of the area
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-26
Standart Normal Distribution
(Z–distribution)
• To ease the calculation of area under the
normal curve, normal distributed variable is
transformed into standart normal variable:
Z =X- μ /σ (measures the number of
standart deviations by which X differes from
the mean)
If Z >0, then X takes place to the right of mean.
If Z <0, then X takes place to the left of the
mean.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-5
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-28
Example
Suppose the sales for a product is represented by
a normal distribution with a mean of 50 and a
standart deviation of 10.
What percent of sales would be between 40 and
65? P(40<X<65)
Z1=(40-50)/10= -1 Z2= (65-50)/10=1.5
P(-1<Z<1.5)= 0.3413+0.4332=0.7745
What percent of the sales would be greater than
30? P(X>30)=P(Z>-2)==.4772+0.5=0.9772
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Table 2-4
Figure 2-6
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-31
The Sampling Distribution of the Mean
•If a random sample of n observations is taken
from a normal population with mean μ and
variance σ2 , then each observation of the random
sample will have the same normal distribution as
the population being sampled.
•If we are sampling from a population with
unknown ditribution, then the sampling distribution
of X will still be approximately normal with mean
μ and variance σ2 /n provided the sample size is
large. (Central Limit Theorem)
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-32
The Sampling Distribution of the Mean
• The normal approximation for X will be good
if n>=30. (Law of large numbers)
• If n<30, the approximation is good only if the
population is not too different from a normal
distribution.
• If the population is known to be normal, the
sampling distribution of X will follow a
normal distribution exactly, no matter how
small the size of the samples.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-33
Example
What is the probability of selecting a sample of
100 observations with a mean greater than 300
when the true population mean is 288 and the
population standard deviation is 60?
Z=(300-288):60/√100=2
P(Z>2)=0.5-0.4772
=0.0228
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-34
The Student’s t-Distribution
• The Student’s t-distribution is used, when the
population variance is not known or when the sample
size is small.Since the t-distribution depends on the
number of degrees of fredom (df), there are many tdistributions (see Table 2.5) What is df?
• As the sample size gets very large, the student’s t-dist.
becomes the same as normal distribution.
• Go through the examples given on pg.67 to
understand how to read the Table 2.5.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Table 2-5
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-36
Statistical Inference
• We usually draw a sample from a population
and then by calculating:
-- the sample mean and
-- the confidence interval for the sample mean
We make some inference about the whole
population.
Interpret the confidence level using the figure on
pg. 69 and the example given on pg. 70
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-37
Hypothesis Testing
Issues to deal with are:
1. Setting up the null and the alternative hypothesis
(One-tailed vs two-tailed test, see pg. 70 and 71))
2. Choosing confidence level
3. Determining the level of significance for one-tailed
and two-tailed tests.
Type I error = Significance level =1- Confidence Level
The effect of sample size on decreasing Type I and Type
II error. (Examples on pg. 73)
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Table 2-6
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-39
Correlation
Measures the degree of linear association
between X and Y. (-1<= r<=1) (see Fig. 2.7)
In this course, we will use Pearson productmoment correlation (see pg. 75)
We could perform a hypothesis test to check the
existence of linear association between two
variables (see examples on pg. 76)
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-7
Figure 2-7 (continued)
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-42
Correlograms
Correlograms help to measure the correlation between
successive observations over time.
Autocorrelation with lag-k can be measured using the
formula on pg. 78.
If the time series is stationary, rk will approach zero
rapidly as k increases.
If there is a trend, rk will approach zero slowly.
If there is seasonality in data, the value of rk will be
significantly different from zero at k=4 for quarterly
data or k=12 for monthly data.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-43
Correlograms
A k-period plot of autocorrelations is called an
autocorrelation function (ACF) or a
correlogram.
We can perform a hypothesis test to check
whether the autocorrelation at lag k is
significantly different from zero.
We will reject the Null Hypothesis if :
|rk| >2/√n
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Forecasting with Accompanying Excel-Based ForecastX™ Software
2-44
Autocorrelation Structure of Real GDP
From Figure 2.8, it is clear that GDP has a fairly strong
positive trend (see hypothesis test results on pg. 79)
In order to use a forecasting method which requires
stationary data, we need to transform GDP data (see
Figure 2.9 on pg. 80) to a stationary series (see Figure
2.9 on pg. 81)
To transform, the first differences for GDP data will be
calculated as fallows:
DGDPt = GDPt – GDPt-1 Fig. 2.10 shows autocorrelation
structure for DGDP data.
McGraw-Hill/Irwin
Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 2-8
Figure 2-9
Figure 2-10
Figure 2-11
Figure 2-11 (continued)
Figure 2-12
Figure 2-13
Figure 2-14
Table 2-4 (continued)
Table 2-5 (continued)
Table 2-7