Download Estimating Passenger Demands from Truncated Samples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Generalized linear model wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Computational fluid dynamics wikipedia , lookup

Data assimilation wikipedia , lookup

Probability box wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Monte Carlo method wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

False position method wikipedia , lookup

Transcript
ESTIMATING PASSENGER DEMANDS FROM TRUNCATED SAMPLES
Franklin S. Young, United ALrlines
THE PROBLEM
distributed (Li.d.). This assumption tends
to break down during periods of heavy
travel.
The profItability of an airline is partly determined
by its ability to put seats where they are needed
the most, to match capacity to demand over its
ESTIMATION METHODS
entire network.
1.
The first hurdle in this challe.nge is to measure
demand. The latter becom es. unfortunately, an
elusive and unobservable quantity when there is
a possibility of "stockouts." In the airline industry. stockouts OCcur stmply because an airplane has .. finite and known capacity. The
relationship between sales (passenger loads) and
demand can be viewed as:
Criteria
Any estimation method must, of course,
provIde reasonably aocurate results. In addition, it must meet two other criteria:
- It must be computationally cheap enough
to estimate 1,600 demand parameters on
a regular basis. With the acquisltlon m
SAS, we have decided to perform this
analysis for each flight leg instead· m
aircraft type.
Sale.s = Min (demand, capacity)
Our approach has heen to infer demand from passenger loads which are readily observable. The
distribution of passenger loads is a trunoated
proxy of demand. Or, In other words, we have
to esUmate the demand parameters from truncated
samples,.
- It must be usable with 20 to 30 obser-
vations. The Inputs to this analysis
are the daily loads by flight during a
particular month.
We have implemented
BASIC ASSUMPTIONS
1.
Normality
2,
The frequency distribution of load data is believed by· the industry to be, among others,
methods and ex-
Graphical Approach
Prior to the acquisition ofSAS, this estimation problem had been solved graphically.
Its logic is quite straightforward. The frequency distribution of loads is plotted on
namal probability paper. After "discounting" the outliers, a straight line is fitted
freehand. Its slope is an estimate of the
standard devlatioo of the underlying namal
distribution _ The mean is estimated by the
abscissa of the 50% pOint.
truncated normal. log-normaL negotive bi-
.nomlal, or Eriang distributed.
,:.
t~o
perimented with a third.
W", have retained the normality hypothesis,
for the normal distribution has been ·extensively studied, and this assumption Is not
infirmed by. thedat"_ For this confirmatcxy
analysis, we tend t6 focus on flights which
never reach oapacity during a given month to
eliminate the distortion introduced by truncation. A common objection to this distributional assumption is that the range m a normal variate is between -0" and += , whereas loads can vary only between 0 and the
capacity m the airplane. In most instances.
less than .001% of the distrlbution lies in
the area truncated on the left (beyond 0) •.
The right trunoatlon cannot, however, be
overlooked, since it repre sents typically
a loss of i6%. Hence, the problem can be
restated as the estlmation of the mean atld
the stand"rd deviat.lon of a s'inglytruncated
normal distrlbutioo.
To implement this method with BAS, we need
first to generate the equivalent of the probability scale found 00 the graph paper,
However, to plot with linear scales instead
of computing the probability associated with
a given passenger load, we derive its normal score, 8i. according to the Blom formula [1]:
51 = PSI [(Ri-3/8) / (N+l/4)]
where
~"" ( = Eli " = +a<>
PSI is the inverse normal CDF.
RI is the rank of the I observatlon.
2 •. Independence
N is the number of observatloos faa given month and flight leg.
Daily loads are Independently and identically
21
auxiliary function tabulated by Cohen [3].
The following formulae are then used:
Then the pas senger loads j!.re plotted and regressed against its normal scores •. The Intercept of the regression is an estimate of
the mean and Its slope an estimate of the
standard deviation.
m*
= x - T(G) (x-xo)
s2* = s2 + T(G) (x-xo)2
The difficult part in thiS 8AS implementation
has been to ide.ntify and I'discount" outliers
where
automatically. From the probablllty plots,
It appears that the data Is well behaved when
the normal scores are between +I and -1.
Henoe, only points falling within that range
are used In. the regression step. The other
cut-off pOints we have experimented with
lead to estimates with greater downward bias.
This subsetting step Is also warranged from a
regression point of view, since it limits the
problem of heteroscedastic disturbances. A
more rlgerous treatment of the outlier problem
Is summarized ln H. A. David [5J.
xo is the truncation point ..
X is the sample mean.
s2 Is the sample variance.
G = s2/(x-xo) 2 and T the
auxlaliary function.
A MONTE CARLO EVALUATION
The properties of these different estimators were
Investigated through a Monte Carlo eXperiment.
We oreated 100 samples of 30 observations of a
normal distribution, N (12,4). Demand was assumed to have a mean of 12 and a standard deviation of 4. Sales were generated from the demand
data by keeping observations which fell between
o and 14.
In summary, the graphic method can be implemented with three procedures (RANK,
PLOT, and SYSRG), and a few Simple file
manipulaticns. 1,600 flights can be processed in less than 300 CPU seconds on a
168.
3.
The mean and the standard deviation of the demand distribution were then estimated from the
sales data using all 3 methods. These estimates,
summarized by their mean and standard deviation,
are reported in Table I.
Cohen's 3-Moment Method
A. C. Cohen [2] devised a clever method
which relies on the first 3 sample moments.
It le/lds to expliCit formulae for the mean
and the standard deviation. Computat1.onally, it is 75% faster than the graphical
method.
The ad hoc .graphic .method yields fairly accurate
estimates when applied to the untruncated sampies. On the trunca.ted samples, the method,
however, underestimates the true value by 20 to
25%, and the filtering of extreme values (outSide
±l Iinormal score!!) produces estimates'which are
only slightly better.
Because of the presence of the third moment,
this technique Is quite sensitive to outliers.
'-,
The estimators are given below:
The 3-moment approach d~s quite well, but the
standard deviations of Its estimates are 2 to 4
times greater than those obtained through the
gr/lphlc method. Interestingly, the graphic and
the 3-moment estimates for the mean show a fair
degree of correlation (R=. 63), whereas the ones
for the standard deviation are only loosely correlated (R=.19) .
2mlm2-m3
m = xo + -,-":"c-"---"2m12-m2
m l m3- m 2 2
2ml Lm2
where xo is the truncation pOint.
Based on those ~OO samples, the. ML method Is
not clearly superior. For example, If the 3moment method gives estimates which are substantially higher than the true value, the ML approachwould produce no· different results. Moreover, the standard deviations of the ML estimates 'are also quite l/lrge and comparable to the
3-mome:nt ones.
ml Is the ith moment of the truncated
sample about Its truncation point.
4.
Maximum LikelihOOd Method
Cohen [3,4) has also proposed a maximum
likelihOOd solution (ML) to this problem. It
involves solving a system of two non-linear
equations and, hence, we have not yet implemented It with BAS. Alternatively, the
estimation can be carried out through an
The Sum of the squared deviations from the true
values was also ll.sed to compare these three
22
estimators. On this count, the graphical method
out-perfonns all others. The deviatioos are,
however, mostly negative. As noted before, this
method underestimates both parameters.
REFERENCES
[lJ Blom, G., "Statistical Estimates and Transfonned Beta Variables," Wiley, 1958.
In summary. none of these three methodS is fully
satisfactory. A candidate for fwther research is
the iterative ML method proposed by Harter and
Moore [5]. It requires starting estimates which
could be supplied by the 3-moment method.
CONCLUSION
The problem of estimating demand from sales
under situations of stockouts is of a generalinterest nature. Por example r it is encountered
in retailing, util1ttes, etc. We hope to see further research in this area which may lead to a
new SMl procedure.
MONTE CARLO EXPERIMENT
COMPARISON OF 3 ESTIMATION METHODS
100 Replications
Method '" Graph +-1
MU
SIGMA
10.189
2.917
0.718
0.64.0
Method = 3-Moment
MU
SIGMA
11.720
2.731
3.519
1.167
Method = Ma"imum" Likelihood
MU
SIGMA
13.002
4.073
4.474
1.533
The true parameters are 12 and 4 for the MU and
SIGMA, respectively.
,~,,"
Cohen, A.C •• "On Estimating the Mean and
Variance of Singly Truncated Normal Frequency Distributions from the First Three Sam "Ie
Moments, " Annals of the Institute of Statistical MathematiCS, 1950, pp.37-44.
[3]
Cohen, A.C .• "Simplified Estimations for
the N onnal Distribution When Sam"le. are
Single Censored or Truncated," Technometrlcs. 1959, pp. 217-237.
[4]
Cohen, A.C •• "Tables for Mujmum Likelihood Estimates: Single Iruncated and
Singly Censored Samples," Technometrics.
1961, pp. 535-541.
[5]
David, H. A•• "Order Statistics." Wiley.
1970.
[6J Harter and Moore, "Iterative ML Estimation
of the parameters of Nonnal Populations from
Singly and Doubly Censored Samples." "
Biometrika, 1966, Vol. 53, pp. ':WS.,213.'
Standard
Deviation
Variable
[2J
Table 1
1:
~,
,
'"
'-
23
_____=""••
~"'=""~~~"'
...
..
=o=_:_ •• ,."_o,-.-._..~,.._,~
___
-~,.,..,...,,_ ~~'-.--
."."---y..,__,__-'_c""'7.~.--.,-
--'·'·-,-..-."-':T'., ,,-.--,.,- .,•.•• - . _ •
~
,.>_*,,,,..~...,.,,,,
••• __
~
.,
-.. ,.• , ..•
'.".~
..••
"'.-~--
MONTE CARLO EXPERIMENT
COMPARISON OF 3 ESTIMATION METHODS
100 Replic~t1cns
Frequency Bar Chart
FREQueNCY
.:too +,
.......
.......
.........
,
"' .. '"
,i
ID+'" •••••
I
80
,,i
+,
,
,,I
7. +
........,.
......
......
........
•••••
.......
,. ......
.....
....
.•••••
. . . . 01:.
"
.. ...... ,
•••••
. ......
.......
..... :IF.
.... **
........
".....
• " . 'II.•
~" +
.
'"
5. +
,.,,. . . *.
......
"' .
.. '"
.......
..........'"
...
,, ..... ".
,, ......
, ... ".
....
.... ..,.**
4-0 -+
....
.....
...."'.,...
..",.,.
..,.
....
.......
.....,
,i
,i
2. +,
:,,
,. +
........
..... t
"'
"' •• "'*
*t,,>+*
.. "''''
''''''.,0,"' ... '" ,."' ....
••••• "' .....
......
.,t-tt •
.
......
"'
"" •• *
• .... ,*
.-'"
;/;.
...... "'.,
",.;1.. 'I'
.. '1' . . . .
•••••
......'" '" ......
.......
...... .'" ...... ...... ... ..
. .."' ....
.
"'**•• ...... ••••• ..... ...... ... .
....... ....."";6 ..."..... .. ....
••••• ....... ....... ......
,.
• • • 01< ..
*****
•• ,..*
~
",
••• ••
..... "' ..
••• **
10.5
>10 • • • •
•••••
• ••••,..
".*,.
,,-" .. .,.*
GRAFlH+-1
1Q.5
"<I<" '" *'"
••••
"";to ••
• to ... ."
,.
..-
ot ..... ,
u.s
"..
ott •••
.. ......
."''''''
.
~
3. +
...
....
.........
... ..
"'
""' ... ,
......
.........
,,I ••• "'$
,, •••••
13.5
16.5
1----------- 3 MOMENT
.*11; ...
19.5
•••
2:2,5
-~--------:
10~S
"'
"
~
13.!)
.6.S
'9.5
22.5
:25. S
28. '5
34 .5
i-----~---------------- MAX lHD -----------------~---i
MU r.tIOPQI!H
r~ETHQC
I
'.,~~.;;;;
"~Lf.\ .fI...
• : .._<~ .!..
"
"I"~·!~~.~7,"-'''''''''''''''''''7I''"i'<'''''''''''",",",?<Tr~'''':''''''''''''.T.W.'."l~'><::'~~u."~ '_"":".~,.,:,~,",,,,,,",, .•~ __:,.", ."
to. -
...
, " ,~:;~. : •. -.> "":.0' ,'"' ,'- "-O"'"'O"'~ ~".~~.=" ~ 0"". -"'- J"r •.'
, - .-_. ".' ~,'.~~T""'1"~·,".t'T""""'·'<
'."' ,.-~~,.~..." "",'_'. :,"",',"", "•..."."""......,.,.J:,.~
,•.
MONTE CARLO EXPERIMENT
COMPARISON OF 3 ESTIMATION METHODS
100 Repl(cations
Frequency Bar Chart
FRE.QUE.NC~
6.
...••••••
.........
•••
......'"
......
...
...•••
...
....•
.........
......
...•••
......
...••••
......
...
...... ... ....•••.. .". ...
......... •• •••...••• •...•• ........,.
...... ...•••••• ..'". ......••• ... ......
...... ...... •••••• ...'" .'" ...,,, ."...
n.
50
"":'"'
:.
,
,,;t-"
40 +
",."
••
.i,,
."* ••
30 +
'"'"
......
......
...
,.. ...
......••
......
20 •
".,,,"
•
t.a
1-
••
" "'''"
"
.....
•
•••
3.0
4.2:
GRA,PH--t-l
1.8
3.0
4.2
:-~---------
5.01
.",'"..
.........
......... ............
......... ....'"..
......... ...•••
...• ......
........ ......... ....• .. ...... ...... ...
.
... ... ... ... ... ... ... ... •••
•••
.to
••
6.6
7.'8
9,0
3 MONENT --·-------1
1.8
3.0
••
4.2
5.4
S.6
1.8
•••
...
9.0 11.4
:-------------- MA~ ~~p ----~~--~~---:
S!0t.1A M1D-POtN!
ME THOD
i!;
i:il
~
!W!