Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Regression analysis wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Generalized linear model wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Computational fluid dynamics wikipedia , lookup
Data assimilation wikipedia , lookup
Probability box wikipedia , lookup
Computational electromagnetics wikipedia , lookup
Monte Carlo method wikipedia , lookup
ESTIMATING PASSENGER DEMANDS FROM TRUNCATED SAMPLES Franklin S. Young, United ALrlines THE PROBLEM distributed (Li.d.). This assumption tends to break down during periods of heavy travel. The profItability of an airline is partly determined by its ability to put seats where they are needed the most, to match capacity to demand over its ESTIMATION METHODS entire network. 1. The first hurdle in this challe.nge is to measure demand. The latter becom es. unfortunately, an elusive and unobservable quantity when there is a possibility of "stockouts." In the airline industry. stockouts OCcur stmply because an airplane has .. finite and known capacity. The relationship between sales (passenger loads) and demand can be viewed as: Criteria Any estimation method must, of course, provIde reasonably aocurate results. In addition, it must meet two other criteria: - It must be computationally cheap enough to estimate 1,600 demand parameters on a regular basis. With the acquisltlon m SAS, we have decided to perform this analysis for each flight leg instead· m aircraft type. Sale.s = Min (demand, capacity) Our approach has heen to infer demand from passenger loads which are readily observable. The distribution of passenger loads is a trunoated proxy of demand. Or, In other words, we have to esUmate the demand parameters from truncated samples,. - It must be usable with 20 to 30 obser- vations. The Inputs to this analysis are the daily loads by flight during a particular month. We have implemented BASIC ASSUMPTIONS 1. Normality 2, The frequency distribution of load data is believed by· the industry to be, among others, methods and ex- Graphical Approach Prior to the acquisition ofSAS, this estimation problem had been solved graphically. Its logic is quite straightforward. The frequency distribution of loads is plotted on namal probability paper. After "discounting" the outliers, a straight line is fitted freehand. Its slope is an estimate of the standard devlatioo of the underlying namal distribution _ The mean is estimated by the abscissa of the 50% pOint. truncated normal. log-normaL negotive bi- .nomlal, or Eriang distributed. ,:. t~o perimented with a third. W", have retained the normality hypothesis, for the normal distribution has been ·extensively studied, and this assumption Is not infirmed by. thedat"_ For this confirmatcxy analysis, we tend t6 focus on flights which never reach oapacity during a given month to eliminate the distortion introduced by truncation. A common objection to this distributional assumption is that the range m a normal variate is between -0" and += , whereas loads can vary only between 0 and the capacity m the airplane. In most instances. less than .001% of the distrlbution lies in the area truncated on the left (beyond 0) •. The right trunoatlon cannot, however, be overlooked, since it repre sents typically a loss of i6%. Hence, the problem can be restated as the estlmation of the mean atld the stand"rd deviat.lon of a s'inglytruncated normal distrlbutioo. To implement this method with BAS, we need first to generate the equivalent of the probability scale found 00 the graph paper, However, to plot with linear scales instead of computing the probability associated with a given passenger load, we derive its normal score, 8i. according to the Blom formula [1]: 51 = PSI [(Ri-3/8) / (N+l/4)] where ~"" ( = Eli " = +a<> PSI is the inverse normal CDF. RI is the rank of the I observatlon. 2 •. Independence N is the number of observatloos faa given month and flight leg. Daily loads are Independently and identically 21 auxiliary function tabulated by Cohen [3]. The following formulae are then used: Then the pas senger loads j!.re plotted and regressed against its normal scores •. The Intercept of the regression is an estimate of the mean and Its slope an estimate of the standard deviation. m* = x - T(G) (x-xo) s2* = s2 + T(G) (x-xo)2 The difficult part in thiS 8AS implementation has been to ide.ntify and I'discount" outliers where automatically. From the probablllty plots, It appears that the data Is well behaved when the normal scores are between +I and -1. Henoe, only points falling within that range are used In. the regression step. The other cut-off pOints we have experimented with lead to estimates with greater downward bias. This subsetting step Is also warranged from a regression point of view, since it limits the problem of heteroscedastic disturbances. A more rlgerous treatment of the outlier problem Is summarized ln H. A. David [5J. xo is the truncation point .. X is the sample mean. s2 Is the sample variance. G = s2/(x-xo) 2 and T the auxlaliary function. A MONTE CARLO EVALUATION The properties of these different estimators were Investigated through a Monte Carlo eXperiment. We oreated 100 samples of 30 observations of a normal distribution, N (12,4). Demand was assumed to have a mean of 12 and a standard deviation of 4. Sales were generated from the demand data by keeping observations which fell between o and 14. In summary, the graphic method can be implemented with three procedures (RANK, PLOT, and SYSRG), and a few Simple file manipulaticns. 1,600 flights can be processed in less than 300 CPU seconds on a 168. 3. The mean and the standard deviation of the demand distribution were then estimated from the sales data using all 3 methods. These estimates, summarized by their mean and standard deviation, are reported in Table I. Cohen's 3-Moment Method A. C. Cohen [2] devised a clever method which relies on the first 3 sample moments. It le/lds to expliCit formulae for the mean and the standard deviation. Computat1.onally, it is 75% faster than the graphical method. The ad hoc .graphic .method yields fairly accurate estimates when applied to the untruncated sampies. On the trunca.ted samples, the method, however, underestimates the true value by 20 to 25%, and the filtering of extreme values (outSide ±l Iinormal score!!) produces estimates'which are only slightly better. Because of the presence of the third moment, this technique Is quite sensitive to outliers. '-, The estimators are given below: The 3-moment approach d~s quite well, but the standard deviations of Its estimates are 2 to 4 times greater than those obtained through the gr/lphlc method. Interestingly, the graphic and the 3-moment estimates for the mean show a fair degree of correlation (R=. 63), whereas the ones for the standard deviation are only loosely correlated (R=.19) . 2mlm2-m3 m = xo + -,-":"c-"---"2m12-m2 m l m3- m 2 2 2ml Lm2 where xo is the truncation pOint. Based on those ~OO samples, the. ML method Is not clearly superior. For example, If the 3moment method gives estimates which are substantially higher than the true value, the ML approachwould produce no· different results. Moreover, the standard deviations of the ML estimates 'are also quite l/lrge and comparable to the 3-mome:nt ones. ml Is the ith moment of the truncated sample about Its truncation point. 4. Maximum LikelihOOd Method Cohen [3,4) has also proposed a maximum likelihOOd solution (ML) to this problem. It involves solving a system of two non-linear equations and, hence, we have not yet implemented It with BAS. Alternatively, the estimation can be carried out through an The Sum of the squared deviations from the true values was also ll.sed to compare these three 22 estimators. On this count, the graphical method out-perfonns all others. The deviatioos are, however, mostly negative. As noted before, this method underestimates both parameters. REFERENCES [lJ Blom, G., "Statistical Estimates and Transfonned Beta Variables," Wiley, 1958. In summary. none of these three methodS is fully satisfactory. A candidate for fwther research is the iterative ML method proposed by Harter and Moore [5]. It requires starting estimates which could be supplied by the 3-moment method. CONCLUSION The problem of estimating demand from sales under situations of stockouts is of a generalinterest nature. Por example r it is encountered in retailing, util1ttes, etc. We hope to see further research in this area which may lead to a new SMl procedure. MONTE CARLO EXPERIMENT COMPARISON OF 3 ESTIMATION METHODS 100 Replications Method '" Graph +-1 MU SIGMA 10.189 2.917 0.718 0.64.0 Method = 3-Moment MU SIGMA 11.720 2.731 3.519 1.167 Method = Ma"imum" Likelihood MU SIGMA 13.002 4.073 4.474 1.533 The true parameters are 12 and 4 for the MU and SIGMA, respectively. ,~,," Cohen, A.C •• "On Estimating the Mean and Variance of Singly Truncated Normal Frequency Distributions from the First Three Sam "Ie Moments, " Annals of the Institute of Statistical MathematiCS, 1950, pp.37-44. [3] Cohen, A.C .• "Simplified Estimations for the N onnal Distribution When Sam"le. are Single Censored or Truncated," Technometrlcs. 1959, pp. 217-237. [4] Cohen, A.C •• "Tables for Mujmum Likelihood Estimates: Single Iruncated and Singly Censored Samples," Technometrics. 1961, pp. 535-541. [5] David, H. A•• "Order Statistics." Wiley. 1970. [6J Harter and Moore, "Iterative ML Estimation of the parameters of Nonnal Populations from Singly and Doubly Censored Samples." " Biometrika, 1966, Vol. 53, pp. ':WS.,213.' Standard Deviation Variable [2J Table 1 1: ~, , '" '- 23 _____=""•• ~"'=""~~~"' ... .. =o=_:_ •• ,."_o,-.-._..~,.._,~ ___ -~,.,..,...,,_ ~~'-.-- ."."---y..,__,__-'_c""'7.~.--.,- --'·'·-,-..-."-':T'., ,,-.--,.,- .,•.•• - . _ • ~ ,.>_*,,,,..~...,.,,,, ••• __ ~ ., -.. ,.• , ..• '.".~ ..•• "'.-~-- MONTE CARLO EXPERIMENT COMPARISON OF 3 ESTIMATION METHODS 100 Replic~t1cns Frequency Bar Chart FREQueNCY .:too +, ....... ....... ......... , "' .. '" ,i ID+'" ••••• I 80 ,,i +, , ,,I 7. + ........,. ...... ...... ........ ••••• ....... ,. ...... ..... .... .••••• . . . . 01:. " .. ...... , ••••• . ...... ....... ..... :IF. .... ** ........ "..... • " . 'II.• ~" + . '" 5. + ,.,,. . . *. ...... "' . .. '" ....... ..........'" ... ,, ..... ". ,, ...... , ... ". .... .... ..,.** 4-0 -+ .... ..... ...."'.,... ..",.,. ..,. .... ....... ....., ,i ,i 2. +, :,, ,. + ........ ..... t "' "' •• "'* *t,,>+* .. "'''' ''''''.,0,"' ... '" ,."' .... ••••• "' ..... ...... .,t-tt • . ...... "' "" •• * • .... ,* .-'" ;/;. ...... "'., ",.;1.. 'I' .. '1' . . . . ••••• ......'" '" ...... ....... ...... .'" ...... ...... ... .. . .."' .... . "'**•• ...... ••••• ..... ...... ... . ....... ....."";6 ..."..... .. .... ••••• ....... ....... ...... ,. • • • 01< .. ***** •• ,..* ~ ", ••• •• ..... "' .. ••• ** 10.5 >10 • • • • ••••• • ••••,.. ".*,. ,,-" .. .,.* GRAFlH+-1 1Q.5 "<I<" '" *'" •••• "";to •• • to ... ." ,. ..- ot ..... , u.s ".. ott ••• .. ...... ."'''''' . ~ 3. + ... .... ......... ... .. "' ""' ... , ...... ......... ,,I ••• "'$ ,, ••••• 13.5 16.5 1----------- 3 MOMENT .*11; ... 19.5 ••• 2:2,5 -~--------: 10~S "' " ~ 13.!) .6.S '9.5 22.5 :25. S 28. '5 34 .5 i-----~---------------- MAX lHD -----------------~---i MU r.tIOPQI!H r~ETHQC I '.,~~.;;;; "~Lf.\ .fI... • : .._<~ .!.. " "I"~·!~~.~7,"-'''''''''''''''''''7I''"i'<'''''''''''",",",?<Tr~'''':''''''''''''.T.W.'."l~'><::'~~u."~ '_"":".~,.,:,~,",,,,,,",, .•~ __:,.", ." to. - ... , " ,~:;~. : •. -.> "":.0' ,'"' ,'- "-O"'"'O"'~ ~".~~.=" ~ 0"". -"'- J"r •.' , - .-_. ".' ~,'.~~T""'1"~·,".t'T""""'·'< '."' ,.-~~,.~..." "",'_'. :,"",',"", "•..."."""......,.,.J:,.~ ,•. MONTE CARLO EXPERIMENT COMPARISON OF 3 ESTIMATION METHODS 100 Repl(cations Frequency Bar Chart FRE.QUE.NC~ 6. ...•••••• ......... ••• ......'" ...... ... ...••• ... ....• ......... ...... ...••• ...... ...•••• ...... ... ...... ... ....•••.. .". ... ......... •• •••...••• •...•• ........,. ...... ...•••••• ..'". ......••• ... ...... ...... ...... •••••• ...'" .'" ...,,, ."... n. 50 "":'"' :. , ,,;t-" 40 + ",." •• .i,, ."* •• 30 + '"'" ...... ...... ... ,.. ... ......•• ...... 20 • ".,,," • t.a 1- •• " "'''" " ..... • ••• 3.0 4.2: GRA,PH--t-l 1.8 3.0 4.2 :-~--------- 5.01 .",'".. ......... ......... ............ ......... ....'".. ......... ...••• ...• ...... ........ ......... ....• .. ...... ...... ... . ... ... ... ... ... ... ... ... ••• ••• .to •• 6.6 7.'8 9,0 3 MONENT --·-------1 1.8 3.0 •• 4.2 5.4 S.6 1.8 ••• ... 9.0 11.4 :-------------- MA~ ~~p ----~~--~~---: S!0t.1A M1D-POtN! ME THOD i!; i:il ~ !W!