Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Economics of the Firm Consumer Demand Analysis Today’s Plan: •Motivation •Refresher on Probability and Statistics •Refresher on Regression analysis •Example of Regression Analysis •Cross Section Estimation •Forecasting •Questions A demand curve tells us a lot about the customers that we face. Movie Tickets Sold (in Thousands) Price of a movie ticket Average Income (in Thousands) Price of a DVD (A substitute to the movies) QM 200 4PM 6I 2PDVD Every $1 increase in the price of a movie ticket lowers sales by 4,000 tickets Every $1,000 increase in average income raises sales by 6,000 tickets Every $1 increase in the price of a DVD raises sales by 2,000 tickets We can use a demand curve to forecast sales and revenues… QM 200 4PM 6I 2PDVD PM $8 I $48,000 PDVD $12 QM 200 48 648 212 480 PM QM 480$8 $3,840 Demand curves slope downwards – this reflects the negative relationship between price and quantity. Elasticity of Demand measures this effect quantitatively QM 200 4PM 6I 2PDVD %Q Q P P %P P Q Price PM $8 I $48,000 PDVD $12 8 .067 480 P 4 $8.00 M D 480 Movie Tickets A 1% rise in price lowers sales by .067% For any fixed price, demand (typically) responds positively to increases in income. Income Elasticity measures this effect quantitatively QM 200 4PM 6I 2PDVD %Q Q I I %I I Q PM $8 Price I $48,000 PDVD $12 $8.00 48 .6 480 I 6 D' D 480 Movie Tickets A 1% rise in average income raises sales by .6% Cross price elasticity refers to the impact on demand of another price changing QM 200 4PM 6I 2PDVD %Q Q PDVD L %PDVD PDVD Q Price PM $8 I $48,000 PDVD $12 $4.00 P DVD 12 2 .05 480 D' D 480 Movie Tickets A 1% rise in DVD prices raises sales by .05% A positive cross price elasticity refers to a substitute while a negative cross price elasticity refers to a compliment Application: At the revenue maximizing price, elasticity should be -1 QM 200 4PM 6I 2PDVD I $48,000 PDVD $12 QM 512 4PM P M PM 4 512 4PM 1 PM $64 Q 256 PM QM $16,384 PM $64 We can then re-calculate elasticity QM 256 I $48,000 PDVD $12 QM 200 4PM 6I 2PDVD P M 64 4 1 256 A 1% rise in price lowers sales by 1% 48 1.125 256 I 6 A 1% rise in average income raises sales by 1.125% P DVD 12 2 .09 256 A 1% rise in DVD prices raises sales by .09% Suppose that average income rose by 8%. By how much could you raise price without losing any sales? %QM PM %PM I %I PDVD %PDVD P 1 M I 1.125 P .09 DVD %QM %PM 1.258 .090 0 %PM 10 All We need is a demand curve!! What are the odds that a fair coin flip results in a head? What are the odds that the toss of a fair die results in a 5? What are the odds that tomorrow’s temperature is 95 degrees? The answer to all these questions come from a probability distribution Probability 1/2 Head Tail Probability 1/6 1 2 3 4 5 6 A probability distribution is a collection of probabilities describing the odds of any particular event We generally assume a Normal Distribution which can be characterized by a mean (average) and standard deviation (measure of dispersion) Probability Temperature Mean-2SD 2.5% Mean -1SD 13.5% Mean 34% Mean+1SD 34% Mean+2SD 13.5% 2.5% Annual Temperature in South Bend has a mean of 59 degrees and a standard deviation of 18 degrees. Probability 95 degrees is 2 standard deviations to the right – there is a 2.5% chance the temperature is 95 or greater (97.5% chance it is cooler than 95) Temperature 23 41 Can’t we do a little better than this? 59 77 95 Conditional distributions give us probabilities conditional on some observable information – the temperature in South Bend conditional on the month of July has a mean of 84 with a standard deviation of 7. Probability 95 degrees falls a little more than one standard deviation away (there approximately a 16% chance that the temperature is 95 or greater) Temperature 70 77 84 91 95 98 Conditioning on month gives us a more accurate probabilities! We know that there should be a “true” probability distribution that governs the outcome of a coin toss (assuming a fair coin) PrHeads PrTails .5 Suppose that we were to flip a coin over and over again and after each flip, we calculate the percentage of heads & tails # of Heads Total Flips (Sample Statistic) .5 (True Probability) That is, if we collect “enough” data, we can eventually learn the truth! We can follow the same process for the temperature in South Bend Temperature ~ N , 2 We could find this distribution by collecting temperature data for south bend Sample Mean (Average) Sample Variance 1N x xi N i 1 N 1 2 s 2 xi x 2 N i 1 Note: Standard Deviation is the square root of the variance. Suppose we know that the value of a car is determined by its age Value = $20,000 - $1,000 (Age) Car Age Value Mean = 8 Mean = $ 12,000 Std. Dev. = 2 Std. Dev. = $ 2,000 We could also use this to forecast: Value = $20,000 - $1,000 (Age) How much should a six year old car be worth? Value = $20,000 - $1,000 (6) = $14,000 Note: There is NO uncertainty in this prediction. Searching for the truth….a linear regression 18000.00 16000.00 Slope = b Error 14000.00 12000.00 10000.00 a 8000.00 6000.00 4000.00 Error 2000.00 0.00 0 2 4 6 Value = a + b * (Age) + error 8 10 12 14 error N 0,σ 2 We want to choose ‘a’ and ‘b’ to minimize the error! Regression Results Variable Intercept Age Coefficients Standard Error t Stat 12,354 653 18.9 - 854 80 -10.60 We have our estimate of “the truth” Value = $12,354 - $854 * (Age) + error Intercept (a) Age (b) Mean = $12,354 Mean = -$854 Std. Dev. = $653 Std. Dev. = $80 T-Stats bigger than 2 in absolute value are considered statistically significant! We also have some statistics about the error term Regression Statistics R Squared 0.36 Standard Error 2250 Percentage of value variance explained by age Error Term Mean = 0 Std, Dev = $2,250 We can now forecast the value of a 6 year old car 6 Salary = $12,354 - $854 * (Age) + error Mean = $12,354 Mean = $854 Mean = $0 Std. Dev. = $653 Std. Dev. = $ 80 Std. Dev. = $2,250 Value 12,354 854 * 6 $7,230 (Recall, The Average Car age is 8 years) StdDev Var a X 2Var b 2 XVar b Var error StdDev 653 6 80 2 2 2 2 2 2 6 8 80 2250 $2,259 Value 12,354 854 * 6 $7,230 StdDev 653 6 80 2 2 2 2 2 2 6 8 80 2250 $2,259 Value +95% Forecast Interval -95% Age 6 x 8 Age Note that your forecast error will always be smallest at the sample mean! Also, your forecast gets worse at an increasing rate as you depart from the mean An applied example… What are the odds that Pat Buchanan received 3,407 votes from Palm Beach County in 2000? The Strategy: Estimate a relationship for Pat Buchanan’s votes using every county EXCEPT Palm Beach “Are a function of” B F D Pat Buchanan’s Votes Observable Demographics Using Palm Beach data, forecast Pat Buchanan’s vote total for Palm Beach BPB F DPB The Data: Demographic Data By County County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 What variables do you think should affect Pat Buchanan’s Vote total? # of Buchanan votes V a bC % of County that is college educated # of votes gained/lost for each percentage point increase in college educated population Results Parameter a b Value 5.35 14.95 Standard Error 58.5 3.84 T-Statistic .09 3.89 R-Square = .19 19% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 15 and a standard deviation of 4 Each percentage point increase in college educated (i.e. from 10% to 11%) raises Buchanan’s vote total by 15 0 15 There is a 95% chance that the value for ‘b’ lies between 23 and 7 V 5.35 14.95C Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 522 262 260 Baker 5.7 90 73 17 Lets try something a little different… County College (%) Buchanan Votes Log of Buchanan Votes Alachua 34.6 262 5.57 Baker 5.7 73 4.29 Log of Buchanan votes LN V a bC % of County that is college educated Percentage increase/decease in votes for each percentage point increase in college educated population Results Parameter a b Value 3.45 .09 Standard Error .27 .02 T-Statistic 12.6 5.4 R-Square = .31 31% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of .09 and a standard deviation of .02 Each percentage point increase in college educated (i.e. from 10% to 11%) raises Buchanan’s vote total by .09% 0 .09 There is a 95% chance that the value for ‘b’ lies between .13 and .05 LN V 3.45 .09C V e LN V Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 902 262 640 Baker 5.7 55 73 -18 How about this… County College (%) Buchanan Votes Log of College (%) Alachua 34.6 262 3.54 Baker 5.7 73 1.74 # of Buchanan votes V a bLN C Log of % of County that is college educated Gain/ Loss in votes for each percentage increase in college educated population Results Parameter a b Value -424 252 Standard Error 139 54 T-Statistic -3.05 4.6 R-Square = .25 25% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 252 and a standard deviation of 54 Each percentage increase in college educated (i.e. from 30% to 30.3%) raises Buchanan’s vote total by 252 votes 0 .09 There is a 95% chance that the value for ‘b’ lies between 360 and 144 V 424 252LN C Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 469 262 207 Baker 5.7 15 73 -58 One More… County College (%) Buchanan Votes Log of College (%) Log of Buchanan Votes Alachua 34.6 262 3.54 5.57 Baker 5.7 73 1.74 4.29 Log of Buchanan votes LN V a bLN C Log of % of County that is college educated Percentage gain/Loss in votes for each percentage increase in college educated population Results Parameter a b Value .71 1.61 Standard Error .63 .24 T-Statistic 1.13 6.53 R-Square = .40 40% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 1.61 and a standard deviation of .24 Each percentage increase in college educated (i.e. from 30% to 30.3%) raises Buchanan’s vote total by 1.61% 0 .09 There is a 95% chance that the value for ‘b’ lies between 2 and 1.13 LN V .71 1.61LN C V e LN V Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 624 262 362 Baker 5.7 34 73 -39 It turns out the regression with the best fit looks like this. County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 LN P a1 a2 B a2 A65 a3 H a4C a5 I Buchanan Votes Total Votes Error term *100 Parameters to be estimated The Results: Variable Coefficient Standard Error t - statistic Intercept 2.146 .396 5.48 Black (%) -.0132 .0057 -2.88 Age 65 (%) -.0415 .0057 -5.93 Hispanic (%) -.0349 .0050 -6.08 College (%) -.0193 .0068 -1.99 Income (000s) -.0658 .00113 -4.58 R Squared = .73 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I Now, we can make a forecast! County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Votes Total Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 County Predicted Votes Actual Votes Error Alachua 520 262 258 Baker 55 73 -18 County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Palm Beach 21.8 23.6 9.8 22.1 33.5 3,407 431,621 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I LN P 2.004 P e 2.004 .134% .00134431,621 578 This would be our prediction for Pat Buchanan’s vote total! LN P 2.004 We know that the log of Buchanan’s vote percentage is distributed normally with a mean of -2.004 and with a standard deviation of .2556 Probability LN(%Votes) -2.004 – 2*(.2556) = -2.5152 -2.004 + 2*(.2556) = -1.4928 There is a 95% chance that the log of Buchanan’s vote percentage lies in this range P e 2.004 .134% Next, lets convert the Logs to vote percentages Probability % of Votes e 2.5152 .08% e 1.4928 .22% There is a 95% chance that Buchanan’s vote percentage lies in this range .00134431,621 578 Finally, we can convert to actual votes Probability 3,407 votes turns out to be 7 standard deviations away from our forecast!!! .0008431,621 348 Votes .0022431,621 970 There is a 95% chance that Buchanan’s total vote lies in this range Back to the original problem. We know that the quantity of some good or service demanded should be related to some basic variables “ Is a function of” QD DP, I ,... Price Quantity Demanded Price Income D Quantity Other “Demand Shifters” Demand Factors Cross Sectional estimation holds the time period constant and estimates the variation in demand resulting from variation in the demand factors Time t-1 t t+1 For example: can we estimate demand for Pepsi in South Bend by looking at selected statistics for South bend Suppose that we have the following data for sales in 200 different Indiana cities City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Lets begin by estimating a basic demand curve – quantity demanded is a linear function of price. Q a0 a1P Change in quantity demanded per $ change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 155,042 18,133 8.55 Price (X) -46,087 7214 -6.39 Regression Statistics R Squared .17 Standard Error 48,074 Q 155,042 46,087 P Every dollar increase in price lowers sales by 46,087 units. Values For South Bend Price of Pepsi $1.37 Q 155,042 46,0871.37 91,903 P We can now use this estimated demand curve along with price in South Bend to estimate demand in South Bend $1.37 Q 91,903 We can get a better sense of magnitude if we convert the estimated coefficient to an elasticity Q 155,042 46,087 P Q 46,087 p P Q p 1.37 46,087 p Q 91,903 1.37 .68 91 , 903 46,087 $1.37 P $1.37 Q 155,042 46,0871.37 91,903 Q 91,903 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Using logs changes the interpretation of the coefficients. Q a0 a1LN P Change in quantity demanded per percentage change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 133,133 14,892 8.93 Price (X) -103,973 16,407 -6.33 Regression Statistics R Squared .17 Standard Error 48,140 Q 133,133 103,973LN P Every 1% increase in price lowers sales by 103,973 units. Values For South Bend Price of Pepsi $1.37 Log of Price .31 Q 133,133 103,973.31 P We can now use this estimated demand curve along with price in South Bend to estimate demand in South Bend $1.37 Q 100,402 We can get a better sense of magnitude if we convert the estimated coefficient to an elasticity Q 133,133 103,973P Q 103,973 %p Q 1 1 103,973 p Q 100,402 P 1 1.04 100 , 402 103,973 $1.37 LN P LN 1.37 .31 Q 133,133 103,97331 100,402 Q 100,402 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Using logs changes the interpretation of the coefficients. LN Q a0 a1P Percentage change in quantity demanded per $ change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 13 .34 38.1 Price (X) -1.22 .13 -8.98 Regression Statistics R Squared .28 Standard Error .90 LN Q 131.22P Every $1 increase in price lowers sales by 1.22%. Values For South Bend Price of Pepsi $1.37 LN Q 13 1.221.37 11.33 Qe 11.33 83,283 P We can now use this estimated demand curve along with price in South Bend to estimate demand in South Bend $1.37 Q 83,283 We can get a better sense of magnitude if we convert the estimated coefficient to an elasticity LN Q 131.22P %Q 1.22 p %Q p 1.37 1.22 p 1 1 P 1.37 1.67 1 1.22 $1.37 LN Q 13 1.221.37 11.33 Q e11.33 83,283 Q 83,283 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Using logs changes the interpretation of the coefficients. LN Q a0 a1LN P Percentage change in quantity demanded per percentage change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 12.3 .28 42.9 Price (X) -2.60 .31 -8.21 Regression Statistics R Squared .25 Standard Error .93 LN Q 12 2.6LN P Every 1% increase in price lowers sales by 2.6%. Values For South Bend Price of Pepsi $1.37 Log of Price .31 LN Q 12 2.6.31 11.19 Q e11.19 72,402 P We can now use this estimated demand curve along with price in South Bend to estimate demand in South Bend $1.37 Q 72,402 We can get a better sense of magnitude if we convert the estimated coefficient to an elasticity LN Q 12 2.6LN P %Q 2.6 %p P 2.6 LN Q 12 2.6.31 11.19 $1.37 Q e11.19 72,402 Q 83,283 We can add as many variables as we want in whatever combination. The goal is to look for the best fit. LN Q a0 a1P a2 LN I a3 LN Pc % change in Sales per $ change in price % change in Sales per % change in income % change in Sales per % change in competitor’s price Regression Results Variable Intercept Coefficient Standard Error t Stat 5.98 1.29 4.63 -1.29 .12 -10.79 Log of Income 1.46 .34 4.29 Log of Competitor’s Price 2.00 .34 5.80 Price R Squared: .46 Values For South Bend Price of Pepsi $1.37 Log of Income 3.81 Log of Competitor’s Price Now we can make a prediction and calculate elasticities .80 LN Q 5.98 1.291.37 1.463.81 2.00.80 11.36 Q e11.36 87,142 P %Q P 1.37 1.29 1.76 P 1 1 %Q I 1.46 %I %Q 2.00 CP %Pc $1.37 Q 87,142 Demand Factors We could use a cross sectional regression to forecast quantity demanded out into the future, but it would take a lot of information! Time t-1 t Estimate a demand curve using data at some point in time t+1 Use the estimated demand curve and forecasts of data to forecast quantity demanded Demand Factors Time Series estimation ignores the demand factors constant and estimates the variation in demand over time Time t-1 t t+1 For example: can we predict demand for Pepsi in South Bend next year by looking at how demand varies across time Time series estimation leaves the demand factors constant and looks at variations in demand over time. Essentially, we want to separate demand changes into various frequencies Trend: Long term movements in demand (i.e. demand for movie tickets grows by an average of 6% per year) Business Cycle: Movements in demand related to the state of the economy (i.e. demand for movie tickets grows by more than 6% during economic expansions and less than 6% during recessions) Seasonal: Movements in demand related to time of year. (i.e. demand for movie tickets is highest in the summer and around Christmas Time Period Quantity (millions of kilowatt hours) 2003:1 11 2003:2 15 2003:3 12 2003:4 14 2004:1 12 2004:2 17 2004:3 13 2004:4 16 2005:1 14 2005:2 18 2005:3 15 2005:4 17 2006:1 15 2006:2 20 2006:3 16 2006:4 19 Suppose that you work for a local power company. You have been asked to forecast energy demand for the upcoming year. You have data over the previous 4 years: First, let’s plot the data…what do you see? 25 20 15 10 5 0 2003-1 2004-1 2005-1 2006-1 This data seems to have a linear trend A linear trend takes the following form: Estimated value for time zero Estimated quarterly growth (in kilowatt hours) xt x0 bt Forecasted value at time t (note: time periods are quarters and time zero is 2003:1) Time period: t = 0 is 2003:1 and periods are quarters Regression Results Variable Coefficient Standard Error t Stat Intercept 11.9 .953 12.5 Time Trend .394 .099 4.00 Regression Statistics R Squared Standard Error Observations .53 1.82 16 xt 11.9 .394t Lets forecast electricity usage at the mean time period (t = 8) xˆt 11.9 .3948 15.05 Var xˆt 3.50 Here’s a plot of our regression line with our error bands…again, note that the forecast error will be lowest at the mean time period 25 20 15 10 5 0 2003-1 2004-1 2005-1 T=8 2006-1 We can use this linear trend model to predict as far out as we want, but note that the error involved gets worse! 70 60 50 40 30 20 10 0 Sample xˆt 11.9 .39476 41.85 Var xˆt 47.7 One method of evaluating a forecast is to calculate the root mean squared error Time Period Actual Predicted Error 2003:1 11 12.29 -1.29 2003:2 15 12.68 2.31 2003:3 12 13.08 -1.08 2003:4 14 13.47 .52 2004:1 12 13.87 -1.87 2004:2 17 14.26 2.73 2004:3 13 14.66 -1.65 2004:4 16 15.05 .94 2005:1 14 15.44 -1.44 2005:2 18 15.84 2.15 2005:3 15 16.23 -1.23 2005:4 17 16.63 .37 2006:1 15 17.02 -2.02 2006:2 20 17.41 2.58 2006:3 16 17.81 -1.81 2006:4 19 18.20 .79 Sum of squared forecast errors A F 2 RMSE t n Number of Observations RMSE 1.70 t Lets take another look at the data…it seems that there is a regular pattern… 25 20 Q2 Q2 Q2 Q2 15 10 5 0 2003-1 2004-1 2005-1 2006-1 We are systematically under predicting usage in the second quarter Average Ratios Time Period Actual Predicted Ratio Adjusted 2003:1 11 12.29 .89 12.29(.87)=10.90 2003:2 15 12.68 1.18 12.68(1.16) = 14.77 2003:3 12 13.08 .91 13.08(.91) = 11.86 •Q2 = 1.16 2003:4 14 13.47 1.03 13.47(1.04) = 14.04 •Q3 = .91 2004:1 12 13.87 .87 13.87(.87) = 12.30 2004:2 17 14.26 1.19 14.26(1.16) = 16.61 2004:3 13 14.66 .88 14.66(.91) = 13.29 2004:4 16 15.05 1.06 15.05(1.04) = 15.68 2005:1 14 15.44 .91 15.44(.87) = 13.70 2005:2 18 15.84 1.14 15.84(1.16) = 18.45 2005:3 15 16.23 .92 16.23(.91) = 14.72 2005:4 17 16.63 1.02 16.63(1.04) = 17.33 2006:1 15 17.02 .88 17.02(.87) = 15.10 2006:2 20 17.41 1.14 17.41(1.16) = 20.28 2006:3 16 17.81 .89 17.81(.91) = 16.15 2006:4 19 18.20 1.04 18.20(1.04) = 18.96 We can adjust for this seasonal component… •Q1 = .87 •Q4 = 1.04 Now, we have a pretty good fit!! 20 19 18 17 16 15 14 13 12 11 10 2003-1 2004-1 2005-1 2006-1 RMSE .26 Recall our prediction for period 76 ( Year 2022 Q4) 70 60 50 40 30 20 10 0 xˆt 11.9 .39476 41.851.04 43.52 We could also account for seasonal variation by using dummy variables xt x0 b0t b1D1 b2 D2 b3 D3 1, if quarter i Di 0, else Note: we only need three quarter dummies. If the observation is from quarter 4, then D1 D2 D3 0 xt x0 b0t Regression Results Variable Coefficient Intercept Standard Error t Stat 12.75 .226 56.38 .375 .0168 22.2 D1 -2.375 .219 -10.83 D2 1.75 .215 8.1 D3 -2.125 .213 -9.93 Time Trend Regression Statistics R Squared .99 Standard Error .30 Observations 16 Note the much better fit!! xt 12.75 .375t 2.375D1 1.75D2 2.125D3 Time Period Actual Ratio Method Dummy Variables 2003:1 11 10.90 10.75 2003:2 15 14.77 15.25 2003:3 12 11.86 11.75 2003:4 14 14.04 14.25 2004:1 12 12.30 12.25 2004:2 17 16.61 16.75 2004:3 13 13.29 13.25 2004:4 16 15.68 15.75 2005:1 14 13.70 13.75 2005:2 18 18.45 18.25 2005:3 15 14.72 14.75 2005:4 17 17.33 17.25 2006:1 15 15.10 15.25 2006:2 20 20.28 19.75 2006:3 16 16.15 16.25 2006:4 19 18.96 18.75 Ratio Method RMSE .26 Dummy Variables RMSE .25 Recall our prediction for period 76 ( Year 2022 Q4) 70 60 50 40 30 20 10 0 xt 12.75 .37576 41.25 Recall, our trend line took the form… xt x0 bt This parameter is measuring quarterly change in electricity demand in millions of kilowatt hours. Often times, its more realistic to assume that demand grows by a constant percentage rather that a constant quantity. For example, if we knew that electricity demand grew by g% per quarter, then our forecasting equation would take the form g% xt x0 1 100 t If we wish to estimate this equation, we have a little work to do… xt x0 1 g t Note: this growth rate is in decimal form If we convert our data to natural logs, we get the following linear relationship that can be estimated ln xt ln x0 t ln 1 g Regression Results Variable Coefficient Standard Error t Stat Intercept 2.49 .063 39.6 Time Trend .026 .006 4.06 Regression Statistics R Squared Standard Error Observations .54 .1197 ln xt 2.49 .026t 16 Lets forecast electricity usage at the mean time period (t = 8) ln xˆt 2.49 .0268 2.698 BE CAREFUL….THESE NUMBERS ARE LOGS !!! Var xˆt .0152 ln xˆt 2.49 .0268 2.698 Var xˆt .0152 The natural log of forecasted demand is 2.698. Therefore, to get the actual demand forecast, use the exponential function e 2.698 14.85 Likewise, with the error bands…a 95% confidence interval is +/- 2 SD 2.698 / 2 .0152 2.451,2.945 e 2.451 ,e 2.945 11.60,19.00 Again, here is a plot of our forecasts with the error bands 30 25 20 15 10 5 0 2003-1 2004-1 2005-1 T=8 2006-1 RMSE 1.70 Errors is growth rates compound quickly!! 600 500 400 300 200 100 0 1 13 25 37 49 61 73 85 97 e 4.49 89.22 / 2SD 35.8,221.8 Let’s try one…suppose that we are interested in forecasting gasoline prices. We have the following historical data. (the data is monthly from April 1993 – June 2010) Does a linear (constant cents per gallon growth per year) look reasonable? Let’s suppose we assume a linear trend. Then we are estimating the following linear regression: monthly growth in cents per gallon pt p0 bt Price at time t Price at April 1993 Number of months from April 1993 Regression Results Variable Intercept Time Trend Coefficient Standard Error t Stat .67 .05 12.19 .010 .0004 23.19 R Squared= .72 We can check for the presence of a seasonal cycle by adding seasonal dummy variables: Cents per gallon impact of quarter I relative to quarter 4 1, if quarter i Di 0, else pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept .58 .07 8.28 Time Trend .01 .0004 23.7 D1 -.03 .075 -.43 D2 .15 .074 2.06 D3 .16 .075 2.20 R Squared= .74 If we wanted to remove the seasonal component, we could by subtracting the seasonal dummy off each gas price Seasonalizing Date Regression coefficient Price Seasonalized data 1993 – 04 1.05 2nd Quarter .15 .90 1993 - 07 1.06 3rd Quarter .16 90 1993 - 10 1.06 4th Quarter 0 1.06 1994 - 01 .98 1st Quarter -.03 1.01 1994 - 04 1.00 2nd Quarter .15 .85 Note: Once the seasonal component has been removed, all that should be left is trend, cycle, and noise. We could check this: Seasonalized Price Series Regression Results ~ pt p0 bt Variable Coefficient Standard Error t Stat Intercept .587 .05 11.06 Time Trend .010 .0004 23.92 Seasonalized Price Series ~ pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept .587 .07 8.28 Time Trend .010 .0004 23.7 D1 0 .075 0 D2 0 .074 0 D3 0 .075 0 The regression we have in place gives us the trend plus the seasonal component of the data pt .58 .01t .03D1 .15D2 .16D3 Predicted Trend Seasonal If we subtract our predicted price (from the regression) from the actual price, we will have isolated the business cycle and noise Business Cycle Component Date Actual Price Predicted Price (From regression) Business Cycle Component 1993 - 04 1.050 .752 .297 1993 - 05 1.071 .763 .308 1993 - 06 1.075 773 .301 1993 - 07 1.064 .797 .267 1993 - 08 1.048 .807 .240 We can plot this and compare it with business cycle dates Actual Price pt pˆ t Predicted Price Data Breakdown Date Actual Price Trend Seasonal Business Cycle 1993 - 04 1.050 .60 .15 .30 1993 - 05 1.071 .61 .15 .31 1993 - 06 1.075 .62 .15 .30 1993 - 07 1.064 .63 .16 .27 1993 - 08 1.048 .64 .16 .24 Regression Results Variable Coefficient Standard Error t Stat Intercept .58 .07 8.28 Time Trend .01 .0004 23.7 D1 -.03 .075 -.43 D2 .15 .074 2.06 D3 .16 .075 2.20 Perhaps an exponential trend would work better… An exponential trend would indicate constant percentage growth rather than cents per gallon. We already know that there is a seasonal component, so we can start with dummy variables Monthly growth rate Percentage price impact of quarter I relative to quarter 4 1, if quarter i Di 0, else ln pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept -.14 .03 -4.64 Time Trend .005 .0001 29.9 D1 -.02 .032 -.59 D2 .06 .032 2.07 D3 .07 .032 2.19 R Squared= .81 If we wanted to remove the seasonal component, we could by subtracting the seasonal dummy off each gas price, but now, the price is in logs Seasonalizing Date Price Log of Price Regression coefficient Log of Seasonalized data Seasonalized Price 1993 – 04 1.05 .049 2nd Quarter .06 -.019 .98 1993 - 07 1.06 .062 3rd Quarter .07 -.010 .99 1993 - 10 1.06 .062 0 .062 1.06 1994 - 01 .98 -.013 1st Quarter -.02 .006 1.00 1994 - 04 1.00 .005 2nd Quarter .06 -.062 .94 Example: e .019 .98 4th Quarter The regression we have in place gives us the trend plus the seasonal component of the data ln pt .14 .005t .02D1 .06D2 .07 D3 Predicted Log of Price Seasonal Trend If we subtract our predicted price (from the regression) from the actual price, we will have isolated the business cycle and noise e .069 .93 Business Cycle Component Date Actual Price Predicted Log Price (From regression) Predicted Price Business Cycle Component 1993 - 04 1.050 -.069 .93 .12 1993 - 05 1.071 -.063 .94 .13 1993 - 06 1.075 -.057 .94 .13 1993 – 07 1.064 -.047 .95 .11 1993 - 08 1.048 -.041 .96 .09 As you can see, very similar results Actual Price pt pˆ t Predicted Price In either case, we could make a forecast for gasoline prices next year. Lets say, April 2011. Forecasting Data Date Time Period April 2011 217 Quarter 2 pt .58 .01217 .030 .151 .160 2.90 OR ln pt .14 .005217 .020 .061 .070 1.005 e1.005 2.73 Quarter Market Share 1 20 2 22 25 3 23 20 4 24 5 18 6 23 7 19 5 8 17 0 9 22 10 23 11 18 12 23 30 15 10 Consider a new forecasting problem. You are asked to forecast a company’s market share for the 13th quarter. 1 2 3 4 5 6 7 8 9 10 There doesn’t seem to be any discernable trend here… 11 12 Smoothing techniques are often used when data exhibits no trend or seasonal/cyclical component. They are used to filter out short term noise in the data. Quarter Market Share MA(3) MA(5) 1 20 2 22 3 23 4 24 21.67 5 18 23 6 23 21.67 21.4 7 19 21.67 22 8 17 20 21.4 9 22 19.67 20.2 10 23 19.33 19.8 11 18 20.67 20.8 12 23 21 19.8 A moving average of length N is equal to the average value over the previous N periods t 1 MAN A tN N t The longer the moving average, the smoother the forecasts are… 30 25 20 Actual 15 MA(3) MA(5) 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 Calculating forecasts is straightforward… MA(3) Quarter Market Share MA(3) MA(5) 1 20 2 22 3 23 4 24 21.67 5 18 23 6 23 21.67 21.4 7 19 21.67 22 8 17 20 21.4 9 22 19.67 20.2 10 23 19.33 19.8 11 18 20.67 20.8 12 23 21 19.8 23 18 23 21.33 3 MA(5) 23 18 23 22 17 20.6 5 So, how do we choose N?? Quarter Market Share MA(3) Squared Error MA(5) Squared Error 1 20 2 22 3 23 4 24 21.67 5.4289 5 18 23 25 6 23 21.67 1.7689 21.4 2.56 7 19 21.67 7.1289 22 9 8 17 20 9 21.4 19.36 9 22 19.67 5.4289 20.2 3.24 10 23 19.33 13.4689 19.8 10.24 11 18 20.67 7.1289 20.8 7.84 12 23 21 4 19.8 10.24 Total = 78.3534 RMSE 78.3534 2.95 9 Total = 62.48 RMSE 62.48 2.99 7 Exponential smoothing involves a forecast equation that takes the following form Ft 1 wAt 1 wFt w 0,1 Forecast for time t Forecast for time t+1 Actual value at time t Smoothing parameter Note: when w = 1, your forecast is equal to the previous value. When w = 0, your forecast is a constant. For exponential smoothing, we need to choose a value for the weighting formula as well as an initial forecast Quarter Market Share W=.3 W=.5 1 20 21.0 21.0 2 22 20.7 20.5 3 23 21.1 21.3 4 24 21.7 22.2 5 18 22.4 23.1 6 23 21.1 20.6 7 19 21.7 21.8 8 17 20.9 20.4 9 22 19.7 18.7 10 23 20.4 20.4 11 18 21.2 21.7 12 23 20.2 19.9 Usually, the initial forecast is chosen to equal the sample average .523 .520.6 21.8 As was mentioned earlier, the smaller w will produce a smoother forecast 30 25 20 15 10 5 0 1 2 3 4 5 Actual 6 7 w=.3 8 9 w=.5 10 11 12 Calculating forecasts is straightforward… W=.3 Quarter Market Share W=.3 W=.5 1 20 21.0 21.0 2 22 20.7 20.5 3 23 21.1 21.3 4 24 21.7 22.2 5 18 22.4 23.1 6 23 21.1 20.6 7 19 21.7 21.8 8 17 20.9 20.4 9 22 19.7 18.7 10 23 20.4 20.4 11 18 21.2 21.7 12 23 20.2 19.9 .323 .720.2 21.04 W=.5 .523 .519.9 21.45 So, how do we choose W?? Quarter Market Share W = .3 Squared Error W=.5 Squared Error 1 20 21.0 1 21.0 1 2 22 20.7 1.69 20.5 2.25 3 23 21.1 3.61 21.3 2.89 4 24 21.7 5.29 22.2 3.24 5 18 22.4 19.36 23.1 26.01 6 23 21.1 3.61 20.6 5.76 7 19 21.7 7.29 21.8 7.84 8 17 20.9 15.21 20.4 11.56 9 22 19.7 5.29 18.7 10.89 10 23 20.4 6.76 20.4 6.76 11 18 21.2 10.24 21.7 13.69 12 23 20.2 7.84 19.9 9.61 Total = 87.19 RMSE 87.19 2.70 12 Total = 101.5 RMSE 101.5 2.91 12