Download Regression - Demand Estimation: Simple Regression Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
DEMAND ESTIMATION
After studying this chapter, you should be able to:
1.
Discuss how the firm’s managers use the information about demand for its
product to determine correctly its profit-maximizing rate of output and price, or
whether to produce a particular product at all.
2.
Discuss demand respond to consumer income increase or decrease as a result
of an economic expansion or contraction.
3.
Specify the components of a regression model that can be used to estimate a
demand equation.
4.
Interpret the regression results (i.e., explain the quantitative impact that changes
in the determinants have on the quantity demanded).
5.
Explain the meaning of R2.
6.
Evaluate the statistical significance of the regression coefficients using the t-test
and the statistical significance of R2 using the F-test.
Introduction:
• An important contributor to firm risk arises from sudden shifts in demand for the
product or service.
• Demand estimation serves two managerial objectives:
(1) it provides the insights necessary for effective management of demand, and
(2) it aids in forecasting sales and revenues.
The theory
SIMPLE LINEAR REGRESSION

Relationships, among other things, may serve as a basis for estimation and
prediction.

Simple prediction—when we take the observed values of X to estimate or
predict corresponding Y values.

Regression analysis uses simple and multiple predictors to predict Y from X
values.

With respect to similarities and differences of correlation and regression, their
relatedness would suggest that beneath many correlation problems is a
regression analysis that could provide further insight about the relationship of
Y with X.
The Basic Model

A straight line is fundamentally the best way to model the relationship between
two continuous variables.

Regression coefficients are the intercept and slope coefficients.

Slope (β1)—the change in Y for a 1-unit change in X.
–
This is the ratio of change (∆) in the rise of the line relative to the run or
travel along the X axis.
1
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB

Intercept (β0)—one of two regression coefficients, is the value for the linear
function when it crosses the Y axis or the estimate of Y when X is zero.
Concept Application


Unfortunately, one rarely comes across a data set composed of four paired
values, a perfect correlation, and an easily drawn line.

A model based on such data is deterministic in that for any value of X, there
is only one possible corresponding value of Y.

A probabilistic model also uses a linear function.
Error term is the deviations of values of Y from the regression line of Y for a
particular value of X.
Method of Least Squares

The method of least squares is a procedure for finding a regression line that
keeps errors of estimate to a minimum.

When we predict the values for Y for each Xi the difference between the
actual Yi and the predicted Y is the error.

This error is then squared and then summed.
Residuals

A residual is the difference between the regression line value of Y and the real Y
value.

When standardized, residuals are comparable to Z scores with a mean of 0 and
a standard deviation of 1.

It is important to apply other diagnostics to verify that the regression assumptions
(normality, linearity, equality of variance and independence of error) are met.
Predictions

Prediction and confidence bands are bow-tie shaped confidence interval
around a predictor.

Confidence intervals can be expanded or narrowed.
Testing for Goodness of Fit

Goodness of fit is a measure of how well the regression model is able to predict
Y.


The most important test in bivariate linear regression is whether the slope,β1,
is equal to zero.
Zero slopes result from various conditions:

Y is completely unrelated to X, and no systematic pattern is evident.

There are constant values of Y for every value of X.
2
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB

The data are related but represented by a nonlinear function.
The t-Test

To test whether β1 = 0, we use a two-tailed test.
The F Test

The F test has an overall role for the model in multiple regressions.

See F test example for an illustration.
Coefficient of Determination

In predicting the values of Y without any knowledge of X, our best estimate be Y
mean.

Each predicted value that does not fall on Y contributes to an error estimate.
Multiple Regression

Multiple regression—statistical tool used to develop a self-weighting estimating
equation that predicts values for a dependent variable from the values of
independent variables.

Multiple regression is used as a descriptive tool in three types of situations:


It is often used to develop a self-weighting estimating equation by which to
predict values for a criterion variable (DV) from the values for several
predictor variables (IVs).

A description application of multiple regression calls for controlling for
confounding variables to better evaluate the contribution of other variables.

Multiple regression can be also used to test and explain causal theories. This
approach is referred to as path analysis (e.g., describes, through regression,
an entire structure of linkages advanced by a causal theory).
Multiple regression is also used as an inference tool to test hypotheses and to
estimate population values.
Method

Multiple regression is an extension of the bivariate linear regression discussed in
Chapter 19.

Dummy variables—nominal variables converted for use in multivariate statistics.

Regression coefficients are stated either in raw score units (the actual X values)
or standardized coefficients (regression coefficients in standardized form
[mean = 0] used to determine the comparative impact of variables that come from
different scales.

When regression coefficients are standardized, they are called beta weights (β)
(standardized regression coefficients where the size of the number reflects the
3
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
level of influence X exerts on Y), and their values indicate the relative importance
of the associated X values, particularly when the predictors are unrelated.
Example

Most statistical packages provide various methods for selecting variables for the
equation.

Forward selection—sequentially adds the variable to a regression model
that results in the largest R2 increase.

Backward elimination—sequentially removes the variable from a regression
model that changes R2 the least.

Stepwise selection—a method for sequentially adding or removing variables
from a regression model to optimize R2 .
–
–
–
–
–
Collinearity—when two independent variables are highly correlated.
Multicollinearity—when more than two independent variables are highly
correlated.
Both of the above can have damaging effects on multiple regression.
Another difficulty with regression occurs when researchers fail to evaluate
the equation with data beyond those used originally to calculate it.
A solution to the above problem can be the holdout sample (the portion
of the sample excludes for later validity testing when the estimating
equation is first computed).
Based on the formula (see chapter), the coefficient of determination is the ratio of the
line of best fit’s error that incurred by using Y.

One purpose of testing is to discover whether the regression equation is a re
effective predictive device than the mean of the dependent variable.

The coefficient of determination is symbolized by r squared. It has several
purposes:


As an index of fit, it is interpreted as the total proportion of variance in Y
explained by X.

As a measure of linear relationship, it tells us how well the regression line fits
the data.

It is also an important indicator of the predictive accuracy of the equation.
Typically, we would like to have an r squared that explains 80 percent or more of
the variation.
Important Concepts:
 Individual Demand Curve the greatest quantity of a good demanded at each
price the consumers are willing to buy, holding other influences constant
 The Market Demand Curve is the horizontal sum of the individual demand
curves.
 The Demand Function includes all variables that influence the quantity
demanded
4
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB

Q = f( P, Ps, Pc, Y, N, W, PE)
+ +
? +
?
+
where:
P is price of the good
PS is the price of substitute goods
PC is the price of related goods
Y is income, N is population, W is wealth, and
PE is the expected future price
Downward Slope to the Demand Curve
• Reasons that price and quantity are negatively related include:
» income effect -- as the price of a good declines, the consumer can
purchase more of all goods since his or her real income increased.
» substitution effect -- as the price declines, the good becomes relatively
cheaper. A rational consumer maximizes satisfaction by reorganizing
consumption until the marginal utility in each good per dollar is equal.
Sign of the estimated Regression Coefficients
A good regression model should be based on a good economic theory. The
theory should indicate what sign each estimated coefficient must take. For example, the
coefficient for the price variable in a demand equation should have negative sign, that is,
when price increases, demand decreases. The income variable should have a positive
sign. If the signs of estimated coefficients do not agree with the theory, the validity of the
model should be questioned.
How to do Demand Estimation?
In estimating the demand for a particular good or service, the process will be:
First step: determine all the factors that might influence this demand (i.e the
formation of Demand model).
Example:
Suppose we wanted to estimate the demand for pizza by university students in
Malaysia. What variables would most likely affect their demand for pizza? Remember
demand theory? We could start to answer this question by using price and all the nonprice determinants – such as income, prices of related goods, taste and preferences,
future expectation and number of buyers. But it is not always possible or appropriate to
include all these variables in a particular demand estimation. Why? Factors of the
availability of data and the cost of generating new data. The two types of data used in
regression analysis are cross-sectional and time series.
For the purpose of illustration, let us assume we have obtained cross-sectional
data on university students (Public and Private University in Malaysia) by conducting a
survey of thirty randomly selected University during a particular month.
Second step: Data Collection
Suppose we have gathered the following information for each campus from this
survey:
(1)
average number of slices (quantity) consumed per month by
students,
(2)
average price of a slice of pizza in places selling pizza
(3)
annual income (PTPN and FAMA)
(4)
average price of soft drink sold in the pizza places, and
(5)
location of the campus (urban versus rural)
5
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
The data obtained from our hypothetical survey are presented in Table 1.
6
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
Table 1. Sample data: The demand for Pizza
Price_P Income_Y P Com_Pc Loc X4 QuantityDD
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
100.00
100.00
90.00
95.00
110.00
125.00
125.00
150.00
80.00
80.00
90.00
100.00
100.00
110.00
125.00
110.00
150.00
100.00
150.00
150.00
150.00
125.00
125.00
100.00
75.00
100.00
110.00
125.00
150.00
150.00
14.00
16.00
8.00
7.00
11.00
5.00
12.00
10.00
18.00
12.00
6.00
5.00
12.00
10.00
14.00
15.00
16.00
12.00
12.00
10.00
13.00
15.00
16.00
17.00
10.00
12.00
6.00
10.00
8.00
10.00
100.00
95.00
110.00
90.00
100.00
100.00
125.00
150.00
100.00
90.00
80.00
75.00
100.00
125.00
130.00
80.00
90.00
95.00
100.00
90.00
95.00
100.00
95.00
100.00
100.00
110.00
125.00
90.00
80.00
95.00
7
1.00
1.00
1.00
1.00
.00
.00
1.00
.00
1.00
1.00
1.00
1.00
1.00
.00
.00
1.00
.00
1.00
.00
.00
.00
1.00
1.00
.00
1.00
1.00
.00
.00
.00
.00
10
12
13
14
9
8
4
3
15
12
13
14
12
10
10
12
11
12
10
8
9
10
11
12
13
10
9
8
8
8
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
Third step: Data Analysis
To estimate the demand for pizza, we employed the regression function contained in
SPSS.
The result
Regression - Demand Estimation: Simple Regression Analysis
Variables Entered/Removed(b)
Model
1
Variables
Entered
Variables
Removed
Location_X
4,
Tuition_X2,
Pri_Cross_
X3,
Price_X1(a)
Method
.
Enter
a All requested variables entered.
b Dependent Variable: Quantity_Y
Model Summary
Model
1
R
Adjusted R
Square
R Square
Std. Error of
the Estimate
.846(a)
.717
.671
1.64048
a Predictors: (Constant), Location_X4, Tuition_X2, Pri_Cross_X3, Price_X1
ANOVA(b)
Model
1
Sum of
Squares
Regressio
n
Residual
Total
df
Mean Square
170.087
4
42.522
67.279
25
2.691
237.367
29
F
Sig.
15.801
.000(a)
a Predictors: (Constant), Location_X4, Tuition_X2, Pri_Cross_X3, Price_X1
b Dependent Variable: Quantity_Y
Coefficients(a)
Unstandardized
Coefficients
Model
1
B
(Constant)
Price_X1
Tuition_X2
Pri_Cross_
X3
Location_X
4
Standardized
Coefficients
Std. Error
Beta
26.667
3.278
-.088
.018
.138
.087
-.076
-.544
t
Sig.
8.135
.000
-.733
-4.858
.000
.174
1.595
.123
.019
-.438
-3.948
.001
.885
-.097
-.615
.544
8
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
a Dependent Variable: Quantity_Y
Fourth Step: Testify the Validity of the independent variables
A good forecasting model may not satisfy all the statistical tests and uphold all the
underlying assumptions. Researchers can visually examine the table of model statistics,
to determine if the following criteria are met. The criteria vary with the number of
independent variables as well as with the number of observations. The following rule of
thumb is based on a model including three independent variables, thirty data points and
the 95% confidence level:
R2: The value of R2 falls between 0 and 1. Higher the value better is the correlation
between the independent variables used in a model. The users should be sontent with
when the value of R2 is greater than 0.9.
F-test: The calculated F-value should be greater than 3. If not, the estimated model
does not represent a good causal relationship between and independent variables. This
a test of the overall soundness of a model.
t-test: The calculated t-values for the regression coefficients should be greater than 2
in absolute terms. The t-value measures the signifance of individual regression
coefficient.
Standard Error of Regression: Smaller the standard error of regression, better will be
the accuracy of forecasts.
9
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
Fifth Step: Data Interpretation
Find the point price elasticity, the point income elasticity, and the point cross-price
elasticity at P=100, Y=14, Ps =110, and in Urban Area (Loc = 1) if the demand function
were estimated to be:
QD = 26.67 – 0.086·P + 0.138·Y – 0.076·Pc - 0.544.Loc
Is the demand for this product (pizza) elastic or inelastic? Is it a luxury or a necessity?
Does this product have a close substitute or complement? Find the point elasticities of
demand.
Let us assume the explanatory variables have the following values:
Price of Pizza (P)
= 100 (i.e.,RM100)
Income (Y)
= 14 (i.e RM14,000)
Price of Soft Drink (Pc)
= 110 (i.e RM1.10)
Location (Loc)
= Urban Area =1
Answer
• First find the quantity at these prices and income:
QD = 26.67 – 0.086·(100) + 0.138·(14) – 0.076·(110) - 0.544.(1)
= 10.898
•
Price elasticity ED
•
Income Elasticity EY = (Q/Y)(Y/Q)
= (0.138)(14/10.898)
= +.177 which is a normal good, but a necessity
•
Cross-price elasticity EAB = (QA/PB)(PB /QA)
= (– 0.076)(110/10.898)
= -.767 which is a complimentary
= (Q/P)(P/Q)
= (-0.086)(100/10.898)
= -0.78 which is inelastic
Six Step: Conclusions
Combined Effect of Demand Elasticities
Example:
The firm can use these elasticities to forecast the demand for their product (coffee)
next year.
•
•
Firm XYZ has a price elasticity of -2 for coffee
Firm XYZ have an income elasticity of 1.5
10
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
•
The cross price elasticity is +.50
•
Most managers find that prices and income change every year. The combined
effect of several changes are additive.
%DQ = ED(% DP) + EY(% DY) + EX(% DPR)
»
where P is price, Y is income, and PR is the price of a related good.
•
If you knew the price, income, and cross price elasticities, then you can forecast
the percentage changes in quantity.
•
What will happen to the quantity sold if you raise price 3%, income rises 2%, and
price of substitute goods raises its price 1%?
» %DQ = EP • %DP +EY • %DY + EX • %DPx
Q:
A:
»
= -2 • 3% + 1.5 • 2% +.50 • 1%
»
= -6% + 3% + .5%
»
%DQ = -2.5%. We expect sales to decline.
Will Total Revenue for your product rise or fall?
Total revenue will rise slightly (about + .5%), as the price went up 3%
and the quantity of coffe sold will fall 2.5%.
Assessment of Model Performance
In business forecasting, a response variable is often driven by many other variables. A
good forecasting model does not have to include all of the relevant variables. When a
model attains its optimal performance, inclusion of additional variables simply
complicates the task of forecasting. But they do not add anything to the accuracy. If
two models yield the same forecast accuracy, the one which contains fewer variables
should be chosen.
11
DR. HUSSIN ABDULLAH
SCHOOL OF ECONOMICS, FINANCE AND BANKING, UUM COB
PROBLEMS
QUESTION 1
1. Husin Sdn. Berhad is the maker of a high-quality Tongkat Ali. A linear regression model used
to estimate the demand function for Husin Tongkat Ali yielded the following results:
QD =
10, 425 – 2,910 PX + 0.028 A
(2.88)
(7)
R2 = 0.81
+ 11,100POP
(3.13)
S.E.E. = 3.4
Where QD = quantity of Mash Tongkat Ali demanded
PX = price of Mash Tongkat Ali
A = Husin Sdn. Bhd. Advertising in dollars
POP = percentage of the Malaysia population over 21 years of age
(i)
(ii)
(iii)
(iv)
Determine the point price elasticity for prices of RM5 and RM10, when A =
RM1,000,000 and POP = .05.
(10 marks)
Determine the point advertising elasticity at an advertising level of RM2,000,000,
if price remain at RM5 and POP = .05.
(5 marks)
The T-statistic value for each coefficient is given in parentheses. If you know that
the demand function was estimated using 25 observations, can you reject at the
95-percent confidence level the hypothesis that there is no relationship between
each of the independent variables and QD?
(5 marks)
What steps are usually involved in the estimation of a demand equation by
regression analysis?
(5 marks)
QUESTION 2
Vivian Maju Segar Sdn. Bhd. (VMS) has hired you as a consultant to analyze the demand for its
line of telecommunications devices in 35 different market areas. The available data set includes
observations on the number of thousands of units sold by VMS per month (QX), the price per unit
charged by VMS (PX), the average unit price of competing brands (PZ), monthly advertising
expenditures by VMS (A), and average gross sales (in $1,000) of businesses in the market area
(I). The result of a regression analysis (with t-ratios in parenthesis) is given below.
QX =
300
(3.0)
R2 = 0.91
(a)
(b)
(c)
- 6 PX
(3.33)
+ 2 PZ + 0.04 A + 0.01 I
(2.5)
(1.33)
(2.5)
S.E.E. = 3.6
Evaluate the statistical significance of the equation as a whole and of each of its
coefficients.
(5 marks)
The average values of the independent variables in the data set used to estimate the
equation are PX = $195, PZ = $225, A = $11,000, and I = $200,000. Calculate a point
estimate of VMS’s average sales and a 95% interval estimate of sales based on these
values.
(10 marks)
What steps are usually involved in the estimation of a demand equation by regression
analysis?
(10 marks)
12