Download CHAPTER 14: MULTIPLE REGRESSION MODELS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
CHAPTER 14: MULTIPLE REGRESSION MODELS
(Skip section 14.8)
“There is no reason anyone would want a computer in their home.”
- Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977
Standard error = s = SSE / ( n  k  1)
Adjusted R2 = 1-(1-R2)(n-1)/n-p-1)
Example 1: The IRS wanted to estimate the amount of unpaid bills discovered by its auditing
division every month. The auditing division keeps track of the number of field-audit labor hours
as well as the number of hours its computers are used to detect unpaid taxes. The IRS believes
that information concerning rewards paid to informants is also relevant. Data for ten months was
analyzed and the following model resulted. (Unpaid taxes are measured in millions of dollars,
rewards are measured in thousands, and hours in units of one hundred.)
Amount of unpaid bills discovered = -46.95 +0.66 field hours +1.3 computer hours +0.36 reward
R-squared = 99.63%
Adjusted R-squared = 99.45%
Standard error of estimation = 0.1415
Qualitative Independent Variables
Dummy variables are used to quantify categorical variables.
Example 2: Data on income and age generated the following regression equation:
income = -1225.99 + 1158.41 age
Adding the variable sex into the equation led to the model:
income = -10280.55 + 1158.41 age + 9054.56 sex
where Sex is a dummy or indicator variable coded as follows:
Sex
= 1 if person is male
= 0 if person is female
Example 3: The Meddicorp Company is interested in analyzing the impact of a new bonus
program on its salesforce. Sales revenue is a function of advertising expenditures and varies
depending the region of the country. The company has divided its salesforce into three regional
offices: South, West and Midwest. The variable region was coded as follows:
Region = 1
if South
2
if West
3
if Midwest
The following regression model resulted:
Sales = -84 +1.55 Advert +1.11 Bonus + 119 Region
Interpret and evaluate this model. How could you improve upon this?
Example 4: Stocks can be listed on the New York Stock Exchange, on the American Exchange
or the NASDAQ (OTC). If this additional information was available to you in Example 1
(Chapter 13), how would you build a linear regression model that incorporated these categories?
Interaction
Example 5: The speed with which a particular insurance innovation is adopted is related to the
size of the insurance firm and the type of firm (stock or mutual). The speed is measured by the
number of months that elapse between the time that the first firm adopted the innovation and the
time that a given firm adopts it. Ten mutual and ten stock firms were studied and an interaction
effect between size and type of firm are included in the model.
Y = 33.83 -0.101 X1 +8.131 X2 - 0.04 X3
where X2 = 1 if the firm is a stock company and X3 = X1*X2
Quadratic models
Y = b0 + b1X + b2X2
If b2>0, the graph is convex, if b2<0, the graph is concave and if b2=0, the graph is linear.
E.g. The price of an item at an auction increases with the number of bidders at an increasing rate
rather than at a constant rate.
E.g. For low prices, demand for a gem decreases as the price increases, however when the gem is
valued at a very high price, the demand increases due to the status the owners believe they gain
by obtaining the gem.
E.g. Model the effect of fatigue on assembly time as measured by the relationship between
average assembly time and hours worked.
Example 6: Build a more realistic model for Example 2 using quadratic and interaction terms.
Example 7: Build a model to estimate the price of a laptop computer using the following
features: RAM (kilobytes), weight (lb.), volume (cubic inches), hard disk capacity (megabytes).
Multicollinearity
Definition: A condition in the data where there are strong interrelationships between the
explanatory variables.
Effect: Inflates the variances of the coefficients and hence leads to non-significant coefficients
Detection: Check correlation matrix or VIF = 1 / (1 - R2)
Solution: Combine related variables (if possible). Otherwise, transform the independent
variables by centering them. Sometimes increasing the sample size (collecting more data) helps.
Note: Quadratic models and models with interaction terms will have multicollinearity problems.
If the objective of the regression analysis is to predict or estimate, multicollinearity is not of great
concern. However, multicollinearity will make interpretation and testing of the coefficients
impossible.
Homework: # 14.7, 14.34, 14.40 and 14.53