Download MULTIPLE REGRESSION

Document related concepts
no text concepts found
Transcript
Exam Feb 28: sets 1,2
• Set 1 due Thurs
• Memo C-1 due Feb 14
• Free tutoring will be available next week
Plan A: MW 4-6PM OR
Plan B: TT 2-4PM
VOTE for Plan A or Plan B
Announce results Thurs
Kinderman Supplement
• Ch 2: Multiple Regression
• Ch 3: Analysis of Variance
MULTIPLE REGRESSION
Kinderman, Ch 2
Example
•
•
•
•
Reference: Statistics for Managers
By Levine, David M; Berenson; Stephan
Second edition (1999)
Prentice Hall
Y = dependent variable =
heating oil sales (gal)
•
•
•
•
•
•
X1 = Temperature (degrees)
X2 = Insulation (inches)
X1 and X2 are independent variables
Y = bo + b1X1 + b2X2
Enter data to Excel
NOTE: If you can’t find Data Analysis, try
Add-Ins
Y = 562 –5X1 –20X2
Bottom table:
Coefficient Column
Interpret coefficients
Intercept = bo = 562: If temp =0 and insulation = 0,
heating oil sales = 562
• b1 = -5: For all homes with same insulation, each
1 degree increase in temperature should decrease
heating oil sales by 5 gallons
• b2 = -20: For all months with same temp, each
additional 1 inch of insulation should decrease
sales by 20 gallons
Categorical Variables
•
•
•
•
•
X = 0 or 1
Example: 0 if male, 1 if female
Example: 1 if graduate, 0 if drop out
Example: 1 if citizen, 0 if alien
NOTE: not in this fuel oil example
Estimate sales if temp = 30,
insulation = 6
• Y = 562 -5(30) – 20(6) = 292 gal
Standard Error = 26
Top table
• Interpret: Typical fuel oil sales were about
26 gal away from average fuel oil sales of
other homes with same temp and insulation
COEFFICIENT OF
MULTIPLE
DETERMINATION
• Top table, R square
• Interpret: 96% of total variation in fuel oil
sales can be explained by variation in
temperature and insulation
Is there a relationship between all
independent variables and
dependent variables?
• Ho: Null hypothesis: All coefficients = 0
Ho: NO Relationship
H1: Alternative hypothesis: At least one
coefficient is not zero
H1: There is a relationship
Computer output: Sample data
• Hypotheses: Population parameters
• Ho: Parameters = 0, but sample data makes
it appear that there is a relationship
• Simple regression: Ho: zero slope vs
H1: slope positive or slope negative
Exponents
• 10-1= 0.1
• 10-2 =0.01
Decision Rule
•
•
•
•
•
•
Reject Ho if “Significance F” < alpha
Middle table
Fuel oil example: Significance F = 1.6E-09
Excel: E = Exponent
1.6E-09 = 1.6*10-9 =0.0000000016
Approaches zero as limit
Significance F=p-value
• Excel uses p-value only if t distribution
• Significance F = probability F is greater
than Sample F
Assume alpha = .05
• Since 0 < .05, reject Ho
• We conclude there IS a relationship between
fuel oil sales and the independent variables
Which independent variables
seem to be important factors?
•
•
•
•
•
•
•
Ho: Temperature not important factor
H1: Temperature is important
Reject Ho if p-value < alpha
Bottom table: p-value column, X1 row
P-value = 1.6E-09, or zero
Reject Ho
Temp is important
Insulation
•
•
•
•
•
Ho: insulation unimportant
H1: insulation important
P-value = 1.9E-06, or zero
Reject Ho
Insulation important
Analysis of Variance (ANOVA)
Kinderman, Ch 3
X = number of auto accidents
Live in City
Live in Suburb
Live in rural
1
2
1
3
0
0
2
1
0
Hypothesis Testing
• Ho: µ1 = µ2 = µ 3
• H1: Not all means are =
• H1: There are differences among 3
populations
• H1: Average number of accidents different
depending on where you live
This course: manual calculations
• If you used computer software, you could
have as many populations as needed
• Homework, exam: 3 populations
• Computer: 4 or more populations
• Ex: Ethnic classifications at CSUN
Sample Sizes
• Column 1: n1 = number of drivers sampled
from policyholders living in city = 3
• Column 2: n2 = sampled from suburban
drivers = 3
• Col 3: n3 = sampled from rural = 3
• Number of rows of data
• Kinderman example: Different sample sizes
n = n1 + n2 + n3
n =3 + 3 + 3 = 9
X = number of auto accidents
Live in City
Live in Suburb
Live in rural
1=X11
2
1
3=X21
0
0
2=X31
1
0
(X  X  X )
X 
n
11
21
1
1
31
(1  3  2)
X 
3
1
Do not assume n1=3 on exam
X1  2
X = number of auto accidents
Live in City
Live in Suburb
Live in rural
1=X11
2
1
3=X21
0
0
2=X31
1
0
Σ=6
Σ=3
Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
X2 1
X 3  .3
Hypotheses
• Ho: Differences in sample means due to
chance, but no differences if ALL drivers
were included (Prop 103)
• H1: Population means are different because
city drivers have more accidents
X .. 

X ij
n
6  3 1
X .. 
9
X ..  1.1
Grand mean = 1.1
SSB = Sum of Squares Between
• Between 3 groups
• Explained Variation
• Here: Variation in number of accidents
explained by where you live (city, suburb,
rural)
• If where you live did not affect accidents,
we would expect SSB = 0
• Next slide: SSB formula
n1( X 1  X ..)  n2( X 2  X ..)  n3( X 3  X ..)
2
2
2
X = number of auto accidents
Live in City
Live in Suburb
Live in rural
1=X11
2
1
3=X21
0
0
2=X31
1
0
Σ=6
Σ=3
Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
This example
• SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2
=4.2
MSB = Mean Square Between
• MSB = SSB/2
• Note: OK for this course, but bigger
problems would have bigger denominator
• MSB = 4.2/2 = 2.1
SSE= Sum of Squared Error
•
•
•
•
Variation within group
Ex: Variation within group of city drivers
Unexplained variation
If every city driver had same number of
accidents, we would expect SSE = 0
• Formula on next slide
3
nj
SSE    ( Xij  Xj ) 2
j 1 i 1
nj
 ( Xi1  X 1)  ( Xi 2  X 2)  ( Xi3  X 3)
i 1
2
2
2
X = number of auto accidents
Live in City
Live in Suburb
Live in rural
1=X11
2
1
3=X21
0
0
2=X31
1
0
Σ=6
Σ=3
Σ=1
Sample mean=2 Sample mean=1 Sample mean=.3
2
(1-2)
2
+(3-2)
2
+(2-2)
+
2
2
2
(2-1) + (0-1) + (1-1) +
(1-.3)2 + (0-.3)2 + (0-.3)2
=4.67
MSE = Mean Square Error
Mean Square Within
Next slide is formula for this course.
Bigger problems have bigger denominator
SSE
MSE 
n3
4.67

93
MSE = 0.78
F RATIO
• Sample F statistic
• Test statistic
• SAM F
MSB
samF 
MSE
2.1
samF 
.78
Sam F = 2.7
• Extreme case#1: Where you live does not
affect number of accidents, so SSB =0, so
MSB = 0, so sam F = 0
• Extreme case #2: Every city driver has
same number of accidents, etc, so SSE = 0,
so MSE = 0, so sam F is very large
Critical F = cr F
• F table at end of Kinderman Supplement
• Appendix A, Table A.3, p 60 in Second
Edition (assumes alpha = .05)
• Column = 2 (denominator of MSB)
• Row = n – 3 (denominator of MSE)
• Correct for this course, different for bigger
problems
Example
• Col 2
• Row = 9-3 = 6
• Cr F = 5.14
Hypothesis Testing
• Ho: µ1 = µ2 = µ 3
• H1: Not all means are =
• H1: There are differences among 3
populations
• H1: Average number of accidents different
depending on where you live
Decision Rule
• Reject Ho if sam F > cr F
• Only right tail since SSB>0, SSE>0, so
sam F>0
• If you reject Ho, you conclude that where
you live affects number of accidents
• If you do not reject Ho, you conclude that
there is too much variation within city
drivers, etc to draw any conclusions
Example
• Since 2.7 is NOT > 5.14, we can NOT reject
Ho
• Differences between city and suburb, etc are
NOT significant
Computer Approach
• Similar to multiple regression
• Reject Ho if Significance F < alpha
• Needed if more than 3 groups
Related documents