Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exam Feb 28: sets 1,2 • Set 1 due Thurs • Memo C-1 due Feb 14 • Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs Kinderman Supplement • Ch 2: Multiple Regression • Ch 3: Analysis of Variance MULTIPLE REGRESSION Kinderman, Ch 2 Example • • • • Reference: Statistics for Managers By Levine, David M; Berenson; Stephan Second edition (1999) Prentice Hall Y = dependent variable = heating oil sales (gal) • • • • • • X1 = Temperature (degrees) X2 = Insulation (inches) X1 and X2 are independent variables Y = bo + b1X1 + b2X2 Enter data to Excel NOTE: If you can’t find Data Analysis, try Add-Ins Y = 562 –5X1 –20X2 Bottom table: Coefficient Column Interpret coefficients Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562 • b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons • b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons Categorical Variables • • • • • X = 0 or 1 Example: 0 if male, 1 if female Example: 1 if graduate, 0 if drop out Example: 1 if citizen, 0 if alien NOTE: not in this fuel oil example Estimate sales if temp = 30, insulation = 6 • Y = 562 -5(30) – 20(6) = 292 gal Standard Error = 26 Top table • Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation COEFFICIENT OF MULTIPLE DETERMINATION • Top table, R square • Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation Is there a relationship between all independent variables and dependent variables? • Ho: Null hypothesis: All coefficients = 0 Ho: NO Relationship H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship Computer output: Sample data • Hypotheses: Population parameters • Ho: Parameters = 0, but sample data makes it appear that there is a relationship • Simple regression: Ho: zero slope vs H1: slope positive or slope negative Exponents • 10-1= 0.1 • 10-2 =0.01 Decision Rule • • • • • • Reject Ho if “Significance F” < alpha Middle table Fuel oil example: Significance F = 1.6E-09 Excel: E = Exponent 1.6E-09 = 1.6*10-9 =0.0000000016 Approaches zero as limit Significance F=p-value • Excel uses p-value only if t distribution • Significance F = probability F is greater than Sample F Assume alpha = .05 • Since 0 < .05, reject Ho • We conclude there IS a relationship between fuel oil sales and the independent variables Which independent variables seem to be important factors? • • • • • • • Ho: Temperature not important factor H1: Temperature is important Reject Ho if p-value < alpha Bottom table: p-value column, X1 row P-value = 1.6E-09, or zero Reject Ho Temp is important Insulation • • • • • Ho: insulation unimportant H1: insulation important P-value = 1.9E-06, or zero Reject Ho Insulation important Analysis of Variance (ANOVA) Kinderman, Ch 3 X = number of auto accidents Live in City Live in Suburb Live in rural 1 2 1 3 0 0 2 1 0 Hypothesis Testing • Ho: µ1 = µ2 = µ 3 • H1: Not all means are = • H1: There are differences among 3 populations • H1: Average number of accidents different depending on where you live This course: manual calculations • If you used computer software, you could have as many populations as needed • Homework, exam: 3 populations • Computer: 4 or more populations • Ex: Ethnic classifications at CSUN Sample Sizes • Column 1: n1 = number of drivers sampled from policyholders living in city = 3 • Column 2: n2 = sampled from suburban drivers = 3 • Col 3: n3 = sampled from rural = 3 • Number of rows of data • Kinderman example: Different sample sizes n = n1 + n2 + n3 n =3 + 3 + 3 = 9 X = number of auto accidents Live in City Live in Suburb Live in rural 1=X11 2 1 3=X21 0 0 2=X31 1 0 (X X X ) X n 11 21 1 1 31 (1 3 2) X 3 1 Do not assume n1=3 on exam X1 2 X = number of auto accidents Live in City Live in Suburb Live in rural 1=X11 2 1 3=X21 0 0 2=X31 1 0 Σ=6 Σ=3 Σ=1 Sample mean=2 Sample mean=1 Sample mean=.3 X2 1 X 3 .3 Hypotheses • Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103) • H1: Population means are different because city drivers have more accidents X .. X ij n 6 3 1 X .. 9 X .. 1.1 Grand mean = 1.1 SSB = Sum of Squares Between • Between 3 groups • Explained Variation • Here: Variation in number of accidents explained by where you live (city, suburb, rural) • If where you live did not affect accidents, we would expect SSB = 0 • Next slide: SSB formula n1( X 1 X ..) n2( X 2 X ..) n3( X 3 X ..) 2 2 2 X = number of auto accidents Live in City Live in Suburb Live in rural 1=X11 2 1 3=X21 0 0 2=X31 1 0 Σ=6 Σ=3 Σ=1 Sample mean=2 Sample mean=1 Sample mean=.3 This example • SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2 =4.2 MSB = Mean Square Between • MSB = SSB/2 • Note: OK for this course, but bigger problems would have bigger denominator • MSB = 4.2/2 = 2.1 SSE= Sum of Squared Error • • • • Variation within group Ex: Variation within group of city drivers Unexplained variation If every city driver had same number of accidents, we would expect SSE = 0 • Formula on next slide 3 nj SSE ( Xij Xj ) 2 j 1 i 1 nj ( Xi1 X 1) ( Xi 2 X 2) ( Xi3 X 3) i 1 2 2 2 X = number of auto accidents Live in City Live in Suburb Live in rural 1=X11 2 1 3=X21 0 0 2=X31 1 0 Σ=6 Σ=3 Σ=1 Sample mean=2 Sample mean=1 Sample mean=.3 2 (1-2) 2 +(3-2) 2 +(2-2) + 2 2 2 (2-1) + (0-1) + (1-1) + (1-.3)2 + (0-.3)2 + (0-.3)2 =4.67 MSE = Mean Square Error Mean Square Within Next slide is formula for this course. Bigger problems have bigger denominator SSE MSE n3 4.67 93 MSE = 0.78 F RATIO • Sample F statistic • Test statistic • SAM F MSB samF MSE 2.1 samF .78 Sam F = 2.7 • Extreme case#1: Where you live does not affect number of accidents, so SSB =0, so MSB = 0, so sam F = 0 • Extreme case #2: Every city driver has same number of accidents, etc, so SSE = 0, so MSE = 0, so sam F is very large Critical F = cr F • F table at end of Kinderman Supplement • Appendix A, Table A.3, p 60 in Second Edition (assumes alpha = .05) • Column = 2 (denominator of MSB) • Row = n – 3 (denominator of MSE) • Correct for this course, different for bigger problems Example • Col 2 • Row = 9-3 = 6 • Cr F = 5.14 Hypothesis Testing • Ho: µ1 = µ2 = µ 3 • H1: Not all means are = • H1: There are differences among 3 populations • H1: Average number of accidents different depending on where you live Decision Rule • Reject Ho if sam F > cr F • Only right tail since SSB>0, SSE>0, so sam F>0 • If you reject Ho, you conclude that where you live affects number of accidents • If you do not reject Ho, you conclude that there is too much variation within city drivers, etc to draw any conclusions Example • Since 2.7 is NOT > 5.14, we can NOT reject Ho • Differences between city and suburb, etc are NOT significant Computer Approach • Similar to multiple regression • Reject Ho if Significance F < alpha • Needed if more than 3 groups