Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9 Student Lecture Notes 9-1 Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses for applications involving a single population mean or proportion Formulate a decision rule for testing a hypothesis Know how to use the test statistic, critical value, and p-value approaches to test the null hypothesis Know what Type I and Type II errors are 2 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-2 Review Goals (continued) explain model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model 3 Review Goals (continued) recognize potential problems in multiple regression analysis and take steps to correct the problems incorporate qualitative variables into the regression model by using dummy variables use variable transformations to model nonlinear relationships 4 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-3 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill of this city is µ = $42 population proportion Example: The proportion of adults in this city with cell phones is P = .68 5 The Null Hypothesis, H0 States the assumption (numerical) to be tested Example: The average number of TV sets in U.S. Homes is at least three ( H0 : μ 3 ) Is always about a population parameter, not about a sample statistic H0 : μ 3 H0 : x 3 6 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-4 The Null Hypothesis, H0 (continued) Begin with the assumption that the null hypothesis is true Similar to the notion of innocent until proven guilty Refers to the status quo Always contains “=” , “≤” or “” sign May or may not be rejected 7 The Alternative Hypothesis, HA Is the opposite of the null hypothesis e.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: µ < 3 ) Challenges the status quo Never contains the “=” , “≤” or “” sign May or may not be accepted Is generally the hypothesis that is believed (or needs to be supported) by the researcher – a research hypothesis 8 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-5 Formulating Hypotheses Example 1: Ford motor company has worked to reduce road noise inside the cab of the redesigned F150 pickup truck. It would like to report in its advertising that the truck is quieter. The average of the prior design was 68 decibels at 60 mph. What is the appropriate hypothesis test? 9 Formulating Hypotheses Example 1: Ford motor company has worked to reduce road noise inside the cab of the redesigned F150 pickup truck. It would like to report in its advertising that the truck is quieter. The average of the prior design was 68 decibels at 60 mph. What is the appropriate test? H0: µ ≥ 68 (the truck is not quieter) status quo HA: µ < 68 (the truck is quieter) wants to support If the null hypothesis is rejected, Ford has sufficient evidence to support that the truck is now quieter. 10 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-6 Formulating Hypotheses Example 2: The average annual income of buyers of Ford F150 pickup trucks is claimed to be $65,000 per year. An industry analyst would like to test this claim. What is the appropriate hypothesis test? 11 Formulating Hypotheses Example 1: The average annual income of buyers of Ford F150 pickup trucks is claimed to be $65,000 per year. An industry analyst would like to test this claim. What is the appropriate test? H0: µ = 65,000 (income is as claimed) status quo HA: µ ≠ 65,000 (income is different than claimed) The analyst will believe the claim unless sufficient evidence is found to discredit it. 12 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-7 Hypothesis Testing Process Claim: the population mean age is 50. Null Hypothesis: H0: µ = 50 Population Now select a random sample: Sample Is x = 20 likely if µ = 50? Suppose the sample mean age is 20: x = 20 If not likely, REJECT Null Hypothesis 13 Reason for Rejecting H0 Sampling Distribution of x x 20 If it is unlikely that we would get a sample mean of this value ... μ = 50 If H0 is true ... if in fact this were the population mean… ... then we reject the null hypothesis that μ = 50. 14 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-8 Errors in Making Decisions Type I Error Reject a true null hypothesis Considered a serious type of error The probability of Type I Error is Called level of significance of the test Set by researcher in advance 15 Errors in Making Decisions (continued) Type II Error Fail to reject a false null hypothesis The probability of Type II Error is β β is a calculated value, the formula is discussed later in the chapter 16 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-9 Outcomes and Probabilities Possible Hypothesis Test Outcomes State of Nature Key: Outcome (Probability) Decision H0 True H0 False Do Not Reject H0 No error (1 - ) Type II Error (β) Reject H0 Type I Error () No Error (1-β) 17 Type I & II Error Relationship Type I and Type II errors cannot happen at the same time Type I error can only occur if H0 is true Type II error can only occur if H0 is false If Type I error probability ( ) , then Type II error probability ( β ) 18 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-10 Factors Affecting Type II Error All else equal, β when the difference between hypothesized parameter and its true value β when β when σ β when n The formula used to compute the value of β is discussed later in the chapter 19 Level of Significance, Defines unlikely values of sample statistic if null hypothesis is true Defines rejection region of the sampling distribution Is designated by , (level of significance) Typical values are .01, .05, or .10 Is selected by the researcher at the beginning Provides the critical value(s) of the test 20 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-11 Hypothesis Tests for the Mean Hypothesis Tests for σ Known σ Unknown Assume first that the population standard deviation σ is known 21 Process of Hypothesis Testing 1. Specify population parameter of interest 2. Formulate the null and alternative hypotheses 3. Specify the desired significance level, α 4. Define the rejection region 5. Take a random sample and determine whether or not the sample result is in the rejection region 6. Reach a decision and draw a conclusion 22 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-12 Level of Significance and the Rejection Region Level of significance = Lower tail test Upper tail test Example: H0: μ ≥ 3 HA: μ < 3 Two tailed test Example: Example: H0: μ ≤ 3 HA: μ > 3 H0: μ = 3 HA: μ ≠ 3 -zα 0 Reject H0 zα 0 Do not reject H0 /2 Do not reject H0 /2 -zα/2 0 Reject H0 Reject H0 Do not reject H0 zα/2 Reject H0 23 Critical Value for Lower Tail Test The cutoff value, -zα or xα , H0: μ ≥ 3 HA: μ < 3 is called a critical value Reject H0 x μ z σ n -zα xα Do not reject H0 0 µ=3 24 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-13 Critical Value for Upper Tail Test The cutoff value, zα or xα , H0: μ ≤ 3 is called a critical value HA: μ > 3 Do not reject H0 0 zα µ=3 xα x μ z Reject H0 σ n 25 Critical Values for Two Tailed Tests H0: μ = 3 HA: μ 3 There are two cutoff values (critical values): ± zα/2 or xα/2 xα/2 /2 /2 Lower Reject H0 Upper Do not reject H0 -zα/2 xα/2 0 µ=3 Lower x /2 μ z /2 Business Statistics: A Decision-Making Approach, 7e Reject H0 zα/2 xα/2 σ n Upper 26 © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-14 The Rejection Region Lower tail test Example: H0: μ ≥ 3 HA: μ < 3 Upper tail test Two tailed test Example: Example: H0: μ ≤ 3 HA: μ > 3 H0: μ = 3 HA: μ ≠ 3 -zα 0 xα 0 Do not reject H0 Reject H0 Reject H0 if z < -zα i.e., if x < xα Do not reject H0 zα xα Reject H0 Reject H0 if z > zα i.e., if x > xα /2 /2 -zα/2 0 zα/2 x α/2(L) x α/2(U) Reject H0 Do not reject H0 Reject H0 Reject H0 if z < -zα/2 or z > zα/2 i.e., if x < xα/2(L) or x > xα/2(U) 27 Two Equivalent Approaches to Hypothesis Testing z-units: For given , find the critical z value(s): -zα , zα ,or ±zα/2 Convert the sample mean x to a z test statistic: Reject H0 if z is in the rejection region, otherwise do not reject H0 z x μ σ n x units: Given , calculate the critical value(s) xα , or xα/2(L) and xα/2(U) The sample mean is the test statistic. Reject H0 if x is in the rejection region, otherwise do not reject H0 28 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-15 Hypothesis Testing Example Test the claim that the true mean # of TV sets in US homes is at least 3. (Assume σ = 0.8) 1. Specify the population value of interest The mean number of TVs in US homes 2. Formulate the appropriate null and alternative hypotheses H0: μ 3 HA: μ < 3 (This is a lower tail test) 3. Specify the desired level of significance Suppose that = .05 is chosen for this test 29 Hypothesis Testing Example (continued) 4. Determine the rejection region = .05 Reject H0 -zα= -1.645 Do not reject H0 0 This is a one-tailed test with = .05. Since σ is known, the cutoff value is a z value: Reject H0 if z < z = -1.645 ; otherwise do not reject H0 30 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-16 Hypothesis Testing Example 5. Obtain sample evidence and compute the test statistic Suppose a sample is taken with the following results: n = 100, x = 2.84 ( = 0.8 is assumed known) z Then the test statistic is: xμ 2.84 3 .16 2.0 σ 0.8 .08 n 100 31 Hypothesis Testing Example (continued) 6. Reach a decision and interpret the result = .05 z Reject H0 -1.645 Do not reject H0 0 -2.0 Since z = -2.0 < -1.645, we reject the null hypothesis that the mean number of TVs in US homes is at least 3. There is sufficient evidence that the mean is less than 3. 32 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-17 Hypothesis Testing Example (continued) An alternate way of constructing rejection region: Now expressed in x, not z units = .05 x Reject H0 2.8684 2.84 Since x = 2.84 < 2.8684, we reject the null hypothesis Do not reject H0 3 x α μ zα σ 0.8 3 1.645 2.8684 n 100 33 p-Value Approach to Testing Convert Sample Statistic ( x ) to Test Statistic (a z value, if σ is known) Determine the p-value from a table or computer Compare the p-value with If p-value < , reject H0 If p-value , do not reject H0 34 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-18 p-Value Approach to Testing (continued) p-value: Probability of obtaining a test statistic more extreme ( ≤ or ) than the observed sample value given H0 is true Also called observed level of significance Smallest value of for which H0 can be rejected 35 p-value example Example: How likely is it to see a sample mean of 2.84 (or something further below the mean) if the true mean is = 3.0? P( x 2.84 | μ 3.0) 2.84 3.0 P z 0.8 100 P(z 2.0) .0228 = .05 p-value =.0228 2.8684 3 x 2.84 -1.645 -2.0 0 z 36 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-19 p-value example (continued) Compare the p-value with If p-value < , reject H0 If p-value , do not reject H0 = .05 Here: p-value = .0228 = .05 Since .0228 < .05, we reject the null hypothesis p-value =.0228 2.8684 3 2.84 37 Example: Upper Tail z Test for Mean ( Known) A phone industry manager thinks that customer monthly cell phone bill have increased, and now average over $52 per month. The company wishes to test this claim. (Assume = 10 is known) Form hypothesis test: H0: μ ≤ 52 the average is not over $52 per month HA: μ > 52 the average is greater than $52 per month (i.e., sufficient evidence exists to support the manager’s claim) 38 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-20 Example: Find Rejection Region (continued) Suppose that = .10 is chosen for this test Find the rejection region: Reject H0 = .10 Do not reject H0 0 zα=1.28 Reject H0 Reject H0 if z > 1.28 39 Review: Finding Critical Value - One Tail What is z given = 0.10? .90 .10 = .10 .50 .40 z Standard Normal Distribution Table (Portion) 0 1.28 Z .07 .08 .09 1.1 .3790 .3810 .3830 1.2 .3980 .3997 .4015 1.3 .4147 .4162 .4177 Critical Value = 1.28 40 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-21 Example: Test Statistic (continued) Obtain sample evidence and compute the test statistic Suppose a sample is taken with the following results: n = 64, x = 53.1 (=10 was assumed known) Then the test statistic is: z xμ 53.1 52 0.88 σ 10 n 64 41 Example: Decision (continued) Reach a decision and interpret the result: Reject H0 = .10 Do not reject H0 0 1.28 Reject H0 z = .88 Do not reject H0 since z = 0.88 ≤ 1.28 i.e.: there is not sufficient evidence that the mean bill is over $52 42 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-22 p -Value Solution (continued) Calculate the p-value and compare to p-value = .1894 Reject H0 = .10 0 Do not reject H0 1.28 Reject H0 z = .88 P(x 53.1 | μ 52.0) 53.1 52.0 P z 10 64 P(z 0.88) .5 .3106 .1894 Do not reject H0 since p-value = .1894 > = .10 43 Critical Value Approach to Testing When σ is known, convert sample statistic ( x ) to a z test statistic Hypothesis Tests for Known Unknown The test statistic is: z x μ σ n 44 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-23 Critical Value Approach to Testing When σ is unknown, convert sample statistic ( x ) to a t test statistic Hypothesis Tests for Known Unknown The test statistic is: t n1 (The population must be approximately normal) x μ s n 45 Hypothesis Tests for μ, σ Unknown 1. Specify the population value of interest 2. Formulate the appropriate null and alternative hypotheses 3. Specify the desired level of significance 4. Determine the rejection region (critical values are from the t-distribution with n-1 d.f.) 5. Obtain sample evidence and compute the t test statistic 6. Reach a decision and interpret the result 46 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-24 Example: Two-Tail Test ( Unknown) The average cost of a hotel room in New York is said to be $168 per night. A random sample of 25 hotels resulted in x = $172.50 and s = $15.40. Test at the = 0.05 level. H0: μ = 168 HA: μ 168 (Assume the population distribution is normal) 47 Example Solution: Two-Tail Test H0: μ = 168 HA: μ 168 = 0.05 n = 25 Critical Values: t24 = ± 2.0639 is unknown, so use a t statistic /2=.025 Reject H0 -tα/2 -2.0639 t n 1 /2=.025 Do not reject H0 0 Reject H0 tα/2 1.46 2.0639 x μ 172.50 168 1.46 s 15.40 n 25 Do not reject H0: not sufficient evidence that true mean cost is different than $168 48 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-25 Type II Error Type II error is the probability of failing to reject a false H0 Suppose we fail to reject H0: μ 52 when in fact the true mean is μ = 50 50 52 Reject H0: μ 52 Do not reject H0 : μ 52 49 Type II Error (continued) Suppose we do not reject H0: 52 when in fact the true mean is = 50 This is the range of x where H0 is not rejected This is the true distribution of x if = 50 50 52 Reject H0: 52 Do not reject H0 : 52 50 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-26 Type II Error (continued) Suppose we do not reject H0: μ 52 when in fact the true mean is μ = 50 Here, β = P( x cutoff ) if μ = 50 β 50 52 Reject H0: μ 52 Do not reject H0 : μ 52 51 Calculating β Suppose n = 64 , σ = 6 , and = .05 σ 6 52 1.645 50.766 n 64 cutoff x μ z (for H0 : μ 52) So β = P( x 50.766 ) if μ = 50 50 50.766 Reject H0: μ 52 52 Do not reject H0 : μ 52 52 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-27 Calculating β (continued) Suppose n = 64 , σ = 6 , and = .05 50.766 50 P( x 50.766 | μ 50) P z P(z 1.02) .5 .3461 .1539 6 64 Probability of type II error: β = .1539 50 52 Reject H0: μ 52 Do not reject H0 : μ 52 53 Hypothesis Tests in Minitab 54 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-28 Hypothesis Tests in Minitab 55 Sample Minitab Output 56 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-29 Hypothesis Tests Summary Addressed hypothesis testing methodology Performed z Test for the mean (σ known) Discussed p–value approach to hypothesis testing Performed one-tail and two-tail tests . . . 57 Hypothesis Tests Summary (continued) Performed t test for the mean (σ unknown) Performed z test for the proportion Discussed Type II error and computed its probability 58 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-30 Multiple Regression Assumptions Errors (residuals) from the regression model: < e = (y – y) The model errors are independent and random The errors are normally distributed The mean of the errors is zero Errors have a constant variance 59 Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables 60 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-31 The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Formula Tab: Data Analysis / Correlation Can check for statistical significance of correlation with a t test 61 Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks 62 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-32 Pie Sales Model Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14 450 5.00 3.5 15 300 7.00 2.7 Multiple regression model: Sales = b0 + b1 (Price) + b2 (Advertising) Correlation matrix: Pie Sales Pie Sales Price Advertising Price Advertising 1 -0.44327 1 0.55632 0.03044 1 63 Interpretation of Estimated Coefficients Slope (bi) Estimates that the average value of y changes by bi units for each 1 unit increase in Xi holding all other variables constant Example: if b1 = -20, then sales (y) is expected to decrease by an estimated 20 pies per week for each $1 increase in selling price (x1), net of the effects of changes due to advertising (x2) y-intercept (b0) The estimated average value of y when all xi = 0 (assuming all xi = 0 is within the range of observed values) 64 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-33 Pie Sales Correlation Matrix Pie Sales Pie Sales Price Advertising Advertising -0.44327 1 0.55632 0.03044 1 Price vs. Sales : r = -0.44327 Price 1 There is a negative association between price and sales Advertising vs. Sales : r = 0.55632 There is a positive association between advertising and sales 65 Scatter Diagrams Sales vs. Price Sales 600 500 400 300 200 Sales vs. Advertising Sales 100 600 0 0 2 4 6 8 Price 10 500 400 300 200 100 0 0 1 2 3 Advertising 4 5 66 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-34 Estimating a Multiple Linear Regression Equation Computer software is generally used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Data / Data Analysis / Regression Minitab: Stat / Regression / Regression… 67 Estimating a Multiple Linear Regression Equation Excel: 68 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-35 Estimating a Multiple Linear Regression Equation Minitab: 69 Multiple Regression Output Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square Standard Error 0.44172 47.46341 Observations ANOVA Regression Sales 306.526 - 24.975(Price) 74.131(Advertising) 15 df SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat F 6.53861 Lower 95% Upper 95% 555.46404 70 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-36 The Multiple Regression Equation Sales 306.526 - 24.975(Price) 74.131(Advertising) where Sales is in number of pies per week Price is in $ Advertising is in $100’s. b1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price 71 Using The Model to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Sales 306.526 - 24.975(Price) 74.131(Advertising) 306.526 - 24.975 (5.50) 74.131(3.5) 428.62 Predicted sales is 428.62 pies Note that Advertising is in $100’s, so $350 means that x2 = 3.5 72 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-37 Predictions in Minitab 73 Predictions in Minitab (continued) < Predicted y value < Confidence interval for the mean y value, given these x’s < Prediction interval for an individual y value, given these x’s 74 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-38 Multiple Coefficient of Determination (R2) Reports the proportion of total variation in y explained by all x variables taken together R2 SSR Sum of squaresregression SST Total sum of squares 75 Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square Standard Error Regression SSR 29460.0 .52148 SST 56493.3 0.44172 52.1% of the variation in pie sales is explained by the variation in price and advertising 47.46341 Observations ANOVA R2 15 df SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat F 6.53861 Lower 95% Upper 95% 555.46404 76 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-39 Adjusted R2 R2 never decreases when a new x variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new x variable is added Did the new x variable add enough explanatory power to offset the loss of one degree of freedom? 77 Adjusted R2 (continued) Shows the proportion of variation in y explained by all x variables adjusted for the number of x variables used n 1 R2A 1 (1 R2 ) n k 1 (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than R2 Useful in comparing among models 78 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-40 Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA 15 df Regression R2A .44172 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat F 6.53861 Lower 95% Upper 95% 555.46404 79 Is the Model Significant? F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the x variables considered together and y Use F test statistic Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) HA: at least one βi ≠ 0 (at least one independent variable affects y) 80 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-41 F-Test for Overall Significance (continued) Test statistic: SSR MSR k F SSE MSE n k 1 where F has (numerator) D1 = k and (denominator) D2 = (n – k – 1) degrees of freedom 81 F-Test for Overall Significance (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA Regression 15 df F MSR 14730.0 6.5386 MSE 2252.8 With 2 and 12 degrees of freedom SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 F 6.53861 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat P-value for the F-Test Lower 95% Upper 95% 555.46404 82 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-42 F-Test for Overall Significance (continued) H0: β1 = β2 = 0 HA: β1 and β2 not both zero = .05 df1= 2 df2 = 12 Critical Value: F = 3.885 Do not reject H0 Reject H0 F MSR 6.5386 MSE Decision: Reject H0 at = 0.05 Conclusion: The regression model does explain a significant portion of the variation in pie sales = .05 0 Test Statistic: F F.05 = 3.885 (There is evidence that at least one independent variable affects y ) 83 Are Individual Variables Significant? Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable xi and y Hypotheses: H0: βi = 0 (no linear relationship) HA: βi ≠ 0 (linear relationship does exist between xi and y) 84 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-43 Are Individual Variables Significant? (continued) H0: βi = 0 (no linear relationship) HA: βi ≠ 0 (linear relationship does exist between xi and y ) Test Statistic: bi 0 t sbi (df = n – k – 1) 85 Are Individual Variables Significant? (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square Standard Error 0.44172 47.46341 Observations ANOVA Regression 15 df t-value for Price is t = -2.306, with p-value .0398 t-value for Advertising is t = 2.855, with p-value .0145 SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat F 6.53861 Lower 95% Upper 95% 555.46404 86 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-44 Inferences about the Slope: t Test Example From Excel output: H0: βi = 0 HA: βi 0 Coefficients Price Advertising d.f. = 15-2-1 = 12 t Stat P-value -24.97509 Standard Error 10.83213 -2.30565 0.03979 74.13096 25.96732 2.85478 0.01449 The test statistic for each variable falls in the rejection region (p-values < .05) = .05 t/2 = 2.1788 /2=.025 /2=.025 Decision: Reject H0 for each variable Conclusion: Reject H0 Do not reject H0 -tα/2 -2.1788 0 tα/2 Reject H0 2.1788 There is evidence that both Price and Advertising affect pie sales at = .05 87 Confidence Interval Estimate for the Slope Confidence interval for the population slope β1 (the effect of changes in price on pie sales): bi t / 2sbi where t has (n – k – 1) d.f. Coefficients Standard Error … Intercept 306.52619 114.25389 … 57.58835 Price -24.97509 10.83213 … -48.57626 -1.37392 74.13096 25.96732 … 17.55303 130.70888 Advertising Lower 95% Upper 95% 555.46404 Example: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price 88 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-45 Standard Deviation of the Regression Model The estimate of the standard deviation of the regression model is: SSE MSE n k 1 s Is this value large or small? Must compare to the mean size of y for comparison 89 Standard Deviation of the Regression Model (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square Standard Error 0.44172 47.46341 Observations ANOVA Regression The standard deviation of the regression model is 47.46 15 df SS MS 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 P-value Significance F 0.01201 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising t Stat F 6.53861 Lower 95% Upper 95% 555.46404 90 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-46 Standard Deviation of the Regression Model (continued) The standard deviation of the regression model is 47.46 A rough prediction range for pie sales in a given week is 2(47.46) 94.2 Pie sales in the sample were in the 300 to 500 per week range, so this range is probably too large to be acceptable. The analyst may want to look for additional variables that can explain more of the variation in weekly sales 91 Multicollinearity Multicollinearity: High correlation exists between two independent variables This means the two variables contribute redundant information to the multiple regression model 92 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-47 Multicollinearity (continued) Including two highly correlated independent variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations 93 Some Indications of Severe Multicollinearity Incorrect signs on the coefficients Large change in the value of a previous coefficient when a new variable is added to the model A previously significant variable becomes insignificant when a new independent variable is added The estimate of the standard deviation of the model increases when a variable is added to the model 94 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-48 Detect Collinearity (Variance Inflationary Factor) VIFj is used to measure collinearity: VIF j 1 1 R 2j R2j is the coefficient of determination when the jth independent variable is regressed against the remaining k – 1 independent variables If VIFj ≥ 5, xj is highly correlated with the other explanatory variables 95 Detect Collinearity Regression Analysis Price and all other X Regression Statistics Multiple R 0.030437581 R Square 0.000926446 Adjusted R Square Standard Error Observations VIF -0.075925366 1.21527235 15 1.000927305 Output for the pie sales example: Since there are only two explanatory variables, only one VIF is reported VIF is < 5 There is no evidence of collinearity between Price and Advertising 96 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-49 Qualitative (Dummy) Variables Categorical explanatory variable (dummy variable) with two or more levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables The number of dummy variables needed is (number of levels – 1) 97 Dummy-Variable Model Example (with 2 Levels) Let: ŷ b1x1 b2 x 2 y = b pie 0 sales x1 = price x2 = holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week) 98 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-50 Dummy-Variable Model Example (with 2 Levels) (continued) ŷ b0 b1x1 b2 (1) (b0 b2 ) b1x1 ŷ b0 b1x1 b2 (0) b0 Different intercept y (sales) b1x1 Holiday No Holiday Same slope If H0: β2 = 0 is rejected, then “Holiday” has a significant effect on pie sales b0 + b2 b0 x1 (Price) 99 Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: Sales 300 - 30(Price) 15(Holiday) Sales: number of pies sold per week Price: pie price in $ 1 If a holiday occurred during the week 0 If no holiday occurred b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price 100 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-51 Dummy-Variable Models (more than 2 Levels) The number of dummy variables is one less than the number of levels Example: y = house price ; x1 = square feet The style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed 101 Dummy-Variable Models (more than 2 Levels) Let the default category be “condo” 1 if ranch x2 0 if not (continued) 1 if split level x3 0 if not ŷ b0 b1x1 b2 x 2 b3 x3 b2 shows the impact on price if the house is a ranch style, compared to a condo b3 shows the impact on price if the house is a split level style, compared to a condo 102 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-52 Interpreting the Dummy Variable Coefficients (with 3 Levels) Suppose the estimated equation is ŷ 20.43 0.045x1 23.53x2 18.84x3 For a condo: x2 = x3 = 0 ŷ 20.43 0.045x1 For a ranch: x3 = 0 ŷ 20.43 0.045x1 23.53 For a split level: x2 = 0 ŷ 20.43 0.045x1 18.84 With the same square feet, a ranch will have an estimated average price of 23.53 thousand dollars more than a condo With the same square feet, a ranch will have an estimated average price of 18.84 thousand dollars more than a condo. 103 Interaction Effects Hypothesizes interaction between pairs of x variables Response to one x variable varies at different levels of another x variable Contains two-way cross product terms y β0 β1x1 β 2 x 2 β3 x 3 β 4 x1x 2 β5 x12 x 2 Basic Terms Interactive Terms 104 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-53 Effect of Interaction Given: y β0 β1x1 β2 x 2 β3 x1x 2 ε Without interaction term, effect of x1 on y is measured by β1 With interaction term, effect of x1 on y is measured by β1 + β3 x2 Effect changes as x2 increases 105 Interaction Example y = 1 + 2x1 + 3x2 + 4x1x2 y where x2 = 0 or 1 (dummy variable) x2 = 1 y = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1 12 8 x2 = 0 y = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1 x1 4 0 0 0.5 1 1.5 Effect (slope) of x1 on y does depend on x2 value 106 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-54 Interaction Regression Model Worksheet Case, i yi x1i x2i x1i x2i 1 2 3 4 : 1 4 1 3 : 1 8 3 5 : 3 5 2 6 : 3 40 6 30 : multiply x1 by x2 to get x1x2, then run regression with y, x1, x2 , x1x2 107 Evaluating Presence of Interaction Hypothesize interaction between pairs of independent variables y β0 β1x1 β2 x 2 β3 x1x 2 ε Hypotheses: H0: β3 = 0 (no interaction between x1 and x2) HA: β3 ≠ 0 (x1 interacts with x2) 108 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-55 Model Building Goal is to develop a model with the best set of independent variables Stepwise regression procedure Easier to interpret if unimportant variables are removed Lower probability of collinearity Provide evaluation of alternative models as variables are added Best-subset approach Try all combinations and select the best using the highest adjusted R2 and lowest sε 109 Stepwise Regression Idea: develop the least squares regression equation in steps, either through forward selection, backward elimination, or through standard stepwise regression The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model 110 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chapter 9 Student Lecture Notes 9-56 Best Subsets Regression Idea: estimate all possible regression equations using all possible combinations of independent variables Choose the best fit by looking for the highest adjusted R2 and lowest standard error sε Stepwise regression and best subsets regression can be performed using Minitab, or other statistical software packages 111 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.