Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 Regression BPS - 3rd Ed. Chapter 5 1 Objectives of Regression To describe the change in Y per unit X To predict the average level of Y at a given level of X BPS - 3rd Ed. Chapter 5 2 “Returning Birds” Example Plot data first to see if relation can be described by straight line (important!) Illustrative data from Exercise 4.4 Y = adult birds joining colony X = percent of birds returning, prior year BPS - 3rd Ed. Chapter 5 3 If data can be described by straight line … describe relationship with equation Y = (intercept) + (slope)(X) May also be written: Y = (slope)(X) + (intercept) Intercept where line crosses Y axis Slope “angle” of line BPS - 3rd Ed. Chapter 5 4 Linear Regression Algebraic line every point falls on line: exact y = intercept + (slope)(X) line scatter cloud suggests a linear trend: Statistical “predicted y” = intercept + (slope)(X) BPS - 3rd Ed. Chapter 5 5 Regression Equation ŷ = a + bx, where – ŷ (“y-hat”) is the predicted value of Y – a is the intercept The TI calculators reverse a & b! – b is the slope – x is a value for X Determine BPS - 3rd Ed. a & b for “best fitting line” Chapter 5 6 What Line Fits Best? If we try to draw the line by eye, different people will draw different lines We need a method to draw the “best line” This method is called “least squares” BPS - 3rd Ed. Chapter 5 7 The “least squares” regression line Each point has: Residual = observed y – predicted y = distance of point from prediction line The least squares line minimizes the sum of the square residuals BPS - 3rd Ed. Chapter 5 8 Calculating Least Squares Regression Coefficients Formula (next slide) Technology – TI-30XIIS – Two variable Applet – Other BPS - 3rd Ed. Chapter 5 9 Formulas b = slope coefficient a = intercept coefficient sy br sx a y bx where sx and sy are the standard deviations of the two variables, and r is their correlation BPS - 3rd Ed. Chapter 5 10 Technology: Calculator BEWARE! TI calculators label the slope and intercept backwards! BPS - 3rd Ed. Chapter 5 11 Regression Line For the “bird data”: a = 31.9343 b = 0.3040 The linear regression equation is: ŷ = 31.9343 0.3040x The slope (-0.3040) represents the average change in Y per unit X BPS - 3rd Ed. Chapter 5 12 Use of Regression for Prediction Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony? Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69 Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60. BPS - 3rd Ed. Chapter 5 13 Prediction via Regression Line Number of new birds and Percent returning When X = 60, the regression model predicts Y = 13.69 BPS - 3rd Ed. Chapter 5 14 Case Study Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe BPS - 3rd Ed. Chapter 5 15 Regression Calculation Case Study Country Austria Belgium Finland France Germany Ireland Italy Netherlands Switzerland United Kingdom BPS - 3rd Ed. Per Capita GDP (x) 21.4 23.2 20.0 22.7 20.8 18.6 21.5 22.0 23.8 21.2 Chapter 5 Life Expectancy (y) 77.48 77.53 77.32 78.63 77.17 76.39 78.51 78.15 78.99 77.37 16 Life Expectancy and GDP (Europe) Case Study (Life Expectancy) Life expectancy (yrs) 79 78 77 76 18 19 20 21 22 23 24 Per Capital GDP BPS - 3rd Ed. Chapter 5 17 Regression Calculation by Hand (Life Expectancy Study) Calculations: x 21.52 s x 1.532 y 77.754 sy 0.795 r 0.809 0.795 br (0.809) 0.420 sx 1.532 a y bx 77.754 - (0.420)(21 .52) 68.716 sy ŷ = 68.716 + 0.420x BPS - 3rd Ed. Chapter 5 18 BPS/3e Two Variable Applet BPS - 3rd Ed. Chapter 5 19 Applet: Data Entry BPS - 3rd Ed. Chapter 5 20 Applet: Calculations BPS - 3rd Ed. Chapter 5 21 Applet: Scatterplot BPS - 3rd Ed. Chapter 5 22 Applet: least squares line BPS - 3rd Ed. Chapter 5 23 Interpretation Life Expectancy Case Study ŷ = 68.716 + (0.420)X Slope: For each increase in GDP 0.420 years increase in life expectancy Prediction example: What is the life expectancy in a country with a GDP of 20.0? ANSWER: ŷ = 68.716 + (0.420)(20.0) = 77.12 Model: BPS - 3rd Ed. Chapter 5 24 Coefficient of Determination (R2) (Fact 4 on p. 111) “Coefficient of determination, (R2) Quantifies the fraction of the Y “mathematically explained” by X Examples: r=1: R2=1: r=.7: R2=.49: regression line explains almost half (49%) of the variation in Y BPS - 3rd Ed. regression line explains all (100%) of the variation in Y Chapter 5 25 We are NOT going to cover the analysis of residual plots (pp. 113-116) BPS - 3rd Ed. Chapter 5 26 Outliers and Influential Points An outlier is an observation that lies far from the regression line Outliers in the y direction have large residuals Outliers in the x direction are influential – removal of influential point would markedly change the regression and correlation values BPS - 3rd Ed. Chapter 5 27 Outliers: Case Study Gesell Adaptive Score and Age at First Word After removing child 18 r2 = 11% From all the data r2 = 41% BPS - 3rd Ed. Chapter 5 28 Cautions About Correlation and Regression Describe Are only linear relationships influenced by outliers Cannot be used to predict beyond the range of X (do not extrapolate) Beware of lurking variables (variables other than X and Y) – Association does not always equal causation! BPS - 3rd Ed. Chapter 5 29 Do not extrapolate (Sarah’s height) Sarah’s BPS - 3rd Ed. 100 height (cm) height is plotted against her age Can you predict her height at age 42 months? Can you predict her height at age 30 years (360 months)? Chapter 5 95 90 85 80 30 35 40 45 50 55 60 65 age (months) 30 Do not extrapolate (Sarah’s height) Regression ŷ = 71.95 + .383(42) = 88 (Reasonable) At age 360 months: 210 190 height (cm) equation: ŷ = 71.95 + .383(X) At age 42 months: 170 150 130 110 ŷ = 71.95 + .383(360) = 209.8 (That’s over 17 feet tall!) BPS - 3rd Ed. Chapter 5 90 70 30 90 150 210 270 330 390 age (months) 31 Caution: Correlation does not always mean causation Even very strong correlations may not correspond to a causal relationship between x and y (Beware of the lurking variable!) BPS - 3rd Ed. Chapter 5 32