Download Regression Line

Regression Line I. Line of Least Squares Consider the following data set and its scatterplot: X 1 1 3 3 4 4 5 5 7 7 Y 1 2 2 3 3 4 3 4 4 4 X 4 SD-line Y 3  X  2 Y  1 (1) Find the equation of the SD-line. 1 Y  3  ( X  4)  Y  3  .5 X  2  Y  .5 X  1 2 (2) Find the points on the SD-line for X = 1, 3, 4, 5, 7. X X X X X  1 : Y  .5(1)  1  1.5  (1, 1.5)  3 : Y  .5(3)  1  2.5  (3, 2.5)  4 : Y  .5(4)  1  3  (4, 3)  5 : Y  .5(5)  1  3.5  (5, 3.5)  7 : Y  .5(7)  1  4.5  (7, 4.5) (3) For each value of X, find the deviations of the Y values of the points in the data set from the corresponding Y value on the SD-line; square these deviations and find the sum. X  1 : (1  1.5) 2  (.5) 2  .25 (2  1.5) 2  (.5) 2  .25 X  3 : (2  2.5) 2  (.5) 2  .25 (3  2.5) 2  (.5) 2  .25 1 X  4 : (3  3) 2  0 2  0 (4  3) 2  12  1 X  5 : (3  3.5) 2  (.5) 2  .25 (4  3.5) 2  (.5) 2  .25 X  7 : (4  4.5) 2  (.5) 2  .25 (4  4.5) 2  (.5) 2  .25 Sum = 3 Go through the same calculations using the line Y  .425 X  1.3 X  1 : Y  .425(1)  1.3  1.725  (1  1.725) 2  (.725) 2  .525625 (2  1.725) 2  (.275) 2  .075625 X  3 : Y  .425(3)  1.3  2.575  (2  2.575) 2  (.575) 2  .330625 (3  2.575) 2  (.425) 2  .180625 X  4 : Y  .425(4)  1.3  3  (3  3) 2  0 2  0 (4  3) 2  12  1 X  5 : Y  .425(5)  1.3  3.425  (3  3.425) 2  (.425) 2  .180625 (4  3.425) 2  (.575) 2  .330625 X  7 : Y  .425(7)  1.3  4.275  (4  4.275) 2  (.425) 2  .180625 (4  4.275) 2  (.425) 2  .180625 Sum = 2.985 Notice that the sum for the second line is smaller than that for the SD-line. In fact, for any line passing through the point of averages, this second line will always give a smaller sum. This line of least squares is the regression line. Recall that the average of a list of numbers is a point of least squares for that list. Thus, the regression line is to a scatterplot as the average is to a list of numbers. Note: The equation of the regression line above was calculated from the formula: r Y  Y  Y ( X  X ). The regression line does pass through the point of averages. X However, since the slope is r Y X , then associated with a change of 1  X there 2 is a change of only r Y . The regression line is not as steep as the SD-line. This is called the regression effect. II. “Smoothed Version” of the Graph of Averages Consider the same data set previously used. To plot the graph of averages, first consider the data points with X-coordinate of 1; ie. (1, 1) and (1, 2). Average the 1 2  1.5 . The point (1, 1.5) will be a point of the graph of averages. Y-values; ie. 2 Similarly, for X = 3, you would get the point (3, 2.5); for X = 4, (4, 3.5); for X = 5, (5, 3.5); and for X = 7, (7, 4). Y 5 Regression line 4 3 2 1 X 1 3 4 5 7 Notice how nicely the regression line “fits” the points on the graph of averages. The regression line is a “smoothed version” of the graph of averages. III. Line for Predicting One Variable from Another Variable Because the regression line is the line of least squares, it is the line used for predicting one variable from another variable. Suppose that we are given the following statistics about the heights and weights of college males: H  69" W  170lb  H  3"  W  20lb r  0.6 Suppose that we want to predict the height of a college male who weighs 190 lb. This male is 1 standard deviation above the mean in weight. We would not expect him to be as much as 1 standard deviation above the mean in height, as would be the case if there were perfect positive correlation and we used the SD-line. Instead, we should 3 expect him to be 0.6 standard deviation, or (0.6)(3" )  1.8" , above the mean. Thus, the predicted height would be 69"1.8"  70.8" . This calculation can be simplified 0.6(3) (W  170) or by using the equation of the regression line; ie. H  69  20 H  0.09W  53.7. When W = 190, the H = .09(190) + 53.7 = 70.8" . Now, suppose that we wanted to predict the weight of a college male who is 72" tall. We might be tempted to use the previous formula; ie. 72 = 0.09W + 53.7 which gives W = 203.3 lb. This cannot be correct because we would expect the weight to be 0.6 standard deviation, or (0.6)(20 lb) = 12 lb, above the mean; ie. W = 170 + 12 = 182 lb. The answer to this apparent dilemma is that there are two regression lines, one for predicting height from weight and a different one for predicting weight from height. 0.6(20) ( H  69) or W  4H  106 . This other regression line would be W  170  3 Thus, when H = 72, W = 4(72) – 106 = 182 lb. The first regression line, H = 0.09W + 53.7, is the line that minimizes the squared deviations in the heights; and the second regression line, W = 4H – 106, is the line that minimizes the squared deviations in the weights. These two cases can be summarized with the following formula: Equation of the Regression Line for Predicting D from I DD  r D I (I  I ) IV. Predicting Percentiles The regression method can be used to predict percentiles. Using the same population of college males with the heights and weights and also assuming that the heights and weights are normally distributed, what is the predicted percentile for height for a college male is at the 80%-tile in weight? Solution 1: (1) Find the z-score for this 80%-tile weight. 80% 60% 20% 20% z=0 z = .85 4 (2) Find the 80%-tile weight. .85  W  170  17  W  170  W  187lb 20 (3) Find the predicted height for a college male who weighs 187 lb. H  0.09W  53.7  0.09(187)  53.7  70.53" (4) Find the z-score for a height of 70.53" . z 70.53  69  .51 3 (5) Find the percentile for a z-score of 0.51. 30.5% + 30.5% 39% = 69.5%-tile 39% z = -.51 30.5% z=0 z = .51 Solution 2: (1) Find the z-score for this 80%-tile weight. As in solution 1, the z for 80% is 0.85. (2) Consider the following form of the regression line that predicts H from W: H  H W  W  r   z H  rzW W  H  W  Thus, the predicted z-score for the height is z H  0.6(0.85)  0.51. H H  r H (W  W )  (3) Find the percentile for a z-score of 0.51. As in solution 1, the predicted percentile is 69.5%-tile. Therefore, in predicting percentiles, use the formula : z D  rz I . 5 Practice Sheet – Regression Line SAT Scores S  1120 GPA’s G  2.5  S  160  G  0.5 r = 0.8 Assume that SAT scores and GPA’s are normally distributed. (1) Give the equation of the regression line for predicting GPA from SAT score. (2) Use this equation to predict GPA’s from the following SAT scores: (i) 1120 (ii) 1340 (iii) 1000 (iv) 700 (3) Predict GPA percentiles from the following SAT percentiles: (i) 50%-tile (ii) 85%-tile (iii) 99%-tile (iv) 20%-tile (4) Give the equation of the regression line for predicting SAT scores from GPA’s. (5) Use this equation to predict SAT scores from the following GPA’s: (i) 2.5 (ii) 3.2 (iii) 1.3 (iv) 4.0 (6) Predict SAT score percentiles from the following GPA percentiles: (i) 50%-tile (ii) 90%-tile (iii) 30%-tile (iv) 5%-tile Solution Key for Regression Line (1) G = 0.0025S – 0.3 (2) (i) 2.5 (ii) 3.05 (iii) 2.2 (iv) 1.45 (3) (i) 50%-tile (ii) 80%-tile (iii) 97%-tile (4) S = 256G + 480 (5) (i) 1120 (ii) 1300 (iii) 810 (iv) 1500 (6) (i) 50%-tile (ii) 84.5%-tile (iii) 34%-tile 6 (iv) 25.5%-tile (iv) 9.7%-tile

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Regression Line