* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Ch 8 Linear Regression AP Statistics Mrs Johnson 3 Ways to Write the Least Squares Regression Line • From Data • Using the calculator, you input data into lists, run a linear regression through the data • From statistics • The LSRL runs through the centroid (x, y) • Using the statistics r, sx, sy, and the mean of x and y, we can write equation of the LSRL from formulas GIVEN on the AP exam. • From computer output • Many times you will be given computer output – the slope and y intercept are always in this given data Interpreting SLOPE in a problem: • When asked to interpret slope – remember that slope is the change in y over the change in x Dy Dx • State the following: As the ________ (explanatory variable) increases by 1 _______ (insert unit) the __________ (response variable) is predicted to increase/decrease (use appropriate word given sign of slope) by _______ (insert slope here and units). • As the caloric content of a burger increases by 1 calorie, the fat content of the burger is PREDICTED to increase by _____ grams. Interpreting y-intercepts: • The y intercept occurs when the explanatory variable is 0. • Interpretation depends on the example – often times there is no real application for the y-intercept. • When the explanatory variable is 0, the response variable is predicted to be _____. (sub 0 into the equation and solve) Coefficient of Determination – R2 • R2 is the squared correlation coefficient R • Gives the proportion (percentage) of the data’s variation accounted for by the model • R2 = 0 would means NONE of the variation of the data is in the model, useless. • R2 = 1 would mean ALL of the variation in the data is accounted for in the model Coefficient of Determination – R2 • Example: • A given data set has a correlation coefficient, r, of 0.8. • R2 = 0.64 --- Interpretation 64% of the variance in the data is accounted for in our model • A given data set has a correlation coefficient, r, of 0.4. • R2 = 0.16 – Interpretation 16% of the variance in the data is accounted for in our model Coefficient of Determination – R2 • NOTE: When interpreting R2, use this fill in the blank: • According to the linear model, _______ (insert R2 value as a percentage) of the variability in response variable is accounted for by the variation in explanatory variable. Predicting with LSRL • Using the LSRL – we can predict y values given x values • CAUTION – only use LSRL to predict behavior within the bounds of your data • Do NOT extrapolate beyond data • Only interpolate within given data set • Using the LSRL from previous example. Determine the fat content for a burger with 550 calories. Example – Fat / Calorie Content Fat(g) 19 31 34 35 39 39 43 Calories 410 580 590 570 640 680 660 Finding the LSRL from given data – using calculator 1. Insert data into L1 (fat) and L2 (cal) 2. Go to Stat – Calc #8 – LinReg(a+bx) 3. Select appropriate lists and STORE regression line 4. Write regression line using WORDS as variables • Interpret Slope: • As the fat content in a burger increases by 1 grams, the caloric content is PREDICTED to increase by _____ calories. • What is the y intercept in the burger example? • A burger with 0 fat grams, there is predicted to have _____ calories. • Interpret r • Interpret r-squared • Predict the calories for a burger with 35 grams of fat Residual • The difference between the predicted value, ŷ , and the actual value from a data point, y. RESID = y - ŷ • Residual plots • Important tool for determining if a line is the best fit for data • A line is a good fit according to the residual plot IF: • No apparent pattern – no direction or shape • Scattered horizontally, with no major gaps or outliers Residual Plots • No pattern – indicates line is a good fit • U – Shaped pattern – indicates non-linear would be best fit • Upside down u shaped pattern indicates non linear would be best fit Residual Plot of Example: • Once you run a regression in your calculator, the residuals are created automatically and ready for you to display • From STAT PLOT, keep the x list as L1 and go to y list and find RESID in the list menu • Zoom 9 will show you the residual plot • Back to the burger data – what is the residual of your 35 grams of fat burger? • Does our line OVER or UNDER predict? • Negative residuals mean our line OVER predicts • Positive residuals mean our line UNDER PREDICTS Set 2: Writing the Line of Best Fit – from statistics given about data • The line of best fit will be written in the form: • y-hat = predicted value • b0 = y intercept • b1 = slope ŷ = b0 + b1 x • Finding the slope of the best fit line: rsy b1 = sx • Sy= standard deviation of response variable • Sx= standard deviation of explanatory variable • r= correlation coefficient Finding the y intercept • Finding the y intercept of the best fit line: • From the equation for predicted value of y ŷ = b0 + b1x • Given the mean values for x and y (x, y) • Given the value of b1 – slope – calculated from statistics r, sx, sy • Use the given point and solve for b0 y = b0 + b1x b0 = y - b1x Example: #36 pg 193 • Given that a line is the form of best fit for a set of data which compares fat and calories on 11 brands of fast food chicken sandwiches, and given the summary statistics: Fat (g) Calories Mean 20.6 472 Standard Dev 9.8 144.2 Correlation 0.974 Example #36 pg 193 continued • Write the equation for line of best fit. • Interpret the slope in the context of the problem • Explain the meaning of the y intercept • What does it mean if a sandwich has a negative residual? • If a sandwich had 23 grams of fat, what is the predicted value for calories? Method #3 – Writing the LSRL from Computer Output • Given the following data set – comparing height (ft) and weight (lb) for 10 people in a weight loss program • Describe and interpret the correlation Reading Computer Output for Line of Best Fit Dependent Variable is: Weight R-Squared = 0.91 Variable Coefficient SE (Coeff) Constant -289.5 2.606 Height 86.1 .0013 predictedweight = -289.5 + 86.1height Linear Regression Line predictedweight = -289.5 + 86.1height Residual Plot – Is Line a Good Fit? Interpreting Slope / y intercepts predictedweight = -289.5 + 86.1height Slope interpretation: As the height of the participant in the weight loss program increases by 1 foot, the predicted weight of the participant increases by approximately 86 lbs. OR As the height of the participant increases by 1 inch, the weight of the participant increases by approximately 7 lbs. Interpreting y-intercepts: predictedweight = -289.5 + 86.1height • The y intercept occurs when the explanatory variable is 0. • What is the y intercept in this example: • -289.5 • No real life interpretation – but the actual interpretation is for a participant in the weight loss program who is 0 feet tall, the predicted weight would be -289.5 lbs.