Download Ch 8 Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Ch 8
Linear Regression
AP Statistics
Mrs Johnson
3 Ways to Write the Least
Squares Regression Line
• From Data
• Using the calculator, you input data into lists, run a
linear regression through the data
• From statistics
• The LSRL runs through the centroid (x, y)
• Using the statistics r, sx, sy, and the mean of x and y, we
can write equation of the LSRL from formulas GIVEN
on the AP exam.
• From computer output
• Many times you will be given computer output – the
slope and y intercept are always in this given data
Interpreting SLOPE in a
problem:
• When asked to interpret slope – remember that slope is the
change in y over the change in x
Dy
Dx
• State the following: As the ________ (explanatory variable)
increases by 1 _______ (insert unit) the __________
(response variable) is predicted to increase/decrease (use
appropriate word given sign of slope) by _______ (insert slope
here and units).
• As the caloric content of a burger increases by 1 calorie, the
fat content of the burger is PREDICTED to increase by _____
grams.
Interpreting y-intercepts:
• The y intercept occurs when the explanatory variable is
0.
• Interpretation depends on the example – often times
there is no real application for the y-intercept.
• When the explanatory variable is 0, the response
variable is predicted to be _____. (sub 0 into the
equation and solve)
Coefficient of Determination – R2
• R2 is the squared correlation coefficient R
• Gives the proportion (percentage) of the data’s
variation accounted for by the model
• R2 = 0 would means NONE of the variation of the
data is in the model, useless.
• R2 = 1 would mean ALL of the variation in the
data is accounted for in the model
Coefficient of Determination – R2
• Example:
• A given data set has a correlation coefficient, r, of 0.8.
• R2 = 0.64 --- Interpretation 64% of the variance in the
data is accounted for in our model
• A given data set has a correlation coefficient, r, of 0.4.
• R2 = 0.16 – Interpretation 16% of the variance in the
data is accounted for in our model
Coefficient of Determination – R2
• NOTE: When interpreting R2, use this fill in the
blank:
• According to the linear model, _______ (insert
R2 value as a percentage) of the variability in
response variable is accounted for by the
variation in explanatory variable.
Predicting with LSRL
• Using the LSRL – we can predict y values given x values
• CAUTION – only use LSRL to predict behavior within the
bounds of your data
• Do NOT extrapolate beyond data
• Only interpolate within given data set
• Using the LSRL from previous example. Determine the fat
content for a burger with 550 calories.
Example – Fat / Calorie Content
Fat(g)
19
31
34
35
39
39
43
Calories
410
580
590
570
640
680
660
Finding the LSRL from given data – using calculator
1. Insert data into L1 (fat) and L2 (cal)
2. Go to Stat – Calc #8 – LinReg(a+bx)
3. Select appropriate lists and STORE regression
line
4. Write regression line using WORDS as variables
• Interpret Slope:
• As the fat content in a burger increases by 1 grams, the caloric
content is PREDICTED to increase by _____ calories.
• What is the y intercept in the burger example?
• A burger with 0 fat grams, there is predicted to have _____
calories.
• Interpret r
• Interpret r-squared
• Predict the calories for a burger with 35 grams of fat
Residual
• The difference between the predicted value, ŷ , and
the actual value from a data point, y.
RESID = y - ŷ
• Residual plots
• Important tool for determining if a line is the best fit
for data
• A line is a good fit according to the residual plot IF:
• No apparent pattern – no direction or shape
• Scattered horizontally, with no major gaps or outliers
Residual Plots
• No pattern – indicates
line is a good fit
• U – Shaped pattern –
indicates non-linear
would be best fit
• Upside down u shaped
pattern indicates non
linear would be best fit
Residual Plot of Example:
• Once you run a regression in your calculator, the
residuals are created automatically and ready for
you to display
• From STAT PLOT, keep the x list as L1 and go to y
list and find RESID in the list menu
• Zoom 9 will show you the residual plot
• Back to the burger data – what is the residual of
your 35 grams of fat burger?
• Does our line OVER or UNDER predict?
• Negative residuals mean our line OVER predicts
• Positive residuals mean our line UNDER PREDICTS
Set 2: Writing the Line of Best Fit –
from statistics given about data
• The line of best fit will be written in the form:
• y-hat = predicted value
• b0 = y intercept
• b1 = slope
ŷ = b0 + b1 x
• Finding the slope of the best fit line:
rsy
b1 =
sx
• Sy= standard deviation of response variable
• Sx= standard deviation of explanatory variable
• r= correlation coefficient
Finding the y intercept
• Finding the y intercept of the best fit line:
• From the equation for predicted value of y
ŷ = b0 + b1x
• Given the mean values for x and y (x, y)
• Given the value of b1 – slope – calculated from statistics r, sx, sy
• Use the given point and solve for b0
y = b0 + b1x
b0 = y - b1x
Example: #36 pg 193
• Given that a line is the form of best fit for a set of data which
compares fat and calories on 11 brands of fast food chicken
sandwiches, and given the summary statistics:
Fat (g)
Calories
Mean
20.6
472
Standard Dev
9.8
144.2
Correlation
0.974
Example #36 pg 193 continued
• Write the equation for line of best fit.
• Interpret the slope in the context of the problem
• Explain the meaning of the y intercept
• What does it mean if a sandwich has a negative residual?
• If a sandwich had 23 grams of fat, what is the predicted value
for calories?
Method #3 – Writing the LSRL
from Computer Output
• Given the following data set – comparing height (ft) and
weight (lb) for 10 people in a weight loss program
• Describe and interpret the correlation
Reading Computer Output for
Line of Best Fit
Dependent Variable is: Weight
R-Squared = 0.91
Variable
Coefficient
SE (Coeff)
Constant
-289.5
2.606
Height
86.1
.0013
predictedweight = -289.5 + 86.1height
Linear Regression Line
predictedweight = -289.5 + 86.1height
Residual Plot – Is Line a Good
Fit?
Interpreting Slope / y
intercepts
predictedweight = -289.5 + 86.1height
Slope interpretation:
As the height of the participant in the weight loss program
increases by 1 foot, the predicted weight of the participant
increases by approximately 86 lbs.
OR
As the height of the participant increases by 1 inch, the
weight of the participant increases by approximately 7 lbs.
Interpreting y-intercepts:
predictedweight = -289.5 + 86.1height
• The y intercept occurs when the explanatory variable is
0.
• What is the y intercept in this example:
• -289.5
• No real life interpretation – but the actual
interpretation is for a participant in the weight loss
program who is 0 feet tall, the predicted weight would
be -289.5 lbs.