• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia, lookup

Linear regression wikipedia, lookup

Data assimilation wikipedia, lookup

Choice modelling wikipedia, lookup

Least squares wikipedia, lookup

Coefficient of determination wikipedia, lookup

Transcript
```Ch 8
Linear Regression
AP Statistics
Mrs Johnson
3 Ways to Write the Least
Squares Regression Line
• From Data
• Using the calculator, you input data into lists, run a
linear regression through the data
• From statistics
• The LSRL runs through the centroid (x, y)
• Using the statistics r, sx, sy, and the mean of x and y, we
can write equation of the LSRL from formulas GIVEN
on the AP exam.
• From computer output
• Many times you will be given computer output – the
slope and y intercept are always in this given data
Interpreting SLOPE in a
problem:
• When asked to interpret slope – remember that slope is the
change in y over the change in x
Dy
Dx
• State the following: As the ________ (explanatory variable)
increases by 1 _______ (insert unit) the __________
(response variable) is predicted to increase/decrease (use
appropriate word given sign of slope) by _______ (insert slope
here and units).
• As the caloric content of a burger increases by 1 calorie, the
fat content of the burger is PREDICTED to increase by _____
grams.
Interpreting y-intercepts:
• The y intercept occurs when the explanatory variable is
0.
• Interpretation depends on the example – often times
there is no real application for the y-intercept.
• When the explanatory variable is 0, the response
variable is predicted to be _____. (sub 0 into the
equation and solve)
Coefficient of Determination – R2
• R2 is the squared correlation coefficient R
• Gives the proportion (percentage) of the data’s
variation accounted for by the model
• R2 = 0 would means NONE of the variation of the
data is in the model, useless.
• R2 = 1 would mean ALL of the variation in the
data is accounted for in the model
Coefficient of Determination – R2
• Example:
• A given data set has a correlation coefficient, r, of 0.8.
• R2 = 0.64 --- Interpretation 64% of the variance in the
data is accounted for in our model
• A given data set has a correlation coefficient, r, of 0.4.
• R2 = 0.16 – Interpretation 16% of the variance in the
data is accounted for in our model
Coefficient of Determination – R2
• NOTE: When interpreting R2, use this fill in the
blank:
• According to the linear model, _______ (insert
R2 value as a percentage) of the variability in
response variable is accounted for by the
variation in explanatory variable.
Predicting with LSRL
• Using the LSRL – we can predict y values given x values
• CAUTION – only use LSRL to predict behavior within the
• Do NOT extrapolate beyond data
• Only interpolate within given data set
• Using the LSRL from previous example. Determine the fat
content for a burger with 550 calories.
Example – Fat / Calorie Content
Fat(g)
19
31
34
35
39
39
43
Calories
410
580
590
570
640
680
660
Finding the LSRL from given data – using calculator
1. Insert data into L1 (fat) and L2 (cal)
2. Go to Stat – Calc #8 – LinReg(a+bx)
3. Select appropriate lists and STORE regression
line
4. Write regression line using WORDS as variables
• Interpret Slope:
• As the fat content in a burger increases by 1 grams, the caloric
content is PREDICTED to increase by _____ calories.
• What is the y intercept in the burger example?
• A burger with 0 fat grams, there is predicted to have _____
calories.
• Interpret r
• Interpret r-squared
• Predict the calories for a burger with 35 grams of fat
Residual
• The difference between the predicted value, ŷ , and
the actual value from a data point, y.
RESID = y - ŷ
• Residual plots
• Important tool for determining if a line is the best fit
for data
• A line is a good fit according to the residual plot IF:
• No apparent pattern – no direction or shape
• Scattered horizontally, with no major gaps or outliers
Residual Plots
• No pattern – indicates
line is a good fit
• U – Shaped pattern –
indicates non-linear
would be best fit
• Upside down u shaped
pattern indicates non
linear would be best fit
Residual Plot of Example:
• Once you run a regression in your calculator, the
residuals are created automatically and ready for
you to display
• From STAT PLOT, keep the x list as L1 and go to y
list and find RESID in the list menu
• Zoom 9 will show you the residual plot
• Back to the burger data – what is the residual of
your 35 grams of fat burger?
• Does our line OVER or UNDER predict?
• Negative residuals mean our line OVER predicts
• Positive residuals mean our line UNDER PREDICTS
Set 2: Writing the Line of Best Fit –
• The line of best fit will be written in the form:
• y-hat = predicted value
• b0 = y intercept
• b1 = slope
ŷ = b0 + b1 x
• Finding the slope of the best fit line:
rsy
b1 =
sx
• Sy= standard deviation of response variable
• Sx= standard deviation of explanatory variable
• r= correlation coefficient
Finding the y intercept
• Finding the y intercept of the best fit line:
• From the equation for predicted value of y
ŷ = b0 + b1x
• Given the mean values for x and y (x, y)
• Given the value of b1 – slope – calculated from statistics r, sx, sy
• Use the given point and solve for b0
y = b0 + b1x
b0 = y - b1x
Example: #36 pg 193
• Given that a line is the form of best fit for a set of data which
compares fat and calories on 11 brands of fast food chicken
sandwiches, and given the summary statistics:
Fat (g)
Calories
Mean
20.6
472
Standard Dev
9.8
144.2
Correlation
0.974
Example #36 pg 193 continued
• Write the equation for line of best fit.
• Interpret the slope in the context of the problem
• Explain the meaning of the y intercept
• What does it mean if a sandwich has a negative residual?
• If a sandwich had 23 grams of fat, what is the predicted value
for calories?
Method #3 – Writing the LSRL
from Computer Output
• Given the following data set – comparing height (ft) and
weight (lb) for 10 people in a weight loss program
• Describe and interpret the correlation
Line of Best Fit
Dependent Variable is: Weight
R-Squared = 0.91
Variable
Coefficient
SE (Coeff)
Constant
-289.5
2.606
Height
86.1
.0013
predictedweight = -289.5 + 86.1height
Linear Regression Line
predictedweight = -289.5 + 86.1height
Residual Plot – Is Line a Good
Fit?
Interpreting Slope / y
intercepts
predictedweight = -289.5 + 86.1height
Slope interpretation:
As the height of the participant in the weight loss program
increases by 1 foot, the predicted weight of the participant
increases by approximately 86 lbs.
OR
As the height of the participant increases by 1 inch, the
weight of the participant increases by approximately 7 lbs.
Interpreting y-intercepts:
predictedweight = -289.5 + 86.1height
• The y intercept occurs when the explanatory variable is
0.
• What is the y intercept in this example:
• -289.5
• No real life interpretation – but the actual
interpretation is for a participant in the weight loss
program who is 0 feet tall, the predicted weight would
be -289.5 lbs.
```
Related documents