Download Chpt. 3 Day 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Forecasting wikipedia , lookup

Regression toward the mean wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Warm-up

Get a sheet of computer
paper/construction paper from the
front of the room, and create your
very own paper airplane. Try to
create planes with different lengths
from tip to end because we are going
to test the height of a person
throwing a plane vs. the distance
travelled by the plane.
Cautions about Correlation
and Regression and Section
3.3
Part of this is actually a section in
Chapter 4!
Limitations of Correlation and
Regression



They only describe LINEAR relationships.
They are NOT resistant to outliers or
influential observations.
Always plot your data first!!!
The Question of Causation


This is a typical topic for AP Exam Questions!!!
Association does not imply causation.
Did you know that ice cream sales and crime are
positively correlated? i.e. As ice cream sales
increase, so does the crime rate. Does that mean high
ice cream sales CAUSE more crime? Well, let’s stop
selling ice cream then! 

Be careful: another variable may be at play here. As temperatures
increase, ice cream sales increase. As temperatures increase, crime
rate increases (people are simply more likely to be outside, so crime
rates increase).
From Correlation to Regression

Regression finds the line that goes through the
scatterplot and summarizes the relationship between
the two variables.

Regression does require an explanatory and a response
variable.
3.3 Least Squares Regression


If a scatterplot shows a linear trend, then we
want to calculate a mathematical model for
that data. This model will enable us to predict
values based on the relationship between the
two variables.
Called LSRL
Equations of Lines


In most math classes, the equation of a line is in the
form y = mx + b.
Statisticians use the form

y  a  bx


Notice that b is now the slope and a is the yintercept!!!
Note: The LSRL ALWAYS goes through
On your Calculator


ALWAYS graph the data first! (use a scatterplot)
Ask yourself: is the data linear?


If it is not, linear regression is not appropriate!
If it is, press


STAT/CALC/8:LinReg L1, L2, Y1 (This automatically
puts the equation into Y=).
Press GRAPH to see the line and the scatterplot together.
Calculating and using LSRL
Using the data from pg. 164,
calculate the LSRL.
 Estimate the weight gain if the NEA
change is 300.
 How did you find your estimate?

What does it all mean?


b, the slope represents the change in the
expected y-value for every one x-value.
a, the y-intercept represents the predicted
starting y-value.
Interpret the slope and yintercept

A biologist wants to study the relationship
between the number of trees x per acre and the
number of birds y per acre. She came up with
the equation of the regression line
y = 5 + 4.2x
Reading Generic Computer
Output
The cell that represents the
constant coefficient is the yintercept. This will always say
constant.
The cell that represents the
boat’s coefficient is the slope.
It will give a variable name.
So, my least-squares regression equation is
Extrapolation

Using the regression line for values far outside
the domain of the explanatory variable.

Example: Let’s say we measure a child’s height
every 6 months for the first 10 years of his/her life.
While we may be able to predict the child’s height
at age 11, predicting it at age 25 would not be
accurate.
The Coefficient of Determination

r2, on the other hand, is the coefficient
of determination. It tells us what
percent of the variation in the response
variable can be explained by the
explanatory variable.
Fill in the blank sentence: r2 (%) of the
variation in what y measures can be
explain by what x measures.
 Interpreting r2 = .721 if x is hours spent
studying and y is GPA.


72.1% of the variation in GPA can be
explained by hours spent studying.
3.3 Residuals

A residual is the difference between what actual y
value and the predicted y value.




Residual = observed y – predicted y
Residual =
When data points lie above the line, the residuals are
positive. This means that the observed was higher than
what the LSRL predicted.
The sum of the residuals = 0.
Example: Calculate the Residual

p. 234 Example 3.16. Type in the data. Then we
will find the LSRL.

Child 1, who spoke at 15 months had an actual
score of 95. What is their residual value?
2.0312
Residual Plots

A residual plot is a
scatterplot of the
observed explanatory
variable vs the
residuals.
To create a residual plot in your
calculator…

In L3, type in L2 – Y1(L1).


This basically says you are going to take your
actual y-values and subtract the predicted values…
You have to plug in your x-values, which is where
L1 comes from.
To see your plot, make a scatter plot of L1, L3.
Why Residuals are Important

Looking at a residual plots gives us an idea of
whether the model we chose is appropriate.
(Something may look like a line, but that may not be
an appropriate model).




For any residual plot, a uniform scatter of points about the
line (no obvious pattern) means that the data is a good fit
for that model.
A residual plot that shows a curved pattern indicates that
your choice of model is NOT a good fit.
If residuals are cone shaped, a your model is not a good fit.
The AP Exam notoriously shows residual plots for
several different models and asks which model is
best.
Is a line a good model?
Bad Model!!!
Good Model!
Outliers and Influential
Observations


Outliers lie outside the
overall pattern of the
other observations.
These will be far above
or below the other
residuals on a residual
plot.
Outliers and Influential
Observations

Influential Observations
markedly change the
result of the calculation
of the LSR line.
Usually, points that are
outliers in the xdirection of the
scatterplot are
influential.
Homework
Chapter 3
#38, 40, 42, 54
Night 2# 55, 56, 59, 61, 63a-c