Download AP_Statistics_Chapter_3_-_Cornell

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
AP Statistics
Chapter 3 Notes – Examining Relationships
3.1 Scatterplots
Variables
Objective:
1) Recognize if each variable is quantitative or categorical?
2) Identify the explanatory and response variables
3) Make a scatterplot to display the relationship between two quantitative variables
4) Describe the form direction, and strength of the overall pattern of a scatterplot
5) Recognize positive or negative association and linear patterns
In this chapter, we will concentrate on relationships among several
variables for the same group of individuals. When examining two ore
more variables we must ask preliminary questions.
What individuals do the data describe?
What are the variables?
Are the variables quantitative or categorical?
Do we want to explore the nature of the relationship, or do we
think that some of the variable explain or cause changes in the others?
We call these response and explanatory variables.
Response variables (dependent) measure an outcome of a study.
Explanatory variables (independent) attempt to explain the observed
outcomes.
The response variable depends on the explanatory variable.
Practice Problems (on back)
page 122 #3.1, 3.2,
3.3
Scatterplots
The most effective way to display the relationship between two
quantitative variables is a scatterplot.
A scatterplot shows the relationship between quantitative variables for
the same individuals. The values of one variable appear on the
horizontal axis, and the values of the other variable appear on the
vertical axis. Always plot the explanatory variable, if any, on the
horizontal axis. Each individual in the data appears as the point in the
plot fixed by the values of both variables for that individual.
Interpreting
scatterplots
When examining a scatterplot:
 Look for the overall pattern and for striking deviations
from the pattern
 Describe the overall pattern by the form, direction, and
strength of the relationship.
When describing form, look for clusters and outliers.
When describing direction, look for positive or negative associations.
1

Two variables are positively associated when above
average values of one tend to accompany above average
values of the other and below average values also tend to
occur together.
 Two variables are negatively associated when above
average values of one tend to accompany below average
vales of the other, and vice versa.
When describing strength, determine how closely the points follow a
clear form.
Figure 3.1, pg 124 Describe the form, direction, and strength of the
relationship.
Practice Problem
3.6, 3.9
Making a calculator
scatterplot
Redo 3.9 scatterplot using the calculator
Assignment 3.1 page
135 #3.15-3.20,
3.22(calc)
2
3.2 Correlation
Objectives:
6) Recognize outliers in a scatterplot
7)Use a calculator to find the correlation between two quantitative variables
8) Know the basic properties of correlation
We consider a linear relationship strong if the points lie close to a
straight line, and weak if they are widely scattered about a line. The
problem is that our eye can be fooled to show a stronger or weaker
relationship by changing the plotting scales. See figure 3.8 on page 141
Correlation
Therefore we have a numerical measure to supplement the graph called
correlation.
Correlation measures the direction and strength of the linear
relationship between two quantitative variables. Correlation is
usually written as r.
 x  x  y i  y 
1


r
 i
n  1  s x  s y 
Facts about
correlation:
1. Correlation makes no difference between explanatory and
response variables. So it doesn’t matter which one you call x and
which y.
2. Correlation requires that both variables be quantitative.
3. Since r uses the standardized values of the observations, r does
not change when we change the units of measure of x, y or both.
4. Positive r indicates positive association between variables,
and negative r indicates a negative association.
5. The correlation r is always a number between -1 and 1.
- Values of r near 0 indicate a very weak linear
3
6.
7.
8.
9.
relationship.
- The strength of the linear relationship increases as r
moves away from 0 toward either 1 or -1.
- Values of r close to -1 or 1 indicate that the points in a
scatterplot lie close to a straight line.
- The values r = -1 or r = 1 only occur if there is a perfect
linear relationship.
Correlation only measures the strength of a linear relationship
between two variables.
Like the mean and standard deviation, the correlation is not
resistant. r is strongly effected by a few outliers.
When stating the correlation you should always state the mean
and standard deviation for x and y
Adding or subtracting the same number to all the values of either
x or y does not change the correlation.
Practice Problems:
page 142 # 3.24,
3.25, 3.28
Assignment 3.2 page
147 #
3.29, 3.31-3.34, 3.36
(do all scatterplots
on calculator)
4
3.3 Least-Squares
Regression
Regression line
9)Explain what the slope b and the intercept a mean in the equation y = a + bx of a
straight line
10) Draw a graph of the straight line when you are given its equation
11)Use a calculator to find the least-squares regression line of a response variable y on
an explanatory variable x from data.
12) Find the slope and intercept of the LSRL from the means and standard deviation
of x and y and their correlation
13) Use the regression line to predict y for a given x.
14) Recognize extrapolation and beware of its dangers
15) Use r² to describe how much of the variation in one variable can be accounted for
by a straight line relationship with another variable
16) Calculate the residuals and plot them against the explanatory variable x or against
other variables
Least-squares regression is a method for finding a line that summarizes
the relationship between two variables.
Regression line – A straight line that describes how a response variable
y changes as an explanatory variable x changes. (Sound familiar) We
often use a regression line to predict the value of y for a given value
of x.
If we believe that the data show a linear trend, then it would be
appropriate to try to fit a least-squares regression line, LSRL, to the
data.
Example 3.8, page 150. Can we predict the Sanchez household gas
consumption for a month averaging 20 degree-days per day?
5
The Least-Squares
Regression Line
Different people might draw different lines by eye on a scatterplot. This
is especially true when the points are widely scattered. So we need a
regression line that isn’t dependent on our guess. No line will pass
through all the points, but we want one that is as close as possible. We
want a regression line that makes distances of the points in a scatterplot
from the line as small as possible. The most common way of achieving
this is the LSRL.
Figure 3.11a page 151
LSRL – the line that makes the sum of the squares of the vertical
distances of the data from the line as small as possible.
Equation of the
LSRL
Equation of the LSRL :
yˆ  a  bx
With slope:
br
sy
sx
And intercept: a  y  bx
The variable y denotes the observed value of y, and the term ŷ (y hat)
means the predicted value of y.
Every LSRL passes through the point ( x , y ) and the slope is equal to
the product of the correlation and the quotient of the standard deviations.
The slope is the rate of change, or the amount of change in ŷ when x
increases by 1.
The intercept of the regression line is the value of ŷ when x = 0.
6
Least-squares lines
on the calculator
Technology Toolbox – Least-squares lines on the calculator. Get out
your calculator and go to page 132.
The equation of the regression line makes prediction easy. Just
substitute an x-value into the equation to predict the corresponding y
value.
Practice Problems
Practice Problem 3.41 page 157
a) “on calculator”
b) ŷ =
c) When x = 716, y = _____
Practice Problem 3.40 page 157
a) “on calculator”
b)
c)
7
The role of r² in
regression
The coefficient of determination, r², measures how well the regression
was in explaining the response. Squaring the correlation gives us a
better idea of the strength of the association. Perfect correlation mean
the points lie exactly on a line (r = 1 or r = -1). This means r² = 1
(100%)and all of the variation in one variable is accounted for by the
linear relationship with the other variable. If r = -0.7 or 0.7, then r² =
0.49 (49%), or about half of the variation is accounted for by the linear
relationship. r² is an overall measure of how successful the regression
line is in relating y to x. When you report a regression, be sure to give r²
as a measure of how successful the regression was in explaining the
response.
Practice Problems
page 165 3.42, 3.44 (on back)
Residuals
Residual is the difference between an observed value of the response
variable and the value predicted by the regression line.
Residual = observed y – predicted y
=
y - ŷ
Because the residuals show how far the data fall from our regression
line, examining the residuals helps assess how well the line describes the
data.
Example 3.14 page
167
8
Residual plot
The residuals from the least squares line have a special property, the
mean of the least squares residuals is always 0. You can check the sum
of the residuals in example 3.14 is
-0.00002 ≈ 0. This is called a roundoff error.
A residual plot is a scatterplot of the regression residuals against the
explanatory variable. This helps us assess the fit of a regression line.
What to look for when you are examining the residuals:
 Uniform scatter of points indicates that the regression line fits the
dats well, so the line is a good model.


A curved pattern shows that the relationship is not linear.

Increasing or decreasing spread about the line as x increases
indicates that prediction of y will be less accurate for larger x.
9

Influential
observations
Individual points with large residuals, are outliers in the vertical
direction because they lie far away from the line that describes
the overall pattern, like Child 19
An outlier is an observation that lies outside the overall pattern of the
other observations.
An observation is influential for a statistical calculation if removing it
would markedly change the result of the calculation. Points that are
outliers in the x direction are often influential for the least-squares
regression line.
See figure 3.20 page 172. Child 19 is an outlier in the y direction, while
Child 18 is an outlier in the x direction.
Read example 3.15 page 172
Do the Technology Toolbox page 174– Residual Plots by Calculator
Assignment 3.3 page
176 #3.50-3.61
10
Summary
11