Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics Chapter 3 Notes – Examining Relationships 3.1 Scatterplots Variables Objective: 1) Recognize if each variable is quantitative or categorical? 2) Identify the explanatory and response variables 3) Make a scatterplot to display the relationship between two quantitative variables 4) Describe the form direction, and strength of the overall pattern of a scatterplot 5) Recognize positive or negative association and linear patterns In this chapter, we will concentrate on relationships among several variables for the same group of individuals. When examining two ore more variables we must ask preliminary questions. What individuals do the data describe? What are the variables? Are the variables quantitative or categorical? Do we want to explore the nature of the relationship, or do we think that some of the variable explain or cause changes in the others? We call these response and explanatory variables. Response variables (dependent) measure an outcome of a study. Explanatory variables (independent) attempt to explain the observed outcomes. The response variable depends on the explanatory variable. Practice Problems (on back) page 122 #3.1, 3.2, 3.3 Scatterplots The most effective way to display the relationship between two quantitative variables is a scatterplot. A scatterplot shows the relationship between quantitative variables for the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Always plot the explanatory variable, if any, on the horizontal axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual. Interpreting scatterplots When examining a scatterplot: Look for the overall pattern and for striking deviations from the pattern Describe the overall pattern by the form, direction, and strength of the relationship. When describing form, look for clusters and outliers. When describing direction, look for positive or negative associations. 1 Two variables are positively associated when above average values of one tend to accompany above average values of the other and below average values also tend to occur together. Two variables are negatively associated when above average values of one tend to accompany below average vales of the other, and vice versa. When describing strength, determine how closely the points follow a clear form. Figure 3.1, pg 124 Describe the form, direction, and strength of the relationship. Practice Problem 3.6, 3.9 Making a calculator scatterplot Redo 3.9 scatterplot using the calculator Assignment 3.1 page 135 #3.15-3.20, 3.22(calc) 2 3.2 Correlation Objectives: 6) Recognize outliers in a scatterplot 7)Use a calculator to find the correlation between two quantitative variables 8) Know the basic properties of correlation We consider a linear relationship strong if the points lie close to a straight line, and weak if they are widely scattered about a line. The problem is that our eye can be fooled to show a stronger or weaker relationship by changing the plotting scales. See figure 3.8 on page 141 Correlation Therefore we have a numerical measure to supplement the graph called correlation. Correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. x x y i y 1 r i n 1 s x s y Facts about correlation: 1. Correlation makes no difference between explanatory and response variables. So it doesn’t matter which one you call x and which y. 2. Correlation requires that both variables be quantitative. 3. Since r uses the standardized values of the observations, r does not change when we change the units of measure of x, y or both. 4. Positive r indicates positive association between variables, and negative r indicates a negative association. 5. The correlation r is always a number between -1 and 1. - Values of r near 0 indicate a very weak linear 3 6. 7. 8. 9. relationship. - The strength of the linear relationship increases as r moves away from 0 toward either 1 or -1. - Values of r close to -1 or 1 indicate that the points in a scatterplot lie close to a straight line. - The values r = -1 or r = 1 only occur if there is a perfect linear relationship. Correlation only measures the strength of a linear relationship between two variables. Like the mean and standard deviation, the correlation is not resistant. r is strongly effected by a few outliers. When stating the correlation you should always state the mean and standard deviation for x and y Adding or subtracting the same number to all the values of either x or y does not change the correlation. Practice Problems: page 142 # 3.24, 3.25, 3.28 Assignment 3.2 page 147 # 3.29, 3.31-3.34, 3.36 (do all scatterplots on calculator) 4 3.3 Least-Squares Regression Regression line 9)Explain what the slope b and the intercept a mean in the equation y = a + bx of a straight line 10) Draw a graph of the straight line when you are given its equation 11)Use a calculator to find the least-squares regression line of a response variable y on an explanatory variable x from data. 12) Find the slope and intercept of the LSRL from the means and standard deviation of x and y and their correlation 13) Use the regression line to predict y for a given x. 14) Recognize extrapolation and beware of its dangers 15) Use r² to describe how much of the variation in one variable can be accounted for by a straight line relationship with another variable 16) Calculate the residuals and plot them against the explanatory variable x or against other variables Least-squares regression is a method for finding a line that summarizes the relationship between two variables. Regression line – A straight line that describes how a response variable y changes as an explanatory variable x changes. (Sound familiar) We often use a regression line to predict the value of y for a given value of x. If we believe that the data show a linear trend, then it would be appropriate to try to fit a least-squares regression line, LSRL, to the data. Example 3.8, page 150. Can we predict the Sanchez household gas consumption for a month averaging 20 degree-days per day? 5 The Least-Squares Regression Line Different people might draw different lines by eye on a scatterplot. This is especially true when the points are widely scattered. So we need a regression line that isn’t dependent on our guess. No line will pass through all the points, but we want one that is as close as possible. We want a regression line that makes distances of the points in a scatterplot from the line as small as possible. The most common way of achieving this is the LSRL. Figure 3.11a page 151 LSRL – the line that makes the sum of the squares of the vertical distances of the data from the line as small as possible. Equation of the LSRL Equation of the LSRL : yˆ a bx With slope: br sy sx And intercept: a y bx The variable y denotes the observed value of y, and the term ŷ (y hat) means the predicted value of y. Every LSRL passes through the point ( x , y ) and the slope is equal to the product of the correlation and the quotient of the standard deviations. The slope is the rate of change, or the amount of change in ŷ when x increases by 1. The intercept of the regression line is the value of ŷ when x = 0. 6 Least-squares lines on the calculator Technology Toolbox – Least-squares lines on the calculator. Get out your calculator and go to page 132. The equation of the regression line makes prediction easy. Just substitute an x-value into the equation to predict the corresponding y value. Practice Problems Practice Problem 3.41 page 157 a) “on calculator” b) ŷ = c) When x = 716, y = _____ Practice Problem 3.40 page 157 a) “on calculator” b) c) 7 The role of r² in regression The coefficient of determination, r², measures how well the regression was in explaining the response. Squaring the correlation gives us a better idea of the strength of the association. Perfect correlation mean the points lie exactly on a line (r = 1 or r = -1). This means r² = 1 (100%)and all of the variation in one variable is accounted for by the linear relationship with the other variable. If r = -0.7 or 0.7, then r² = 0.49 (49%), or about half of the variation is accounted for by the linear relationship. r² is an overall measure of how successful the regression line is in relating y to x. When you report a regression, be sure to give r² as a measure of how successful the regression was in explaining the response. Practice Problems page 165 3.42, 3.44 (on back) Residuals Residual is the difference between an observed value of the response variable and the value predicted by the regression line. Residual = observed y – predicted y = y - ŷ Because the residuals show how far the data fall from our regression line, examining the residuals helps assess how well the line describes the data. Example 3.14 page 167 8 Residual plot The residuals from the least squares line have a special property, the mean of the least squares residuals is always 0. You can check the sum of the residuals in example 3.14 is -0.00002 ≈ 0. This is called a roundoff error. A residual plot is a scatterplot of the regression residuals against the explanatory variable. This helps us assess the fit of a regression line. What to look for when you are examining the residuals: Uniform scatter of points indicates that the regression line fits the dats well, so the line is a good model. A curved pattern shows that the relationship is not linear. Increasing or decreasing spread about the line as x increases indicates that prediction of y will be less accurate for larger x. 9 Influential observations Individual points with large residuals, are outliers in the vertical direction because they lie far away from the line that describes the overall pattern, like Child 19 An outlier is an observation that lies outside the overall pattern of the other observations. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction are often influential for the least-squares regression line. See figure 3.20 page 172. Child 19 is an outlier in the y direction, while Child 18 is an outlier in the x direction. Read example 3.15 page 172 Do the Technology Toolbox page 174– Residual Plots by Calculator Assignment 3.3 page 176 #3.50-3.61 10 Summary 11