Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 5 Summarizing Bivariate Data Correlation Variables: Response variable (y) measures an outcome (dependent) Explanatory variable (x) helps explain or influence changes in a response variable (independent) Suppose we found the age and weight of a sample of 10 adults. Create a scatterplot of the data below. Is there any relationship between the age and weight of these adults? Age 24 30 41 28 50 46 49 35 20 39 Wt 256 124 320 185 158 129 103 196 110 130 Does there seem to be a relationship between age and weight of these adults? Suppose we found the height and weight of a sample of 10 adults. Create a scatterplot of the data below. Is there any relationship between the height and weight of these adults? Is it positive or negative? Weak or strong? Ht 74 65 77 72 68 60 62 73 61 64 Wt 256 124 320 185 158 129 103 196 110 130 Does there seem to be a relationship between height and weight of these adults? When describing relationships between two variables, you should address: • Direction (positive, negative, or neither) • Strength of the relationship (how much scattering?) • Form ( linear or some other pattern) • Unusual features (outliers or influential points) And ALWAYS in context of the problem! Identify as having a positive association, a negative association, or no association. 1. Heights of mothers & heights of their adult + daughters 2. Age of a car in years and its current value 3. Weight of a person and calories consumed + 4. Height of a person and the person’s birth NO month 5. Number of hours spent in safety training and the number of accidents that occur The closer the points in a The farther away from a scatterplot are to a straight straight line – the weaker the line - the stronger the relationship relationship. Correlation measures the direction and the strength of a linear relationship between 2 quantitative variables. Find the mean and standard deviation of the heights and weights of the 10 students: Height (in) Weight (lb) 74 256 65 124 77 320 x 67.6 sx 6.06 72 185 68 158 60 129 62 103 y 171.1 73 196 61 110 64 130 s y 70.25 r x i x y y xi x y i y i s s s x y x sy xi x y i y 1 n 1 s x s y = Find the mean and standard deviation of the heights and weights of the 10 students: Height (in) Weight (lb) 74 256 65 124 77 320 72 185 68 158 60 129 62 103 73 196 61 110 64 130 r x i x y y xi x y i y i s s s x y x sy 1.06 -.43 1.55 .73 .07 -1.25 -.92 .89 -1.09 -.59 1.21 -.67 2.12 .20 -.19 -.60 -.97 .35 -.87 -.59 xi x y i y 1 n 1 s x s y 1.28 .29 3.29 .14 -.01 .75 .90 .32 .95 .35 = 8.24/9= .9157 Correlation Coefficient (r)• A quantitative assessment of the strength & direction of the linear relationship between bivariate, quantitative data • Pearson’s sample correlation is used most • parameter - r (rho) • statistic - r xi x yi y 1 r n 1 s x s y Speed Limit (mph) 55 50 45 40 30 20 Avg. # of accidents (weekly) 28 25 21 17 11 6 Calculate r. Interpret r in context. r = .9964 There is a strong, positive, linear relationship between speed limit and average number of accidents per week. Properties of r (correlation coefficient) • legitimate values of r is [-1,1] No Correlation Strong correlation Moderate Correlation Weak correlation -1 -.8 -.5 0 .5 .8 1 Some Correlation Pictures Some Correlation Pictures Some Correlation Pictures Some Correlation Pictures Some Correlation Pictures Some Correlation Pictures x (in mm) 12 y 4 15 7 21 10 32 14 26 9 19 8 Find r. .9181 Interpret r in context. There is a strong, positive, linear relationship between speed limit and the number of weekly accidents. 24 12 • value of r is not changed by any transformations x (in mm) 12 y 4 15 7 21 10 32 14 26 9 19 8 24 12 Find r. following Do the transformations & .9181 Change to cmcalculate & find r.r .9181 1) 5(x + 14) 2) (y + 30) ÷ 4 The correlations are the same. STILL = .9181 • value of r does not depend on which of the two variables is labeled x Switch x & y & find r. Type: LinReg L2, L1 The correlations are the same. • value of r is non-resistant x y 12 4 15 7 21 10 32 14 26 9 19 8 24 22 Find r. Outliers affect the correlation coefficient • value of r is a measure of the extent to which x & y are linearly related Find the correlation for these points: x -3 -1 1 3 5 7 9 Y 40 20 8 4 8 20 40 What does this correlation mean? Sketch the scatterplot r = 0, but has a definite relationship! 1) Correlation makes no distinction between explanatory and response variable. It is unitless. 2) Correlation does not change when we change the units of measurement of x, y, or both. 3) Correlation requires both variables to be quantitative. 4) Correlation does not describe curved relationship between variables, no matter how strong. Only the linear relationship between variables. 5) Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Correlation does not imply causation Correlation does not imply causation Correlation does not imply causation