Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS Chapter 5 Correlation/Regression MVS 250: V. Katch 1 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction 2 Correlation 3 Definition Correlation exists between two variables when one of them is related to the other in some way 4 Assumptions 1. The sample of paired data (x,y) is a random sample. 2. The pairs of (x,y) data have a bivariate normal distribution. 5 Definition Scatterplot (or scatter diagram) is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point. 6 Scatter Diagram of Paired Data 7 Scatter Diagram of Paired Data 8 Positive Linear Correlation y y y (a) Positive x x x (b) Strong positive (c) Perfect positive Scatter Plots 9 Negative Linear Correlation y y y (d) Negative x x x (e) Strong negative (f) Perfect negative Scatter Plots 10 No Linear Correlation y y x (g) No Correlation x (h) Nonlinear Correlation Scatter Plots 11 Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample xy/n - (x/n)(y/n) r= (SDx) (SDy) Where xy/n is the mean of the cross products; (x/n) is the mean of the x variable; (y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable 12 Notation for the Linear Correlation Coefficient n number of pairs of data presented denotes the addition of the items indicated. x/n denotes the mean of all x values. y/n denotes the mean of all y values. xy/n denotes the mean of the cross products [x times y, summed; divided by n] r linear correlation coefficient for a sample linear correlation coefficient for a population 13 Rounding the Linear Correlation Coefficient r Round to three decimal places Use calculator or computer if possible 14 Properties of the Linear Correlation Coefficient r 1. -1 r 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship. 15 Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Remember to use n-2 16 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation. 17 Common Errors Involving Correlation 250 Distance (feet) 200 150 100 50 0 0 1 2 3 4 5 6 7 8 Time (seconds) 18 Correlation is Not Causation A B C 19 Correlation Calculations Rank Order Correlation - Rho Pearson’s - r 20 Rank Order Correlation Hits 1 Rank 10 HR 3 Rank 8 D 2 D2 4 2 3 4 5 9 8 7 6 4 5 1 7 7 6 10 4 2 2 -3 2 4 4 9 4 6 7 8 5 4 3 6 2 10 5 9 1 0 -5 2 0 25 4 9 10 2 1 9 8 2 3 0 2 0 4 21 Rank Order Correlation, cont Rho = 1- [6 2 (∑D ) Hits Rank HR Rank D D2 1 10 3 8 2 4 2 9 4 7 2 4 3 8 5 6 2 4 4 7 1 10 -3 9 5 6 7 4 2 4 6 5 6 5 0 0 7 4 2 9 -5 25 8 3 10 1 2 4 9 2 9 2 0 0 10 1 8 3 2 4 N=10 /N 2 (N -1)] Rho = 1- [6(58)/10(102-1)] Rho = 1- [348 / 10 (100 -1)] Rho = 1- [348 / 990] Rho = 1- 0.352 Rho = 0.648 (∑D2 = 58) 22 Pearson’s r Hits HR xy 1 3 3 2 4 8 3 5 15 4 1 4 5 7 35 6 6 36 7 2 14 8 10 80 9 9 81 10 8 80 x/n x/n xy/n =5.5 = 5.5 =32.86 xy/n - (x/n)(y/n) r= (SDx) (SDy) r = 32.86 - (5.5) (5.5)/(3.03) (3.03) r = 35.86 - 30.25 / 9.09 r = 5.61 / 9.09 r = 0.6172 23 Pearson’s r Excel Demonstration 24 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household 0.27 1.41 2 3 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 25 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household 0.27 1.41 2 3 Plastic 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 Household 2 3 3 6 4 2 1 5 26 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household 0.27 1.41 2 3 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 Household size Plastic Garbage v Household size 7 6 5 4 3 2 1 0 r = 0.842 R2 2= 0.7096 R = 0.71 0 0.5 1 1.5 2 2.5 3 3.5 Plastic (lbs) 27 Is there a significant linear correlation? n=8 = 0.05 =0 : 0 H 0: H1 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table R with n = 8 and = 0.05) TABLE R Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 = .05 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 = .01 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 28 Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0: = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject =0 -1 r = - 0.707 Fail to reject =0 0 Reject =0 r = 0.707 1 Sample data: r = 0.842 29 Method 1: Test Statistic is t (follows format of earlier chapters) 30 Formal Hypothesis Test To determine whether there is a significant linear correlation between two variables Two methods Both methods let H0: = (no significant linear correlation) H1: (significant linear correlation) 31 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table R (no degrees of freedom) 32 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Reject =0 -1 r = - 0.811 Fail to reject =0 0 Reject =0 r = 0.811 1 Sample data: r = 0.828 33 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: t= r 1-r2 n-2 Critical values: use Table T with degrees of freedom = n - 2 34 Start Testing for a Linear Correlation Let H0: = 0 H1: 0 Select a significance level Calculate r using Formula 9-1 METHOD 1 METHOD 2 The test statistic is t= The test statistic is r r Critical values of t are from Table A-6 1-r2 n -2 Critical values of t are from Table A-3 with n -2 degrees of freedom If the absolute value of the test statistic exceeds the critical values, reject H0: = 0 Otherwise fail to reject H0 If H0 is rejected conclude that there is a significant linear correlation. If you fail to reject H0, then there is not sufficient evidence to conclude that there is linear correlation. 35 Why does the critical value of r increase as sample size decreases? A correlation by chance is more likely. 36 Coefficient of Determination (Effect Size) r2 The part of variance of one variable that can be explained by the variance of a related variable. 37 Justification for r Formula r= (x -x) (y -y) (n -1) Sx Sy (x, y) centroid of sample points x=3 y x - x = 7- 3 = 4 (7, 23) • 24 20 y - y = 23 - 11 = 12 Quadrant 1 Quadrant 2 16 • 12 8 • Quadrant 3 •• 4 y = 11 (x, y) Quadrant 4 x 0 0 1 2 3 4 5 6 7 38