* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Section 10
Survey
Document related concepts
Transcript
Chapter 4 – Describing the Relation between Two Variables Section 4.1 – Scatter Diagrams and Correlation Correlation – There is a correlation between two variables when one of them is related to the other in some way. It is a measure of the strength of an association. The response variable (y value) is the variable whose value can be explained by the value of the explanatory or predictor variable (x value). Scatter Diagram – A graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point. The explanatory variable is plotted on the horizontal axis and the response variable is plotted on the vertical axis. #17 page 200 #19 page 200 Answers: 17. r = .896 and there is a linear relation 19. r = -.496 and there is not a linear relation The scatter diagram can help determine if the variables have a linear relationship. Note the graphs on page 192. Two variables can be linearly related – positively associated or negatively associated, nonlinearly related, or have no relation. 1 The scatterplot of the data can help determine if the two variables are positively associated, negatively associated, or have no association. Note the graphs on page 194. The linear correlation coefficient (r) or Pearson product moment correlation coefficient is a measure of the strength and direction of the linear correlation between two quantitative variables. n xy ( x )( y ) r n( x 2 ) ( x ) 2 n( y 2 ) ( y ) 2 Properties of the linear correlation coefficient r: –1 ≤ r ≤ 1 The closer r is to +1, the stronger is the evidence of positive association. The closer r is to –1, the stronger is the evidence of negative association. If r is close to 0, then little or no evidence exists of a linear relationship between the two variables. 5. The value of r is unitless. The units of x and y play no role in the interpretation of r. 6. r is not resistant, ie, outliers or points that do not follow the pattern will affect the value of r. 1. 2. 3. 4. Calculator: To Draw Scatter Diagrams: 1. Enter the explanatory variable in L1 and the response variable in L2. 2. Press 2nd Y= to bring up the StatPlot menu. 3. Press Enter (or Select 1:Plot1). 4. Turn Plot 1 on by highlighting the On button and pressing Enter. 5. Highlight the scatter diagram icon and press Enter. [Xlist is L1, Ylist is L2] 6. Press Zoom and select 9:ZoomStat. Correlation Coefficient: 1. Turn the diagnostics on by selecting the catalog 2nd 0. Scroll down and select DiagnosticOn. Hit Enter twice to activate diagnostics. 2. With the explanatory variable in L1 and the response variable in L2, press Stat, highlight Calc and select 4:LinReg (ax + b). With LinReg on the Home screen, press Enter. 2 Hypothesis Test Claim: There is a linear relationship between the two variables. Hypothesis: H0: 0 (The population parameter is the Greek letter rho) H1: 0 Level of significance: use .05 Critical Value: found in Table II in appendix A Picture: Scatterplot Test Statistic: r (from the calculator) LinRegTTest P-value: p (from the calculator) LinRegTTest Decision: Reject H0 or Do Not Reject H0 Conclusion: There is or is not sufficient evidence to conclude that there is a linear relationship between the variables. To use the value of r to determine if the correlation between the two variables is strong enough to conclude that there is a linear relationship between them (ie. significant linear correlation), find the critical value in Table II in appendix A. If the absolute value of r exceeds the value in Table II, then a linear relationship exists between the two variables. r Critical Value linear relationship Assume that 8 pairs of data result in a value of r = 0.939. Is there a linear relationship between x and y ? Page 199 9, 13, 15, 24, 25 3 Section 4.2 – Least-Squares Regression Once the scatter diagram and linear correlation coefficient indicate that a linear relationship exists between two variables, the next step is to find a linear equation that describes the relationship between the two variables. The difference between the observed value of y and the predicted value of y ( ŷ ) is the error or residual. It is found using the regression equation. ( y – ŷ ) Note: The residual represents how close our prediction comes to the actual observation. The smaller the residual, the better the prediction. Regression – A statistical procedure for finding a model (equation or formula) that describes that relationship. Regression models might be used for prediction purposes. The least-squares regression line (usually called the regression line) is the line that minimizes the sum of the residuals. It is also known as the line of best-fit. Equation of the Least-Squares Regression Line: where b1 r sy sx Calculator: LinRegTTest (slope of the line) and ŷ b1 x b0 b0 y b1 x (y-intercept of the line) ŷ = a + bx, the values of a and b will be given Using Regression Lines to Make Predictions: When predicting a y-value based on some value of x, If there is a significant linear correlation between the variables, then the regression equation can be used to predict ŷ for a value of x. If there is NOT a significant linear correlation between the variables, then the regression equation should Not be used to predict ŷ . Since there are no other models, the predicted ŷ in this case is just y (the mean of the y values) . When using the regression equation to predict, stay within the scope of the sample data. Page 215 4, 7, 19 4