Download Section 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 4 – Describing the Relation between Two Variables
Section 4.1 – Scatter Diagrams and Correlation
Correlation – There is a correlation between two variables when one of them is related to
the other in some way. It is a measure of the strength of an association.
The response variable (y value) is the variable whose value can be explained by the value of
the explanatory or predictor variable (x value).
Scatter Diagram – A graph that shows the relationship between two quantitative variables
measured on the same individual. Each individual in the data set is represented by a point.
The explanatory variable is plotted on the horizontal axis and the response variable is plotted
on the vertical axis.
#17 page 200
#19 page 200
Answers:
17. r = .896 and there is a linear relation
19. r = -.496 and there is not a linear relation
The scatter diagram can help determine if the variables have a linear relationship.
Note the graphs on page 192. Two variables can be linearly related – positively associated or
negatively associated, nonlinearly related, or have no relation.
1
The scatterplot of the data can help determine if the two variables are positively associated,
negatively associated, or have no association. Note the graphs on page 194.
The linear correlation coefficient (r) or Pearson product moment correlation coefficient is
a measure of the strength and direction of the linear correlation between two quantitative
variables.
n  xy  (  x )(  y )
r
n(  x 2 )  (  x ) 2 n(  y 2 )  (  y ) 2
Properties of the linear correlation coefficient r:
–1 ≤ r ≤ 1
The closer r is to +1, the stronger is the evidence of positive association.
The closer r is to –1, the stronger is the evidence of negative association.
If r is close to 0, then little or no evidence exists of a linear relationship
between the two variables.
5. The value of r is unitless. The units of x and y play no role in the interpretation of r.
6. r is not resistant, ie, outliers or points that do not follow the pattern will affect the
value of r.
1.
2.
3.
4.
Calculator:
To Draw Scatter Diagrams:
1. Enter the explanatory variable in L1 and the response variable in L2.
2. Press 2nd Y= to bring up the StatPlot menu.
3. Press Enter (or Select 1:Plot1).
4. Turn Plot 1 on by highlighting the On button and pressing Enter.
5. Highlight the scatter diagram icon and press Enter. [Xlist is L1, Ylist is L2]
6. Press Zoom and select 9:ZoomStat.
Correlation Coefficient:
1. Turn the diagnostics on by selecting the catalog 2nd 0. Scroll down and
select DiagnosticOn. Hit Enter twice to activate diagnostics.
2. With the explanatory variable in L1 and the response variable in L2, press
Stat, highlight Calc and select 4:LinReg (ax + b). With LinReg on the Home
screen, press Enter.
2
Hypothesis Test
Claim: There is a linear relationship between the two variables.
Hypothesis: H0:   0 (The population parameter is the Greek letter rho)
H1:   0
Level of significance: use   .05
Critical Value: found in Table II in appendix A
Picture: Scatterplot
Test Statistic: r (from the calculator)
LinRegTTest
P-value: p (from the calculator) LinRegTTest
Decision: Reject H0 or Do Not Reject H0
Conclusion: There is or is not sufficient evidence to conclude that there is a linear relationship
between the variables.
To use the value of r to determine if the correlation between the two variables is strong
enough to conclude that there is a linear relationship between them (ie. significant linear
correlation), find the critical value in Table II in appendix A. If the absolute value of r exceeds
the value in Table II, then a linear relationship exists between the two variables.
r  Critical Value  linear relationship
Assume that 8 pairs of data result in a value of r = 0.939. Is there a linear relationship
between x and y ?
Page 199 9, 13, 15, 24, 25
3
Section 4.2 – Least-Squares Regression
Once the scatter diagram and linear correlation coefficient indicate that a linear relationship
exists between two variables, the next step is to find a linear equation that describes the
relationship between the two variables.
The difference between the observed value of y and the predicted value of y ( ŷ ) is the error
or residual. It is found using the regression equation. ( y – ŷ ) Note: The residual
represents how close our prediction comes to the actual observation. The smaller the residual,
the better the prediction.
Regression – A statistical procedure for finding a model (equation or formula) that describes
that relationship. Regression models might be used for prediction purposes.
The least-squares regression line (usually called the regression line) is the line that
minimizes the sum of the residuals. It is also known as the line of best-fit.
Equation of the Least-Squares Regression Line:
where
b1  r 
sy
sx
Calculator: LinRegTTest
(slope of the line) and
ŷ  b1 x  b0
b0  y  b1 x (y-intercept of the line)
ŷ = a + bx, the values of a and b will be given
Using Regression Lines to Make Predictions:
When predicting a y-value based on some value of x,
If there is a significant linear correlation between the variables, then the regression
equation can be used to predict ŷ for a value of x.
If there is NOT a significant linear correlation between the variables, then the
regression equation should Not be used to predict ŷ . Since there are no other
models, the predicted ŷ in this case is just y (the mean of the y values) .
When using the regression equation to predict, stay within the scope of the sample
data.
Page 215 4, 7, 19
4