* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download I. Descriptive Statistics (3) Correlational studies Relationship
Survey
Document related concepts
Transcript
I. Descriptive Statistics (3) Correlational studies Correlation Relationship between variables Scatterplots (Scatter Diagrams) Measuring Correlation: Studies in which two or more variables are measured to find the direction and degree to which they covary. • The Correlation Coefficient: r Covary: Two variables covary when a change in one variable is related to a consistent change in the other variable Relationship between variables Postive relationship A relationship between two variables in which, as the value of one variable increases, the value of the other variable tends to increase also Scatterplots (Scatter Diagrams) Bivariate distribution: A distribution in which two scores are obtained from each subject Negative relationship A relationship between two variables in which, as the value of one variable increases, the value of the other variable tends to decrease No relationship Lack of relationship. No regularity among the pairs of values of the variables Scatterplots (Scatter Diagrams) Postive correlation Scatterplot: A graph of a bivariate distribution in which the X variable is plotted on the horizontal axis and the Y variable is plotted on the vertical axis Scatterplots (Scatter Diagrams) Negative correlation 1 Scatterplots (Scatter Diagrams) No correlation Scatterplots (Scatter Diagrams) Linear relationship A relationship between two variables that can be described by a straight line Curvilinear relationship A relationship between two variables that can be described best with a curved line Measuring Correlation Correlation Coefficient A number between –1 and 1 that describes the relationship between pairs of variables Pearson Correlation Coefficient A statistic, symbolized by r, that indicates the degree of linear relationship between two variables measured at the interval or ratio level -1 ≤ r ≤ 1 Measuring Correlation Measuring Correlation Pearson Correlation _ _ Coefficient r = ∑(x - X)(y - Y) / (n sx sy) r = ∑ zx zy / n (z score formula) r = (n ∑xy - ∑x ∑y) / [√(n ∑x2 – (∑x)2 ) √(n ∑y2 – (∑y)2 )] (computational formula) What does r mean? r is a type of mean: the mean of the products of paired z scores Based on a measure of covariation: Cross Products The value of r is a measure of how well a straight line describes the cluster of dots in a scatterplot I. Descriptive Statistics (4) Regression But keep in mind that, … a high correlation does not mean that there is a cause-effect relation! Experimentation is needed! Regression Line and Predictive Errors Least Squares Regression Line Least Squares Regression Equation Standard Error of prediction What is r2? 2 Regression: Building on Correlation Prediction (regression) vs. relation (correlation) (Simple) linear regression: Statistical tool used to predict scores on the dependent variable from scores on (one) independent variable Regression Line and Predictive errors When a bivariate distribution shows a linear relationship, it is sometimes useful to try to predict X from Y using a regression line. This line is conceived as an approximation to the cloud of data observed in the scatterplot. Regression Line and Predictive errors Equation for a line Slope: The amount that Y is predicted to increase for an increase of 1 in X. Y-intercept: the predicted value for Y when X is 0 (point at which the line intercepts the y-axis) Predictive error: It is the difference, for each X, between the observed corresponding Y and the value of the Ycoordinate. The position of the regression line should minimize the total predictive error. y = 2x + 5 Least Squares Regression line least squares regression line: the prediction line that minimizes the total squared predictive error It has the form: y = mx + n Least Squares equation The least squares regression equation minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis. Yˆ = bX + a The slope is: b = r(Sy/Sx_) _ The Y-intercept is: a = Y – bX Assumptions: Linearity and Homoscedasticity 3 Least Squares equation The least squares regression equation minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis. Yˆ = bX + a The slope is: b = r(Sy/S_x) _ The Y-intercept is: a = Y – bX Standard error of prediction How to determine the amount of error associated with these predictions? Standard error of prediction (or Standard error of the estimate): A statistic that indicates the typical distance between a regression line and the actual data points Where do the values for a and b come from? And what do they mean? (see pdf notes) Standard error of prediction Standard error of prediction (or Standard error of the estimate): What is r2? The squared correlation coefficient (r2): is the proportion of total variance in one variable that is predictable from its relationship with the other variable provides a measure of the worth of least squares as predictors r2) Sy|x = Sy √(1 – It is a (rough) measure of the average amount of predictive error by which known Y values deviate from predicted Yˆ values It reflects the degree to which the points diverge from the regression line. It reflects the accuracy of the prediction. r2 = (SStotal – SSerror ) / SStotal 4