Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Forecasting wikipedia , lookup
Choice modelling wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Time series wikipedia , lookup
Regression analysis wikipedia , lookup
Happiness comes not from material wealth but less desire. 1 Applied Statistics Using SAS and SPSS Topic: Simple linear regression By Prof Kelly Fan, Cal State Univ, East Bay 2 Example: Computer Repair A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are: Time and Units. 3 Humm… How long will it take me to repair this unit? Goal: to predict the length of repair Time for a given number of computer Units 4 Computer Repair Data Units Min’s Units Min’s 1 23 6 97 2 29 7 109 3 49 8 119 4 64 9 149 4 74 9 145 5 87 10 154 6 96 10 166 5 Graphical Summary of Two Quantitative Variable Scatterplot of response variable against explanatory variable What is the overall (average) pattern? What is the direction of the pattern? How much do data points vary from the overall (average) pattern? Any potential outliers? 6 Summary for Computer Repair Data Scatterplot (Time vs Units) Some Simple Conclusions Time is Linearly related with computer Units. (The length of) Time is Increasing as (the number of) Units increases. Data points are closed to the line. No potential outlier. 7 Numerical Summary of Two Quantitative Variable Regression equation Correlation 8 Review: Math Equation for a Line Y: the response variable X: the explanatory variable Y=b0+b1X Y } b1 1 } b0 X 9 Regression Equation The regression line models the relationship between X and Y on average. The math equation of a regression line is called regression equation. 10 The Usage of Regression Equation Predict the value of Y for a given X value Eg. How long will it take to repair 3 computer units? 11 General Notation Yˆ b0 b1 X is called “predicted Y,” pronounced as “y hat,” which estimates the average Y value for a specified X value. Eg. Yˆ 4.16 15.51 X The predicted repair time of a given # of units 12 The Limitation of the Regression Equation The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed. Eg. The predicted WT of a given HT: Yˆ 205 5 X Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!! 13 The Unpredicted Part The value Y Yˆ is the part the regression equation (model) cannot predict, and it is called “residual.” 14 residual { 15 Correlation between X and Y X and Y might be related to each other in many ways: linear or curved. 16 y 2.0 1.6 1.5 1.4 1.2 y 1.8 2.5 2.0 2.2 3.0 Examples of Different Levels of Correlation 0.0 0.2 0.4 0.6 x r=.98 Strong Linearity 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x r=.71 Median Linearity 17 2.5 y 2.0 3.0 1.5 2.5 1.0 2.0 y 3.5 4.0 3.0 Examples of Different Levels of Correlation 0.0 0.2 0.4 0.6 x r=-.09 Nearly Uncorrelated 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x r=.00 Nearly Curved 18 (Pearson) Correlation Coefficient of X and Y A measurement of the strength of the “LINEAR” association between X and Y Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y; the correlation coefficient of X and Y is: n r (y i 1 i y )( xi x ) (n 1) s y s x 19 Correlation Coefficient of X and Y -1< r < 1 The magnitude of r measures the strength of the linear association of X and Y The sign of r indicate the direction of the association: “-” negative association “+” positive association 20 Goodness of Fit R^2 is the proportion of Y variance explained/accounted by the model we use to fit the data When there is only one X (simple linear regression) R^2 = r^2. 21 SPSS Output Analyze >> Regression >> Linear Model Summaryb Model 1 R R Square a .994 .987 Adjus ted R Square .986 Std. Error of the Es timate 5.39172 a. Predictors : (Constant), units b. Dependent Variable: time ANOVAb Model 1 Sum of Squares Regress ion 27419.509 Res idual 348.848 Total 27768.357 df 1 12 13 Mean Square 27419.509 29.071 F 943.201 Sig. .000 a a. Predictors : (Constant), units b. Dependent Variable: time 22 Confidence Intervals Coefficientsa Model 1 (Cons tant) units Uns tandardized Coefficients B Std. Error 4.162 3.355 15.509 .505 Standardized Coefficients Beta .994 t 1.240 30.712 Sig. .239 .000 95% Confidence Interval for B Lower Bound Upper Bound -3.148 11.472 14.409 16.609 a. Dependent Variable: time 23 Check for Normality 24 Check for Equal Variances SCATTERPLOT of zresid & zpred 25 The Influence of Outliers The slope becomes smaller (toward outliers) 13 Y3 11 9 The r value becomes smaller (less linear) 7 5 4 6 8 10 12 14 X3 26 The Influence of Outliers The slope becomes clear (toward outliers) Scatterplot of y vs x 5 4 The | r | value becomes larger (more linear: 0.1590.935) y 3 2 1 0 0 2 4 6 8 10 x 27 Identify Outliers using Residual Plots Use “standardized” residuals!! The cases with standardized residuals of size 3 or more outliers 28