Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Time series wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Linear regression wikipedia , lookup
SPSS Class Notes Analyzing Data 1.0 Demonstration and explanation For this section we will be using the hs1.sav data set that we worked with in previous sections. File Open Data select C:\spss\hs1.sav t-tests This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50. Analyze Compare Means One Sample t-test select write and compare it to 50 This is the two-sample independent t-test with separate (unequal) variances. Analyze Compare Means Independent Samples t-test select write as the dependent variable and female as the independent variable This is the paired t-test, testing whether or not the mean of write equals the mean of science. Analyze Compare Means Paired Samples t-test select write and science Anova In this example the GLM command is used to perform a one-way analysis of variance (ANOVA). Analyze General Linear Models Univariate select write as the dependent variable and prog as the fixed factor In this example the GLM command is used to perform a two-way analysis of variance (ANOVA). The plot option creates plots of the means, which can be a great visual aid to understanding the data. Analyze General Linear Models Univariate select write as the dependent variable and prog and ses as fixed factors Plots select prog to be the X axis and ses to be the separate lines The Tukey test is used to test all the pair-wise comparisons of the levels of prog. Repeat the above analysis (dialogue recall) Post Hoc select prog and choose Tukey test Here the GLM command performs an analysis of covariance (ANCOVA). Note that the results are exactly the same as in the regression where write and science are regressed on math. Analyze General Linear Models Univariate select math as the dependent variable and science and write as covariates model select custom choose main effect in the build terms field and select every variable in the Factors & Covariates field and move them to the Model field. Regression This is plain old OLS regression. Analyze Regression Linear select math as the dependent variable and write and science as independent variables It is often very useful to look at the standardized residual versus standardized predicted plot in order to look for outliers and to check for homogeneity of variance. The ideal situation is to see no observations beyond the reference lines, which means that there are no outliers. Also, we would like the points on the plot to be distributed randomly, which means that all the systematic variance has been explained by the model. Analyze Regression Linear select math as the dependent variable and female, write and socst as independent variables Plots select Zresid for the Y axis and ZPred for the X axis Double click on the plot Chart Reference line click on Y and then OK add a line at Y = -2.5 and at Y = 2.5 As you can see, there is one outlier. Next, we will create an outlier by changing the writing score for student 1 (id=1) to 100 (write=100), and then repeat the above analysis. Repeat the above analysis (dialogue recall) Double click on the plot Chart Reference line click on Y then on OK add a line at Y = -2.5 and at Y = 2.5 Let's us change the writing score for student 1 back to 44 and then we will use the save option to create a variable in the data set called res_1, which is the unstandardized residual. Repeat the above analysis (dialogue recall) Save check the "unstandardized residual" box The P-P plots command produces a normal probability plot. It is a method of testing if the residuals from the regression are normally distributed. Graph P-P plots select res_1 and the test distribution to be "normal" The Q-Q plots produces a normal quantile plot. It is another method for testing if the residuals are normally distributed. The normal quantile plot is more sensitive to deviances from normality in the tails of the distribution, whereas the normal probability plot is more sensitive to deviances near the mean of the distribution. Graph Q-Q plots Select res_1 and the test distribution to be "normal" Logistic regression Logistic regression requires a dependent variable that is dichotomous (i.e., has only two values). As we do not have such a variable in our data set, we will create one called honcomp (honors composition). This is purely for illustrative purposes only! Transform Compute select honcomp for the "target variable" and for numeric expression enter "write >= 60". Analyze Regression Binary Logistic select honcomp as the dependent variable, and select read and socst as covariates Non-parametric tests The binomial test is the nonparametric analog of the single-sample two-sided t-test. Analyze Non-Parametric Tests Binomial select write and define the cut point to be 50 The signrank test is the nonparametric analog of the paired t-test. Analyze Non-Parametric Tests 2 Related Samples select write and read as the test pair list and select Wilcoxon as the test type The Mann Whitney U test is the nonparametric analog of the independent twosample t-test. Analyze Non-Parametric Tests 2 Independent Samples select write as the test variable list, female as the group variable and select Mann Whitney U as the test type The Kruskal Wallis test is the nonparametric analog of the one-way ANOVA. Analyze Non-Parametric Tests K Independent Samples select write as the test variable list and select prog as the group variable The density plot type displays a density graph of the residuals. This is useful in verifying that the residuals are normally distributed which is a very important assumption for regression. SPLUS Create SPLUS graph... select res_1 and move to "selected variables" click "Finish" select Density(x) Plot as the plot type 2.0 Syntax version * t-tests. t-test /testval=50 /variables=write. t-test groups=female(0 1) /variables=write. t-test pairs= write with science (paired). * anova. glm write by prog /design = prog. glm write by prog ses /design = prog, ses, prog*ses /plot = profile(prog*ses). glm write by prog ses /design = prog, ses, prog*ses /posthoc = prog(tukey). * ancova. glm math with science write /design= science write. * regression. regression /dependent math /method=enter write science. regression /dependent math /method=enter socst write ses /scatterplot=(*zresid ,*zpred ). * creating an outlier, running the regression and looking at the outlier. if id=1 write=100. exe. regression /dependent math /method=enter socst write ses /scatterplot=(*zresid ,*zpred ). * removing the outlier. if id=1 write=44. exe. regression /dependent math /method=enter socst write ses /save resid. *residual plots. pplot /variables=res_1 /type=p-p /dist=normal. pplot /variables=res_1 /type=q-q /dist=normal. * creating a dichotomous variable. compute honcomp = (write > 60). execute. * logistic regression. logistic regression var=honcomp /method=enter read socst. * non-parametric tests. * binomial test. npar test /binomial (.50)= write (50). * sign test. npar test /sign= read with write (paired). *signrank test. npar tests /m-w= write by female(1 0). * kruskal-wallis test. npar tests /k-w=write by prog(1 3).