Download SPSS Class Notes Analyzing Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
SPSS Class Notes
Analyzing Data
1.0 Demonstration and explanation
For this section we will be using the hs1.sav data set that we worked with in previous
sections.




File
Open
Data
select C:\spss\hs1.sav
t-tests
This is the one-sample t-test, testing whether the sample of writing scores was drawn
from a population with a mean of 50.




Analyze
Compare Means
One Sample t-test
select write and compare it to 50
This is the two-sample independent t-test with separate (unequal) variances.





Analyze
Compare Means
Independent Samples t-test
select write as the dependent variable and female as the
independent
variable
This is the paired t-test, testing whether or not the mean of write equals the mean of
science.




Analyze
Compare Means
Paired Samples t-test
select write and science
Anova
In this example the GLM command is used to perform a one-way analysis of variance
(ANOVA).



Analyze
General Linear Models
Univariate

select write as the dependent variable and prog as the
fixed factor
In this example the GLM command is used to perform a two-way analysis of variance
(ANOVA). The plot option creates plots of the means, which can be a great visual
aid to understanding the data.






Analyze
General Linear Models
Univariate
select write as the dependent variable and prog and ses as
fixed factors
Plots
select prog to be the X axis and ses to be the separate
lines
The Tukey test is used to test all the pair-wise comparisons of the levels of prog.




Repeat the above analysis (dialogue recall)
Post Hoc
select prog and choose Tukey test
Here the GLM command performs an analysis of covariance (ANCOVA). Note that
the results are exactly the same as in the regression where write and science are
regressed on math.








Analyze
General Linear Models
Univariate
select math as the dependent variable and science and
write as covariates
model
select custom
choose main effect in the build terms field and select
every variable in
the Factors & Covariates field and move them to the
Model field.
Regression
This is plain old OLS regression.





Analyze
Regression
Linear
select math as the dependent variable and write and
science as independent
variables
It is often very useful to look at the standardized residual versus standardized
predicted plot in order to look for outliers and to check for homogeneity of
variance. The ideal situation is to see no observations beyond the reference lines,
which means that there are no outliers. Also, we would like the points on the plot to
be distributed randomly, which means that all the systematic variance has been
explained by the model.












Analyze
Regression
Linear
select math as the dependent variable and female, write
and socst
as independent variables
Plots
select Zresid for the Y axis and ZPred for the X axis
Double click on the plot
Chart
Reference line
click on Y and then OK
add a line at Y = -2.5 and at Y = 2.5
As you can see, there is one outlier. Next, we will create an outlier by changing the
writing score for student 1 (id=1) to 100 (write=100), and then repeat the above
analysis.






Repeat the above analysis (dialogue recall)
Double click on the plot
Chart
Reference line
click on Y then on OK
add a line at Y = -2.5 and at Y = 2.5
Let's us change the writing score for student 1 back to 44 and then we will use the
save option to create a variable in the data set called res_1, which is the
unstandardized residual.



Repeat the above analysis (dialogue recall)
Save
check the "unstandardized residual" box
The P-P plots command produces a normal probability plot. It is a method of testing
if the residuals from the regression are normally distributed.



Graph
P-P plots
select res_1 and the test distribution to be "normal"
The Q-Q plots produces a normal quantile plot. It is another method for testing if the
residuals are normally distributed. The normal quantile plot is more sensitive to
deviances from normality in the tails of the distribution, whereas the normal
probability plot is more sensitive to deviances near the mean of the distribution.


Graph
Q-Q plots

Select res_1 and the test distribution to be "normal"
Logistic regression
Logistic regression requires a dependent variable that is dichotomous (i.e., has only
two values). As we do not have such a variable in our data set, we will create one
called honcomp (honors composition). This is purely for illustrative purposes only!









Transform
Compute
select honcomp for the "target variable" and for numeric
expression enter
"write >= 60".
Analyze
Regression
Binary Logistic
select honcomp as the dependent variable, and select read
and socst as
covariates
Non-parametric tests
The binomial test is the nonparametric analog of the single-sample two-sided t-test.




Analyze
Non-Parametric Tests
Binomial
select write and define the cut point to be 50
The signrank test is the nonparametric analog of the paired t-test.





Analyze
Non-Parametric Tests
2 Related Samples
select write and read as the test pair list and select
Wilcoxon
as the test type
The Mann Whitney U test is the nonparametric analog of the independent twosample t-test.





Analyze
Non-Parametric Tests
2 Independent Samples
select write as the test variable list, female as the
group variable and
select Mann Whitney U as the test type
The Kruskal Wallis test is the nonparametric analog of the one-way ANOVA.


Analyze
Non-Parametric Tests


K Independent Samples
select write as the test variable list and select prog as
the group variable
The density plot type displays a density graph of the residuals. This is useful in
verifying that the residuals are normally distributed which is a very important
assumption for regression.





SPLUS
Create SPLUS graph...
select res_1 and move to "selected variables"
click "Finish"
select Density(x) Plot as the plot type
2.0 Syntax version
* t-tests.
t-test
/testval=50
/variables=write.
t-test
groups=female(0 1)
/variables=write.
t-test
pairs= write with science (paired).
* anova.
glm
write by prog
/design = prog.
glm
write by prog ses
/design = prog, ses, prog*ses
/plot = profile(prog*ses).
glm
write by prog ses
/design = prog, ses, prog*ses
/posthoc = prog(tukey).
* ancova.
glm
math with science write
/design= science write.
* regression.
regression
/dependent math
/method=enter write science.
regression
/dependent math
/method=enter socst write ses
/scatterplot=(*zresid ,*zpred ).
* creating an outlier, running the regression and looking at the
outlier.
if id=1 write=100.
exe.
regression
/dependent math
/method=enter socst write ses
/scatterplot=(*zresid ,*zpred ).
* removing the outlier.
if id=1 write=44.
exe.
regression
/dependent math
/method=enter socst write ses
/save resid.
*residual plots.
pplot
/variables=res_1
/type=p-p
/dist=normal.
pplot
/variables=res_1
/type=q-q
/dist=normal.
* creating a dichotomous variable.
compute honcomp = (write > 60).
execute.
* logistic regression.
logistic regression var=honcomp
/method=enter read socst.
* non-parametric tests.
* binomial test.
npar test
/binomial (.50)= write (50).
* sign test.
npar test
/sign= read with write (paired).
*signrank test.
npar tests
/m-w= write by female(1 0).
* kruskal-wallis test.
npar tests
/k-w=write by prog(1 3).