Download Conducting and Interpreting Multivariate Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Conducting and interpreting
multivariate analyses
using SPSS and Excel
David Patterson, College of Social Work
The University of Tennessee
September 18-19, 2006 – Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
Regression Analysis of HMIS Data
• What is regression analysis?
– AKA - Linear regression, Ordinary Least Squares (OLS)
– Bivariate regression • measures the association or relationship between a
dependent variable (DV) and an independent variable (IV).
• Estimates the measurable difference in the DV for each
one-unit of change in an IV.
– Multiple regression • Measures the relationship between a single DV and two or
more IV.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
2
Regression Analysis of HMIS Data
• What can regression analysis tell us about HMIS
data?
– It can us understand possible causal relationship
between certain outcomes (DV) and possible causal
factors (IV).
• E.G., Length of stay prior to housing (DV) and age (IV1),
duration of homelessness (IV2), and current income (IV3).
• Stated another way, how is length of stay prior to housing
predicted independently by each IV and through their
combined influence?
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
3
Regression Analysis of HMIS Data
• Challenges of regression analysis with HMIS data
– Level of measurement
• Many HMIS variables are NOT continuous variables,
required for the DV in multiple regression.
• Most are nominal, e.g. race, zip code, gender, disability
status.
– High levels of missing data in many variables
• Common in social services data sets
• Requires careful evaluation of extent and pattern of
missing data.
• Selection and implementation of missing data procedure
• Added complexity with nominal level data
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
4
Regression Analysis of HMIS Data
• Assumptions of regression analysis
– Normality - scores or observations obtained would be
normally distributed in the population of interest. Assumed if
sampling is random or includes random assignment. Generally
not the case in HMIS data. (Explore with a frequency
AGE
distribution).
70
• Note- Age is not quite
normally distributed in this
graph.
60
50
40
30
Fre q ue ncy
20
10
Std . Dev = 1 1 . 2 8
Me an = 3 8 .9
N = 626.00
0
7
5
.5
.5
.5
.5
7.
72
67
62
5
.5
5
57
2.
5
52
7.
.5
2.
.5
.5
7. 5
47
4
3
3
2
22
17
AGE
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
5
Regression Analysis of HMIS Data
• Equality of Variances - Homoscedasticity
– Points in the scatterplot of the residuals (difference between
the observed and predicted values) are randomly distributed
about a horizontal line from the mean of the residuals.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
6
Regression Analysis of HMIS Data
• Independence of Observations
– Scores or observations are independent of each other.
Independence means that the observations or values
independently derived and that one event or value will
not depend on another event or value.
– Durbin-Watson statistic between 1.5 and 2.5
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
7
Regression Analysis of HMIS Data
• Linearity - a linear relationship exists between
variables. Evaluate with a scatterplot of DV and IV.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
8
Regression Analysis of HMIS Data
• Two Methods Compared
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
9
Regression Analysis of HMIS Data
• Exploratory Research Question
– Is there are relationship between duration of homelessness
(days) and age, years of education, and weight?
• Method
– Regression analysis with SPSS and Excel using two data sets.
• Intention
– Demonstrate the utility of these two tools in regression
analysis with HMIS data.
– Demonstrate the challenges of regression analysis with HMIS
data.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
10
Regression Analysis of HMIS Data
• Steps of SPSS Regression - N= 1550
1. Report
downloaded from
HMIS data
system.
2. Data cleaned
and file prepared
in Excel.
3. Excel file opened
in SPSS.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
11
Regression Analysis of HMIS Data
• Opening linear regression dialog window
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
12
Regression Analysis of HMIS Data
• Linear regression dialog window for stats
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
13
Regression Analysis of HMIS Data
• Linear regression dialog window for plots
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
14
Regression Analysis of HMIS Data
Mean = average value for each
variable.
Standard deviation = measures
the dispersion of values from
the mean.
Together they describe the
shape of the distribution for
each variable
Correlations measure the strength
of the relationship between two
variables.
Correlation values range between
1.0 and -1.0.
The closer to zero, the weaker the
correlation
Note the weak correlations
between the DV and each of the IV
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
15
Regression Analysis of HMIS Data
• Results - While the results may be significant, are there problems
with the model?
Note the R Square
value.
R Square indicates
the proportion of
variation in the DV
explained by the IV.
In this model, an R
Square of .031
means that the 3 IV
account for very
little of the variance
in the DV.
The fact that the model is statistically significant
may be due to the large N (1550).
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
16
Regression Analysis of HMIS Data
• What do the plots tell us?
Distribution of the DV is highly
skewed.
September 18-19, 2006 - Denver, Colorado
Departure from the straight line
indicates the data are not normally
distributed.
Sponsored by the U.S. Department of Housing and Urban Development
17
Regression Analysis of HMIS Data
• Second SPSS regression analysis with smaller sample.
– N= 626
– (Constrained data set limiting homelessness to > 1 month and < 1
year.
Is the distribution
normal or skewed?
The F-stat used in
regression to test
the significance of
the model, is quite
robust to violations
of the assumption
of normality.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
18
Regression Analysis of HMIS Data
• Results
Note the weak
correlations
between the DV
and each of the IV.
Note there is no
statistically
significant bivariate
correlation
between the DV
and each of the IV.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
19
Regression Analysis of HMIS Data
Note the
regression
model is not
significant.
The results suggest that for this sample age, years of
education, and weight cannot be used to predict
duration (days) of homelessness.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
20
Regression Analysis of HMIS Data
• What do the plots tell us?
In this data set (N = 626), the
distribution is much less skewed
than the (N = 1550) data set.
September 18-19, 2006 - Denver, Colorado
The data are more normally
distributed than the (N = 1550) data
set.
Sponsored by the U.S. Department of Housing and Urban Development
21
Regression Analysis of HMIS Data
• Steps of Excel regression analysis
1. Report
downloaded from
HMIS data
system.
2. Data cleaned
and file prepared
in Excel.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
22
Regression Analysis of HMIS Data
• Excel produced histograms to examine the shape of the
distributions.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
23
Regression Analysis of HMIS Data
• Steps of Excel regression analysis
– Can use the Chart Wizard to produce scatterplots to examine the
bivariate correlation between the DV and the IV of the model.
There is a weak correlation between
the variables.
September 18-19, 2006 - Denver, Colorado
The variables are not correlated.
Sponsored by the U.S. Department of Housing and Urban Development
24
Regression Analysis of HMIS Data
• Steps of Excel regression analysis
1. Select Data
Analysis under
Tools in the
menu bar.
2. If Data Analysis
does not appear,
then select Addins. The Analysis
ToolPak.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
25
Regression Analysis of HMIS Data
• Steps of Excel regression analysis
Specify the input range
for the Y (IV) and the X
(DV) variables
Check boxes for all plots.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
26
Regression Analysis of HMIS Data
• Excel regression statistics are the same as the results from the
second SPPS analysis.
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
27
Regression Analysis of HMIS Data
• Excel Regression Plots
Residual plots are used to
check for regression
assumptions. Significant
patterns in the scatterplot
suggest a violation of
regression assumptions.
September 18-19, 2006 - Denver, Colorado
Use to check for the
Normality assumption.
Sponsored by the U.S. Department of Housing and Urban Development
28
Regression Analysis of HMIS Data
Results Comparison
Stat
SPSS
Excel
R Square
.002
.00
F-stat
.506
.51
Sig.
.678
.68
1.972
Not reported
Durbin-Watson
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
29
Spreadsheet Data Analysis Resources
Spreadsheet Tutorial
http://www.usd.edu/trio/tut/excel/index.html
Using Formulas and Functions
http://www.meadinkent.co.uk/excel.htm
Setting Up Data Analysis Tools in Excel
http://www-micro.msb.le.ac.uk/1010/toolpak.html
Excel Spreadsheet Tips
http://www.mrexcel.com/articles.shtml
Data Analys is with Spreads heets (with CD-ROM)
http://www.ablongm an.com/catalog/academic/product/0,114
4,020540751X,00.html
Using S preadsheets for Data Collection, Statistical Analysis
and Graphical Representation
http://web.utk.edu/%7Edap/Random/Order/Start.htm
September 18-19, 2006 - Denver, Colorado
Sponsored by the U.S. Department of Housing and Urban Development
30