Download spss2002. - Michigan State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
PRR389 SPSS & Statistics Page
Survey data analysis with SPSS 10.0
We will practice data analysis with the dataset from the 1996 Huron-Clinton Metro-parks visitor survey.
To prepare for lab: (November 11)
1. Read the brief description of HCMA survey methods (page 2).
2. Study questionnaire (handed out in class) and the codebook (end of this handout) to become familiar with
questions asked and how variables are coded in the computer data file.
3. Review basic statistical procedures (page 3-8, particularly 4 and 5)
4. Go to lab and walk through portions of the SPSS Tutorial, start at beginning
5. Skim SPSS Procedure summary (pages 9-15)
6. In lab we will walk you through the practice exercises (Page 13)
TIP: This exercise requires you have some familiarity with the HCMA survey, some
knowledge of basic statistics, and some familiarity with SPSS procedures. Thinking about
management, planning or policy questions that suggest particular analysis of this dataset is
also helpful. DON’T COME TO MICRO-LAB “COLD” without having looked at
questionnaire, attended class on Thursday, or reviewed any of these materials.
We will return to micro-lab on NOV 26 to provide individual help. You should first try to
complete the exercise yourself.
EXERCISE:
DUE – December 3
Formulate a couple of simple research questions/ mini-analysis for the HCMA survey, run appropriate
procedures in SPSS and report the results. Refer to the HCMA questionnaire and codebook to identify
measurement scales of variables and how to interpret each variable.
a. First describe two or more variables - run suitable descriptive statistics.
b. Then test at least one hypothesis about a relationship between two variables and/or estimate a
confidence interval around a population parameter estimate.
Either CROSSTAB procedure with a Chi square statistic
Or
MEANS procedure to compare means for two or more groups
Write up the analysis on at most one or two pages organizing the results for presentation. DO NOT simply
DUMP out the raw SPSS output. Create your own tables, format them nicely, and explain the results,
reporting only what is important. Attach the SPSS printouts of the procedures you ran as an appendix.
1
PRR389 SPSS & Statistics Page
2
1996 Huron Clinton Metroparks User Survey
BACKGROUND. The Huron Clinton Metropolitan Parks Authority (HCMA) manages a system of 13 parks in
southeast Michigan. As part of HCMA’s continuing effort to meet the needs of people of Southeast Michigan, a user
survey was conducted during 1995-96. The results of the survey will be used to update HCMA’s 5 year plan.
OBJECTIVES
1. Describe characteristics and patterns of use of HCMA park users
2. Identify trends in user characteristics and patterns via comparisons with previous surveys.
3. Identify and profile managerially relevant market segments
4. Evaluate visitor satisfaction with HCMA parks and measure visitor preferences for new facilities and programs.
METHODS: A self-administered survey of HCMA visitors was conducted between Dec 1, 1995 and November 30,
1996. Four page questionnaires were distributed to a sample of visitors in vehicles entering one of the 13 HCMA units
during this period.
The sample was stratified by park, season, weekend-weekday and time of arrival at the park. Sampling was
disproportionate across these strata to assure an adequate size sample for each park and season. Weights adjust the
sample to the actual distribution of visits in 1995-96. Each park distributed questionnaires on 10-12 dates during each
season. Dates were uniformly distributed throughout each season and divided evenly between weekends and weekdays.
Gate attendants distributed surveys to each vehicle entering the park during the first 5 minutes of each hour on sampling
dates. During busy periods surveys were given to every other vehicle and during slow periods sampling was conducted
for the first 10 minutes of each hour. Visitors could return surveys at drop boxes located at each park exit or by return
mail.
The four page questionnaire was developed from the 1990 HCMA survey instrument. Questions cover party
characteristics, use of daily vs annual permits, activities in the park, importance & satisfaction with park attributes (for
an I-P analysis), knowledge and use of HCMA parks, preferences for new programs and facilities, and a set of
household characteristics.
You will be analyzing data covering winter, spring, summer, and fall seasons. A total of 4,031 surveys were
completed over this period (overall response rate of 42%). Surveys by park range from 815 at Kensington to just over
80 at some of the more lightly used parks.
SUGGESTED ANALYSIS
1. Use descriptive statistics to profile parks users - some sample results at website (hcma study).
2. Compare two or more subgroups (maybe market segments defined by age, income, use of annual or daily permit, etc.
Develop segments by classifying visitors into useful subgroups and then describing important differences between
the subgroups. Example - see activity segment table at hcma results website (link is in (1)).
3. Test for a relationship between two variables using CROSSTABS (Chi square) or COMPARE MEANS (T/F-Test)
We will be using SPSS-PC to analyze this survey. The data file HCMA96.SAV is a specially coded SPSS data file that
can be retrieved from within SPSS. It is available in course AFS space.
PRR389 SPSS & Statistics Page
3
STATISTICS - SUMMARY
1. Functions of statistics
a. description: summarize a set of data
b. inference: make generalizations from sample to population. parameter estimates, hypothesis tests.
2. Types of statistics
i. Descriptive statistics: describe a set of data
a. frequency distribution - SPSS Frequency
b. central tendency: mean, median (order statistics), mode. SPSS - Descriptives
c. dispersion: range, variance & standard deviation in Descriptives
d. Others: shape -skewness, kutosis.
e. EDA procedures (exploratory data analysis) . SPSS Explore
Stem & leaf display: ordered array, freq distrib. & histogram all in one.
Box and Whisker plot: Five number summary-min.,Q1, median, Q3, and max.
Resistant statistics: trimmed and winsorized means,midhinge, interquartile deviation.
ii. Inferential statistics: make inferences from samples to populations.
a. Parameter estimation – compute confidence intervals around population parameters
b. Hypothesis testing - test relationships between variables
iii. Parameteric vs non-parametric statistics
a. parametric : assume interval scale measurements and normally distributed variables.
b. nonparametric (distribution free statistics) : generally weaker assumptions: ordinal or nominal
measurements, don't specify the exact form of distribution.
3. General rules for interpreting hypothesis tests.
i. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables
(e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column
variable in a crosstab table).
a. Pearson Correlation
rxy =0.
b. T-Test
x =y
c. One Way ANOVA
M1=M2=M3=...=Mn
d. Chi square : No relationship between X and Y. Formally, this is captured by the "expected table", which
assumes cells in the X-Y table can be generated completely from row and column totals.
ii.. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5%
chance of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and
look for SIG's <.01. Weaker, use 90% , SIG's < .10.
iii.. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc
are the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you
could get results like those you see from a random sample of this size IF there were no relationship between the two
variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no
relationship (the null hypothesis).
For 95% level, you REJECT null hypothesis if SIG <.05
If SIG > .05 you FAIL TO REJECT
REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship.
iv. Hypothesis tests are assessing if one can generalize from information in the sample to draw conclusions about
relationships in the population. With very small samples most null hypotheses cannot be rejected while with very
large samples almost any hypothesized relationship will be "statistically significant" - even when not practically
significant. Be cognizant of sample size (N) when making tests.
Type I error: rejecting null hypothesis when it is true. Prob of Type I error is 1-confidence level.
Type II error: failing to reject null hypothesis when it is false. Power of a test = 1-prob of a type II error.
PRR389 SPSS & Statistics Page
4
DESCRIPTIVE STATISTICS
As the name implies, these are used to describe characteristics of the sample or the population it is intended to represent.
Begin by describing variables one at a time (univariate statistics). There are two basic procedures for this:
FREQUENCIES
If variable is nominal, or ordinal with a small number of categories/levels, use SPSS FREQUENCIES procedure.
This will produce a table giving the number and percentage of cases that gave each of the possible responses.
Here is a sample SPSS output table from FREQUENCY of INCOME variable. Check questionnaire to see that income
was measured in 4 categories with a “choose not to answer” response. The codebook or Variable View page on SPSS
file will indicate variable was coded 1-4 for four income groups, and 5 for the “Choose not to answer” response.
TOTAL HOUSEHOLD INCOME BEFORE TAXES
Frequency
Percent
Valid
UNDER $25,000
$25,000 TO
$49,999
$50,000 TO
$74,999
$75,000 OR
MORE
Total
Missing CHOOSE NOT TO
ANSWER
System
Total
Total
Valid Cumulative
Percent
Percent
14.0
14.0
33.5
47.5
112
267
10.4
25.0
227
21.2
28.4
75.9
193
18.0
24.1
100.0
798
182
74.5
17.0
100.0
90
273
1071
8.4
25.5
100.0
The five possible responses are the rows. Notice response categories (values) are labeled (“Under $25” etc).
 Frequency = number of cases selecting this response
 Percent = percentage this is of al cases
 Valid Percent = percentage of “non-missing” cases. Here the No answer response is missing as is “system
mising”, cases that left this question blank.
 Cumulative Percent = running total (not always useful or relevant)
Generally, you want to report the Valid Percent as your best estimate of the percentages of all visitors (in the
population) in each income group. Raw counts are largely a function of sample size and not that useful.
DESCRIPTIVES
For interval or ratio scale variables, you usually want to compute means and standard deviations rather than frequencies.
Here’s table from running DESCRIPTIVES procedure with the age variable. Age was measured as interval scale.
AGE OF SUBJECT
Valid N (listwise)
N Minimum Maximum
925
16
86
925
Mean Std. Error
43.29
.48
Std. Deviation
14.52
In this case the average age was 43, lowest age in the sample was 16 and highest was 86. The average is based on 925
cases that answered this question. The standard deviation indicates the “spread” of ages in the sample. You may
compute a 95% confidence interval for the estimate of average age by computing the standard error = standard
deviation/ sqrt(n). In this example SE = 14.52/sqrt(925) = .48. A 95 % confidence interval is two standard errors either
side of the mean = (43 + or – 2*.48) or roughly (42,44). SPSS computes the SE for you.
PRR389 SPSS & Statistics Page
Guidance to Statistical Tests - Hypothesis Tests
Testing hypotheses is a little more complicated, but give it a try. Here we want to test for a relationship
between two or more variables. Again, which procedure to use depends on the measurement scales of the
variables.
CROSSTABULATIONS – The CROSSTABS procedure – This is simply the bivariate version of
FREQUENCIES. Run this when you have two variables that are nominal scale or have small
number of categories/levels. (Nominal x Nominal)
SPSS produces the bivariate distribution in the sample and a Chi Square Statistic, which tests the null
hypothesis of no relationship between two variables. This is analagous to using a Pivot Table in
Excel. You need a minimum of 5 cases per cell in your table, so don’t run this with variables that
have too many categories (recode if neccessary to collapse categories) . The Pearson Chi Square
statistic in SPSS provides a test of whether or not the two variables are related.
Example of Crosstabs with HCMA data
Examine relationship between age and income – CROSSTABS AGE2 BY INCOME (note
AGE2 puts age into a small set of categories
Compare activity participation or attitudes of men and women – CROSSTAB of GENDER
with one of the activity or attitude variables.
To get the Chi Square test along with the table, select the Statistics button and check Chi square.
Look for Significance levels smaller than .05 to reject null hypothesis of no relationship at
the 95% confidence level. If SIG > .05 sample doesn’t provide enough eviodence to
conclude there is a relationship within the full population.
COMPARING SUBGROUP MEANS
Another common bivariate analysis is to compare means on an interval scale variable across two or more
population subgroups. In this case you want an interval scale dependent variable (the one you compute
means for) and a nominal scale independent variable (the one that forms the groups). (Nominal x
Interval)
SPSS has several different procedures for comparing means. It will suffice to use the MEANS
procedure. Put the interval scale variable in the dependent variable box and the variable for
forming subgroups in independent variable box. To get a hypothesis test, select the Options button
and check the “Anova table and eta” box at the bottom, then CONTINUE.
CORRELATIONS - Interval by Interval
Pearson Correlation: Run CORRELATION procedure to get the correlation coefficient between the two
variables AND a test of null hypothesis that the correlation in population is zero. Be sure you
understand distinction here between the measure of association between the two variables in the sample
(correlation coefficient) and the test of hypothesis that correlation is zero (making inference to the
population).
Regression : is multivariate extension of correlation. A linear relationship between a dependent
variable and several independent variables is estimated. t-statistics for each regression coefficient
test for a relationship between X and Y while controlling for the other independent variables.
Standardized regression coefficients (betas) indicate relative importance of each independent
variable. The R square statistic (use adjusted R square) measures amount of variation in Y
explained by the X’s.
5
PRR389 SPSS & Statistics Page
6
EXAMPLES OF T-TEST/ANOVA AND CHI SQUARE
The Independent Samples T-TEST Tests for differences in means (or percentages) across two subgroups. ANOVA is
simply the extension to more than two groups and uses the F statistic. Null hypothesis with two groups is that the mean
of Group 1 = mean of group 2. This test assumes interval scale measure of dependent variable (the one you compute
means for) and that the distribution in the population is normal. The generalization to more than two groups is called a
one way analysis of variance (ANOVA) and the null hypothesis is that all the subgroup means are identical. These are
parametric statistics since they assume interval scale and normality.
In SPSS use Compare means, several options as follows:
Means
One Sample T-Test
Indep. Samples T-Test
Paired samples T-Test
One Way ANOVA
Compare subgroup means, Options ANOVA for stat test
Test H0 : Mean of variable = some constant
Two groups, Test H0 : Mean for group 1 = Mean for group 2
Paired variables - applies in pre-test, post-test situation
Compare means for more than two groups
Chi square is a nonparametric statistic to test if there is a relationship in a contingency table, i.e. Is the row variable
related to the column variable? Is there any discernible pattern in the table? Can we predict the column variable Y if we
know the row variable X?
The Chi square statistic is calculated by comparing the observed table from the sample, with an "expected" table derived
under the null hypothesis of no relationship. If Fo denotes a cell in the observed table and Fe a corresponding cell in
expected table, then
2
Chi square (  ) =
2
 (Fo -Fe) /Fe
cells
The cells in the expected table are computed from the row (n r ) and column (nc ) totals for the sample as follows:
Fe =nr nc / n
.
CHI SQUARE TEST EXAMPLE: Suppose a sample (n=100) from student population yields the following observed
table of frequencies:
GENDER
Male
Female
Total
IM-USE
Yes
20
40
60
No
30
10
40
Total
50
50
100
EXPECTED TABLE UNDER NULL HYPOTHESIS (NO RELATIONSHIP)
GENDER
Male
Female
Total
IM-USE
Yes
30
30
60
No
20
20
40
Total
50
50
100
2
2
2
2
2
 = (20-30) /30 + (40-30) /30 + (30-20) /20 + (10-20) /20
100/30 + 100/30 + 100/20 +100/20 = 13.67
PRR389 SPSS & Statistics Page
7
Chi square tables report the probability of getting a Chi square value
this high for a particular random sample, given that there is no
relationship in the population. If doing the test by hand, you would
look up the probability in a table. There are different Chi square tables
depending on the number of cells in the table. Determine the number
of degrees of freedom for the table as (rows-1) X (columns -1). In this
case it is (2-1)*(2-1)=1. The probability of obtaining a Chi square of
13.67 given no relationship is less than .001. (The last entry in my table gives 10.83 as the chi square value
corresponding to a probability of .001, so 13.67 would have a smaller probability).
If using a computer package, it will normally report both the Chi square and the probability or significance
level corresponding to this value. In testing your null hypothesis, REJECT if the reported probability is less than .05 (or
whatever confidence level you have chosen). FAIL TO REJECT if the probability is greater than .05.
REVIEW OF STEPS IN HYPOTHESIS TESTING: For the above example :
(1) Nominal level variables, so we used Chi square.
(2) State null hypothesis. No relationship between gender and IM-USE
2
(3) Choose confidence level. 95%, so alpha = .05, critical region is  > 3.84
2
(4) Draw sample and calculate the statistic;  = 13.67
(5). 13.67 > 3.84, so inside critical region, REJECT null hypothesis. Alternatively, SIG= .001 on computer printout,
.001<.05 so REJECT null hypothesis. Note we could have rejected null hypothesis at .001 level here.
WHAT HAVE WE DONE? We have used probability theory to determine the likelihood of obtaining a contingency
table with a Chi square of 13.67 or greater given that there is no relationship between gender and IMUSE. If there is no
relationship (null hypothesis is true), obtaining a table that deviates as much as the observed table does from the
expected table would be very rare - a chance of less than one in 1000. We therefore assume we didn't happen to get this
rare sample, but instead our null hypothesis must be false. Thus we conclude there is a relationship between gender and
IMUSE.
The test doesn't tell us what the relationship is, but we can inspect the observed table to find out. Calculate row or
column percents and inspect these. For row percents divide each entry on a row by the row total.
Row percents:
GENDER
Male
Female
Total
IM-USE
Yes
.33
.67
1.00
No
.75
.25
1.00
Total
.50
.50
1.00
To find the "pattern" in table, compare row percents for each row with the "Totals" at bottom. Thus, half of sample are
men, whereas only a third of IMusers are male and three quarters of nonusers are male. Conclusion - men are less likely
to use IM.
-------------------------------------------------------------Column Percents: Divide entries in each column by column total.
GENDER
Male
Female
Total
IM-USE
Yes
.40
.80
.60
No
.60
.20
.40
Total
1.00
1.00
1.00
PATTERN: 40% of males use IM, compared to 80% of women. Conclude women more likely to use IM. Note in this
case the column percents provide a clearer description of the pattern than row percents.
PRR389 SPSS & Statistics Page
8
COMPUTING Confidence intervals around parameter estimates. When you use a sample statistic to
estimate a population parameter, you base your estimate on a single sample. Estimates will vary somewhat from one
sample to another. Reporting results as confidence intervals acknowledges this variation due to sampling error. When
probability samples are used we can estimate the size of this error. The standard error of the mean (SE Mean)is the
standard deviation of the sampling distribution - i.e. how much do means for different samples of a given size from the
same population vary? The SE Mean provides the basic measure of likely sampling error in a sample estimate.
A 95% confidence interval is two (1.96) standard errors (SE) either side of the sample mean.
SEMean = standard deviation in population/ square root of n (sample size)
SPSS computes standard deviations and/or standard errors for you. You should be able to compute a 95 % confidence
interval if you have the sample mean (say X) and
a) standard error of mean (SEMean) - (X- 2*SEMean, X + 2* SEMean)
b) standard deviation of variable in population (  ) and sample size (n) :
SEMean = /sqrt(n), 95% CI= (X- 2*/sqrt(n), X + 2*/sqrt(n))
Examples:
a) In sample of size 100, pct reporting previous visit to park is 40%. If SEMean is 5%, then 95% CI is (40% +
or - 2 * 5%) = (30%, 50%).
b) In sample of size 100, pct reporting previous visit to park is 40%. If standard deviation in population is 30%,
then
SEMean is /sqrt(n) = 30/sqrt(100) = 30/10 = 3. and
95% CI = (40 + or - 2 SEMean) = 40 + or - 2*3%) = 40 + or - 6% = (34%,46%)
c) If same mean and standard deviation as b) but using bigger sample of 900, note the
95%CI = (40 + or - 2 * 30%/sqrt(900)) = 40 + or - 2*(30/30) = 40 + or - 2%
= (38%, 42%)
OTHER STATISTICAL NOTES
a. Measures of strength of a relationship vs a statistical test of a hypothesis. There are a number of statistics that
measure how strong a relationship is, say between variable X and variable Y. These include parametric statistics like the
Pearson Correlation coefficient, rank order correlation measures for ordinal data (Spearman's rho and Kendall's tau),
and a host of non-parametric measures including Cramer's V, phi, Yule's Q, lambda, gamma, and others. DO NOT
confuse a measure of association with a test of a hypothesis. The Chi square statistic tests a particular hypothesis. It tells
you little about how strong the relationship is, only whether you can reject a hypothesis of no relationship based upon
the evidence in your sample. The problem is that the size of Chi square depends on strength of relationships as well as
sample size and number of cells. There are measures of association based on chi square that control for the number of
cells in table and sample size. Correlation coefficients from a sample tell how strong the relationship is in the sample,
not whether you can generalize this to the population. There is a test of whether a correlation coefficient is significantly
different from zero that evaluates generalizability from the sample correlation to the population correlation. This tests
the null hypothesis that the correlation in the population is zero.
b. Statistical significance versus practical significance. Hypothesis tests merely test how confidently we can
generalize from what was found in the sample to the population we have sampled from. It assumes random
sampling-thus, you cannot do statistical hypothesis tests from a non-probability sample or a census. The larger the
sample, the easier it is to generalize to the population. For very large sample sizes, virtually ALL hypothesized
relationships are statistically significant. For very small samples, only very strong relationships will be statistically
significant. What is practically significant is a quite different matter from what is statistically significant. Check to see
how large the differences really are to judge practical significance, i.e. does the difference make a difference?.
PRR389 SPSS & Statistics Page
9
SPSS FOR WINDOWS version 10.0 - LAB Nov 12-26.
Contents: SPSS procedures - 1-4; Practice Exercise - 5-6 ; Assigned Exercise is on page 6 at
bottom. Sample analysis - 7; HCMA study description - 8 Codebook 9-10.
SPSS stands for Statistical Package for the Social Sciences. Other popular statistical software includes SAS, SYSTAT
and MINITAB. SPSS is well suited to analysis of social science/survey data. Like all statistical packages, SPSS works
with a table of data with cases as rows and variables as columns (just like an Excel Table - in fact you can import Excel
tables directly to SPSS and vice versa). For survey data, each case is a respondent or questionnaire and each variable is
usually a numeric coding of the response to a single question on the survey instrument. Statistical packages prefer to
analyze data in numeric form so one codes variables like GENDER as something like 1=male, 2=female ( 1=male,
0=female is better). We will be analyzing data from the 1996 Huron Clinton Metropark visitor survey. The HCMA
survey dataset includes 4,031 cases and 136 variables (original). A few of the messier variables have been dropped for
this exercise and other variables have been computed.
You will need copies of the HCMA96.SAV file to complete this exercise. You may retrieve directly in SPSS in
micro-labs from the Course AFS space. You also should have reviewed the HCMA questionnaire and codebook
to become famliar with the data set – variables, coding etc.
1. Loading SPSS-PC . Run SPSS by selecting the SPSS program from the START
menu (In math/stat applications, SPSS,SPSS10.1.4, SPSS 10,1 for Windows).
When SPSS opens you will see options to run tutorial, enter data, or open an existing
file (the default). Run through tutorials for a preview. To retrieve HCMA data file,
close the opening dialogue box and retrieve file directly from SPSS menus. File,
Open, Data then browse to the HCMA96.SAV file in the PRR389 course AFS space.
File
Edit Transform
New
Open
Data
Syntax
Output
When the file is loaded, you will see the data in the data window in spreadsheet format. Variable names are at top
of columns. Cases run down rows. Each case/row represents one respondent/completed questionnaire. See HCMA
codebook and questionnaire to match variables with items on the questionnaire. To see codes as Values rather than
numbers, choose View on menus and check Value Labels (uncheck to toggle back to numbers). To see
information about any variable, choose "Variable view" tab at bottom. On menus, Utilities, Variables shows you
information for all variables. You are now ready to run statistical analysis.
2. To run Statistical Procedures choose the ANALYZE option
on menu and then the statistical procedure you wish to run. We
will work mostly with Descriptive Statistics and the Compare
Means procedure.
Descriptive Statistics
Transform
Analyze
Reports
Descriptive Statistics
Custom Tables
Compare means
General Linear
Correlate
Regression
LogLinear
Descriptive Statistics
Classsify
Frequencies
Data Reduction
Descriptives
Scale
Explore
Nonparametric tests
Crosstabs
Survival
Multiple Response
FREQUENCIES
frequencies for nominal & ordinal
variables
DESCRIPTIVES
means etc. for interval/ratio scale
variables
EXPLORE
Exploratory data analysis procedures to
see distributions
CROSSTABS
Tables for nominal or ordinal (few categories) variables, Chi square test
PRR389 SPSS & Statistics Page
COMPARE MEANS
10
Interval dependent variable, nominal or limited category independent variable
Means
Compare subgroup means, Options ANOVA for stat test
One Sample T-Test
Test H0 : Mean of variable = some constant
Indep. Samples T-Test
Two groups, Test H0 : Mean for group 1 = Mean for group 2
Paired samples T-Test
Paired variables - applies in pre-test, post-test situation
One Way ANOVA
Compare means for more than two groups
Compare Means
Means
One Sample t-Test
Ind. Samples TPaired Samples T
One Way ANOVA
3. General Steps for Running Procedures.
a. First choose a procedure from Analyze menu. Note the appropriate procedure depends on measurement levels
of your variables and nature of the intended analysis. See 5 below for details.
b. Choose variables : Select from list of variables at left, click arrow to move into Variable Box at right. Note that
you can choose several variables at a time - move one at a time by selecting and clicking arrow or by double
clicking on variable name. Hold CTRL key down while clicking to select several variables and move to
Variable Box as a group. To Unselect a variable, click on it in the Variable Box on right, arrow switches
direction, click it to move back.
c. Select Buttons at bottom for special Options, Statistics, etc. - complete dialog boxes, CONTINUE
d. Click OK to run the procedure
e. Results appear in the OUTPUT Window. SPSS automatically switches to output window when you run a
procedure. Scroll around in this window to view results. To return to Data window click HCMA96 button on
application bar at bottom or choose HCMA96 from Window menu item.
4. SPSS Windows and files. SPSS throws up lots of WINDOWS, often not maximized. Use the MAXIMIZE buttons
at top right of windows to expand display to full screen. Use WINDOW command on menu bar to choose between
the Output or Data Windows or choose them from Application bar at bottom. Three primary windows are
The Data Window - a spreadsheet showing raw data, variables across columns, cases down rows. Run most
procedures from here. SPSS data files have an *.SAV extension. SPSS 10.0 has added a "variable view"
page to the data window accessed via Excel-type tabs at bottom. The Variable view page has definitions
of variables and coding information.
Output window - when you run a procedure, results are shown in the Output window. This is like a
wordprocessor with outline at left to select particular results. You may print results from here or copy and
paste them to WORD or EXCEL. SPSS Output files have an *.SPO extension
Syntax window - optional. If you use Paste option, you can paste procedures to syntax window, where you
can easily rerun them or edit them. SPSS syntax files have an *.SPS extension.
SPSS data and output files are specially coded files you can only read in SPSS. There are utilities to save data
files as Excel or Access files, or to import data from those formats to SPSS. The syntax files are simple
text files that can be read by a wordprocessor.
5. Guidance on individual procedures - basic statistics
a. FREQUENCIES - run this on variables at nominal or ordinal scale with a small number of categories. Gives
frequency distribution for the variable and optional statistics.
b. DECRIPTIVES - run for interval scale variables to get mean, standard deviation, etc. , choose S.E. Mean in
Statistics Dialog BOX to compute confidence intervals.
c. CROSSTABS - for nominal/ordinal variables, choose a row and column variable (variable with fewer categories
for columns). In Statistics, select Chi square for a hypothesis test, in Cells choose Row Pct and Column Pct.
d. COMPARE MEANS - dependent variable must be interval scale (or dichotomous), independent variable forms
subgroups (should take on limited set of values - usually nominal or ordinal).
e. CORRELATE - for two or more interval scale variables user Pearson, Spearman/Kendall for ordinal measures.
PRR389 SPSS & Statistics Page
11
6. Variable Transformations - Sometimes you want to change coding of a variable or compute a new variable.
RECODING AND COMPUTING procedures are in the TRANSFORM menu. Use RECODE to change
coding of a variable (maybe to collapse into fewer groups or reassign missing codes) and COMPUTE to
compute new variables (e.g. simple sum of other variables).
a. RECODE changes coding of a variable. First choose whether you want to put new codes in same variable or a
different (new) one . The latter preserves old codes and sets up new variable with new codes. To preserve the
original coding on the file, choose recode "into new variable". Then you must add name for new variable and
press the CHANGE button. In either case, specify coding changes as follows. Select variable you want to change
codes for and choose the "old and new values" option. Then complete the Dialog Box to indicate how codes
should be changed. Press ADD button to add each coding change to the recode box. Repeat procedure for as
many codes as you wish to change. Then press OK to execute the changes.
For example to change code 4 on the FIRST variable to group "within the past 5 years" (3) and "more than 5
years ago" (4) together, select recode into same variable, choose FIRST variable, choose OLD AND NEW
VALUES button enter a 4 in box for old value and a 3 for new value at right. Then click the ADD button and a
line 4  3 will appear in box. Click CONTINUE, then Click OK to perform the recoding. If you look in DATA
Window under FIRST column all the 4’s should now be 3’s. When you run a FREQ on FIRST, 3’s and 4’s will
be grouped and show up as 3’s. Careful as any value labeling won't be automatically corrected.
b. COMPUTE: To compute new variables from old. Choose transform, Compute. Enter a name for new variable in
the Target Variable Box( 8 characters or less). Then enter a mathematical expression in the larger box after the =
sign indicating how new variable is computed. Press OK to execute the procedure. Your new variable is added as
a column at the end of file in DATA window. You may now use this variable in any procedure (refer to it by the
name you assigned).
e.g. to compute a variable equal to length of time each party stayed in the park. Enter HOURS as a name in
Target Variable Box. In numeric expression box enter LEAVE - ARRIVE. Press OK. Be careful to spell
variable names correctly. You can paste variables into box by double clicking on them in the list of variables at
left and then adding (or pasting from calculator pad) math expressions in between. You can edit inside box to
correct mistakes. SPSS will add the new variable to the file - check it at far right in data window. You can now
use the new HOURS variable like any other in a statistical procedure. It won't be kept when you exit SPSS unless
you save file (probably no need to save file, but if you do, you'll have to put it in your own AFS space). Beware of
missing values when computing new variables. Result will be missing if any variables in formula are missing.
Good practice when recoding or transforming is to always check the result before proceeding with further
analysis. Check via frequencies on new and old variables or by manually checking a few cases in data window.
7. Other Procedures and Tips
a. OPTIONS. SPSS may be set up to show variables in either alphabetic or file order in pick lists. To get “File”
order of variables, choose EDIT, OPTIONS in main SPSS menu and change Variable order from Alpha to File
order (push radio buttons on General Tab at right). You must do this BEFORE retrieving the file. Choose File,
New Data and then re-retrieve the file if you already loaded it for this change to take effect. This doesn’t change
the order of variables on data window, only in variable pick lists.
b. CUSTOM TABLES: The Custom Tables procedures let you run descriptive statistics on groups of variables
and assembles the results in tables, giving you some control over formatting and labeling. It produces what are
sometimes called "banner tables" summarizing a number of variables in a single table. Use "Basic Tables" for
descriptive statistics, "General Tables" for crosstabulations, and "Tables of Frequencies" for frequency
distributions. You may check out this procedure after you have mastered those in SUMMARIZE section, if you
wish.
c. PRINTING and SAVING. You may print results as you generate them from the output WINDOW, copy ones
you want into a wordprocessor. To save output, when you exit SPSS (By File Exit command), answer YES to the
question about saving your output. Enter a path and filename, e.g. A:SPSS.SPO to put it on your floppy or enter
path to your AFS space. You don’t need to save the data (respond NO to this question when exiting). The
PRR389 SPSS & Statistics Page
12
SPSS.SPO file can only be read by SPSS. You can also copy and paste SPSS output to WORD or EXCEL by
opening both SPSS and these applications. The Output window is a simple text editor - you can add your own
notations and delete items you don't want. Outline at left is handy for finding a procedure you ran or deleting it.
d. Selecting and Sorting Cases: The Data menu has procedures to SORT the data file on a particular variable or to
SELECT subsets of cases to use in an analysis. For example, to Select only cases from Kensington Metropark,
choose Data, Select Cases and then push the IF tab and enter filter PARK=1 (Kensington is park 1 in coding
scheme). Any subsequent analysis will only use the Kensington cases and you will see a "filter on" message in
status bar and cases not from Kensington are "slashed out" in data window. REMEMBER To turn filter off
when you want to return to all cases -, come back to DATA, Select CASES and choose the "all cases" radio
button.
e. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from
sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to
the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to
population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the
VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the
sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for
disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion
weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a
sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but
then normalize weights back to the actual sample size, so statistical tests are based on the true sample size. You
can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or
turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable.
f. OTHER PROCEDURES. Feel free to explore other parts of the SPSS program. You can generate GRAPHS in
the graphs menu (be careful to save before trying graphs - labs are crash prone on graphing), and view various
information about the file in UTILITIES menu. See the HELP menus for tutorial and further information about
using SPSS for WINDOWS. If you’d like more instruction in SPSS, the Computer Lab runs shortcourses. Also
check out the SPSS site on the web at www.spss.com and tutorials.
g. SYNTAX WINDOW. SPSS also lets you paste commands into a syntax window (look for the PASTE buttons on
most procedures). If you prefer you can type, edit and run procedures from the syntax window if you know the
syntax. This is sometimes faster than navigating thru the menus, but requires some familiarity with SPSS syntax.
If you paste commands to Syntax window, you can save the syntax file and easily rerun procedures later. This
simplifies rerunning a complicated set of procedures. The HCMA.SPS syntax file (in labs99 subdirectory) will
run the procedures in the practice exercise that follows. To retrieve, in SPSS menu use File Open, Syntax filer
and point to hcma.sps file on U drive in labs99 folder.
h. Using Excel for data Entry. If you want to enter survey data in Excel and then import result to SPSS, use these
guidelines. On first row of spreadsheet enter a short (8 characters or less) variable name - avoid spaces and
special characters as SPSS may not like them. You can name variables VAR1, VAR2, … but this makes them
harder to identify. Enter each questionnaire as a separate case below the names, one case to a row - no blank
rows. Save Excel file when complete and close it. To retrieve this file into SPSS, Use File, Open, Data and
change default extension to xls files and pick your Excel file. Enter range on spreadsheet where data is located
and click button for variable names in first row if you have done that. Should read data into SPSS. Be careful
about blanks in Excel as these will come in as missing.
i. MISSING VALUES and N's. SPSS allows certain values to be designated as "missing" for each variable and has
a general "system missing value" designated by a "." . It is a good idea to pay attention to the number of cases for
any procedure you run and understand when lots of cases are missing or when you have filtered out cases with a
SELECT CASES procedure. Watch the N"s. If you just look at percentages and means, these may be based on
only a few cases. Remember that confidence levels of results will depend on the sample size. Also beware of
statistical tests from WEIGHTed analyses, that may distort the actual sample size.
PRR389 SPSS & Statistics Page
13
SPSS PRACTICE EXERCISE
0. So we all have the same options, let's list variables in alphabetic order rather than file order. Choose Edit, Options
from SPSS menu, then General tab. In variable lists at right select "Alphabetical". Then OK. Also choose the
"Viewer" tab and make sure "Display commands in log" at bottom is checked. Sometimes it is easier to see
variables in file order – Set this option in General Tab – choose "Display names" instead of Variable labels.
You will need to re-retrieve the file for these changes to take effect. Following is easiest way.
File, New, Data - will open a blank file
File, choose HCMA96.SAV from recently used files at bottom OR File Open and point to it again.
1. FREQUENCIES for nominal, ordinal variable. Describe the characteristics of park visitors - income and age. From
codebook note these variables are measured in categories- i.e. ordinal scale with small number of categories. Run
FREQUENCIES. In menu choose Analyze, Descriptive Statistics, Frequencies. Find INCOME and AGE2 variables on
list at left (near end in file order, or in alphabetic order). Select them with mouse and click arrow to move to the variable
box at right (or double click on the variable). Click OK to run frequencies.
2. DESCRIPTIVES for interval scale variable. How many female visitors were there on average in each? Find
variable on codebook = TOTFEMAL, note it is interval scale. In menu Choose Analyze, Descriptive Statistics,
Descriptives. Complete Dialog box as above by selecting TOTFEMAL (near end in file order) and moving it to the
input box.
3. CROSSTABS with two nominal or ordinal variables. Crosstabs generates a table using two variables, one for rows
and one for columns. Question- What is distribution of the sample by age and income? From menu, choose
Analyze, Descriptive Statistics, Crosstabs. Complete Dialog Box by choosing INCOME for the row variable and
AGE2 for the column variable. Also click the
STATISTICS button at bottom and ask for all of them -- CONTINUE
CELLS button at bottom and ask for observed count, row percents, and column percents -then CONTINUE and OK to run procedure.
4. COMPARE MEANS. Use this procedure to compare averages of two or more subgroups. Dependent variable is
interval scale variable (means are computed for this variable). Independent variable should be nominal or have small
number of values/groups - it forms groups. Let's see if visitors using an annual motor vehicle permit visit parks more
often than those entering on a daily permit. Find variables - independent = MVP95 identifies those with a MVP
(grouping variable). Dependent= HCMATOT measures days of use of Metroparks last year.
5. Simple Hypothesis testing - give these a try
a. Confidence interval for an average. In the DESCRIPTIVES procedure (Analyze, Descriptive Statistics,
Descriptives) let's compute the average days that people used Metroparks last year - HCMATOT just as in 2 above, but
also click the OPTIONS button at bottom of dialog box and ask for SE mean (Standard error of mean ) Click
CONTINUE, then OK to run. To get 95% confidence interval you add and subtract two standard errors from the sample
mean.
b. Differences in means. Are those who use annual permits older than those who use dailies? Two variables
are MVP which indicates whether people used a daily or annual permit to enter the park and AGE which gives age of
the respondents (interval scale). We want to compute means for AGE for each type of entry permit. In menu, choose
Analyze, Compare Means, Means as in 4 above. Complete Dialog Box by choosing dependent variable - the interval
scale one = AGE, then independent or subgroup variable - nominal or ordinal scale with small number of categories =
MVP.
For statistical test of hypotheses that all the subgroup means are equal, also choose OPTIONS and ask for the "ANOVA
Table and eta" at bottom. Also ask for SE Mean by moving this from list of statistics to "Cell Statistics" at right. Then
click OK to run the procedure.
PRR389 SPSS & Statistics Page
14
Alternatively you could perform the independent samples T-test on two groups defined by MVP (say group
one has MVP =0, group 2 =1).
c. Chi square - tests for relationships in a crosstab table between two nominal/ordinal variables. Are higher
income visitors more likely to use annual permits? Note MVP95 and INCOME are measured in a small number of
categories (nominal or ordinal). Run the Crosstab procedure (Analyze, Descriptive Statistics, Crosstabs - see 3 above)
with MVP95 and INCOME. In Statistics, choose Chi square, In Cells choose row and column percents.
d. Correlations - For two interval scale variables. Is age (AGE) correlated with total days of use
(HCMATOT)? Procedure is Analyze, Correlate, Bivariate. Choose AGE and HCMATOT.
6. General rules for interpreting hypothesis tests.
1. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables
(e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column
variable in a crosstab table).
2. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5% chance
of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and look for
SIG's <.01. Weaker, use 90% , SIG's < .10.
3. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc are
the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you could
get results like those you see from a random sample of this size IF there were no relationship between the two
variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no
relationship (the null hypothesis).
For 95% level, you REJECT null hypothesis if SIG <.05
If SIG > .05 you FAIL TO REJECT
REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship.
4. Hypothesis tests are evaluating whether you can generalize from information in the sample to draw conclusions
about relationships in the population. With very small samples most null hypotheses cannot be rejected while with
very large samples almost any hypothesized relationship will be "statistically significant" - even when not
practically significant. Be cognizant of sample size (N) when making tests.
7. RECODING AND COMPUTING NEW VARIABLES - TRANSFORM . Sometimes you want to create new
variables or change the coding of an existing variable (e.g. to collapse categories)
COMPUTING NEW VARIABLES. What is the average number of visitors in each party? You will need to
COMPUTE a new variable equal to the sum of total female and male visitors and then run DESCRIPTIVES on this
new variable. Transform, Compute. Enter name of new variable - PARTY then formula = TOTMALE +
TOTFEMALE in box (paste in names to be sure of spelling). OK. This variable has already been created and saved
on file.
RECODING a variable. Suppose we want to collapse income into two categories, say above or below 50,000. Choose
TRANSFORM on menu, then RECODE. Complete dialog box to recode income into a NEW variable (See Recode
command on previous page) - call it INCOM2. Then run FREQUENCY on INCOM2
8. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from
sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the
population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of
about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when
describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual
distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and
different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical
tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use
VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual
PRR389 SPSS & Statistics Page
15
sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a
weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data,
Weight Cases on menu and choose the desired weighting variable. To turn weigths off, return here and check
“Do not weight cases”.
SAMPLE ANALYSIS: All of this will easily fit on one page. (attach the relevant SPSS output).
I hypothesized that the respondents who list price of admission as an important factor in choosing a park will have
larger family sizes. The appropriate survey variables are Q7PRICE measured as a 5 item Likert scale (ordinal) and
TOTFAM. TOTFAM is computed as a sum of the number of children 18 and under (HOUSEKID) and adults
(HOUSEADT) that live in household.
First I computed TOTFAM=HOUSEKID + HOUSEADT. Then I ran Descriptives on these three variables to get means.
Put these into a Table and show results. I've omitted the numbers. You should briefly describe and interpret results
in a paragraph and display details in short table or figure (format tables & figures properly).
Table 1. Average family size for visitors
Category
Number of People
Children
Adults
Total
Report a 95% confidence interval for TOTFAM by getting DESCRIPTIVES and asking for SE Mean.
Run frequencies on Q7PRICE variable - report as percentages in a simple table. (Use the Valid Pct Column, do not
report everything SPSS prints out e.g - omit cumulative pcts unless they are meaningful to you)
Table 2. Rating of importance of admission price
Importance
extremely important
very important
important
somewhat important
not important
Pct
Based on Table 2, I split the sample into two groups: Q7PRICE = 1, 2 or 3 (extremely, very, and important) formed
group one, and 4 or 5 group two. An independent samples T-Test was run to test for a difference in the average
family sizes (TOTFAM) across the two subgroups (when asked to define groups, I used 4 as the Cut Point. All
codes less than the cut point form one group, and all codes greater than or equal to the cut point form the other
group). Show results of this in short table. Note those rating price as more important (Group 1) have somewhat
larger family sizes (2.6 people compare to 2.3). The difference is statistically significant at the 95% confidence
level.
Table 3. Test of Difference in Family Sizes by Importance of Admission Price
Importance Subgroup
Group 1: Extremely, very, or important
Group 2 : Somewhat or not important
Test of difference in Means
Average Family Size
2.6
2.3
T-statistic SIG
3.77
.000
This example illustrates how to explain the analysis including variables you selected and any changes you made in them
(recodes) and also the results. If you choose nominal or limited category variables you will use crosstabs and Chi
Square test. Be sure to first describe variables and then perform the statistical test.
PRR389 SPSS & Statistics Page
CODEBOOK: HURON-CLINTON METROPARKS USER SURVEY
Part 1. Variables that APPEAR on the questionnaire.
QUESTION
VARIABLE NAMES
Q1-Time
ARRIVE, LEAVE
Q2N/A
Q3-Permit
MVP
Q4-Permit 95
MVP95
Q5-Activity
NATURE to PLAYGRND
Q6-primary
activity
Q7-Importance
(characteristics))
Q8-Importance
(reasons)
Q9-Facilities
Q10-Programs
Q11 Aware free
admission
Q12-Familiarity
(column a)
Q12-Familiarity
(column b)
Q13- First visit
PRIMACT
Q7BEAUTY to Q7CROWD
(see questionnaire for details)
Q8FAMILY to Q8NATURE
(see questionnaire for details)
Q9WATER to Q9NONEWF
(see questionnaire for details)
Q10NATURE to Q10OTHER
(see questionnaire for details)
FREEDAY1, FREEDAY2
META to LAKA (see
questionnaire for details)
METB to LAKB (see
questionnaire for details)
FIRST
Q14- Get info
Q15- Performance
INFOTV to INFOOTHR
Q5BEAUTY to Q5OVERAL
(see questionnaire for details)
Q16- Comments
Q17Zipcode
Q18- Age
Q19- Gender
Q20- Employment
Q21- Employed
Q22- Marital
status
Q23- Family
members
Q24- Education
Q25- Income
Q26- Race
Q27- Final
N/A
ZIPCODE
AGE
GENDER
EMPLOY
EMPLFULL, EMPLPART
MARITAL
HOUSEKID, HOUSEADT
EDUCATE
INCOME
ETHNIC
N/A
Refer to the questionnaire for more details.
CODING
COMMENT
coded in military time
dropped from file
0=daily, 1=annual permit
0=No, 1=Yes
0=not participate,
see footnote for variable
1=participate
names & code numbers
activity number from Q5
see footnote below for
activity codes
1=extremely important to
5=not important
1=extremely important to
5=not important
0=not chosen, 1=like
developed
0=not chosen, 1=like
developed
0=no, 1=yes
if yes for FREEDAY1
continue for FREEDAY2
0=blank, 1= familiar
number of times visited in
the past 12 months
1= today, 2= within the past
year, 3= within the past 5
years, 4= more than 5 years
ago
0=blank, 1= get info from it
1 = excellent , 2 = very
good, 3 = good, 4 = fair,
5 = poor, -8 = don’t know
N/A
5 digit zipcode
code age
1= female, 2= Male
(see questionnaire for detail)
number of people employed
(see questionnaire for detail)
interval
numbers of children (adults)
at home
(see questionnaire for detail)
(see questionnaire for detail)
(see questionnaire for detail)
N/A
interval
interval
interval
interval
16
PRR389 SPSS & Statistics Page
Part 2. Created Variables. These variables were created by using recodes and computes.
VARIABLE NAMES
CODING
COMMENT
PARK
1 = Metro Beach, 2 = Wolcott Mill,
Park where survey was
3 = Stony Creek, 4 = Indian Springs,
distributed- determined from
5 = Kensington, 6 = Huron Meadows,
survey ID number
7 = Hudson Mills, 8 = Dexter-Huron,
9 = Delhi, 10 = Lower Huron, 11 = Willow,
12 = Oakwoods, 13 = Lake Erie
COUNTY
1 = OAKLAND, 2 = LIVINGSTON,
county of residence determined
3 = WAYNE, 4 = WASHTENAW,
from zipcode
5 = MACOMB, 6 = MONROE, 7 = other
AGE2
1= <17, 2= 18-35, 3= 36-59, 4= >59
Grouping of age into categories
TOTFEMAL
total number of female in party
Sum from question #2
TOTMALE
total number of male in party
Sum from question #2
DAY
1= Monday, 2= Tuesday etc.
day of the week of visit
WEEKEND
0=weekday, 1=weekend
from date distributed
HCMATOT
Sum of total days visited parks for 1995
Sum from question #12
VSTWT2
weight to use to adjust sample to population
of visits.
VSTORWT2
weight to adjust sample to population of
visitors
FLC
0 = S/NC/18-35, 1 = S/C/18-35,
family life cycles computed from
2 = M/NC/18-35, 3 = M/C/18-35,
age, marital status & children in
4 = S/NC/36-55, 5 = S/C/>36,
household
6 = M/NC/36-55, 7 = M/C/36-55,
8 = S/NC/>55, 9 = M/NC/>55,
10= M/C/>55
SEASON
1= winter, 2= spring, 3= summer, 4= fall
from date distributed
Code number, variable names, and activities for question 5 and 6.
1 NATURE
2 SCENIC
3 PICNIC
4 BIKE
5 WALK
6 WALKPET
7 RUN
8 ROLLER
9 VISITNC
10 VISITF
11 VISITGM
12 SUNBATHE
13 BOATNM
14 BOATM
15 FISHB
16 FISHS
17 WATERSL
18 SWIMLAKE
19 SWIMPOOL
20 EVENT
21 OTHERACT
22 GOLF
23 PLAYGAME
NATURE OBSERVATION OR
PHOTOGRAPHY
SCENIC DRIVE
PICNIC
BICYCLE
WALK OR HIKE
WALK PET(S)
RUN OR JOG
ROLLERSKATE OR IN-LINE SKATE OR
SKI
VISIT NATURE CENTER
VISIT FARM
VISIT GRIST MILL
SUNBATHE
BOAT - NON-MOTOR
BOAT - MOTOR
FISH FROM BOAT
FISH FROM SHORE
WATERSLIDE
SWIM OR WADE IN LAKE
SWIM OR WADE IN POOL (
INCLUDING WAVEPOOL)
ATTEND A SPECIAL EVENT IN THE
PARK
PARTICIPATE IN AN OTHER ACTIVITY
GOLF
PLAY OTHER GAMES OR SPORTS
(NOT GOLF)
24 WATCH
25 PLAYGRND
WATCH GAMES OR SPORTS
USE PLAYGROUND EQUIPMENT OR
TOT LOT
(the following codes are for question 6 only)
26 ICE FISH
27 CROSS COUNTRY SKI
28 SLED OR TOBOGGAN
29 ICE SKATE
30 FISHING (UNDETERMINED)
31 NONE
17
PRR 475 Exercise 11
Page 2