Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PRR389 SPSS & Statistics Page Survey data analysis with SPSS 10.0 We will practice data analysis with the dataset from the 1996 Huron-Clinton Metro-parks visitor survey. To prepare for lab: (November 11) 1. Read the brief description of HCMA survey methods (page 2). 2. Study questionnaire (handed out in class) and the codebook (end of this handout) to become familiar with questions asked and how variables are coded in the computer data file. 3. Review basic statistical procedures (page 3-8, particularly 4 and 5) 4. Go to lab and walk through portions of the SPSS Tutorial, start at beginning 5. Skim SPSS Procedure summary (pages 9-15) 6. In lab we will walk you through the practice exercises (Page 13) TIP: This exercise requires you have some familiarity with the HCMA survey, some knowledge of basic statistics, and some familiarity with SPSS procedures. Thinking about management, planning or policy questions that suggest particular analysis of this dataset is also helpful. DON’T COME TO MICRO-LAB “COLD” without having looked at questionnaire, attended class on Thursday, or reviewed any of these materials. We will return to micro-lab on NOV 26 to provide individual help. You should first try to complete the exercise yourself. EXERCISE: DUE – December 3 Formulate a couple of simple research questions/ mini-analysis for the HCMA survey, run appropriate procedures in SPSS and report the results. Refer to the HCMA questionnaire and codebook to identify measurement scales of variables and how to interpret each variable. a. First describe two or more variables - run suitable descriptive statistics. b. Then test at least one hypothesis about a relationship between two variables and/or estimate a confidence interval around a population parameter estimate. Either CROSSTAB procedure with a Chi square statistic Or MEANS procedure to compare means for two or more groups Write up the analysis on at most one or two pages organizing the results for presentation. DO NOT simply DUMP out the raw SPSS output. Create your own tables, format them nicely, and explain the results, reporting only what is important. Attach the SPSS printouts of the procedures you ran as an appendix. 1 PRR389 SPSS & Statistics Page 2 1996 Huron Clinton Metroparks User Survey BACKGROUND. The Huron Clinton Metropolitan Parks Authority (HCMA) manages a system of 13 parks in southeast Michigan. As part of HCMA’s continuing effort to meet the needs of people of Southeast Michigan, a user survey was conducted during 1995-96. The results of the survey will be used to update HCMA’s 5 year plan. OBJECTIVES 1. Describe characteristics and patterns of use of HCMA park users 2. Identify trends in user characteristics and patterns via comparisons with previous surveys. 3. Identify and profile managerially relevant market segments 4. Evaluate visitor satisfaction with HCMA parks and measure visitor preferences for new facilities and programs. METHODS: A self-administered survey of HCMA visitors was conducted between Dec 1, 1995 and November 30, 1996. Four page questionnaires were distributed to a sample of visitors in vehicles entering one of the 13 HCMA units during this period. The sample was stratified by park, season, weekend-weekday and time of arrival at the park. Sampling was disproportionate across these strata to assure an adequate size sample for each park and season. Weights adjust the sample to the actual distribution of visits in 1995-96. Each park distributed questionnaires on 10-12 dates during each season. Dates were uniformly distributed throughout each season and divided evenly between weekends and weekdays. Gate attendants distributed surveys to each vehicle entering the park during the first 5 minutes of each hour on sampling dates. During busy periods surveys were given to every other vehicle and during slow periods sampling was conducted for the first 10 minutes of each hour. Visitors could return surveys at drop boxes located at each park exit or by return mail. The four page questionnaire was developed from the 1990 HCMA survey instrument. Questions cover party characteristics, use of daily vs annual permits, activities in the park, importance & satisfaction with park attributes (for an I-P analysis), knowledge and use of HCMA parks, preferences for new programs and facilities, and a set of household characteristics. You will be analyzing data covering winter, spring, summer, and fall seasons. A total of 4,031 surveys were completed over this period (overall response rate of 42%). Surveys by park range from 815 at Kensington to just over 80 at some of the more lightly used parks. SUGGESTED ANALYSIS 1. Use descriptive statistics to profile parks users - some sample results at website (hcma study). 2. Compare two or more subgroups (maybe market segments defined by age, income, use of annual or daily permit, etc. Develop segments by classifying visitors into useful subgroups and then describing important differences between the subgroups. Example - see activity segment table at hcma results website (link is in (1)). 3. Test for a relationship between two variables using CROSSTABS (Chi square) or COMPARE MEANS (T/F-Test) We will be using SPSS-PC to analyze this survey. The data file HCMA96.SAV is a specially coded SPSS data file that can be retrieved from within SPSS. It is available in course AFS space. PRR389 SPSS & Statistics Page 3 STATISTICS - SUMMARY 1. Functions of statistics a. description: summarize a set of data b. inference: make generalizations from sample to population. parameter estimates, hypothesis tests. 2. Types of statistics i. Descriptive statistics: describe a set of data a. frequency distribution - SPSS Frequency b. central tendency: mean, median (order statistics), mode. SPSS - Descriptives c. dispersion: range, variance & standard deviation in Descriptives d. Others: shape -skewness, kutosis. e. EDA procedures (exploratory data analysis) . SPSS Explore Stem & leaf display: ordered array, freq distrib. & histogram all in one. Box and Whisker plot: Five number summary-min.,Q1, median, Q3, and max. Resistant statistics: trimmed and winsorized means,midhinge, interquartile deviation. ii. Inferential statistics: make inferences from samples to populations. a. Parameter estimation – compute confidence intervals around population parameters b. Hypothesis testing - test relationships between variables iii. Parameteric vs non-parametric statistics a. parametric : assume interval scale measurements and normally distributed variables. b. nonparametric (distribution free statistics) : generally weaker assumptions: ordinal or nominal measurements, don't specify the exact form of distribution. 3. General rules for interpreting hypothesis tests. i. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables (e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column variable in a crosstab table). a. Pearson Correlation rxy =0. b. T-Test x =y c. One Way ANOVA M1=M2=M3=...=Mn d. Chi square : No relationship between X and Y. Formally, this is captured by the "expected table", which assumes cells in the X-Y table can be generated completely from row and column totals. ii.. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5% chance of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and look for SIG's <.01. Weaker, use 90% , SIG's < .10. iii.. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc are the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you could get results like those you see from a random sample of this size IF there were no relationship between the two variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no relationship (the null hypothesis). For 95% level, you REJECT null hypothesis if SIG <.05 If SIG > .05 you FAIL TO REJECT REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship. iv. Hypothesis tests are assessing if one can generalize from information in the sample to draw conclusions about relationships in the population. With very small samples most null hypotheses cannot be rejected while with very large samples almost any hypothesized relationship will be "statistically significant" - even when not practically significant. Be cognizant of sample size (N) when making tests. Type I error: rejecting null hypothesis when it is true. Prob of Type I error is 1-confidence level. Type II error: failing to reject null hypothesis when it is false. Power of a test = 1-prob of a type II error. PRR389 SPSS & Statistics Page 4 DESCRIPTIVE STATISTICS As the name implies, these are used to describe characteristics of the sample or the population it is intended to represent. Begin by describing variables one at a time (univariate statistics). There are two basic procedures for this: FREQUENCIES If variable is nominal, or ordinal with a small number of categories/levels, use SPSS FREQUENCIES procedure. This will produce a table giving the number and percentage of cases that gave each of the possible responses. Here is a sample SPSS output table from FREQUENCY of INCOME variable. Check questionnaire to see that income was measured in 4 categories with a “choose not to answer” response. The codebook or Variable View page on SPSS file will indicate variable was coded 1-4 for four income groups, and 5 for the “Choose not to answer” response. TOTAL HOUSEHOLD INCOME BEFORE TAXES Frequency Percent Valid UNDER $25,000 $25,000 TO $49,999 $50,000 TO $74,999 $75,000 OR MORE Total Missing CHOOSE NOT TO ANSWER System Total Total Valid Cumulative Percent Percent 14.0 14.0 33.5 47.5 112 267 10.4 25.0 227 21.2 28.4 75.9 193 18.0 24.1 100.0 798 182 74.5 17.0 100.0 90 273 1071 8.4 25.5 100.0 The five possible responses are the rows. Notice response categories (values) are labeled (“Under $25” etc). Frequency = number of cases selecting this response Percent = percentage this is of al cases Valid Percent = percentage of “non-missing” cases. Here the No answer response is missing as is “system mising”, cases that left this question blank. Cumulative Percent = running total (not always useful or relevant) Generally, you want to report the Valid Percent as your best estimate of the percentages of all visitors (in the population) in each income group. Raw counts are largely a function of sample size and not that useful. DESCRIPTIVES For interval or ratio scale variables, you usually want to compute means and standard deviations rather than frequencies. Here’s table from running DESCRIPTIVES procedure with the age variable. Age was measured as interval scale. AGE OF SUBJECT Valid N (listwise) N Minimum Maximum 925 16 86 925 Mean Std. Error 43.29 .48 Std. Deviation 14.52 In this case the average age was 43, lowest age in the sample was 16 and highest was 86. The average is based on 925 cases that answered this question. The standard deviation indicates the “spread” of ages in the sample. You may compute a 95% confidence interval for the estimate of average age by computing the standard error = standard deviation/ sqrt(n). In this example SE = 14.52/sqrt(925) = .48. A 95 % confidence interval is two standard errors either side of the mean = (43 + or – 2*.48) or roughly (42,44). SPSS computes the SE for you. PRR389 SPSS & Statistics Page Guidance to Statistical Tests - Hypothesis Tests Testing hypotheses is a little more complicated, but give it a try. Here we want to test for a relationship between two or more variables. Again, which procedure to use depends on the measurement scales of the variables. CROSSTABULATIONS – The CROSSTABS procedure – This is simply the bivariate version of FREQUENCIES. Run this when you have two variables that are nominal scale or have small number of categories/levels. (Nominal x Nominal) SPSS produces the bivariate distribution in the sample and a Chi Square Statistic, which tests the null hypothesis of no relationship between two variables. This is analagous to using a Pivot Table in Excel. You need a minimum of 5 cases per cell in your table, so don’t run this with variables that have too many categories (recode if neccessary to collapse categories) . The Pearson Chi Square statistic in SPSS provides a test of whether or not the two variables are related. Example of Crosstabs with HCMA data Examine relationship between age and income – CROSSTABS AGE2 BY INCOME (note AGE2 puts age into a small set of categories Compare activity participation or attitudes of men and women – CROSSTAB of GENDER with one of the activity or attitude variables. To get the Chi Square test along with the table, select the Statistics button and check Chi square. Look for Significance levels smaller than .05 to reject null hypothesis of no relationship at the 95% confidence level. If SIG > .05 sample doesn’t provide enough eviodence to conclude there is a relationship within the full population. COMPARING SUBGROUP MEANS Another common bivariate analysis is to compare means on an interval scale variable across two or more population subgroups. In this case you want an interval scale dependent variable (the one you compute means for) and a nominal scale independent variable (the one that forms the groups). (Nominal x Interval) SPSS has several different procedures for comparing means. It will suffice to use the MEANS procedure. Put the interval scale variable in the dependent variable box and the variable for forming subgroups in independent variable box. To get a hypothesis test, select the Options button and check the “Anova table and eta” box at the bottom, then CONTINUE. CORRELATIONS - Interval by Interval Pearson Correlation: Run CORRELATION procedure to get the correlation coefficient between the two variables AND a test of null hypothesis that the correlation in population is zero. Be sure you understand distinction here between the measure of association between the two variables in the sample (correlation coefficient) and the test of hypothesis that correlation is zero (making inference to the population). Regression : is multivariate extension of correlation. A linear relationship between a dependent variable and several independent variables is estimated. t-statistics for each regression coefficient test for a relationship between X and Y while controlling for the other independent variables. Standardized regression coefficients (betas) indicate relative importance of each independent variable. The R square statistic (use adjusted R square) measures amount of variation in Y explained by the X’s. 5 PRR389 SPSS & Statistics Page 6 EXAMPLES OF T-TEST/ANOVA AND CHI SQUARE The Independent Samples T-TEST Tests for differences in means (or percentages) across two subgroups. ANOVA is simply the extension to more than two groups and uses the F statistic. Null hypothesis with two groups is that the mean of Group 1 = mean of group 2. This test assumes interval scale measure of dependent variable (the one you compute means for) and that the distribution in the population is normal. The generalization to more than two groups is called a one way analysis of variance (ANOVA) and the null hypothesis is that all the subgroup means are identical. These are parametric statistics since they assume interval scale and normality. In SPSS use Compare means, several options as follows: Means One Sample T-Test Indep. Samples T-Test Paired samples T-Test One Way ANOVA Compare subgroup means, Options ANOVA for stat test Test H0 : Mean of variable = some constant Two groups, Test H0 : Mean for group 1 = Mean for group 2 Paired variables - applies in pre-test, post-test situation Compare means for more than two groups Chi square is a nonparametric statistic to test if there is a relationship in a contingency table, i.e. Is the row variable related to the column variable? Is there any discernible pattern in the table? Can we predict the column variable Y if we know the row variable X? The Chi square statistic is calculated by comparing the observed table from the sample, with an "expected" table derived under the null hypothesis of no relationship. If Fo denotes a cell in the observed table and Fe a corresponding cell in expected table, then 2 Chi square ( ) = 2 (Fo -Fe) /Fe cells The cells in the expected table are computed from the row (n r ) and column (nc ) totals for the sample as follows: Fe =nr nc / n . CHI SQUARE TEST EXAMPLE: Suppose a sample (n=100) from student population yields the following observed table of frequencies: GENDER Male Female Total IM-USE Yes 20 40 60 No 30 10 40 Total 50 50 100 EXPECTED TABLE UNDER NULL HYPOTHESIS (NO RELATIONSHIP) GENDER Male Female Total IM-USE Yes 30 30 60 No 20 20 40 Total 50 50 100 2 2 2 2 2 = (20-30) /30 + (40-30) /30 + (30-20) /20 + (10-20) /20 100/30 + 100/30 + 100/20 +100/20 = 13.67 PRR389 SPSS & Statistics Page 7 Chi square tables report the probability of getting a Chi square value this high for a particular random sample, given that there is no relationship in the population. If doing the test by hand, you would look up the probability in a table. There are different Chi square tables depending on the number of cells in the table. Determine the number of degrees of freedom for the table as (rows-1) X (columns -1). In this case it is (2-1)*(2-1)=1. The probability of obtaining a Chi square of 13.67 given no relationship is less than .001. (The last entry in my table gives 10.83 as the chi square value corresponding to a probability of .001, so 13.67 would have a smaller probability). If using a computer package, it will normally report both the Chi square and the probability or significance level corresponding to this value. In testing your null hypothesis, REJECT if the reported probability is less than .05 (or whatever confidence level you have chosen). FAIL TO REJECT if the probability is greater than .05. REVIEW OF STEPS IN HYPOTHESIS TESTING: For the above example : (1) Nominal level variables, so we used Chi square. (2) State null hypothesis. No relationship between gender and IM-USE 2 (3) Choose confidence level. 95%, so alpha = .05, critical region is > 3.84 2 (4) Draw sample and calculate the statistic; = 13.67 (5). 13.67 > 3.84, so inside critical region, REJECT null hypothesis. Alternatively, SIG= .001 on computer printout, .001<.05 so REJECT null hypothesis. Note we could have rejected null hypothesis at .001 level here. WHAT HAVE WE DONE? We have used probability theory to determine the likelihood of obtaining a contingency table with a Chi square of 13.67 or greater given that there is no relationship between gender and IMUSE. If there is no relationship (null hypothesis is true), obtaining a table that deviates as much as the observed table does from the expected table would be very rare - a chance of less than one in 1000. We therefore assume we didn't happen to get this rare sample, but instead our null hypothesis must be false. Thus we conclude there is a relationship between gender and IMUSE. The test doesn't tell us what the relationship is, but we can inspect the observed table to find out. Calculate row or column percents and inspect these. For row percents divide each entry on a row by the row total. Row percents: GENDER Male Female Total IM-USE Yes .33 .67 1.00 No .75 .25 1.00 Total .50 .50 1.00 To find the "pattern" in table, compare row percents for each row with the "Totals" at bottom. Thus, half of sample are men, whereas only a third of IMusers are male and three quarters of nonusers are male. Conclusion - men are less likely to use IM. -------------------------------------------------------------Column Percents: Divide entries in each column by column total. GENDER Male Female Total IM-USE Yes .40 .80 .60 No .60 .20 .40 Total 1.00 1.00 1.00 PATTERN: 40% of males use IM, compared to 80% of women. Conclude women more likely to use IM. Note in this case the column percents provide a clearer description of the pattern than row percents. PRR389 SPSS & Statistics Page 8 COMPUTING Confidence intervals around parameter estimates. When you use a sample statistic to estimate a population parameter, you base your estimate on a single sample. Estimates will vary somewhat from one sample to another. Reporting results as confidence intervals acknowledges this variation due to sampling error. When probability samples are used we can estimate the size of this error. The standard error of the mean (SE Mean)is the standard deviation of the sampling distribution - i.e. how much do means for different samples of a given size from the same population vary? The SE Mean provides the basic measure of likely sampling error in a sample estimate. A 95% confidence interval is two (1.96) standard errors (SE) either side of the sample mean. SEMean = standard deviation in population/ square root of n (sample size) SPSS computes standard deviations and/or standard errors for you. You should be able to compute a 95 % confidence interval if you have the sample mean (say X) and a) standard error of mean (SEMean) - (X- 2*SEMean, X + 2* SEMean) b) standard deviation of variable in population ( ) and sample size (n) : SEMean = /sqrt(n), 95% CI= (X- 2*/sqrt(n), X + 2*/sqrt(n)) Examples: a) In sample of size 100, pct reporting previous visit to park is 40%. If SEMean is 5%, then 95% CI is (40% + or - 2 * 5%) = (30%, 50%). b) In sample of size 100, pct reporting previous visit to park is 40%. If standard deviation in population is 30%, then SEMean is /sqrt(n) = 30/sqrt(100) = 30/10 = 3. and 95% CI = (40 + or - 2 SEMean) = 40 + or - 2*3%) = 40 + or - 6% = (34%,46%) c) If same mean and standard deviation as b) but using bigger sample of 900, note the 95%CI = (40 + or - 2 * 30%/sqrt(900)) = 40 + or - 2*(30/30) = 40 + or - 2% = (38%, 42%) OTHER STATISTICAL NOTES a. Measures of strength of a relationship vs a statistical test of a hypothesis. There are a number of statistics that measure how strong a relationship is, say between variable X and variable Y. These include parametric statistics like the Pearson Correlation coefficient, rank order correlation measures for ordinal data (Spearman's rho and Kendall's tau), and a host of non-parametric measures including Cramer's V, phi, Yule's Q, lambda, gamma, and others. DO NOT confuse a measure of association with a test of a hypothesis. The Chi square statistic tests a particular hypothesis. It tells you little about how strong the relationship is, only whether you can reject a hypothesis of no relationship based upon the evidence in your sample. The problem is that the size of Chi square depends on strength of relationships as well as sample size and number of cells. There are measures of association based on chi square that control for the number of cells in table and sample size. Correlation coefficients from a sample tell how strong the relationship is in the sample, not whether you can generalize this to the population. There is a test of whether a correlation coefficient is significantly different from zero that evaluates generalizability from the sample correlation to the population correlation. This tests the null hypothesis that the correlation in the population is zero. b. Statistical significance versus practical significance. Hypothesis tests merely test how confidently we can generalize from what was found in the sample to the population we have sampled from. It assumes random sampling-thus, you cannot do statistical hypothesis tests from a non-probability sample or a census. The larger the sample, the easier it is to generalize to the population. For very large sample sizes, virtually ALL hypothesized relationships are statistically significant. For very small samples, only very strong relationships will be statistically significant. What is practically significant is a quite different matter from what is statistically significant. Check to see how large the differences really are to judge practical significance, i.e. does the difference make a difference?. PRR389 SPSS & Statistics Page 9 SPSS FOR WINDOWS version 10.0 - LAB Nov 12-26. Contents: SPSS procedures - 1-4; Practice Exercise - 5-6 ; Assigned Exercise is on page 6 at bottom. Sample analysis - 7; HCMA study description - 8 Codebook 9-10. SPSS stands for Statistical Package for the Social Sciences. Other popular statistical software includes SAS, SYSTAT and MINITAB. SPSS is well suited to analysis of social science/survey data. Like all statistical packages, SPSS works with a table of data with cases as rows and variables as columns (just like an Excel Table - in fact you can import Excel tables directly to SPSS and vice versa). For survey data, each case is a respondent or questionnaire and each variable is usually a numeric coding of the response to a single question on the survey instrument. Statistical packages prefer to analyze data in numeric form so one codes variables like GENDER as something like 1=male, 2=female ( 1=male, 0=female is better). We will be analyzing data from the 1996 Huron Clinton Metropark visitor survey. The HCMA survey dataset includes 4,031 cases and 136 variables (original). A few of the messier variables have been dropped for this exercise and other variables have been computed. You will need copies of the HCMA96.SAV file to complete this exercise. You may retrieve directly in SPSS in micro-labs from the Course AFS space. You also should have reviewed the HCMA questionnaire and codebook to become famliar with the data set – variables, coding etc. 1. Loading SPSS-PC . Run SPSS by selecting the SPSS program from the START menu (In math/stat applications, SPSS,SPSS10.1.4, SPSS 10,1 for Windows). When SPSS opens you will see options to run tutorial, enter data, or open an existing file (the default). Run through tutorials for a preview. To retrieve HCMA data file, close the opening dialogue box and retrieve file directly from SPSS menus. File, Open, Data then browse to the HCMA96.SAV file in the PRR389 course AFS space. File Edit Transform New Open Data Syntax Output When the file is loaded, you will see the data in the data window in spreadsheet format. Variable names are at top of columns. Cases run down rows. Each case/row represents one respondent/completed questionnaire. See HCMA codebook and questionnaire to match variables with items on the questionnaire. To see codes as Values rather than numbers, choose View on menus and check Value Labels (uncheck to toggle back to numbers). To see information about any variable, choose "Variable view" tab at bottom. On menus, Utilities, Variables shows you information for all variables. You are now ready to run statistical analysis. 2. To run Statistical Procedures choose the ANALYZE option on menu and then the statistical procedure you wish to run. We will work mostly with Descriptive Statistics and the Compare Means procedure. Descriptive Statistics Transform Analyze Reports Descriptive Statistics Custom Tables Compare means General Linear Correlate Regression LogLinear Descriptive Statistics Classsify Frequencies Data Reduction Descriptives Scale Explore Nonparametric tests Crosstabs Survival Multiple Response FREQUENCIES frequencies for nominal & ordinal variables DESCRIPTIVES means etc. for interval/ratio scale variables EXPLORE Exploratory data analysis procedures to see distributions CROSSTABS Tables for nominal or ordinal (few categories) variables, Chi square test PRR389 SPSS & Statistics Page COMPARE MEANS 10 Interval dependent variable, nominal or limited category independent variable Means Compare subgroup means, Options ANOVA for stat test One Sample T-Test Test H0 : Mean of variable = some constant Indep. Samples T-Test Two groups, Test H0 : Mean for group 1 = Mean for group 2 Paired samples T-Test Paired variables - applies in pre-test, post-test situation One Way ANOVA Compare means for more than two groups Compare Means Means One Sample t-Test Ind. Samples TPaired Samples T One Way ANOVA 3. General Steps for Running Procedures. a. First choose a procedure from Analyze menu. Note the appropriate procedure depends on measurement levels of your variables and nature of the intended analysis. See 5 below for details. b. Choose variables : Select from list of variables at left, click arrow to move into Variable Box at right. Note that you can choose several variables at a time - move one at a time by selecting and clicking arrow or by double clicking on variable name. Hold CTRL key down while clicking to select several variables and move to Variable Box as a group. To Unselect a variable, click on it in the Variable Box on right, arrow switches direction, click it to move back. c. Select Buttons at bottom for special Options, Statistics, etc. - complete dialog boxes, CONTINUE d. Click OK to run the procedure e. Results appear in the OUTPUT Window. SPSS automatically switches to output window when you run a procedure. Scroll around in this window to view results. To return to Data window click HCMA96 button on application bar at bottom or choose HCMA96 from Window menu item. 4. SPSS Windows and files. SPSS throws up lots of WINDOWS, often not maximized. Use the MAXIMIZE buttons at top right of windows to expand display to full screen. Use WINDOW command on menu bar to choose between the Output or Data Windows or choose them from Application bar at bottom. Three primary windows are The Data Window - a spreadsheet showing raw data, variables across columns, cases down rows. Run most procedures from here. SPSS data files have an *.SAV extension. SPSS 10.0 has added a "variable view" page to the data window accessed via Excel-type tabs at bottom. The Variable view page has definitions of variables and coding information. Output window - when you run a procedure, results are shown in the Output window. This is like a wordprocessor with outline at left to select particular results. You may print results from here or copy and paste them to WORD or EXCEL. SPSS Output files have an *.SPO extension Syntax window - optional. If you use Paste option, you can paste procedures to syntax window, where you can easily rerun them or edit them. SPSS syntax files have an *.SPS extension. SPSS data and output files are specially coded files you can only read in SPSS. There are utilities to save data files as Excel or Access files, or to import data from those formats to SPSS. The syntax files are simple text files that can be read by a wordprocessor. 5. Guidance on individual procedures - basic statistics a. FREQUENCIES - run this on variables at nominal or ordinal scale with a small number of categories. Gives frequency distribution for the variable and optional statistics. b. DECRIPTIVES - run for interval scale variables to get mean, standard deviation, etc. , choose S.E. Mean in Statistics Dialog BOX to compute confidence intervals. c. CROSSTABS - for nominal/ordinal variables, choose a row and column variable (variable with fewer categories for columns). In Statistics, select Chi square for a hypothesis test, in Cells choose Row Pct and Column Pct. d. COMPARE MEANS - dependent variable must be interval scale (or dichotomous), independent variable forms subgroups (should take on limited set of values - usually nominal or ordinal). e. CORRELATE - for two or more interval scale variables user Pearson, Spearman/Kendall for ordinal measures. PRR389 SPSS & Statistics Page 11 6. Variable Transformations - Sometimes you want to change coding of a variable or compute a new variable. RECODING AND COMPUTING procedures are in the TRANSFORM menu. Use RECODE to change coding of a variable (maybe to collapse into fewer groups or reassign missing codes) and COMPUTE to compute new variables (e.g. simple sum of other variables). a. RECODE changes coding of a variable. First choose whether you want to put new codes in same variable or a different (new) one . The latter preserves old codes and sets up new variable with new codes. To preserve the original coding on the file, choose recode "into new variable". Then you must add name for new variable and press the CHANGE button. In either case, specify coding changes as follows. Select variable you want to change codes for and choose the "old and new values" option. Then complete the Dialog Box to indicate how codes should be changed. Press ADD button to add each coding change to the recode box. Repeat procedure for as many codes as you wish to change. Then press OK to execute the changes. For example to change code 4 on the FIRST variable to group "within the past 5 years" (3) and "more than 5 years ago" (4) together, select recode into same variable, choose FIRST variable, choose OLD AND NEW VALUES button enter a 4 in box for old value and a 3 for new value at right. Then click the ADD button and a line 4 3 will appear in box. Click CONTINUE, then Click OK to perform the recoding. If you look in DATA Window under FIRST column all the 4’s should now be 3’s. When you run a FREQ on FIRST, 3’s and 4’s will be grouped and show up as 3’s. Careful as any value labeling won't be automatically corrected. b. COMPUTE: To compute new variables from old. Choose transform, Compute. Enter a name for new variable in the Target Variable Box( 8 characters or less). Then enter a mathematical expression in the larger box after the = sign indicating how new variable is computed. Press OK to execute the procedure. Your new variable is added as a column at the end of file in DATA window. You may now use this variable in any procedure (refer to it by the name you assigned). e.g. to compute a variable equal to length of time each party stayed in the park. Enter HOURS as a name in Target Variable Box. In numeric expression box enter LEAVE - ARRIVE. Press OK. Be careful to spell variable names correctly. You can paste variables into box by double clicking on them in the list of variables at left and then adding (or pasting from calculator pad) math expressions in between. You can edit inside box to correct mistakes. SPSS will add the new variable to the file - check it at far right in data window. You can now use the new HOURS variable like any other in a statistical procedure. It won't be kept when you exit SPSS unless you save file (probably no need to save file, but if you do, you'll have to put it in your own AFS space). Beware of missing values when computing new variables. Result will be missing if any variables in formula are missing. Good practice when recoding or transforming is to always check the result before proceeding with further analysis. Check via frequencies on new and old variables or by manually checking a few cases in data window. 7. Other Procedures and Tips a. OPTIONS. SPSS may be set up to show variables in either alphabetic or file order in pick lists. To get “File” order of variables, choose EDIT, OPTIONS in main SPSS menu and change Variable order from Alpha to File order (push radio buttons on General Tab at right). You must do this BEFORE retrieving the file. Choose File, New Data and then re-retrieve the file if you already loaded it for this change to take effect. This doesn’t change the order of variables on data window, only in variable pick lists. b. CUSTOM TABLES: The Custom Tables procedures let you run descriptive statistics on groups of variables and assembles the results in tables, giving you some control over formatting and labeling. It produces what are sometimes called "banner tables" summarizing a number of variables in a single table. Use "Basic Tables" for descriptive statistics, "General Tables" for crosstabulations, and "Tables of Frequencies" for frequency distributions. You may check out this procedure after you have mastered those in SUMMARIZE section, if you wish. c. PRINTING and SAVING. You may print results as you generate them from the output WINDOW, copy ones you want into a wordprocessor. To save output, when you exit SPSS (By File Exit command), answer YES to the question about saving your output. Enter a path and filename, e.g. A:SPSS.SPO to put it on your floppy or enter path to your AFS space. You don’t need to save the data (respond NO to this question when exiting). The PRR389 SPSS & Statistics Page 12 SPSS.SPO file can only be read by SPSS. You can also copy and paste SPSS output to WORD or EXCEL by opening both SPSS and these applications. The Output window is a simple text editor - you can add your own notations and delete items you don't want. Outline at left is handy for finding a procedure you ran or deleting it. d. Selecting and Sorting Cases: The Data menu has procedures to SORT the data file on a particular variable or to SELECT subsets of cases to use in an analysis. For example, to Select only cases from Kensington Metropark, choose Data, Select Cases and then push the IF tab and enter filter PARK=1 (Kensington is park 1 in coding scheme). Any subsequent analysis will only use the Kensington cases and you will see a "filter on" message in status bar and cases not from Kensington are "slashed out" in data window. REMEMBER To turn filter off when you want to return to all cases -, come back to DATA, Select CASES and choose the "all cases" radio button. e. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable. f. OTHER PROCEDURES. Feel free to explore other parts of the SPSS program. You can generate GRAPHS in the graphs menu (be careful to save before trying graphs - labs are crash prone on graphing), and view various information about the file in UTILITIES menu. See the HELP menus for tutorial and further information about using SPSS for WINDOWS. If you’d like more instruction in SPSS, the Computer Lab runs shortcourses. Also check out the SPSS site on the web at www.spss.com and tutorials. g. SYNTAX WINDOW. SPSS also lets you paste commands into a syntax window (look for the PASTE buttons on most procedures). If you prefer you can type, edit and run procedures from the syntax window if you know the syntax. This is sometimes faster than navigating thru the menus, but requires some familiarity with SPSS syntax. If you paste commands to Syntax window, you can save the syntax file and easily rerun procedures later. This simplifies rerunning a complicated set of procedures. The HCMA.SPS syntax file (in labs99 subdirectory) will run the procedures in the practice exercise that follows. To retrieve, in SPSS menu use File Open, Syntax filer and point to hcma.sps file on U drive in labs99 folder. h. Using Excel for data Entry. If you want to enter survey data in Excel and then import result to SPSS, use these guidelines. On first row of spreadsheet enter a short (8 characters or less) variable name - avoid spaces and special characters as SPSS may not like them. You can name variables VAR1, VAR2, … but this makes them harder to identify. Enter each questionnaire as a separate case below the names, one case to a row - no blank rows. Save Excel file when complete and close it. To retrieve this file into SPSS, Use File, Open, Data and change default extension to xls files and pick your Excel file. Enter range on spreadsheet where data is located and click button for variable names in first row if you have done that. Should read data into SPSS. Be careful about blanks in Excel as these will come in as missing. i. MISSING VALUES and N's. SPSS allows certain values to be designated as "missing" for each variable and has a general "system missing value" designated by a "." . It is a good idea to pay attention to the number of cases for any procedure you run and understand when lots of cases are missing or when you have filtered out cases with a SELECT CASES procedure. Watch the N"s. If you just look at percentages and means, these may be based on only a few cases. Remember that confidence levels of results will depend on the sample size. Also beware of statistical tests from WEIGHTed analyses, that may distort the actual sample size. PRR389 SPSS & Statistics Page 13 SPSS PRACTICE EXERCISE 0. So we all have the same options, let's list variables in alphabetic order rather than file order. Choose Edit, Options from SPSS menu, then General tab. In variable lists at right select "Alphabetical". Then OK. Also choose the "Viewer" tab and make sure "Display commands in log" at bottom is checked. Sometimes it is easier to see variables in file order – Set this option in General Tab – choose "Display names" instead of Variable labels. You will need to re-retrieve the file for these changes to take effect. Following is easiest way. File, New, Data - will open a blank file File, choose HCMA96.SAV from recently used files at bottom OR File Open and point to it again. 1. FREQUENCIES for nominal, ordinal variable. Describe the characteristics of park visitors - income and age. From codebook note these variables are measured in categories- i.e. ordinal scale with small number of categories. Run FREQUENCIES. In menu choose Analyze, Descriptive Statistics, Frequencies. Find INCOME and AGE2 variables on list at left (near end in file order, or in alphabetic order). Select them with mouse and click arrow to move to the variable box at right (or double click on the variable). Click OK to run frequencies. 2. DESCRIPTIVES for interval scale variable. How many female visitors were there on average in each? Find variable on codebook = TOTFEMAL, note it is interval scale. In menu Choose Analyze, Descriptive Statistics, Descriptives. Complete Dialog box as above by selecting TOTFEMAL (near end in file order) and moving it to the input box. 3. CROSSTABS with two nominal or ordinal variables. Crosstabs generates a table using two variables, one for rows and one for columns. Question- What is distribution of the sample by age and income? From menu, choose Analyze, Descriptive Statistics, Crosstabs. Complete Dialog Box by choosing INCOME for the row variable and AGE2 for the column variable. Also click the STATISTICS button at bottom and ask for all of them -- CONTINUE CELLS button at bottom and ask for observed count, row percents, and column percents -then CONTINUE and OK to run procedure. 4. COMPARE MEANS. Use this procedure to compare averages of two or more subgroups. Dependent variable is interval scale variable (means are computed for this variable). Independent variable should be nominal or have small number of values/groups - it forms groups. Let's see if visitors using an annual motor vehicle permit visit parks more often than those entering on a daily permit. Find variables - independent = MVP95 identifies those with a MVP (grouping variable). Dependent= HCMATOT measures days of use of Metroparks last year. 5. Simple Hypothesis testing - give these a try a. Confidence interval for an average. In the DESCRIPTIVES procedure (Analyze, Descriptive Statistics, Descriptives) let's compute the average days that people used Metroparks last year - HCMATOT just as in 2 above, but also click the OPTIONS button at bottom of dialog box and ask for SE mean (Standard error of mean ) Click CONTINUE, then OK to run. To get 95% confidence interval you add and subtract two standard errors from the sample mean. b. Differences in means. Are those who use annual permits older than those who use dailies? Two variables are MVP which indicates whether people used a daily or annual permit to enter the park and AGE which gives age of the respondents (interval scale). We want to compute means for AGE for each type of entry permit. In menu, choose Analyze, Compare Means, Means as in 4 above. Complete Dialog Box by choosing dependent variable - the interval scale one = AGE, then independent or subgroup variable - nominal or ordinal scale with small number of categories = MVP. For statistical test of hypotheses that all the subgroup means are equal, also choose OPTIONS and ask for the "ANOVA Table and eta" at bottom. Also ask for SE Mean by moving this from list of statistics to "Cell Statistics" at right. Then click OK to run the procedure. PRR389 SPSS & Statistics Page 14 Alternatively you could perform the independent samples T-test on two groups defined by MVP (say group one has MVP =0, group 2 =1). c. Chi square - tests for relationships in a crosstab table between two nominal/ordinal variables. Are higher income visitors more likely to use annual permits? Note MVP95 and INCOME are measured in a small number of categories (nominal or ordinal). Run the Crosstab procedure (Analyze, Descriptive Statistics, Crosstabs - see 3 above) with MVP95 and INCOME. In Statistics, choose Chi square, In Cells choose row and column percents. d. Correlations - For two interval scale variables. Is age (AGE) correlated with total days of use (HCMATOT)? Procedure is Analyze, Correlate, Bivariate. Choose AGE and HCMATOT. 6. General rules for interpreting hypothesis tests. 1. You test a NULL hypothesis - The NULL hypothesis is a statement of NO relationship between the two variables (e.g., means are the same for different subgroups, correlation is zero, no relationship between row and column variable in a crosstab table). 2. TESTS are conducted at a given "confidence level" - most common is a 95% level. At this level there is a 5% chance of incorrectly rejecting the null hypothesis when it is true. For stricter test, use 99% confidence level and look for SIG's <.01. Weaker, use 90% , SIG's < .10. 3. On computer output look for the SIGnificance or PROBability associated with the test. The F, T, Chi-square, etc are the actual "test statistics", but the SIG's are what you need to complete the test. SIG gives the probability you could get results like those you see from a random sample of this size IF there were no relationship between the two variables in the population from which it is drawn. If small probability (<.05) you REJECT the assumption of no relationship (the null hypothesis). For 95% level, you REJECT null hypothesis if SIG <.05 If SIG > .05 you FAIL TO REJECT REJECTING NULL HYPOTYHESIS means the data suggest that there is a relationship. 4. Hypothesis tests are evaluating whether you can generalize from information in the sample to draw conclusions about relationships in the population. With very small samples most null hypotheses cannot be rejected while with very large samples almost any hypothesized relationship will be "statistically significant" - even when not practically significant. Be cognizant of sample size (N) when making tests. 7. RECODING AND COMPUTING NEW VARIABLES - TRANSFORM . Sometimes you want to create new variables or change the coding of an existing variable (e.g. to collapse categories) COMPUTING NEW VARIABLES. What is the average number of visitors in each party? You will need to COMPUTE a new variable equal to the sum of total female and male visitors and then run DESCRIPTIVES on this new variable. Transform, Compute. Enter name of new variable - PARTY then formula = TOTMALE + TOTFEMALE in box (paste in names to be sure of spelling). OK. This variable has already been created and saved on file. RECODING a variable. Suppose we want to collapse income into two categories, say above or below 50,000. Choose TRANSFORM on menu, then RECODE. Complete dialog box to recode income into a NEW variable (See Recode command on previous page) - call it INCOM2. Then run FREQUENCY on INCOM2 8. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual PRR389 SPSS & Statistics Page 15 sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable. To turn weigths off, return here and check “Do not weight cases”. SAMPLE ANALYSIS: All of this will easily fit on one page. (attach the relevant SPSS output). I hypothesized that the respondents who list price of admission as an important factor in choosing a park will have larger family sizes. The appropriate survey variables are Q7PRICE measured as a 5 item Likert scale (ordinal) and TOTFAM. TOTFAM is computed as a sum of the number of children 18 and under (HOUSEKID) and adults (HOUSEADT) that live in household. First I computed TOTFAM=HOUSEKID + HOUSEADT. Then I ran Descriptives on these three variables to get means. Put these into a Table and show results. I've omitted the numbers. You should briefly describe and interpret results in a paragraph and display details in short table or figure (format tables & figures properly). Table 1. Average family size for visitors Category Number of People Children Adults Total Report a 95% confidence interval for TOTFAM by getting DESCRIPTIVES and asking for SE Mean. Run frequencies on Q7PRICE variable - report as percentages in a simple table. (Use the Valid Pct Column, do not report everything SPSS prints out e.g - omit cumulative pcts unless they are meaningful to you) Table 2. Rating of importance of admission price Importance extremely important very important important somewhat important not important Pct Based on Table 2, I split the sample into two groups: Q7PRICE = 1, 2 or 3 (extremely, very, and important) formed group one, and 4 or 5 group two. An independent samples T-Test was run to test for a difference in the average family sizes (TOTFAM) across the two subgroups (when asked to define groups, I used 4 as the Cut Point. All codes less than the cut point form one group, and all codes greater than or equal to the cut point form the other group). Show results of this in short table. Note those rating price as more important (Group 1) have somewhat larger family sizes (2.6 people compare to 2.3). The difference is statistically significant at the 95% confidence level. Table 3. Test of Difference in Family Sizes by Importance of Admission Price Importance Subgroup Group 1: Extremely, very, or important Group 2 : Somewhat or not important Test of difference in Means Average Family Size 2.6 2.3 T-statistic SIG 3.77 .000 This example illustrates how to explain the analysis including variables you selected and any changes you made in them (recodes) and also the results. If you choose nominal or limited category variables you will use crosstabs and Chi Square test. Be sure to first describe variables and then perform the statistical test. PRR389 SPSS & Statistics Page CODEBOOK: HURON-CLINTON METROPARKS USER SURVEY Part 1. Variables that APPEAR on the questionnaire. QUESTION VARIABLE NAMES Q1-Time ARRIVE, LEAVE Q2N/A Q3-Permit MVP Q4-Permit 95 MVP95 Q5-Activity NATURE to PLAYGRND Q6-primary activity Q7-Importance (characteristics)) Q8-Importance (reasons) Q9-Facilities Q10-Programs Q11 Aware free admission Q12-Familiarity (column a) Q12-Familiarity (column b) Q13- First visit PRIMACT Q7BEAUTY to Q7CROWD (see questionnaire for details) Q8FAMILY to Q8NATURE (see questionnaire for details) Q9WATER to Q9NONEWF (see questionnaire for details) Q10NATURE to Q10OTHER (see questionnaire for details) FREEDAY1, FREEDAY2 META to LAKA (see questionnaire for details) METB to LAKB (see questionnaire for details) FIRST Q14- Get info Q15- Performance INFOTV to INFOOTHR Q5BEAUTY to Q5OVERAL (see questionnaire for details) Q16- Comments Q17Zipcode Q18- Age Q19- Gender Q20- Employment Q21- Employed Q22- Marital status Q23- Family members Q24- Education Q25- Income Q26- Race Q27- Final N/A ZIPCODE AGE GENDER EMPLOY EMPLFULL, EMPLPART MARITAL HOUSEKID, HOUSEADT EDUCATE INCOME ETHNIC N/A Refer to the questionnaire for more details. CODING COMMENT coded in military time dropped from file 0=daily, 1=annual permit 0=No, 1=Yes 0=not participate, see footnote for variable 1=participate names & code numbers activity number from Q5 see footnote below for activity codes 1=extremely important to 5=not important 1=extremely important to 5=not important 0=not chosen, 1=like developed 0=not chosen, 1=like developed 0=no, 1=yes if yes for FREEDAY1 continue for FREEDAY2 0=blank, 1= familiar number of times visited in the past 12 months 1= today, 2= within the past year, 3= within the past 5 years, 4= more than 5 years ago 0=blank, 1= get info from it 1 = excellent , 2 = very good, 3 = good, 4 = fair, 5 = poor, -8 = don’t know N/A 5 digit zipcode code age 1= female, 2= Male (see questionnaire for detail) number of people employed (see questionnaire for detail) interval numbers of children (adults) at home (see questionnaire for detail) (see questionnaire for detail) (see questionnaire for detail) N/A interval interval interval interval 16 PRR389 SPSS & Statistics Page Part 2. Created Variables. These variables were created by using recodes and computes. VARIABLE NAMES CODING COMMENT PARK 1 = Metro Beach, 2 = Wolcott Mill, Park where survey was 3 = Stony Creek, 4 = Indian Springs, distributed- determined from 5 = Kensington, 6 = Huron Meadows, survey ID number 7 = Hudson Mills, 8 = Dexter-Huron, 9 = Delhi, 10 = Lower Huron, 11 = Willow, 12 = Oakwoods, 13 = Lake Erie COUNTY 1 = OAKLAND, 2 = LIVINGSTON, county of residence determined 3 = WAYNE, 4 = WASHTENAW, from zipcode 5 = MACOMB, 6 = MONROE, 7 = other AGE2 1= <17, 2= 18-35, 3= 36-59, 4= >59 Grouping of age into categories TOTFEMAL total number of female in party Sum from question #2 TOTMALE total number of male in party Sum from question #2 DAY 1= Monday, 2= Tuesday etc. day of the week of visit WEEKEND 0=weekday, 1=weekend from date distributed HCMATOT Sum of total days visited parks for 1995 Sum from question #12 VSTWT2 weight to use to adjust sample to population of visits. VSTORWT2 weight to adjust sample to population of visitors FLC 0 = S/NC/18-35, 1 = S/C/18-35, family life cycles computed from 2 = M/NC/18-35, 3 = M/C/18-35, age, marital status & children in 4 = S/NC/36-55, 5 = S/C/>36, household 6 = M/NC/36-55, 7 = M/C/36-55, 8 = S/NC/>55, 9 = M/NC/>55, 10= M/C/>55 SEASON 1= winter, 2= spring, 3= summer, 4= fall from date distributed Code number, variable names, and activities for question 5 and 6. 1 NATURE 2 SCENIC 3 PICNIC 4 BIKE 5 WALK 6 WALKPET 7 RUN 8 ROLLER 9 VISITNC 10 VISITF 11 VISITGM 12 SUNBATHE 13 BOATNM 14 BOATM 15 FISHB 16 FISHS 17 WATERSL 18 SWIMLAKE 19 SWIMPOOL 20 EVENT 21 OTHERACT 22 GOLF 23 PLAYGAME NATURE OBSERVATION OR PHOTOGRAPHY SCENIC DRIVE PICNIC BICYCLE WALK OR HIKE WALK PET(S) RUN OR JOG ROLLERSKATE OR IN-LINE SKATE OR SKI VISIT NATURE CENTER VISIT FARM VISIT GRIST MILL SUNBATHE BOAT - NON-MOTOR BOAT - MOTOR FISH FROM BOAT FISH FROM SHORE WATERSLIDE SWIM OR WADE IN LAKE SWIM OR WADE IN POOL ( INCLUDING WAVEPOOL) ATTEND A SPECIAL EVENT IN THE PARK PARTICIPATE IN AN OTHER ACTIVITY GOLF PLAY OTHER GAMES OR SPORTS (NOT GOLF) 24 WATCH 25 PLAYGRND WATCH GAMES OR SPORTS USE PLAYGROUND EQUIPMENT OR TOT LOT (the following codes are for question 6 only) 26 ICE FISH 27 CROSS COUNTRY SKI 28 SLED OR TOBOGGAN 29 ICE SKATE 30 FISHING (UNDETERMINED) 31 NONE 17 PRR 475 Exercise 11 Page 2