Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INTRO TO SPSS Part 1: Getting started Nathan Walter Annenberg School for Communication & Journalism SPSS One of the most popular software packages for statistical analysis Originally tailored for the needs of social science scholars and practitioners Recent versions attempt to appeal to health sciences Very common in the industry!!! What’s the type of scaling for the variable? Scale Properties Scale Examples Scale has an absolute zero? Ratios equal (2/1 =4/2); Proportions appropriate Interval between adjacent scale points same Ordered Arbitrary Labeled categories Mathematical Operations (non-controversial use) Ratio Age, number of sex acts Y Y Y Y N Division, Multiplication, Subtraction & Addition Interval Temperature (Fahrenheit) N N Y Y N Subtraction & Addition Ordinal Likert scales; Ranked preferences N N N Y N Greater than or less than operations (for Likertand similar scales S/A typical) Nominal Gender, Race N N N N Y Counting Measures of central tendency What is the best way to describe the center of the distribution? Mean – average Median – middle Mode – Most common With that said… Measures of central tendency describe the most “typical” value in the distribution, but they don’t tell us the whole story. For instance, the following two distributions have the same mean, mode, and median. Does it mean that they are the same? 3,3,3,3 4 4 3 3 2 Freq Freq 4,3,3,2 1 1 2 3 4 2 1 1 2 3 4 Measures of dispersion Range – the difference between the largest score in the distribution and the smallest score. R Max Min Interquartile range – the upper quartile minus the lower quartile. IQR Q3 Q1 Measures of dispersion Variance – the squared mean of the deviations from the mean. Standard deviation – the square root of the variance. With that said…. Group 1: 95, 96, 97, 98, 99 Group 2: 5, 6, 7, 8, 9 SD for group 1: 2 2 2 2 2 4 1 0 1 4 (99 97) (98 97) (97 97) (96 97) (95 97) 1.41 5 5 SD for group 2: 2 2 2 2 2 4 1 0 1 4 (9 7) (8 7) (7 7) (6 7) (5 7) 1.41 5 5 Output Statistics How old are you ? N Valid Missing Mean Median Mode Std. Deviation Range Percentiles 25 50 75 243 0 40.62 37.00 26 14.931 66 28.00 37.00 52.00 Normal distribution Distribution of IQ among students (N = 89) Distribution for years of schooling (N = 27750) Distribution of “resistance to change” (154548) Normal Distribution: Plots & Tests Kolmogorov-Smirnov test- compares data with a normally distributed probability sample. Shapiro-Wilk test- a null-hypothesis assumes that the sample is normally distributed (more power). Is there significant skew? Why does this matter? • Determining whether there is a significant skew z = Skew/ Standard Error of Skew *Is Z greater than [3.29] (p <.001)? If yes, conclude sample is apt to be from population that is not normal. Positive skew: Negative skew: Kurtosis: Vertical Abnormalities in Shape Leptokurtic: Too peaked (positive) Platykurtic: Too flat (negative) Calculate? Z = Kurtosis/standard error of kurtosis Oh no, outliers!!! Assess the situation – Is it really that bad?! What to do: 1. Remove the case 2. Transform the data 3. Change the score: a. Highest/lowest score +/- 1 b. +/- 2SD Still a problem? 1. Robust methods 2. Bootstrapping Z (normal) distribution vs. T distribution Relative frequency Z t Analysis of group means – T tests A collection of statistical models used to analyze the differences among group means and their associated procedures. a) One sample T-test b) Independent samples T-test ti xi x Sx ( x x ) ( ) t Sx x 1 2 1 1 c) Paired samples T-test d d t SD d n 2 2 Scales Construction Factor analysis: A. Development of objective tests (CFA) B. Test theory (CFA) C. Data Reduction (EFA) Reliability check/ internal consistency (Cronbach’s alpha) Common rotation techniques Technique Type Goals Comments Varimax Orthogonal Minimize complexity of factors (simplify columns of loading matrix) by maximizing variance of loadings on each factor) Most commonly used rotation. Default recommended. Direct oblimin Oblique Simplify factors by minimizing crossproducts of loadings Allows wide range of factor intercorrelations Data manipulation Questions? Nathan Walter [email protected]