Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Standardization and regression Maarten Buis 7-12-2005 1 Outline • Recap • table A • D-Standardization • Regression: Thinking about effects 2 Recap Standardization • Standardization makes difficult to interpret variables more interpretable • Allows you to add variables, e.g.: Socioeconomic class is income + education • P-standardization: % of observations who score less • Z-standardization: deviation from mean in standard deviations 3 Appendix 2, Table A 4 D-standardization • A third way to make the unit comparable between variables • One assigns numbers 0 or 1 to cases. The range is thus fixed, the mean is proportion (P)of ones. The standard deviation is P*(1-*P) • Can be done to every variable • Easy to communicate results (high versus low incomes 5 D-standardization • However: you lose a lot of information • Often chosen when constructing a scale based on a lot of variables • Loss of information is less severe in this case • New variable has a range of 0 till 1 and can be interpreted as the proportion ‘correct’ answers 6 Effect of gender on income • • • • • good old loneliness dataset mean income males: f 2.819,86 mean income females: f 2.223,27 effect of gender on income: f 2.819,86 – f 2.223,27 = f 596,59 7 Effect of age on income age 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl Mean 3.373,68 3.282,48 3.311,79 2.853,17 2.973,18 2.782,87 2.630,97 2.928,21 2.832,16 2.739,62 2.550,00 2.509,28 2.487,13 2.704,26 2.359,08 2.474,34 2.230,67 2.299,85 age 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl fl Mean 2.466,79 2.422,93 2.151,48 2.400,00 2.343,75 2.294,87 2.281,47 2.109,48 2.141,30 2.304,85 2.239,26 2.158,70 2.058,42 2.206,12 2.050,37 2.076,39 2.380,00 8 Effect of age on income comparison difference in mean income 55 - 56 -fl 91,21 56 - 57 fl 29,32 57 - 58 -fl 458,62 58 - 59 fl 120,01 59 - 60 -fl 190,31 60 - 61 -fl 151,90 . . . . . . 82 - 83 -fl 65,59 83 - 84 -fl 80,56 84 - 85 -fl 100,27 85 - 86 fl 147,69 86 - 87 -fl 155,74 87 - 88 fl 26,02 88 - 89 fl 303,61 9 Too much of a good thing • Too many effects to be meaningful. • Problem: How to summarize a large number of effects? • Effects are just numbers. • So the problem becomes: How to summarize a large number of numbers? • We have seen this problem before!!! • Solution: Compute the average effect. • This is the effect in regression: On average one loses f 32,70 if one gets a year older. 10 effect and straight line • An average person has average age and income; i.e. is 71 years old and earns f 2.525,• Someone a year older will earn f 32,70 less • Someone two years older will earn 2 * f32.70 = f 65.40 less • Someone three years older will earn 3 * f 32.70 = f 98.10 less • etc. 11 3000 2900 2800 income 2700 2600 2500 2400 2300 2200 2100 2000 58 63 68 73 78 age 12 effect and straight line • • • • • The regression line is a straight line the equation of a straight line is: y = ... x = ... the slope is ... 13