Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
0 .01 Density .02 .03 A relatively symmetrical distribution, some slight right (positive) skewness 20 40 60 AGE 80 100 Note that the mean and median are relatively close, the mean is greater than the median suggesting a longer right hand tail and positive skewness (note the skewness statistic is greater than zero) AGE ------------------------------------------------------------Percentiles Smallest 1% 22 18 5% 27 18 10% 30 19 Obs 1881 25% 37 19 Sum of Wgt. 1881 50% 75% 90% 95% 99% 48 60 73 79 88 Largest 93 93 93 93 Mean Std. Dev. 49.67305 16.0474 Variance Skewness Kurtosis 257.5191 .4700756 2.529016 0 .1 Density .2 .3 A left (negative) skewed variable, educational attainment 0 5 10 15 EDUC Note that the mean and median are not too close, the mean is less than the median suggesting a longer left hand tail and negative skewness (note the skewness statistic is less than zero) EDUC ------------------------------------------------------------Percentiles Smallest 1% 2 1 5% 5 1 10% 8 1 Obs 3322 25% 9 1 Sum of Wgt. 3322 50% 75% 90% 95% 99% 11 13 14 15 16 Largest 16 16 16 16 Mean Std. Dev. Variance Skewness Kurtosis 10.98977 2.872879 8.253433 -1.111893 4.378316 0 2.0e-06 Density 4.0e-06 6.0e-06 8.0e-06 Example of a highly right-skewed (positively skewed) variable, family income (FINC) 0 500000 1000000 FINC The Statistics for FINC FINC ------------------------------------------------------------Percentiles Smallest 1% 7000 20 5% 16000 400 10% 24200 530 Obs 1232 25% 43335 850 Sum of Wgt. 1232 50% 75% 90% 95% 99% 73700 120370 197000 330000 409600 Largest 518000 634000 928000 1194000 Mean Std. Dev. 99711.64 94979 Variance Skewness Kurtosis 9.02e+09 3.352085 24.58478 1500000 8 6 Density 4 2 0 0 2 Density 4 6 8 An example of transformations. The first graph (EURO2DOLLAR) has a mean of .9742. The second graph (EURO2DOLLARPLUS3) adds the value 3 to each observation in the dataset. Note that the graph has the same shape, but is SHIFTED to the right by 3. The new mean is .9742+3 or 3.9742 but the standard deviation is unchanged. 0 5 0 EURO2DOLLAR 5 euro2dollarplus3 The third graph (EURO2DOLLARMULT2) takes the original values and multiplies every one of them by two. Note that the graph is shifted by 2 and stretched a bit. The new mean is 1.9484 (or .9742*2) and the standard deviation is changed .208 (which is .104*2). 4 3 Density 2 1 0 0 1 Density 2 3 4 The final graph (EURO2DOLLARMULT2PLUS3) multiplies each value by 2 then adds 3. We see that the new mean is (old mean*2)+3 or 4.9484 and the new standard deviation is only multiplied by 2. 0 5 euro2dollarmult2 0 5 euro2dollarmult2plus3 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------euro2dollar | 1043 .9742087 .1040628 .827 1.2597 euro2~rplus3 | 1043 3.974209 .1040628 3.827 4.2597 euro2dolla~2 | 1043 1.948417 .2081257 1.654 2.5194 euro2~2plus3 | 1043 4.948417 .2081257 4.654 5.5194 0 .2 Density .4 .6 Example of what a natural log (advanced transformation) can do for a variable. Recall that FINC was highly right skewed. Transforming FINC by taking the natural log of each value has the effect of correcting some of the skewness and giving the distribution a more peaked shape. 0 5 10 15 log_finc It is now slightly left (negatively) skewed but far more symmetrical than before Some well known transformed scores: Mean 500 20 100 100 SD 100 5 15 16 Scale Name SAT sections; GRE sections; GMAT ACT Wechsler IQ Test Stanford Binet IQ Test Suppose you have a list of values 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4 A frequency table might look like the first four columns below Score Freq. Proportion 0 1 0.083333333 2 4 0.333333333 3 4 0.333333333 4 3 0.25 Percentage 8.333333333 33.33333333 33.33333333 25 Score * Proportion 0 0.666666667 1 1 Total 12 100 2.666666667 1 Sum of 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4 is 32 Average of 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4 is 2.66667 If I didn’t know the frequency, it does not matter as long as I know the proportion of 0, 2, 3, and 4s. You can use a similar method to calculate the standard deviation for the same list (page 53) Score Freq. Proportion 0 2 3 4 1 4 4 3 Total 12 Deviation Sq.Deviation Weighted Sq. Deviation ⎛f⎞ ⎜ ⎟ ⎝n⎠ (x − X ) ( x − X )2 ⎛f⎞ ( x − X )2 * ⎜ ⎟ ⎝n⎠ 0.083333333 0.333333333 0.333333333 0.25 -2.6667 -0.6667 0.3333 1.3333 7.11113 .444449 .11111 1.77777 (7.11113*.083333)=.592593 (.444449*.333333)=.148148 (.111111*.333333)=.037037 (1.77777*.25) =.444444 1 To get the variance, MSD=1.22222 ⎛ n ⎞ s2 = ⎜ ⎟ MSD ⎝ n − 1⎠ The standard deviation then is = (12/11)*1.2222 = 1.33333 1.3333 = 1.154