Download A relatively symmetrical distribution, some slight right (positive

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
0
.01
Density
.02
.03
A relatively symmetrical distribution, some slight right (positive) skewness
20
40
60
AGE
80
100
Note that the mean and median are relatively close, the mean is greater than the median
suggesting a longer right hand tail and positive skewness (note the skewness statistic is greater
than zero)
AGE
------------------------------------------------------------Percentiles
Smallest
1%
22
18
5%
27
18
10%
30
19
Obs
1881
25%
37
19
Sum of Wgt.
1881
50%
75%
90%
95%
99%
48
60
73
79
88
Largest
93
93
93
93
Mean
Std. Dev.
49.67305
16.0474
Variance
Skewness
Kurtosis
257.5191
.4700756
2.529016
0
.1
Density
.2
.3
A left (negative) skewed variable, educational attainment
0
5
10
15
EDUC
Note that the mean and median are not too close, the mean is less than the median suggesting a
longer left hand tail and negative skewness (note the skewness statistic is less than zero)
EDUC
------------------------------------------------------------Percentiles
Smallest
1%
2
1
5%
5
1
10%
8
1
Obs
3322
25%
9
1
Sum of Wgt.
3322
50%
75%
90%
95%
99%
11
13
14
15
16
Largest
16
16
16
16
Mean
Std. Dev.
Variance
Skewness
Kurtosis
10.98977
2.872879
8.253433
-1.111893
4.378316
0
2.0e-06
Density
4.0e-06
6.0e-06
8.0e-06
Example of a highly right-skewed (positively skewed) variable, family income (FINC)
0
500000
1000000
FINC
The Statistics for FINC
FINC
------------------------------------------------------------Percentiles
Smallest
1%
7000
20
5%
16000
400
10%
24200
530
Obs
1232
25%
43335
850
Sum of Wgt.
1232
50%
75%
90%
95%
99%
73700
120370
197000
330000
409600
Largest
518000
634000
928000
1194000
Mean
Std. Dev.
99711.64
94979
Variance
Skewness
Kurtosis
9.02e+09
3.352085
24.58478
1500000
8
6
Density
4
2
0
0
2
Density
4
6
8
An example of transformations. The first graph (EURO2DOLLAR) has a mean of .9742.
The second graph (EURO2DOLLARPLUS3) adds the value 3 to each observation
in the dataset. Note that the graph has the same shape, but is SHIFTED to the right by
3. The new mean is .9742+3 or 3.9742 but the standard deviation is unchanged.
0
5
0
EURO2DOLLAR
5
euro2dollarplus3
The third graph (EURO2DOLLARMULT2) takes the original values and multiplies every
one of them by two. Note that the graph is shifted by 2 and stretched a bit. The
new mean is 1.9484 (or .9742*2) and the standard deviation is changed .208 (which is .104*2).
4
3
Density
2
1
0
0
1
Density
2
3
4
The final graph (EURO2DOLLARMULT2PLUS3) multiplies each value by 2 then adds 3.
We see that the new mean is (old mean*2)+3 or 4.9484 and the new standard deviation
is only multiplied by 2.
0
5
euro2dollarmult2
0
5
euro2dollarmult2plus3
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------euro2dollar |
1043
.9742087
.1040628
.827
1.2597
euro2~rplus3 |
1043
3.974209
.1040628
3.827
4.2597
euro2dolla~2 |
1043
1.948417
.2081257
1.654
2.5194
euro2~2plus3 |
1043
4.948417
.2081257
4.654
5.5194
0
.2
Density
.4
.6
Example of what a natural log (advanced transformation) can do for a variable. Recall that
FINC was highly right skewed. Transforming FINC by taking the natural log of each value
has the effect of correcting some of the skewness and giving the distribution a more peaked
shape.
0
5
10
15
log_finc
It is now slightly left (negatively) skewed but far more symmetrical than before
Some well known transformed scores:
Mean
500
20
100
100
SD
100
5
15
16
Scale Name
SAT sections; GRE sections; GMAT
ACT
Wechsler IQ Test
Stanford Binet IQ Test
Suppose you have a list of values 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4
A frequency table might look like the first four columns below
Score Freq. Proportion
0
1
0.083333333
2
4
0.333333333
3
4
0.333333333
4
3
0.25
Percentage
8.333333333
33.33333333
33.33333333
25
Score * Proportion
0
0.666666667
1
1
Total 12
100
2.666666667
1
Sum of 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4
is
32
Average of 0, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4 is 2.66667
If I didn’t know the frequency, it does not matter as long as I know the
proportion of 0, 2, 3, and 4s.
You can use a similar method to calculate the standard deviation for the same
list (page 53)
Score Freq. Proportion
0
2
3
4
1
4
4
3
Total 12
Deviation
Sq.Deviation
Weighted Sq. Deviation
⎛f⎞
⎜ ⎟
⎝n⎠
(x − X )
( x − X )2
⎛f⎞
( x − X )2 * ⎜ ⎟
⎝n⎠
0.083333333
0.333333333
0.333333333
0.25
-2.6667
-0.6667
0.3333
1.3333
7.11113
.444449
.11111
1.77777
(7.11113*.083333)=.592593
(.444449*.333333)=.148148
(.111111*.333333)=.037037
(1.77777*.25) =.444444
1
To get the variance,
MSD=1.22222
⎛ n ⎞
s2 = ⎜
⎟ MSD
⎝ n − 1⎠
The standard deviation then is
= (12/11)*1.2222 = 1.33333
1.3333 = 1.154
Related documents