Download Handout

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES
Recitation, week 5
Bell Shaped Distributions, anyone?
Throughout the course we have been assuming that the distribution of a
number of variables is bell-shaped. The assumption of a bell-shaped
distribution is useful, because it allows us to use the empirical rule:
approximately 95% of the observations will be between the mean minus the
standard deviation and the mean plus the standard deviation.
But what if a distribution is not bell-shaped? Is there an easy fix? We have
seen one distribution earlier in the course that is not bell-shaped: the number
of times a song is played on Spotify. We called that distribution a superstar
distribution because it is strongly right-skewed: a few songs get played many
times, while the vast majority of songs gets played very few times.
Here we will consider the distribution of income in the Census 2010, collected
by the U.S. Census Bureau. Each individual in the census is asked to report
his income.
1. Go to the course website and download the data set for this recitation.
Open it in Stata.
2. How many observations are there? Can you check that this is
approximately a 1% sample of the overall census?
3. Spot the income variable in the data set using describe. Obtain the
mean and standard deviation of that variable. In your opinion, is this
weekly, monthly or annual income? Are there other potential income
sources than wage and salary income?
4. Inspect the minimum and maximum of income, draw a histogram and
spot extreme observations. Remove these observations. Why do some
individuals report no income? Explain.
5. Draw a histogram of income – after having removed the zero income
values and the extreme values. Is the distribution of income bellshaped? Is this a superstar distribution? Explain.
6. At what level of income does an individual belong to the top 1% of the
highest earners? (in terms of wage and salary income).
7. The distribution of income doesn’t seem to bell-shaped. To remedy this
issue – the empirical rule does not apply – we will take the logarithm of
income. Generate a new variable called log_income by typing gen
log_income = log(incwage).
8. Draw a histogram of log income. Is it approximately bell-shaped? To
visually check this, draw the histogram with a bell-shaped distribution
(aka normal distribution) by typing hist log_income, normal.
9. Although the log of income is not exactly bell shaped, it may satisfy
certain features of a bell-shaped distribution, for instance, the empirical
rule. We want to check that approximately 95% of the observations are
between the mean of log_income and +- two standard deviations.
a. Summarize the log of income to find the mean and standard
deviation.
b. Create a variable within_95_pct that is equal to 1 if the
observation is within the mean + - two standard deviations in the
following way:
gen within_95pct = log_incwage <= 10.04581 + 2*1.284475 &
log_incwage >= 10.04581 - 2*1.284475
(all on one line)
Replace the values 10.04581 and 1.284475 with the appropriate
mean and standard deviation.
c. Do a tabulate within_95_pct to check that approximately 95% of
the observations lie in that interval.
10. Congrats! Now we know how to go from a superstar distribution to a
bell-shaped distribution. The rule is clear :
The log of income is bell-shaped, while income has a superstar distribution.
11. Exercise: fill in the following sentence:
John Applebee’s income is $22,026, hence the log of his income is
approximately 10. The log of the median income is 10.34. Therefore
the median income is approximately …. % higher than John
Applebee’s income.
Wrap up: the log allows us to compare the ratio of values. If the log of
income of Tom is 10.41 and the log of income of Barbara is 10.46, Barbara’s
income is approximately 5% higher than Tom’s.
Details for question 11.
Write that:
log(median income) – log(John’s income) = 10.34 – 10.0
Hence, using the properties of the log:
log(median income / John’s income) = 0.34
Take the exponential of both sides:
Median income / John’s income = exp(0.34)
Notice that exp(0.34) is approximately 1+0.34 ! That is true for all small
values. For instance exp(0.05) is approximately 1+0.05.
Finally :
Median income / John’s income = 1.34
So the median income is 34% higher than John’s income.