Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES Recitation, week 5 Bell Shaped Distributions, anyone? Throughout the course we have been assuming that the distribution of a number of variables is bell-shaped. The assumption of a bell-shaped distribution is useful, because it allows us to use the empirical rule: approximately 95% of the observations will be between the mean minus the standard deviation and the mean plus the standard deviation. But what if a distribution is not bell-shaped? Is there an easy fix? We have seen one distribution earlier in the course that is not bell-shaped: the number of times a song is played on Spotify. We called that distribution a superstar distribution because it is strongly right-skewed: a few songs get played many times, while the vast majority of songs gets played very few times. Here we will consider the distribution of income in the Census 2010, collected by the U.S. Census Bureau. Each individual in the census is asked to report his income. 1. Go to the course website and download the data set for this recitation. Open it in Stata. 2. How many observations are there? Can you check that this is approximately a 1% sample of the overall census? 3. Spot the income variable in the data set using describe. Obtain the mean and standard deviation of that variable. In your opinion, is this weekly, monthly or annual income? Are there other potential income sources than wage and salary income? 4. Inspect the minimum and maximum of income, draw a histogram and spot extreme observations. Remove these observations. Why do some individuals report no income? Explain. 5. Draw a histogram of income – after having removed the zero income values and the extreme values. Is the distribution of income bellshaped? Is this a superstar distribution? Explain. 6. At what level of income does an individual belong to the top 1% of the highest earners? (in terms of wage and salary income). 7. The distribution of income doesn’t seem to bell-shaped. To remedy this issue – the empirical rule does not apply – we will take the logarithm of income. Generate a new variable called log_income by typing gen log_income = log(incwage). 8. Draw a histogram of log income. Is it approximately bell-shaped? To visually check this, draw the histogram with a bell-shaped distribution (aka normal distribution) by typing hist log_income, normal. 9. Although the log of income is not exactly bell shaped, it may satisfy certain features of a bell-shaped distribution, for instance, the empirical rule. We want to check that approximately 95% of the observations are between the mean of log_income and +- two standard deviations. a. Summarize the log of income to find the mean and standard deviation. b. Create a variable within_95_pct that is equal to 1 if the observation is within the mean + - two standard deviations in the following way: gen within_95pct = log_incwage <= 10.04581 + 2*1.284475 & log_incwage >= 10.04581 - 2*1.284475 (all on one line) Replace the values 10.04581 and 1.284475 with the appropriate mean and standard deviation. c. Do a tabulate within_95_pct to check that approximately 95% of the observations lie in that interval. 10. Congrats! Now we know how to go from a superstar distribution to a bell-shaped distribution. The rule is clear : The log of income is bell-shaped, while income has a superstar distribution. 11. Exercise: fill in the following sentence: John Applebee’s income is $22,026, hence the log of his income is approximately 10. The log of the median income is 10.34. Therefore the median income is approximately …. % higher than John Applebee’s income. Wrap up: the log allows us to compare the ratio of values. If the log of income of Tom is 10.41 and the log of income of Barbara is 10.46, Barbara’s income is approximately 5% higher than Tom’s. Details for question 11. Write that: log(median income) – log(John’s income) = 10.34 – 10.0 Hence, using the properties of the log: log(median income / John’s income) = 0.34 Take the exponential of both sides: Median income / John’s income = exp(0.34) Notice that exp(0.34) is approximately 1+0.34 ! That is true for all small values. For instance exp(0.05) is approximately 1+0.05. Finally : Median income / John’s income = 1.34 So the median income is 34% higher than John’s income.