Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES Recitation Week 5 ANSWERS Bell Shaped Distributions, anyone? 2. There are 3061692 observations. From the website of the 2010 US census, we obtain that it reported 308.7 million people. That means our dataset contains 3061692/308700000 x 100= 0.992 % of the census. 3. Mean of income= 2005759 Standard deviation = 3979740 This is yearly income. Yes, other sources of compensation such as stipends and scholarships and fellowships, the returns to capital (shares of companies, interests, dividends, rents), royalties. 4. Minimum of income = 0 Maximum = 9999999 Both from the histogram and from the summarize output we can observe that the values 0 and 9999999 seem to be extreme. Note that if we apply our outliers formula: LQ – 0.5(IQR) = 0 – 0.5(82000) = - 41000 UP + 0.5(IQR) = 82000 + 0.5(82000) = 123000 We obtain that 9999999 is an outlier, but not 0. We still may want to include 0 as an anomalous value because most households do receive some sort of income even if it is undeclared (unemployed individuals may receive money from their parents or relatives, for example). Some of those zero income entries may in fact be no responses, some other individuals may actually have incentives to report zero income because they are operating in the submerged economy outside of the tax system. Some others may simply reflect a household member (non working spouse) who is a dependent. Thus, we drop these two extreme values: drop if incwage==0 drop if incwage== 9999999 5. It is not a bell shaped distribution. It is a superstar distribution because it is highly right skewed. This makes sense if we think that most of the population is not rich, but there are a few individuals that have salaries that are much higher than the lower and middle classes, thus shifting the mean towards the right. 6. That would be the 99th percentile, which is 295000 $ per year. 7. gen log_income = log(incwage) 8. We can say that this distribution is approximately bell shaped (even though it is not perfect, it may satisfy the empirical rule). 9. sum log_income mean = 10.04563 standard dev = 1.284935 gen within_95pct = log_income <= 10.04563+ 2*1.284935 & log_income >= 10.04563 - 2*1.284935 From tab we get that 94.21% of the observations (which is close to 95%, as we expect from empirical rule) fall within the mean of log_income and +- two standard deviations. 11. The median income is approximately 34% higher than John Applebee’s income. Explanation Write that: log(median income) – log(John’s income) = 10.34 – 10.0 Hence, using the properties of the log: log(median income / John’s income) = 0.34 Take the exponential of both sides: Median income / John’s income = exp(0.34) Notice that exp(0.34) is approximately 1+0.34 ! That is true for all small values. For instance exp(0.05) is approximately 1+0.05. Finally : Median income / John’s income = 1.34 So the median income is 34% higher than John’s income. LOGARITHMS REVIEW A logarithm is an exponent, exponentiation and logarithms are inverse operations. y = logbx if and only if by = x, where x > 0, b > 0, and b 1. Log properties 1. 2. 3. 4. logb(xy) = logbx + logby. logb(x/y) = logbx - logby. logb(xn) = n logbx. logbx = logax / logab. If the base (b) = e, we have a natural logarithm called ln. e is a mathematical constant that is approximately 2.71828.We will not be using natural logarithms in this class. When we get the log of a variable on Stata, the software uses b = 10.