Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Fiscal multiplier wikipedia, lookup

Transcript
```STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES
Recitation Week 5
Bell Shaped Distributions, anyone?
2. There are 3061692 observations.
From the website of the 2010 US census, we obtain that it reported 308.7 million
people. That means our dataset contains 3061692/308700000 x 100= 0.992 %
of the census.
3. Mean of income= 2005759
Standard deviation = 3979740
This is yearly income.
Yes, other sources of compensation such as stipends and scholarships and
fellowships, the returns to capital (shares of companies, interests, dividends,
rents), royalties.
4. Minimum of income = 0
Maximum = 9999999
Both from the histogram and from the summarize output we can observe that
the values 0 and 9999999 seem to be extreme. Note that if we apply our outliers
formula:
LQ – 0.5(IQR) = 0 – 0.5(82000) = - 41000
UP + 0.5(IQR) = 82000 + 0.5(82000) = 123000
We obtain that 9999999 is an outlier, but not 0. We still may want to include 0 as
an anomalous value because most households do receive some sort of income
even if it is undeclared (unemployed individuals may receive money from their
parents or relatives, for example). Some of those zero income entries may in fact
be no responses, some other individuals may actually have incentives to report
zero income because they are operating in the submerged economy outside of
the tax system. Some others may simply reflect a household member (non
working spouse) who is a dependent.
Thus, we drop these two extreme values:
drop if incwage==0
drop if incwage== 9999999
5.
It is not a bell shaped distribution. It is a superstar distribution because it is
highly right skewed. This makes sense if we think that most of the population is
not rich, but there are a few individuals that have salaries that are much higher
than the lower and middle classes, thus shifting the mean towards the right.
6. That would be the 99th percentile, which is 295000 \$ per year.
7. gen log_income = log(incwage)
8.
We can say that this distribution is approximately bell shaped (even though it is
not perfect, it may satisfy the empirical rule).
9. sum log_income
mean = 10.04563
standard dev = 1.284935
gen within_95pct = log_income <= 10.04563+ 2*1.284935 & log_income >=
10.04563 - 2*1.284935
From tab we get that 94.21% of the observations (which is close to 95%, as we
expect from empirical rule) fall within the mean of log_income and +- two
standard deviations.
11. The median income is approximately 34% higher than John Applebee’s
income.
Explanation
Write that:
log(median income) – log(John’s income) = 10.34 – 10.0
Hence, using the properties of the log:
log(median income / John’s income) = 0.34
Take the exponential of both sides:
Median income / John’s income = exp(0.34)
Notice that exp(0.34) is approximately 1+0.34 ! That is true for all small values.
For instance exp(0.05) is approximately 1+0.05.
Finally :
Median income / John’s income = 1.34
So the median income is 34% higher than John’s income.
LOGARITHMS REVIEW
A logarithm is an exponent, exponentiation and logarithms are inverse
operations.
y = logbx if and only if by = x,
where x > 0, b > 0, and b
1.
Log properties
1.
2.
3.
4.
logb(xy) = logbx + logby.
logb(x/y) = logbx - logby.
logb(xn) = n logbx.
logbx = logax / logab.
If the base (b) = e, we have a natural logarithm called ln. e is a mathematical
constant that is approximately 2.71828.We will not be using natural logarithms
in this class. When we get the log of a variable on Stata, the software uses b = 10.
```