Download Exercise 1: Understanding Regression Output

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Transcript
Instructions: read this document and complete the exercises. You are NOT to work
together. You can e-mail me any questions you may have. It is due Tuesday, March 1st,
in class.
Exercise 1: The T-distribution.
You should have studied and used the t-distribution in your previous statistics course. The actual
theory behind the t-distribution is fairly advanced. Your econometrics textbook in Chapter 5
presents some of this material; it discusses how a t random variable is the ratio of a standard
normal random variable and a Chi-squared random variable. For our purposes, we don’t need to
go into these details. Instead, I want you to understand what random variables have a tdistribution and how to use the t-table (it works quite differently from the Z-table).
First, remember what a standard normal random variable is: Take a normally distributed random
variable, minus out its mean and divide by its standard deviation, this gives you a standard
normal random variable that has a mean of 0 and a standard deviation of 1.0. In notation, this is:
X ~ N (  , )  Z 
( X  )

~ N (0,1)
However, if  is unknown, then it must be estimated. The best estimator to use is the sample
standard deviation, s, where s 
s 
2
 ( xt  x ) 2 .
T 1
If we then standardize X using s
instead of σ, we are adding an additional source of uncertainty to the new standardized random
variable. This is because the sample standard deviation s is also a random variable, and this
random variable has a Chi Squared distribution. Therefore, constructing a new random variable
that is the ratio of a normal random variable, X, divided by a Chi Squared random variable s,
results in a new random variable that has a t-distribution, *not* a normal distribution. This
distribution is pictured on page 94 of your textbook. It is bell-shaped like the standard normal,
but it has “wider” tails to allow for the additional uncertainty that is added when dividing
through by the sample standard deviation.
Similar to the normal distribution, we use a table to get probabilities for the t values. However,
the t-table works very differently. It appears on the inside of the front cover of your text book (it
might be on the back cover, depending on the date of your book’s printing.)
Each t value has some measure of degrees of freedom. For our simple regression models it will
be T-2 where T is the sample size. Go to the table and look up DF (“degrees of freedom”) of 10
(it could be any value, I chose 10 just for demonstration). You will find this in the first column
of the table. Go down to the value 10. In this row, you are given t-values (not probabilities).
Instead, the probabilities appear at the top of each column: there are only 5 of them: 0.10, 0.05,
0.025, 0.01 and 0.005. In the row for DF = 10 you find:
DF
 = 0.10
 = 0.05
 = 0.025
 = 0.01
 = 0.005
…
…
…
…
…
…
10
1.372
1.812
2.228
2.764
3.169
How to interpret these values: The probability that the random variable t is greater than or equal
to 1.372 is 0.10 (10%). We write this out as: P(t  1.372) = 0.10.
The top of the table tells us that the table contains “Right-tail critical values for the tdistribution”. Because the t random variable is symmetric, we also know that
P(t  -1.372) = 0.10.
this area under the
curve to the right of
1.372 is equal to 0.10
0
1.372
t values
What happens if we want to calculate the probability that the t random variable with 10 degrees
of freedom takes on any value, such as P(t  2.00)? Well, this table will only allow us to
approximate it:
The table tells us that P(t  1.812) = 0.05 and P(t  2.228) = 0.025 (verify this by
examining the table).
Therefore, we can say that the probability lies between 2.5% and 5%.
This is: 0.025  P(t  2.00)  0.05.
Alternatively, we could use Excel to get exact probabilities, instead of approximating. The
formula in Excel is =TDIST( ) in the brackets you need to enter 3 values: the first is the value
for t. In our example, this would be 2.00. The second value is the degrees of freedom, and the
third value is either a 1 or a 2, for one or two tails. To answer the probability: P(t  2.00) with
10 degrees of freedom, we would enter the formula: =TDIST(2.00,10,1). Hit return and Excel
returns the value 0.036694, which lies between 0.025 and 0.05 as we indicated above. If you
wanted to find the probability that t is greater than 2.00 OR less than –2.00, this would be the
sum of the two tails. In excel, the formula would be =TDIST(2.00,10,2) and Excel would return
the value 0.073388. (Verify these results using Excel before proceeding).
Answer these questions:
a) Approximate P(t . 2.00) for 38 degrees of freedom using the table
b) Find P(t . 2.00) for 38 degrees of freedom using the Excel
c) Find tc where P(t  tc) = 0.05 for 38 degrees of freedom using the T-table
d) Find tc where P(t  tc) or P(t  tc) = 0.05 for 38 degrees of freedom. For this one, the
following diagram will help:
0.025
0.025
-tc
0
you want the sum of the two
tail areas to be 0.05, so you
need to know what tc value
leaves 0.025 in each tail?
tc
Exercise 2: Confidence Intervals
Read slides 5.1 – 5.9 and pages 90 – 98, only skimming pages 92, 93 and 94 (which go into great
detail concerning the t- distribution). These slides and pages explain how to construct a
confidence interval and how to interpret it. Slide 5.7 demonstrates how we move from a
statement concerning the t-distribution to a confidence interval for β2. Think of the confidence
interval as an estimator for the true, unknown parameter. We know from Chapter 4 that b2 is our
estimator for β2, but now we recognize it as a point estimator. By itself, if tells us nothing about
its precision. For that, we need to take b2’s standard deviation into consideration. That is where
confidence intervals come into play. We take our point estimator, b2, and construct an interval
around it using the standard deviation of the estimator, se(b2). The interval is expressed in terms
of degree of confidence. We get to choose the degree of confidence, with 95% being the most
popular choice, and 90% and 99% also being useful levels.
In the example from the slides (slides 5.8 and 5.9) and book (bottom of page 97 – 98), we have a
sample of 40 observations, implying T – 2 = 40 – 2 = 38 degrees of freedom. From here, we
used the t-table to determine that 95% of the possible t values lie between +/- 2.024. From here
we then infer that 95% of the possible b2 values lie between +/-2.024 standard deviations
(se(b2)):
b2  2.024se(b2 )
= 0.1283 ± 0.0617
= [0.0665, 0.1901]
We say that the margin for error is 0.0617. We are 95% confident that this interval contains the
true β2 value. Notice that we do not say that we believe there to be a 95% probability that the
true β2 is in this interval. The difference between the two statements concerns what the
probability refers to. The β2 parameter is not random, so we do not make probabilistic
statements about it. The interval is constructed using b2 and se(b2) which are both random
variables, so we make the probabilistic statement about the interval.
The interval can be for any level of confidence, where the critical t-value tc will differ depending
on the level of confidence and the degrees of freedom. The general form of a confidence interval
is: b2  t c se(b2 )
Answer these questions:
1) For the example on slides 5.8 and 5.9, construct the 99% and 90% confidence intervals for β2.
Show all of your work. This is very straight-forward as long as you know how to use the t-table.
2) For the elections regression that you estimated in computer assignment #2, demonstrate how
the 95% confidence interval for β2 was constructed. Excel prints this out for you [0.499459,
1.27633], but you need to show exactly how it was constructed.
3) Suppose that a political pundit argues that the growth rate of the economy has NO effect on
how people vote. Explain how your results in 2) above can refute this statement. (Hint: if this
statement were true, what would the value of β2 be?)
4) Repeat 2) for the mortgage rate/housing starts regression that you estimated for problem set
#3.
5) Suppose that a housing market analyst predicts that a 1 percentage point increase in the 30
year fixed mortgage rate will cause a reduction in housing starts of 40,000 units. Does your
confidence interval from 4) support or refute this prediction? Explain.