Download Reading Assignment 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
1
Statistical Inference from the Ordinary Least Squares Model
The reading assignments have derived the OLS estimator. Using assumptions A – E
given in the Assumptions Reading Assignment, the mean and the variance of the OLS estimator
is derived. The distribution of the estimator, however, has not been defined. The mean and
variance do not generally define a statistical distribution. To conduct statistical tests, the
distribution (normal, t-statistic, F distribution, Chi-Squared) must be known.
The subject of this reading assignment is inference, statistical and economic, from the
OLS regression. Statistical inference involves several tests on the estimated parameters. These
tests will involve tests of associated with either a single parameter or a group of parameters.
Economic inference includes these statistical tests, but also includes estimated parameter’s
magnitude and sign. Inference concerning the previously discussed goodness-of-fit measure, R2,
will also be expanded in this reading assignment.
Additional Assumptions Necessary for Statistical Inference
Because the distribution of the OLS estimator is not given by Assumptions A – E, we
need to rely on one of two assumptions to perform statistical tests. Under either of these two
assumptions, the distributions for the statistical tests can be derived. In both cases, the
distribution is the same. Deriving the distributions is beyond the scope of this class.
Assume Normality
The most restrictive assumption is the error terms in the model are normally distributed.
We have already assumed they have a mean of zero and constant variance. This assumption is
written as u i ~ N (0,  2 ) .
Central Limit Theorem
Instead of assuming the error terms are normally distributed, we can rely on the Central
Limit Theorem. In either case, the same distributions are obtained. The Central Limit Theorem
(CLT) states:
given any distribution that has a mean, , and a variance,2, the distribution of sample
means drawn at random from the original distribution approaches the normal
2
distribution with a mean  and variance 
as the sample size increases.
n
This theorem can also be stated in the form, let q1, q2, q3, . . ., qn be independent and randomly
distributed from any distribution that has a mean of  and a variance 2. The mean of qi is:
q
then
n
1
q
(q1  q 2  q3    q n )   i
n
n
i 1
2
z
q  E[ q ] q  
.


[var( q )] 2
n
In this form, the CLT states that the average value of n independent random variables
from any probability distribution (as long as it has as mean and a variance) will have an
approximately a standard normal distribution after subtracting its mean and dividing by its
standard deviation, if the sample size, n, is large enough. The standard normal distribution is
N(0, 1), that is a mean of zero and a variance of one. Why n and not n  1 ? We are
interested in the standard error of a series of means and not the standard error of a sample of
random numbers. Second, in practice, n is large, therefore, n and n - 1 are not very different.
The sample means of the random variables approximate a normal distribution.
Why is the CLT important and how is it use? To answer these questions, we need to
examine the error term, u. What is u? Given the model set-up, the error term includes all factors
affecting the dependent variable that are not included as an independent variable. That is, the
error term includes everything affecting y except the x’s. Therefore, u is a sum of many different
factors affecting y. Because u is a sum of many different factors, we can invoke the CLT to
conclude u approximates the normal distribution. Why is the CLT important? If we consider the
error term as a sum of many different factors, the CLT states the error term will approximate the
normal distribution. The CLT use is similar to assuming the error terms are normally distributed.
Thus, by invoking the CLT, statistical distributions can be derived. These distributions allow for
statistical tests on the estimated OLS parameters.
Although, the assumptions of either normality or invoking the CLT are the most
restrictive assumptions made to use OLS, these assumptions allow statistical tests to be
performed. These statistical tests are maybe the most important component of running an OLS
regression. Statistical inference is a powerful aspect of OLS. You must understand the
following statistical tests.
Inference Concerning Parameters
As noted earlier inference involves statistical tests and examining the estimated
coefficients’ magnitude and sign. In this section, aspects concerning individual parameters are
discussed.
t-tests
Individual t-tests can be done for each estimated coefficient. Under Assumptions A - E
and either assuming normality of the error terms or invoking the CLT, the following can be
shown:
(1)
(ˆ j   j )
var( ˆ j )
~ t n k .
3
This equation states the result obtained by dividing the quantity obtained by subtracting any
value, j, from the estimated coefficient, ̂ j , by the standard error of the estimate for ̂ j will be
distributed as a student t-distribution (t-distribution) with n - k degrees of freedom. Note, before
a particular sample is taken, the estimates for ̂ j are a random number. That is, the estimator has
a distribution associated with it. This is why the above equation is a distribution. After the OLS
estimates are obtained for a particular sample, ̂ j become as fixed number. The statistics and
mathematics necessary to derive this result are beyond this class. We will take this result as a
given.
t-distribution. Before applying the above result to your OLS estimates, it is informative to
review, hypothesis formulation and testing, the student t-distribution, and the use of the tdistribution in statistical testing. As noted in the statistics reading assignment, the t-distribution
is one of the most important statistical distributions. See the statistics reading assignment for the
general t-test. You are responsible for knowing this test upside down and backwards.
The t-distribution is a symmetric bell-shaped distribution, but the shape (probabilities)
depends on the degrees of freedom of the distribution. For different degrees of freedom, the tdistribution has different critical values. As the degrees of freedom increase, the t-distribution
approaches the normal distribution. On the web page is a file containing a table associated with
the t-distribution. At this point, you should download this file and confirm the distribution
various by degrees of freedom.
Given a distribution, statistical tests associated with various hypotheses can be
conducted. To conduct hypotheses testing, a null hypothesis and an alternative hypothesis are
necessary. It is important that the alternative and null hypothesis cover all possible outcomes.
The null hypothesis is commonly denoted as Ho, whereas, the alternative is commonly denoted
as HA. Several alternative and null hypotheses are:
Null
Alternative
H0:
HA:
j = 0
j  0
j > 0
j  0
j < 0
j  0.
In this list, three different null hypotheses are given in the top row and the associated alternative
in the second row. For each null, the alternative hypothesis covers all possible alternatives not
given by the null hypothesis. For example, consider the first null hypothesis, H0: j = 0. Given
this null, two different alternatives are possible, j could be less than zero or j could be greater
than zero. Both alternatives are covered by the alternative hypothesis of j  0. An alternative
hypothesis of j > 0 would be inappropriate for the null hypothesis of H0: j = 0. It is
inappropriate because it does not cover the potential for j < 0. If your test statistics was such
that it implied j < 0, you would not be able to make any inference from your two hypothesis. It
is important to set-up your null and alternative hypothesis such that they cover all possible
alternatives.
Given the different null and alternative hypotheses, the t-test can be either a two-tailed
test or a one-tailed test. Knowing if the test is one- or two-tailed is important in conducting the
4
test, the value associated with the critical value will depend on the tails, and in interpreting the
test inference. A two-tailed test is given by
H0:
HA:
j = d
j  d.
An example of a one-tailed test is
H0:
HA:
j  d
j < d.
In general, these examples show that when using a t-test, any value can be tested as given by the
general notation, d. It is not necessary to that d = 0, as given in the previous examples. The
number d can be any value.
The “fail to reject” and “rejection” regions are different between one- and two-tailed
tests. To conduct a test, a level of significance must be chosen. The level of significance is
given by . The probability of a “Type I” error is given by the level of significance. A Type I
error occurs when the null hypothesis is rejected, but the hypothesis is true. Associated any level
of significance is a critical value. The critical value is the point of demarcation between the
acceptance and region regions.
Before proceeding, a few words about Type I and II errors is appropriate. The two types
of errors are defined in table 1.
Table 1. Type I and Type II Errors Defined
Decision Regarding Statistical
States of the World
Test
Null hypothesis true
Null hypothesis false
Reject null
Type I error
Correct decision
Do not reject null
Correct decision
Type II error
We can fix the probability of a Type I error by picking a value for . Unfortunately, the same
control over a Type II error is not possible. The probability of a Type II error can only be
calculated if we know the true value for the estimated parameters. If we know the true value of
the parameters, there is no reason to perform statistical tests. We can, however, state three
important aspects concerning Type I and II errors.
1) The probabilities of Type I and Type II errors are inversely related. This means as you
decrease the probability of a Type I error, you are increasing the probability of a Type II
error and vice versa.
2) The closer the true value is to the hypothesized value, the greater the chance for a Type
II error.
3) The t-test may be the best test, because for a given probability for a Type I error, the
test minimizes the probability of a Type II error.
5
For a two-tailed test, the fail to reject and rejection regions are given by α/2. This is
shown in figure 1.
Rejection
Region
α/2
Rejection
Region
α/2
Fail to
Reject
Region
Figure 1. Two-tailed test
A one-tailed test fail to reject and rejection regions are defined by α probability in one of the tails
as shown in figure 2.
Fail to
Reject
Region
Rejection
Region
α
Rejection
Region
α
Fail to
Reject
Region
Figure 2. Two Cases Associated with a One-tailed Test
For either a two-tailed or a one-tailed test, you calculate a value based on equation (1).
Then the null hypothesis is either failed to reject or rejected based on where the calculated tstatistic values falls.
Key Point: the test is testing hypothesis concerning the population parameters. The test
is not testing hypothesis about the estimated parameters from a particular sample. Once a sample
is taken, the estimated values are a fixed number. It makes no sense to test if a given number is
equal to some other number. We know the value for a given number.
Application to OLS. At the beginning of this inference section, we stated we could use the
estimated parameters and their estimated variance to obtain a statistic that is distributed as a tdistribution. Combining our knowledge of hypothesis testing with this result, it is clear we can
conduct tests using the estimated parameters. These tests are concerning with hypothesis
6
concerning the true parameters and are not testing hypothesis about the sample. Recall, for a
given sample, your OLS estimates are a unique set of fixed numbers.
As an application to OLS, let’s assume you have estimated the following equation, with
estimated standard errors beneath each estimated parameter:
(2)
y t  1.5  5.2x t
(0.25) (5.2)
.
It is not uncommon to see estimated equations written in this form. Here, the estimated slope is
5.2 and the estimated intercept is 1.5. These estimated values come from the OLS estimator
given by the equation ˆ  (X' X) 1 X' Y . The standard errors (the square root of the variance) of
the estimated parameters are 0.25 for the intercept and 5.2 for the slope parameter. These
standard errors are the square root of the diagonal elements from the estimator of the variance of
the estimated parameters given by var( ˆ )  ˆ 2 (X' X) 1 . Lets test the following hypothesis:
H0:
HA:
1  2
1 < 2.
Inserting the values from equation (2) into equation (1), the t-statistic becomes:
(3)
t n k ~
(ˆ 1  1 )
var( ˆ 1 )

1.5  2
 2.0 .
0.25
The next step is to use a t-table to obtain the critical value for your assumed level of significance.
Assuming 28 degrees of freedom (n - k, n = 30 and k = 2) and an  = 0.05, the critical value is 1.701. At this point, you should look at the t-table given on the class web site and convince
yourself you know how the critical value was determined. This is a one-tailed test and we are
interested in the left-hand tail. Notice the hypotheses are concerned with the true parameter
value, , and not the estimated value, ̂ . Graphically, the problem is stated in figure 3.
Rejection
Region
α = 0.05
Fail to
Reject
Region
-2.0
-1.701
Figure 3. One-tailed t-test Example
7
In this example, the calculated value falls into the rejection region. Therefore, we would reject
the null hypothesis the 1 = 0. If we chose level of significance equal to 0.025, the critical value
would be -2.048. At this level of significance, we would fail to reject the null hypothesis. This
examples illustrated that different statistical conclusions (inferences) can be reached depending
of the level of significance chosen. It is important for you to think about what are the statistical
and economic implications of choosing different  levels?
As a second example, lets test the following null hypothesis;
H0:
HA:
2 = 0
2  0
Inserting the values from equation (2) into equation (1), the t-statistic becomes:
(3)
t n k ~
(ˆ 2   2 )
var( ˆ 2 )

5.2  0
 1.0 .
5.2
As before, the next step is to use a t-table to obtain the critical value for your assumed level of
significance. Assuming 28 degrees of freedom and an  = 0.05, the critical values are -2.048
and 2.048. This is a two-tailed test. At this point, you should look at the t-table given on the
class web site and convince yourself you know how the critical value was determined. Another
point is even through the significance level is the same between the two examples; the critical
values differ. This is caused by the one- versus two-tailed aspect. Convince yourself why this
occurs. Graphically, the problem is stated in figure 4.
Rejection
Region
α/2 = 0.025
-2.048
Rejection
Region
α/2 = 0.025
Fail to
Reject
Region
1
-2.048
Figure 4. Two-Tailed Example
In this test, the calculated value falls into the fail to reject region. We would state that we fail to
reject the null hypothesis, 2 = 0.
Significance of a Variable. Most regression packages, including Excel, print out a specific t-test
for every estimated parameter. This test is
8
H0:
HA:
j = 0
j  0.
This test is often referred to as testing if the variable is significant. If the true parameter value is
equal to zero, independent variable, xj, has no affect on the dependent variable. This is what is
meant by significance of the variable. You will need to know and understand this test.
p-values. In addition to printing out the specific t-test associated with the significance of a
variable, most regression packages also print out the probability value. The probability value is
known as the p-value. It is increasingly common to state the p-value associated with the test
statistics rather than choosing a level of significance. It is, therefore, important you understand
the meaning of a p-value.
The probability value is the probability that the test statistic, t-statistic, takes a value
larger than the calculated value. In other words, the p-value for a given t-statistic is the smallest
significance level at which the null hypothesis would be rejected. Because the p-value represents
the area under a probability density function, p-values range from 0 to 1. P-values are reported
as decimals.
An illustration will help in the understanding of p-values. From the hypothesis-testing
example associated with 2, we obtained a calculated t-value equal to 1. The test was a twotailed test. The p-value is given graphically in figure 5.
Rejection
Region
p/2 = 0.16
Rejection
Region
p/2 = 0.16
Fail to
Reject
Region
-1
1
Figure 5. p-values Areas Defined for Two-Tailed Test
As illustrated in figure 5, the calculated t-statistics are place on the student t-distribution graph.
The p-value is the areas in the two tails for a two-tailed test using the calculated t-statistic as the
demarcation point between the reject and fail to reject regions. Computer programs
automatically compute these areas by integration. A concept that is beyond this class. P-values
can also be associated with one-tailed tests. For a one-tailed test, obviously we are interested the
area given by only one of the tails. In the example illustrated, the p-value would equal 0.32.
P-values can be used several ways. The first is the p-value gives the level of significance
associated with the calculated t-statistic. In the above example, if you were to choose a level of
9
significance equal to 0.32, your two-tailed test critical values would equal -1 and 1. At this level
of significance, your critical value and test statistic are equal. In other words, you can report the
exact level of significance that provides the cut off value between rejecting and failing to reject
the null hypothesis. Second, the p-values can be used as follows, the null hypothesis is rejected
if the p-value is less than or equal to your chosen level of significance, . In the above example,
a  level of 0.05 was chosen and a p-value of 0.32 was given. At this level of significance, we
would fail to reject the null hypothesis; the p-value is larger than the level of significance.
Graphically, we can show the use of p-values for the two-tailed test as follows.
Rejection
Region
p/2 = 0.16
Rejection
Region
p/2 = 0.16
Fail to
Reject
Region
-1
zc=-2.048 alpha=0.05
1
2.048
zc=-0.67 alpha=0.5 0.67
We are concerned with two levels of significance, = 0.05 and =0.50. At the 5% significance
level, the critical values are -2.048 and 2.048, whereas at the 50% level, the critical values are
-0.67 and 0.67. The calculated t-statistic fails in the range between these two levels of
significance by design for this example. At the 5% critical level, we reject the null hypothesis.
At the 50% level we would fail to reject the null hypothesis. The decision rule is given by
the null hypothesis is rejected if the p-value is less than or equal to the level of
significance, , which is the level of a Type I error you are willing to accept. You fail to
reject the null hypothesis if the p-value is greater than the chosen level of significance.
For the one-tailed test given above, the p-value is 0.0276. This is the area or probability
in the left-hand tail. The p-value also shows why we failed to reject the null hypothesis at the
0.05% level, but rejected the null hypothesis at the 0.025% level. The calculated t-statistic fails
in between the critical values associated with these two levels of significance. This is illustrated
as follows
10
Rejection
Region
p = 0.0276
Fail to
Reject
Region
-2.0
zc=-2.048 alpha=0.025
zc=-1.701 alpha=0.05
Confidence Intervals
The estimated parameters, ̂' s , are a single number. Such estimates are known as point
estimates. A point estimate by definition provides no indication of the reliability of the number.
That is, what is reasonable range we would consider the parameter would fall into. Using the tdistribution, a confidence interval can be obtained for each estimated parameter. Confidence
intervals are interpreted as follows:
if random samples were obtained over and over with the upper and lower confidence
interval computed each time, then the unknown value, , would lie in the intervals
calculated (1-) percent of the samples.
For a single sample, we do not know if the true parameter lies inside or outside of the confidence
interval.
To calculate a confidence interval, the starting point is the following equation:
ˆ j   j
Pr( t c 
 tc )  1  
var( ˆ )
where Pr denotes probability, tc the critical value from the t-distribution, and all other variables
are as previously defined. The critical value, tc, is the value from the t-table using the
appropriate α, a two-tailed test, and degrees of freedom. ˆ and var(  ) are estimated values from
your OLS regressions. β is the true value. This equation gives the probability of being between
the two critical values, tc. Rearranging the equation, by multiplying by the standard error of the
variance of β (noting this value is positive, because it is the square root of a positive number)
gives:
Pr( t c var( ˆ )  ˆ j    t c var( ˆ ) )  1   .
11
Subtracting ̂ j from both sides gives:
Pr(  ˆ j  t c var( ˆ )      ˆ j  t c var( ˆ ) )  1   .
Multiplying by -1 gives:
Pr( ˆ j  t c var( ˆ )    ˆ j  t c var( ˆ ) )  1   .
Rearranging terms gives:
(4)
Pr( ˆ j  t c var( ˆ )    ˆ j  t c var( ˆ ) )  1   .
This gives the 1-α confidence interval. The confidence interval is known as an interval estimate
in contrast to the point estimates.
Continuing with the above example, the (1 - ) % interval estimators for j’s are obtained
by applying equation (4) to the estimated for ˆ 1 and ˆ 2 . The interval estimates are:
Pr(1.5  2.048 0.25 2  1  1.5  2.048 0.25 2 )  1  .05
Pr(0.988  1  2.012)  95%
Pr(5.2  2.048 5.2 2   2  5.2  2.048 5.2 2 )  1  .05
Pr( 5.45   2  15.85)  95%
In this example, the confidence interval for the intercept is much smaller than the confidence
interval for the slope parameter. The difference is in the estimated standard errors for the
parameters. The estimated standard error for the slope parameter is over 20 times larger than the
estimated standard error for the intercept.
Economic Interpretation of the Parameters
Up to this point we have been concerned with obtaining either a point or interval estimate
for the parameters and then conducting a t-test for significance. Economic interpretation of the
estimated coefficients involves more than these mechanical issues. Economic interpretation of
an estimated equation and the individual parameters involves combining economic theory with
the estimated parameters and statistics. Interpretation is a science, as well as, an art and comes
with experience. We will spend a large part of the class on interpretation. For now, a short
discussion is appropriate.
12
Basically, economic interpretation involves the question, “Do the estimated parameters
make sense from a theoretical standpoint?” To answer this question, you must look at several
issues:
1)
2)
3)
significance of the variable - specific t-test discussed earlier,
sign of the estimated parameter (positive or negative), and
magnitude or size of the estimated parameter.
The first issue was previously discussed. Here, we are considering the issue is the
parameter different than zero. Given the t-test value, if the null hypothesis of equal to zero is not
rejected, the associated independent variable does not affect the dependent variable. You should
ask yourself what does theory state about the relationship. If you are estimating a demand
equation, own price should be significantly different from zero. That is, own price is expected to
affect quantity demanded for a product. If own price is not significantly different than zero, you
need to ask yourself, why? Did you do something wrong? A price of a substitute or complement
should also be significantly different than zero. You may have included a price of a product that
you wanted to test if it was a substitute or complement. Insignificance may indicate the product
is not a substitute or a compliment.
Sign of the estimated parameter is very important. Continuing with the demand example,
the parameter associated with own price should be negative. As own price increases, the
quantity demanded should decrease. Estimated parameters associated with the price of
substitutes (complements) should be positive (negative). As the price of the substitute
(complement) increases, the quantity of the good in questions should increase (decrease). What
would you expect for the sign of the estimated parameter for income in a demand equation?
The magnitude of the estimated parameters must be examined. For example, if you
estimate a demand equation for Pepsi, what would you expect the magnitude of the parameter on
own price to be? This is a difficult question to answer, but nonsensical parameter estimates can
be weeded out. For example, if the estimated coefficient indicated if the price of Pepsi increased
by $1 per 20-ounce bottle, the number of 20-ounce bottles sold would decrease by one bottle per
day in the U.S would you believe your results? Do you think Pepsi could more than double its
price and have little impact on demand for its product? Is the following reasonable? Your
estimated coefficient indicates a $1 increase would cause the number of 20-ounce Pepsi bottles
sold in the U.S. to decrease by one-half. This is where experience, economic theory, and prior
studies become important.
Inference from Multiple Parameters
To this point the discussion on inference has been concerned with inference associated
with each estimated parameters separately. Two measures that are concerned with making
inference from more than a single parameter are discussed in this section, adjusted R2 and Ftests.
Adjusted Coefficient of Determination
13
In a previous reading assignment, we defined the coefficient of determination, R2, as the
amount of sample variation in the y’s that is explained by the x’s. R2 range between zero and
one. The equation for calculating R2 is:
SSR
R 

SST
2
 ( ŷ
 (y
i
i
SSE
 û i .
 1
 1
2
SST
 y)
 ( y i  y) 2
 y) 2
2
This statistic is a measure of the goodness-of-fit of the equation. Thus, the measure is looking at
how all the independent variables together explain the dependent variable. It is no longer
looking at an individual estimated parameter.
There is a problem with using R2. The problem is R2 is sensitive to the number of
independent variables. Addition of another independent variable increases the R2. Therefore, to
maximize the R2 all you have to do is add additional variables. One can obtain an R2 of one by
having the number of independent variables equal the number of observations, n = k. This
problem can be shown by examining the equation to calculate R2:
SSR
R 

SST
2
 ( ŷ
 (y
i
 y) 2
i
 y) 2
.
An addition independent variable will not affect SST. SST is strictly the variation in the
observed y’s around there mean. The independent variables have no impact on this variation.
Adding independent variables will affect the estimated y’s. Adding additional x’s will increase
the amount of the variation explained, increasing  ( ŷ i  ŷ) 2 . Increasing the numerator and not
changing the denominator will cause R2 to increase.
R2 is concerned with variation. The solution to the R2 problem is to concern ourselves
with the variance instead of the variation. The adjusted coefficient of determination, R 2 , is
defined as:
SSE
R2  1
ˆ 2
(n  k )
 1
 1
SST
var( y)
(n  1)
 (y
 ŷ i ) 2
i
 (y
i
 y)
(n  k )
2
(n  1)
where ̂ 2 is the variance of the error terms or variance of the y’s net of the influence of the x’s,
and var(y) is the variance of the y’s. As in the equation for R2, increasing the number of
independent variables has no impact on the dominator in the equation for R 2 . Increasing the
number of independent variables affects both components in the numerator. Increasing the
number of independent variables will decrease the SSE. This is the same affect as for R2
increasing the SSR will cause SSE to decrease. At the same time, the degrees of freedom (n - k)
also decreases, with n constant and k increasing, n - k will decrease.
14
Adding additional independent variables increases R2. For the adjusted coefficient of
determination, it is shown adding additional independent variables will not necessarily increase
R 2 . If the additional variable(s) help to explain the dependent variable, R 2 will increase. If the
additional independent variable(s) do not help explain the dependent variable, R 2 will decrease.
This occurs because R 2 takes into account the degrees of freedom.
Relationship Between R2 and R 2 . A relationship between R2 and R 2 can be shown. The
difference between the two measures is R2 measures variation and R 2 measures variance.
Taking in account degrees of freedom changes variation into variance. Rearranging the equation
for R2 one obtains the following equation:
SSE
SST
.
SSE
2
(1  R ) 
SST
R2  1
Multiplying both sides of the above equation by -(n-1)/(n-k) and rearranging the following
equation is obtained:
1
SSE
n 1
n  1 SSE
(n  k )
2
n

k
.

(1  R )  


1
SST
nk
n  k SST
SST
(n  1)
nk
SSE
Adding one to both sides, one obtains the definition for R 2 . Rearranging one obtains
SSE
n 1
(n  k )
2
R  1
(1  R )  1 
SST
nk
(n  1) .
n 1
R 2  1  (1  R 2 )
nk
2
(5)
From equation (5), several aspects of the relationship between R2 and R 2 can be
illustrated. These aspects are summarized as follows.
1)
If the number of independent variables equals one, k = 1, then R2 = R 2 . This is
true because the last term in equation (5) reduces to (n - 1) / (n - 1), which equals
one. Equation (5) reduces to one minus one plus R2, which equals R2.
2)
If the number of independent variables is greater than one, k > 1, then R2 > R 2 .
The definition for R2 was one minus the percent variation unexplained. With this
definition, (1-R2) becomes one minus the quantity one minus the percent variation
unexplained, which equals the percent unexplained. Equation (5) in words
15
becomes 1 - (% unexplained)(a number >1). Note, (n - 1) / (n - k) will be a
number greater than one, because the numerator is greater than the denominator.
R 2 because one minus the a number bigger than the percent explained, because
the percent explained is multiplied by a number greater than one. R2 recall is one
minus percent unexplained. Therefore, R2 must be greater than R 2 .
3)
R 2 can be a negative number. A negative R 2 indicates a very poor fit of your
equation to the data. Recall the lower bond for R2 is zero. Both R2 and R 2 have
an upper bound of one, indicating a perfect fit.
4)
R2 increase as k increases for a given n, whereas R 2 may increase or decrease as
k increases for a given n.
5)
R 2 eliminates some of the problems associated with R2 by taking into account the
degrees of freedom. However, R 2 does not eliminate all the problems.
Use of R2 and R 2 . R2 and R 2 are used to compare different regression equations. For example,
you have estimated the following two equations, but are not sure which equation is “better”:
y t  ˆ 1  ˆ 2 x t , 2 , and
y t  ˆ 1  ˆ 2 x t , 2  ˆ 3 x t ,3 .
Theory does not provide enough guidance to determine if x3 should be in the equation or not.
We know the R2 for the second equation will be larger than the R2 for the first equation,
regardless if x3 helps explain y. Informally (no real statistical test), we use R 2 to compare the
two estimated equations. The rule is to choose the equation with the highest R 2 as the “best”
equation. To compare R 2 ’s, the dependent variables must be the same variable and must be in
the same units.
F-Test
The t-test is used to test individual parameters, whereas we use the F-test to test several
parameters at the same time. This is called multiple restrictions. Consider the following
equation:
(6)
y t  1   2 x t , 2   3 x t ,3   4 x t , 4  u t .
Examples of several different null hypothesis that are multiple restrictions are
H0:
(7)
3 = 4 = 0
2 = 3 = 4 = 0
2 = 4 = 0
16
HA:
H0 is not true.
In the above example, the alternative hypothesis holds if only one of the i’s is not equal to zero
in the different null hypothesis. Key Point: the null hypothesis is concerned with testing jointly
if several true parameters are equal to zero. This is in contrast to the t-test in which only one
parameter was tested at a time. The alternative hypothesis is usually stated in this generic form
to cover all possible alternatives.
F-test. To conduct an F-test, an unrestricted and a restricted model must be defined. The
general unrestricted model, which contains all k parameters, is:
(8)
y t  1   2 x t , 2   3 x t ,3   4 x t , 4  k x t ,k  u t .
Restrictions as given by the null hypothesis constitute the restricted model. In general a null
hypothesis will have q restrictions, therefore, the restricted model will have q less estimated
parameters. Note the null hypothesis is the parameter is equal to zero. A zero parameter value is
the same as leaving the variable out of the estimated equation. The general restricted model is
given by:
y t  1   2 x t , 2   3 x t ,3   4 x t , 4  k q x t ,k q  u t .
The restricted model has q less parameters to be estimated. This is accomplished by leaving the
q variables out of the estimated equation. This effectively forces the q parameters to equal zero.
The general null hypothesis to be tested is:
H0:
HA:
k-q+1 = k-q+2 = . . . = k
H0 is not true.
Key point is the null hypothesis is placing q restrictions on the restricted model, forcing q
coefficients to equal zero. Under the assumptions of the OLS model and either assuming
normality of the error terms or invoking the Central Limit Theorem, it can be shown the
following ratio has an F distribution with q and n - k degrees of freedom:
(SSE r  SSE ur )
(9)
F  statistic 
SSE ur
q
~ Fq ,n k
(n  k )
where SSEr is the sum of squared residuals from the restricted model and SSEur is the sum of
squared residuals from the unrestricted model. The above statistic will be positive, because all
components of the equation are positive. Recall the sum of squares is positive and n, k, and q by
definition are positive. SSEr  SSEur because the restricted model has less independent variables
so it cannot explain more (smaller SSE) than the unrestricted model.
17
F-Distribution. Before conducting an F-test associated with an OLS regression, it is useful to
briefly review the F-distribution. The F-distribution is always positive and has two degrees of
freedom, one for the numerator and one for the denominator. The F-test is a one-tailed test,
concerned with the area in the right-hand tail. Graphically, the F-test is:
F critical value
with q, n-k degrees
of freedom
Area =
1-
0
Rejection Region
α
Values of F
If the calculated F-statistic falls in the rejection region of the right-hand tail, the null hypothesis
is rejected. If the calculated F-statistic falls to the left of the critical value, we fail to reject the
null hypothesis.
Conducting an F-test. To calculate the F-statistic, two regression equations must be estimated.
Let’s consider the model given by equation (6). In this equation there are three independent
variables plus the intercept. Equation (6) is the unrestricted model. Consider testing the first
null hypothesis given in equation (7), H0: 3 = 4 = 0. First, we would estimate the unrestricted
model given by equation (6). From this estimation we would obtain the SSEur. Next, we would
estimate the restricted model and obtain the SSEr. The restricted model in this case is:
(10)
y t  1   2 x t , 2  u t .
In the restricted model, the variables associated with 3 and 4 are not included in the estimation.
To calculate the F-statistic, we would substitute the calculated SSE into equation (9) with q = 2.
We are placing two restrictions on the model, 3 = 4 = 0. The calculated value is then compared
to the critical value for a given level of significance.
As an example, lets assume you have 124 observations and estimate equation (6). You
obtain an SSEur = 84.02. Next, you estimate equation (10) and obtain an SSEr = 149.64. The Fstatistic from equation (9) is:
(SSE r  SSE ur )
F  statistic 
SSE ur
(n  k )
q
(149.64  84.02)

84.02
(124  4)
2  32.81  46.86.
0.700
Assuming a level of significance of 0.01, the critical value for the F-distribution with 2, 120
degrees of freedom is 4.79. The calculated F-statistic lies to the right of the critical value;
18
therefore, the null hypothesis is rejected that the parameters 3 and 4 are jointly equal to zero.
The test does not say anything about the individual parameters only the null hypothesis of jointly
equal to zero is rejected.
Similar to the t-test, the F-test is testing hypothesis concerning the true parameter and not
the estimated parameters. The estimated parameters are fixed numbers for a given sample.
This procedure can be used to jointly test multiple restrictions. Equations must be
separately estimated and the SSE’s obtained.
Linear Relationship. Similar to the t-test, most regression packages, including Excel,
automatically calculate and print out a specific F-test. This F-test is jointly testing if there is a
linear relationship between the dependent variable and all of the independent variables together
except the intercept. This is a special case of the general F-test previously discussed. For the
general model given in equation (8), the null and alternative hypotheses are:
H0:
HA:
2 = 3 = . . . = k
H0 is not true.
This test is jointly testing if all parameters are equal to zero. This test gives an indication if
jointly the independent variables are linearly related to the dependent variable. Linearly related
arises because of the assumption the equations are linear in parameters. The interpretation of this
test is the same as previously discussed. This is just a special case of the F-test.
When interpreting estimated equations, the F-test must be discussed. That is, do you
reject or fail to reject the null hypothesis of a linear relationship between the dependent variable
and the independent variables jointly. What level of significance is assumed? The section on pvalues will help in the interpretation.
Continuing with the above example, the SSEur remains the same, but the restricted SSEr
is obtained by estimating the model with only an intercept. That is, no independent variables are
included in the model. The SSEr = 257.81. Using these SSE values, the F-statistic associated
with the above hypothesis is:
(SSE r  SSE ur )
F  statistic 
SSE ur
(n  k )
q
(257.81  84.02)

84.02
(124  4)
3  57.91  82.74.
0.700
With 3, 120 degrees of freedom the critical value associated with an  = 0.05 (0.01) is 2.68
(3.95). At either of these levels of significance, we would reject the null hypothesis that there is
no linear relationship between the dependent and independent variables.
Relationship Between the F-Statistic and R2
19
Because both the F-test and the R2 are both based on the SSE and SST, it is intuitive a
relationship exists between the two metrics. To show this relationship, recall the equation for
calculating R2 is R 2  1  (SSE
) . This equation can be rearranged to obtain
SST
SSE  SST (1  R 2 ) . We can then substitute this result into the F-statistic equation given in
equation (9) to obtain:
(SSE r  SSE ur )
F  statistic 
SSE ur
q
(n  k )
(10)
[SST (1  R 2r )  SST (1  R 2ur )]

SST(1  R )
1
ur
q
(n  k )
where as before r denotes the restricted model and ur the unrestricted model. Note, SST is the
same in both models because it is only a function of the observed dependent variable and not
dependent on the independent variables. Because the term SST is present in a multiplicative
form in each term of equation (10) we can eliminate SST from the equation, the one’s in the
numerator will cancel, and by rearranging we obtain the relationship between the F-statistic and
R2. Mathematically, this is shown as follows:
[SST (1  R 2r )  SST (1  R 2ur )]
F  statistic 
SST (1  R )
2
ur
(n  k )
SST[(1  R 2r )  (1  R 2ur )]

SST (1  R )
2
ur
(n  k )
[(1  R  1  R )]
2
r

2
ur
(1  R )
2
ur
(n  k )
[( R  R )]
2
ur
F  statistic 
2
r
(1  R )
2
ur
q
(n  k )
.
q
q
q
20
This relationship shows, you can calculate the F-statistic and conduct the F-test, if you
know the R2 for both equations. More important, the relationship shows that the F-statistic and
R2 are related.
For the special null hypothesis, all ’s are jointly equal to zero, the relationship reduces
to:
[( R 2ur )]
F  statistic 
(1  R )
2
ur
q
.
(n  k )
The relationship reduces to this form, because R2r = 0. The coefficient of determination is zero
in the restricted model, because there are no independent variables, x’s in the model. Recall, R2
is the amount of variation in y explained by the x’s. If there are no x’s in the model, they
obviously explain none of the variation.
p-value for the F-Test
The p-value for the F-test associated with the null hypotheses all parameters are jointly
equal to zero is printed out by most regression software. The interpretation of the p-value is the
same as it was for the t-test. The p-value is the probability of observing a value of the F
distribution at least as large as the calculated F-statistic. Graphically, the p-value is given by
21
Calculated Fstatistic with q, n-k
degrees of freedom
Area = p-value
0
Values of F
As with the p-values associated with the t-statistic, p-values for the F-statistic can be used
several ways. The first is the p-value gives the level of significance associated with the
calculated t-statistic. In the above example, if you were to choose a level of significance equal to
0.00044, your F-critical value would equal 82.74. At this level of significance, your critical
value and test statistic are equal. In other words, you can report the exact level of significance
that provides the cut off value between rejecting and failing to reject the null hypothesis.
Second, the p-values can be used as follows, the null hypothesis is rejected if the p-value is less
than or equal to your chosen level of significance, . In the above example, a  level of 0.05
was chosen and a p-value of 0.00044 is obtained. At this level of significance, we would reject
the null hypothesis. If we chose a  that is smaller than 0.00044, then we would fail to reject
the null hypothesis.
The decision rule is given by
the null hypothesis is rejected if the p-value is less than or equal to the level of
significance, , which is the level of a Type I error you are willing to accept. You fail to
reject the null hypothesis if the p-value is greater than the chosen level of significance.