Download Lecture 13

Document related concepts
no text concepts found
Transcript
Chapter 5: Dummy Variables
COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
Occupational schools
Regular schools
N
We’ll now examine how you can include qualitative explanatory variables in
your regression model.
Suppose that you have data on the annual recurrent expenditure, COST, and the
number of students enrolled, N, for a sample of secondary schools, of which
there are two types: regular and occupational.
The occupational schools aim to provide skills for specific occupations and
they tend to be relatively expensive to run because they need to maintain
specialized workshops.
1
COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
Occupational schools
Regular schools
N
Suppose, we want to estimate the cost of running an occupational and a
regular school. One way of dealing with the difference in the costs would
be to run separate regressions for the two types of schools.
However this would have the drawback that you would be potentially
running regressions with two small samples instead of one large one, with
an adverse effect on the precision of the estimates of the coefficients.
© Christopher Dougherty 1999–2006
COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b 1'
Occupational schools
Regular schools
b1
N
OCC = 0 Regular school
COST = b1 + b2N + u
OCC = 1 Occupational school
COST = b1' + b2N + u
Another way of handling the difference would be to hypothesize that the cost function for
occupational schools has an intercept b1' that is greater than that for regular schools.
Effectively, we are hypothesizing that the annual overhead cost is different for the two types
of school, but the marginal cost is the same. The marginal cost assumption is not very
plausible and we will relax it in due course.
COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b 1+ d
Occupational schools
Regular schools
d
b1
N
OCC = 0 Regular school
COST = b1 + b2N + u
OCC = 1 Occupational school
COST = b1 + d + b2N + u
Let us define d to be the difference in the intercepts: d = b1' – b1.
Then b1' = b1 + d and we can rewrite the cost function for occupational
schools as shown.
© Christopher Dougherty 1999–2006
COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b 1+ d
Occupational schools
Regular schools
d
b1
N
Combined equation
COST = b1 + d OCC + b2N + u
OCC = 0 Regular school
COST = b1 + b2N + u
OCC = 1 Occupational school
COST = b1 + d + b2N + u
We can now combine the two cost functions by defining a dummy variable
OCC that has value 0 for regular schools and 1 for occupational schools.
(Dummy variables always have two values, 0 or 1.)
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
700000
600000
COST
500000
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
N
Occupational schools
Regular schools
We will now fit a function of this type using actual data for a sample of 74
secondary schools in Shanghai.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
School
Type
COST
N
1
OCC
Occupational
345,000
623
1
2
Occupational
537,000
653
1
3
Regular
170,000
400
0
4
Occupational
526.000
663
1
5
Regular
100,000
563
0
6
Regular
28,000
236
0
7
Regular
160,000
307
0
8
Occupational
45,000
173
1
9
Occupational
120,000
146
1
10
Occupational
61,000
99
1
The table shows the data for the first 10 schools in the sample. The annual cost
is measured in yuan, one yuan being worth about 20 cents U.S. at the time. N is
the number of students in the school.
OCC is the dummy variable for the type of school.
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
. reg COST N OCC
Source |
SS
df
MS
---------+-----------------------------Model | 9.0582e+11
2 4.5291e+11
Residual | 5.6553e+11
71 7.9652e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 2,
71)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
56.86
0.0000
0.6156
0.6048
89248
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
331.4493
39.75844
8.337
0.000
252.1732
410.7254
OCC |
133259.1
20827.59
6.398
0.000
91730.06
174788.1
_cons | -33612.55
23573.47
-1.426
0.158
-80616.71
13391.61
------------------------------------------------------------------------------
We now run the regression of COST on N and OCC, treating OCC just like any
other explanatory variable, despite its artificial nature. The Stata output is
shown above.
We will begin by interpreting the regression coefficients.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
^
COST = –34,000 + 133,000OCC + 331N
The regression results have been rewritten in equation form.
From it we can derive cost functions for the two types of school by
setting OCC equal to 0 or 1.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
^
COST = –34,000 + 133,000OCC + 331N
Regular School
(OCC = 0)
^
COST = –34,000 + 331N
If OCC is equal to 0, we get the equation for regular schools, as shown. It implies
that the marginal cost per student per year is 331 yuan and that the annual
overhead cost is -34,000 yuan.
Obviously having a negative intercept does not make any sense at all and it
suggests that the model is misspecified in some way. We will come back to this
later. It’s worth noting that its t-statistic indicates that its not significant.
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
^
COST = –34,000 + 133,000OCC + 331N
Regular School
(OCC = 0)
^
COST = –34,000 + 331N
Occupational School
(OCC = 1)
^
COST = –34,000 + 133,000 + 331N
= 99,000 + 331N
The coefficient of the dummy variable is an estimate of d, the extra annual
overhead cost of an occupational school.
Putting OCC equal to 1, we estimate the annual overhead cost of an
occupational school to be 99,000 yuan. The marginal cost is the same as for
regular schools. It must be, given the model specification.
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
-100000
N
Occupational schools
Regular schools
The scatter diagram shows the data and the two cost functions
derived from the regression results.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
. reg COST N OCC
Source |
SS
df
MS
---------+-----------------------------Model | 9.0582e+11
2 4.5291e+11
Residual | 5.6553e+11
71 7.9652e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 2,
71)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
56.86
0.0000
0.6156
0.6048
89248
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
331.4493
39.75844
8.337
0.000
252.1732
410.7254
OCC |
133259.1
20827.59
6.398
0.000
91730.06
174788.1
_cons | -33612.55
23573.47
-1.426
0.158
-80616.71
13391.61
------------------------------------------------------------------------------
In addition to the estimates of the coefficients, the regression results will
include standard errors and the usual diagnostic statistics.
We will perform a t test on the coefficient of the dummy variable. Our null
hypothesis is H0: d = 0 and our alternative hypothesis is H1: d 0.

In words, our null hypothesis is that there is no difference in the overhead
costs of the two types of school. The t statistic is 6.40, so it is rejected at the
0.1% significance level.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
. reg COST N OCC
Source |
SS
df
MS
---------+-----------------------------Model | 9.0582e+11
2 4.5291e+11
Residual | 5.6553e+11
71 7.9652e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 2,
71)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
56.86
0.0000
0.6156
0.6048
89248
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
331.4493
39.75844
8.337
0.000
252.1732
410.7254
OCC |
133259.1
20827.59
6.398
0.000
91730.06
174788.1
_cons | -33612.55
23573.47
-1.426
0.158
-80616.71
13391.61
------------------------------------------------------------------------------
We can perform t tests on the other coefficients in the usual way.
The t statistic for the coefficient of N is 8.34, so we conclude that the
marginal cost is (very) significantly different from 0.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
. reg COST N OCC
Source |
SS
df
MS
---------+-----------------------------Model | 9.0582e+11
2 4.5291e+11
Residual | 5.6553e+11
71 7.9652e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 2,
71)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
56.86
0.0000
0.6156
0.6048
89248
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
331.4493
39.75844
8.337
0.000
252.1732
410.7254
OCC |
133259.1
20827.59
6.398
0.000
91730.06
174788.1
_cons | -33612.55
23573.47
-1.426
0.158
-80616.71
13391.61
------------------------------------------------------------------------------
In the case of the intercept, the t statistic is –1.43, so we do not reject
the null hypothesis H0: b1 = 0.
Thus one explanation of the nonsensical negative overhead cost of
regular schools might be that they do not actually have any overheads
and our estimate is a random number.
© Christopher Dougherty 1999–2006
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
. reg COST N OCC
Source |
SS
df
MS
---------+-----------------------------Model | 9.0582e+11
2 4.5291e+11
Residual | 5.6553e+11
71 7.9652e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 2,
71)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
56.86
0.0000
0.6156
0.6048
89248
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
331.4493
39.75844
8.337
0.000
252.1732
410.7254
OCC |
133259.1
20827.59
6.398
0.000
91730.06
174788.1
_cons | -33612.55
23573.47
-1.426
0.158
-80616.71
13391.61
------------------------------------------------------------------------------
A more realistic version of this hypothesis is that b1 is positive but small
(as you can see, the 95 percent confidence interval includes positive
values) and the error term is responsible for the negative estimate.
As already noted, a further possibility is that the model is misspecified
in some way. We will continue to develop the model in the next
sequence.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Now we’ll study how to extend the dummy variable technique to
handle a qualitative explanatory variable which has more than two
categories.
Previously, we used a dummy variable to differentiate between
regular and occupational schools when fitting a cost function.
In actual fact there are two types of regular secondary school in
Shanghai. There are general schools, which provide the usual
academic education, and vocational schools.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
As their name implies, the vocational schools are meant to impart
occupational skills as well as give an academic education.
However the vocational component of the curriculum is typically
quite small and the schools are similar to the general schools. Often
they are just general schools with a couple of workshops added.
Likewise there are two types of occupational school. There are
technical schools training technicians and skilled workers’ schools
training craftsmen.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
So now the qualitative variable has four categories. The standard
procedure is to choose one category as the reference category and to
define dummy variables for each of the others.
In general it is good practice to select the most normal or basic
category as the reference category, if one category is in some sense
more normal or basic than the others.
In the Shanghai sample it is sensible to choose the general schools
as the reference category. They are the most numerous and the other
schools are variations of them.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Accordingly we will define dummy variables for the other three types.
TECH will be the dummy for the technical schools: TECH is equal to 1
if the observation relates to a technical school, 0 otherwise.
Similarly we will define dummy variables WORKER and VOC for the
skilled workers’ schools and the vocational schools.
Each of the dummy variables will have a coefficient which represents
the extra overhead costs of the schools, relative to the reference
category.
Note that you do not include a dummy variable for the reference
category, and that is the reason that the reference category is usually
described as the omitted category.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u
General School
COST = b1 + b2N + u
(TECH = WORKER = VOC = 0)
If an observation relates to a general school, the dummy variables
are all 0 and the regression model is reduced to its basic
components.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u
General School
COST = b1 + b2N + u
(TECH = WORKER = VOC = 0)
Technical School
COST = (b1 + dT) + b2N + u
(TECH = 1; WORKER = VOC = 0)
If an observation relates to a technical school, TECH will be equal to
1 and the other dummy variables will be 0. The regression model
simplifies as shown.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u
General School
COST = b1 + b2N + u
(TECH = WORKER = VOC = 0)
Technical School
COST = (b1 + dT) + b2N + u
(TECH = 1; WORKER = VOC = 0)
Skilled Workers’ School
COST = (b1 + dW) + b2N + u
(WORKER = 1; TECH = VOC = 0)
Vocational School
COST = (b1 + dV) + b2N + u
(VOC = 1; TECH = WORKER = 0)
The regression model simplifies in a similar manner in the case of
observations relating to skilled workers’ schools and vocational
schools.
© Christopher Dougherty 1999–2006
COST
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Technical
b1+dT
b1+dW
b1+dV
b1
dW
Workers’
Vocational
dT
dV
General
N
The diagram illustrates the model graphically. The d coefficients are the extra
overhead costs of running technical, skilled workers’, and vocational schools,
relative to the overhead cost of general schools.
Note that we do not make any prior assumption about the size, or even the
sign, of the d coefficients. They will be estimated from the sample data.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
School
Type
COST
N
TECH WORKER VOC
1
Technical
345,000
623
1
0
0
2
Technical
537,000
653
1
0
0
3
General
170,000
400
0
0
0
4
Workers’
526.000
663
0
1
0
5
General
100,000
563
0
0
0
6
Vocational
28,000
236
0
0
1
7
Vocational
160,000
307
0
0
1
8
Technical
45,000
173
1
0
0
9
Technical
120,000
146
1
0
0
10
Workers’
61,000
99
0
1
0
Here are the data for the first 10 of the 74 schools. Note how the
values of the dummy variables TECH, WORKER, and VOC are
determined by the type of school in each observation.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
700000
600000
COST
500000
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
N
Technical schools
Vocational schools
General schools
Workers' schools
The scatter diagram shows the data for the entire sample,
differentiating by type of school.
© Christopher Dougherty 1999–2006
1400
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
Here is the Stata output for this regression. The coefficient of N
indicates that the marginal cost per student per year is 343 yuan.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
The coefficients of TECH, WORKER, and VOC are 154,000, 143,000,
and 53,000, respectively, and should be interpreted as the additional
annual overhead costs, relative to those of general schools.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
The constant term is –55,000, indicating that the annual overhead
cost of a general academic school is –55,000 yuan per year.
Obviously this is nonsense and indicates that something is wrong
with the model specification.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
^ = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N
COST
General School
COST = –55,000 + 343N
(TECH = WORKER = VOC = 0)
The top line shows the regression result in equation form. We will
derive the implicit cost functions for each type of school.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
^ = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N
COST
General School
^
COST
= –55,000 + 343N
(TECH = WORKER = VOC = 0)
In the case of a general school, the dummy variables are all 0 and the
equation reduces to the intercept and the term involving N.
The annual marginal cost per student is estimated at 343 yuan. The
annual overhead cost per school is estimated at –55,000 yuan.
Obviously a negative amount is inconceivable.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
^ = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N
COST
General School
^
COST
= –55,000 + 343N
(TECH = WORKER = VOC = 0)
Technical School
(TECH = 1; WORKER = VOC = 0)
^
COST
= –55,000 + 154,000 + 343N
= 99,000 + 343N
The extra annual overhead cost for a technical school, relative to a
general school, is 154,000 yuan. Hence we derive the implicit cost
function for technical schools.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
^ = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N
COST
General School
^
COST
= –55,000 + 343N
(TECH = WORKER = VOC = 0)
Technical School
(TECH = 1; WORKER = VOC = 0)
Skilled Workers’ School
(WORKER = 1; TECH = VOC = 0)
Vocational School
(VOC = 1; TECH = WORKER = 0)
^
COST
= –55,000 + 154,000 + 343N
= 99,000 + 343N
^
COST
= –55,000 + 143,000 + 343N
= 88,000 + 343N
^
COST
= –55,000 + 53,000 + 343N
= –2,000 + 343N
And similarly the extra overhead costs of skilled workers’ and
vocational schools, relative to those of general schools, are 143,000
and 53,000 yuan, respectively.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
^ = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N
COST
General School
^
COST
= –55,000 + 343N
(TECH = WORKER = VOC = 0)
Technical School
(TECH = 1; WORKER = VOC = 0)
Skilled Workers’ School
(WORKER = 1; TECH = VOC = 0)
Vocational School
(VOC = 1; TECH = WORKER = 0)
^
COST
= –55,000 + 154,000 + 343N
= 99,000 + 343N
^
COST
= –55,000 + 143,000 + 343N
= 88,000 + 343N
^
COST
= –55,000 + 53,000 + 343N
= –2,000 + 343N
Note that in each case the annual marginal cost per student is
estimated at 343 yuan. The model specification assumes that this
figure does not differ according to type of school.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
-100000
N
Technical schools
Vocational schools
General schools
The four cost functions are illustrated graphically.
© Christopher Dougherty 1999–2006
Workers' schools
1400
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
We can perform t tests on the coefficients in the usual way.
The t statistic for N is 8.52, so the marginal cost is (very) significantly
different from 0, as we would expect.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
The t statistic for the technical school dummy is 5.76, indicating the
the annual overhead cost of a technical school is (very) significantly
greater than that of a general school, again as expected.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
Similarly for skilled workers’ schools, the t statistic is 5.15, indicating
the the annual overhead cost of a skilled workers’ school is (very)
significantly greater than that of a general school, again as expected.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
In the case of vocational schools, however, the t statistic is only 1.71,
indicating that the overhead cost of such a school is not significantly
greater than that of a general school.
This is not surprising, given that the vocational schools are not much
different from the general schools.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
Note that the null hypotheses for the tests on the coefficients of the dummy
variables are than the overhead costs of the other schools are not different from
those of the general schools.
Finally we will perform an F test of the joint explanatory power of the dummy
variables as a group. The null hypothesis is H0: dT = dW = dV = 0. The alternative
hypothesis is that at least one d is different from 0.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
------------------------------------------------------------------------------
The residual sum of squares in the specification including the
dummy variables is 5.41×1011.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N
Source |
SS
df
MS
---------+-----------------------------Model | 5.7974e+11
1 5.7974e+11
Residual | 8.9160e+11
72 1.2383e+10
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 1,
72)
Prob > F
R-squared
Adj R-squared
Root MSE
=
74
=
46.82
= 0.0000
= 0.3940
= 0.3856
= 1.1e+05
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
339.0432
49.55144
6.842
0.000
240.2642
437.8222
_cons |
23953.3
27167.96
0.882
0.381
-30205.04
78111.65
------------------------------------------------------------------------------
The residual sum of squares in the specification excluding the
dummy variables is 8.92×1011.
The reduction in RSS when we include the dummies is therefore
(8.92 – 5.41)×1011. We will check whether this reduction is
significant with the usual F test.
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N
Source |
SS
df
MS
---------+-----------------------------Model | 5.7974e+11
1 5.7974e+11
Residual | 8.9160e+11
72 1.2383e+10
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 1,
72)
Prob > F
R-squared
Adj R-squared
Root MSE
=
74
=
46.82
= 0.0000
= 0.3940
= 0.3856
= 1.1e+05
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
74
29.63
0.0000
0.6320
0.6107
88578
(8.92  1011  5.41 1011) / 3
F (3,69) 
 14.92
5.41 1011 / 69
The numerator in the F ratio is the reduction in RSS divided by the
cost, which is the 3 degrees of freedom given up when we estimate
three additional coefficients (the coefficients of the dummies).
© Christopher Dougherty 1999–2006
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N
Source |
SS
df
MS
---------+-----------------------------Model | 5.7974e+11
1 5.7974e+11
Residual | 8.9160e+11
72 1.2383e+10
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 1,
72)
Prob > F
R-squared
Adj R-squared
Root MSE
=
74
=
46.82
= 0.0000
= 0.3940
= 0.3856
= 1.1e+05
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
74
29.63
0.0000
0.6320
0.6107
88578
(8.92  1011  5.41 1011) / 3
F (3,69) 
 14.92
5.41 1011 / 69
The denominator is RSS for the specification including the dummy variables,
divided by the # degrees of freedom remaining after they have been added.
The F ratio is therefore 14.92.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N
Source |
SS
df
MS
---------+-----------------------------Model | 5.7974e+11
1 5.7974e+11
Residual | 8.9160e+11
72 1.2383e+10
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 1,
72)
Prob > F
R-squared
Adj R-squared
Root MSE
=
74
=
46.82
= 0.0000
= 0.3940
= 0.3856
= 1.1e+05
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
. reg COST N TECH WORKER VOC
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
(8.92  1011  5.41 1011 ) / 3
F (3,69) 
 14.92
11
5.41 10 / 69
74
29.63
0.0000
0.6320
0.6107
88578
F (3,60)crit, 0.1%  6.17
F tables do not give the critical value for 3 and 69 degrees of freedom, but it must be lower
than the critical value with 3 and 60 degrees of freedom. This is 6.17, at the 0.1%
significance level. Thus we reject H0 at a high significance level. This is not exactly
surprising since t tests show that TECH and WORKER have highly significant coefficients.
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
700000
600000
COST
500000
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
N
Technical schools
Vocational schools
General schools
Workers' schools
So far, we chose general academic schools as the reference (omitted)
category and defined dummy variables for the other categories.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
This enabled us to compare the overhead costs of the other schools
with those of general schools and to test whether the differences
were significant.
• However, suppose that we were interested in testing whether the
overhead costs of skilled workers’ schools were different from
those of the other types of school. How could we do this?
The simplest solution is to re-run the regression making skilled
workers’ schools the reference category. Now we need to define a
dummy variable GEN for the general schools instead.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
COST = b1 + dTTECH + dVVOC + dGGEN + b2N + u
The model is shown in equation form. Note that there is no longer a
dummy variable for skilled workers’ schools since they form the
reference category.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
COST = b1 + dTTECH + dVVOC + dGGEN + b2N + u
Skilled Workers' School
COST = b1 + b2N + u
(TECH = VOC = GEN = 0)
In the case of observations relating to skilled workers’ schools, all
the dummy variables are 0 and the model simplifies to the intercept
and the term involving N.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
COST = b1 + dTTECH + dVVOC + dGGEN + b2N + u
Skilled Workers' School
COST = b1 + b2N + u
(TECH = VOC = GEN = 0)
Technical School
COST = (b1 + dT) + b2N + u
(TECH = 1; VOC = GEN = 0)
In the case of observations relating to technical schools, TECH is
equal to 1 and the intercept increases by an amount dT.
Note that dT should now be interpreted as the extra overhead cost of
a technical school relative to that of a skilled workers’ school.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
COST = b1 + dTTECH + dVVOC + dGGEN + b2N + u
Skilled Workers' School
COST = b1 + b2N + u
(TECH = VOC = GEN = 0)
Technical School
COST = (b1 + dT) + b2N + u
(TECH = 1; VOC = GEN = 0)
Vocational School
COST = (b1 + dV) + b2N + u
(VOC = 1; TECH = GEN = 0)
General School
COST = (b1 + dG) + b2N + u
(GEN = 1; TECH = VOC = 0)
Similarly one can derive the implicit cost functions for vocational and
general schools, their d coefficients also being interpreted as their
extra overhead costs relative to those of skilled workers’ schools.
© Christopher Dougherty 1999–2006
COST
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
b1+dT
b1
b1+dV
b1+dG
Technic
al
Workers’
dT
dV
dG
Vocation
al
General
N
This diagram illustrates the model graphically. Note that the d shifts
are measured from the line for skilled workers’ schools.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
School
Type
COST
N
TECH
VOC
GEN
1
Technical
345,000
623
1
0
0
2
Technical
537,000
653
1
0
0
3
General
170,000
400
0
0
1
4
Workers’
526.000
663
0
0
0
5
General
100,000
563
0
0
1
6
Vocational
28,000
236
0
1
0
7
Vocational
160,000
307
0
1
0
8
Technical
45,000
173
1
0
0
9
Technical
120,000
146
1
0
0
10
Workers’
61,000
99
0
0
0
Here are the data for the first 10 of the 74 schools with skilled
workers’ schools as the reference category.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
Here is the Stata output for the regression.
We will focus first on the regression coefficients.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
^ = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
COST
The regression result is shown written as an equation.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
^ = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
COST
Skilled Workers' School
^
COST
= 88,000 + 343N
(TECH = VOC = GEN = 0)
Putting all the dummy variables equal to 0, we obtain the equation
for the reference category, the skilled workers’ schools.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
^ = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
COST
Skilled Workers' School
^
COST
= 88,000 + 343N
(TECH = VOC = GEN = 0)
Technical School
(TECH = 1; VOC = GEN = 0)
^
COST
= 88,000 + 11,000 + 343N
= 99,000 + 343N
Putting TECH equal to 1 and VOC and GEN equal to 0, we obtain the
equation for the technical schools.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
^
COST = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N
Skilled Workers' School
^
COST = 88,000 + 343N
(TECH = VOC = GEN = 0)
Technical School
(TECH = 1; VOC = GEN = 0)
Vocational School
(VOC = 1; TECH = GEN = 0)
General School
(GEN = 1; TECH = VOC = 0)
^
COST = 88,000 + 11,000 + 343N
= 99,000 + 343N
^
COST = 88,000 – 90,000 + 343N
= –2,000 + 343N
^
COST = 88,000 – 143,000 + 343N
= –55,000 + 343N
And similarly we obtain the equations for the vocational and general
schools, putting VOC and GEN equal to 1 in turn.
Note that the cost functions turn out to be exactly the same as when
we used general schools as the reference category.
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
-100000
N
Technical schools
Vocational schools
General schools
Workers' schools
Consequently the scatter diagram with regression lines is exactly the
same as before.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
The goodness of fit, whether measured by R2, RSS, or the standard
error of the regression (the estimate of the standard deviation of u,
here denoted Root MSE), is likewise not affected by the change.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
But the t tests are affected. In particular, the meaning of a null
hypothesis for a dummy variable coefficient being equal to 0 is different.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
For example, the t statistic for the technical school coefficient is for the
null hypothesis that the overhead costs of technical schools are the same
as those of skilled workers’ schools.
The t ratio in question is only 0.35, so the null hypothesis is not rejected.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
The t ratio for the coefficient of VOC is –2.65, so one concludes that
the overheads of vocational schools are significantly lower than
those of skilled workers’ schools, at the 1% significance level.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH VOC GEN
Source |
SS
df
MS
---------+-----------------------------Model | 9.2996e+11
4 2.3249e+11
Residual | 5.4138e+11
69 7.8461e+09
---------+-----------------------------Total | 1.4713e+12
73 2.0155e+10
Number of obs
F( 4,
69)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
74
29.63
0.0000
0.6320
0.6107
88578
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
General schools clearly have lower overhead costs than the skilled
workers’ schools, according to the regression.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH WORKER VOC
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
-----------------------------------------------------------------------------. reg COST N TECH VOC GEN
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
Note that there are some differences in the standard errors as well.
However, the standard error (and t-statistic) of the coefficient of N
are unaffected.
© Christopher Dougherty 1999–2006
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH WORKER VOC
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
-----------------------------------------------------------------------------. reg COST N TECH VOC GEN
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
The one test involving the dummy variables that can be performed with either specification
is the test of whether the overhead costs of general schools and skilled workers’ schools
are different.
The choice of specification can make no difference to the outcome of this test. The only
difference is caused by the fact that the regression coefficient has become negative in the
second specification. The standard error is the same, so the t statistic has the same
absolute magnitude and the outcome of the test must be the same.
THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
. reg COST N TECH WORKER VOC
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
154110.9
26760.41
5.759
0.000
100725.3
207496.4
WORKER |
143362.4
27852.8
5.147
0.000
87797.57
198927.2
VOC |
53228.64
31061.65
1.714
0.091
-8737.646
115194.9
_cons | -54893.09
26673.08
-2.058
0.043
-108104.4
-1681.748
-----------------------------------------------------------------------------. reg COST N TECH VOC GEN
-----------------------------------------------------------------------------COST |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------N |
342.6335
40.2195
8.519
0.000
262.3978
422.8692
TECH |
10748.51
30524.87
0.352
0.726
-50146.93
71643.95
VOC | -90133.74
33984.22
-2.652
0.010
-157930.4
-22337.07
GEN | -143362.4
27852.8
-5.147
0.000
-198927.2
-87797.57
_cons |
88469.29
28849.56
3.067
0.003
30916.01
146022.6
------------------------------------------------------------------------------
However the standard errors of the coefficients of the other dummy variables are slightly larger
in the second specification. This is because the skilled workers’ schools are less ‘normal’ or
‘basic’ than the general schools and there are fewer of them in the sample (only 17, as opposed
to 28).
As a consequence there is less precision in measuring the difference between their costs and
those of the other schools than there was when general schools were the reference category.