Download Third Supervision G12, Michaelmas 2001

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Confidence interval wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Third Supervision G12, Michaelmas 2001
Topic: Regression Modelling
1) You are given the following historic price-demand data
Price Z
£7.00
£8.00
£8.00
£7.90
£9.10
£8.90
£7.20
£8.10
£7.60
£7.20
£9.70
£7.20
£9.00
£7.70
£9.70
£7.50
£10.70
Demand
168
158
174
155
158
142
162
159
157
160
154
168
162
154
154
158
148
What price would you set? Use linear regression to explore the relationship
between price and demand.
a) Why are the parameters random variables?
b) Argue that the distribution of the intercept is normal and derive its mean
and variance.
c) Explain the concepts of confidence intervals and confidence level.
d) Derive a recipe for a one-sided confidence interval for the intercept.
Generate a graph of the confidence function in a spreadsheet.
e) Suggest a price and give a confidence interval for the expected profit
(assume unit costs of £5,00). Provide confidence intervals for your chosen
price and some prices close-by.
f) Test statistically whether there is a relationship between demand and price.
g) After you have done a)-e), apply the Excel regression add-in to the data
and interpret the output.
Use Excel for calculations and graphics.
Solutions to the questions are in the spreadsheet Supervision.xls. The only part
that needs additional explanation is e). To begin with, let us repeat the analysis for
expected revenues in the handout and then replace the unknown error variance by
the estimation – which amounts to replacing the normal by a t-distribution. The
expected profits function is of the form
 ( x)  ( x  c) y  ( x  c)(  x) .
Our estimation is of the form
 Est ( x)  ( x  c)(a  bx)  ( x  c)( y  b( x  x )).
The variables b and y are both normal and statistically independent and
therefore the estimated profit is a normal variable with mean
E ( Est ( x))  ( x  c)( E ( y )  E (b)( x  x ))  ( x  c)(  x   ( x  x ))   ( x)
and variance
VAR( Est ( x))  ( x  c) 2 (
2
n
 (x  x)2
2
(x  x)2
2 1
)

(
x

c
)
(

) 2
2
2
n  ( xi  x )
 ( xi  x )
If you simply replace the unknown  2 by its estimate SSEa, b /( n  2) then you
will make the variance “random” since SSE (a, b) /( n  2) is a random variable.
This conceptually wrong and also computationally wrong in our example since the
number of observations is rather small. To get around this, let’s looks at
standardized variables. Since  Est is normal, the variable
 Est ( x)   ( x)
Z
1
n
 ( x  c) ( 
(x  x)2
)
 ( xi  x ) 2
has a standard normal distribution. The latter variable has the unknown  in the
root in the denominator. We can replace this quantity in the definition of the
random variable Z by SSE (a, b) /( n  2) since we are now changing a random
variable by dividing it by another random variable – which is conceptually sound1.
If we do this, the new variable
 Est ( x)   ( x)
Tn 2 
1
(x  x)2
SSE (a, b)
( x  c) ( 
)
2
n  ( xi  x )
n2
has a t-distribution with ( n  2) degrees of freedom. Confidence intervals can now
be constructed in the usual way. (see also the spreadsheet Confidence.xls).
2) Derive a system of linear equations that specifies the least squares estimates
for the parameters of a basis function model E( y( x))   pk f k ( x) . Can you
give a concise matrix form of the equations?
The sum of squared errors is
SSE ( p)   ( p k f k ( xi )  yi ) 2 .
i
k
The first order optimality conditions are
SSE
0
( p)   2( pk f k ( xi )  yi ) f j ( xi ) .
p j
i
k
SSE (a, b)
1
We are dividing Z by

n2
. The variable SSE (a, b) /  can be shown to have a
chi-square distribution with ( n  2) degrees of freedom (i.e. it is the sum of squares of ( n  2)
independent standard normal variables) and to be independent of the variable Z . Therefore the new
variable Tn  2 has Student’s t-distribution with ( n  2) degrees of freedom.
This is equivalent to
 ( f j ( xi ) f k ( xi )) pk   yi f j ( xi ), j  1,..., m .
k
i
i
This is a system of m equations in m unknowns p1 ,..., pm . It can be written in matrix
form as
F T Fp  Fy ,
where F  ( Fij )  ( f j ( xi )) . It has a unique solution if F has rank m .