Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Third Supervision G12, Michaelmas 2001 Topic: Regression Modelling 1) You are given the following historic price-demand data Price Z £7.00 £8.00 £8.00 £7.90 £9.10 £8.90 £7.20 £8.10 £7.60 £7.20 £9.70 £7.20 £9.00 £7.70 £9.70 £7.50 £10.70 Demand 168 158 174 155 158 142 162 159 157 160 154 168 162 154 154 158 148 What price would you set? Use linear regression to explore the relationship between price and demand. a) Why are the parameters random variables? b) Argue that the distribution of the intercept is normal and derive its mean and variance. c) Explain the concepts of confidence intervals and confidence level. d) Derive a recipe for a one-sided confidence interval for the intercept. Generate a graph of the confidence function in a spreadsheet. e) Suggest a price and give a confidence interval for the expected profit (assume unit costs of £5,00). Provide confidence intervals for your chosen price and some prices close-by. f) Test statistically whether there is a relationship between demand and price. g) After you have done a)-e), apply the Excel regression add-in to the data and interpret the output. Use Excel for calculations and graphics. Solutions to the questions are in the spreadsheet Supervision.xls. The only part that needs additional explanation is e). To begin with, let us repeat the analysis for expected revenues in the handout and then replace the unknown error variance by the estimation – which amounts to replacing the normal by a t-distribution. The expected profits function is of the form ( x) ( x c) y ( x c)( x) . Our estimation is of the form Est ( x) ( x c)(a bx) ( x c)( y b( x x )). The variables b and y are both normal and statistically independent and therefore the estimated profit is a normal variable with mean E ( Est ( x)) ( x c)( E ( y ) E (b)( x x )) ( x c)( x ( x x )) ( x) and variance VAR( Est ( x)) ( x c) 2 ( 2 n (x x)2 2 (x x)2 2 1 ) ( x c ) ( ) 2 2 2 n ( xi x ) ( xi x ) If you simply replace the unknown 2 by its estimate SSEa, b /( n 2) then you will make the variance “random” since SSE (a, b) /( n 2) is a random variable. This conceptually wrong and also computationally wrong in our example since the number of observations is rather small. To get around this, let’s looks at standardized variables. Since Est is normal, the variable Est ( x) ( x) Z 1 n ( x c) ( (x x)2 ) ( xi x ) 2 has a standard normal distribution. The latter variable has the unknown in the root in the denominator. We can replace this quantity in the definition of the random variable Z by SSE (a, b) /( n 2) since we are now changing a random variable by dividing it by another random variable – which is conceptually sound1. If we do this, the new variable Est ( x) ( x) Tn 2 1 (x x)2 SSE (a, b) ( x c) ( ) 2 n ( xi x ) n2 has a t-distribution with ( n 2) degrees of freedom. Confidence intervals can now be constructed in the usual way. (see also the spreadsheet Confidence.xls). 2) Derive a system of linear equations that specifies the least squares estimates for the parameters of a basis function model E( y( x)) pk f k ( x) . Can you give a concise matrix form of the equations? The sum of squared errors is SSE ( p) ( p k f k ( xi ) yi ) 2 . i k The first order optimality conditions are SSE 0 ( p) 2( pk f k ( xi ) yi ) f j ( xi ) . p j i k SSE (a, b) 1 We are dividing Z by n2 . The variable SSE (a, b) / can be shown to have a chi-square distribution with ( n 2) degrees of freedom (i.e. it is the sum of squares of ( n 2) independent standard normal variables) and to be independent of the variable Z . Therefore the new variable Tn 2 has Student’s t-distribution with ( n 2) degrees of freedom. This is equivalent to ( f j ( xi ) f k ( xi )) pk yi f j ( xi ), j 1,..., m . k i i This is a system of m equations in m unknowns p1 ,..., pm . It can be written in matrix form as F T Fp Fy , where F ( Fij ) ( f j ( xi )) . It has a unique solution if F has rank m .