Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
QT1 exam answers Quantitative Techniques 1 Exam questions for January 2004 Time allowed: two hours Attempt ALL questions 1. What are the main sources of data errors? What can be done to reduce errors in data you did not collect yourself? Sources of data error •At collection source: Clerical error, misunderstood question, conceptual error •The incentive to look good /bad •Wrong units ('000s, millions, etc.), $ , £ •Sampling error •Transcription error •Calculation error •Rounding Lesson •Assume data are error-ridden •Use checking techniques: –descriptive statistics, graphs –eyeballing: do the data follow expected pattern? make corrections where feasible omit data which is probably wrong 2. B: For each of the following statements about the probabilities of outcomes A and (a) (b) (i) Say whether they are true, false, or uncertain If uncertain, spell out conditions under which they are true P(A or B) = P(A) + P(B) a) Uncertain: b) True if mutually exclusive (ii) P(AB) = P(A) + P(B) Never true 1 (iii) P(AB)=P(A).P(B) Never true (iv) P(AB) = P(A).P(B) A and B independent P(A | B)= P (AB)/P(B) (v) Always true 3. The question was meant to be : Suppose x is a continuous random variable with the probability density function (pdf): f(x)= x for 0x1 2 - x for 1 x 2 0 elsewhere a) Draw a graph of this function Question 3 1.2 1 pdf (x) 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 x b) Explain how you know this is a valid pdf. i) f(x) 0 (not true for 2-2x)! ii) Area under curve = 1 (ditto) c) Comment on the relative position of the mean, median and mode. 2 Symmetric distribution. Therefore they will all be at x = 1 (does not apply to 2-2x) d) Calculate the probability that 0.5 x 1.5 Area to left of 0.5 = half base x height = 0.5*0.5* 0.5= 0.125 Area to right of 1.5 = 0.125 Total area = 0.250 Therefore P(0.5 x 1.5) = 1- 0.25 = 0.75 4. You are organising a concert and believe that attendance will depend on the weather. You believe the following possibilities are appropriate: Weather Terrible weather Mediocre weather Good weather Probability 0.2 Attendance 500 0.6 0.2 1200 2000 a) What is the expected attendance? b) suppose each ticket costs £% and the fixed costs are £2,000. What are the expected profits? c) Graph the probability distribution for profits d) What is the most you could pay for the fixed costs and still have an 80% chance of making a profit on the event (To nearest £.) Costs: £2,000 Weather Terrible weather Mediocre weather Good weather Probability 0.2 Attendance 500 100 Revenue £2,500.00 0.6 0.2 1200 2000 720 400 £6,000.00 £10,000.00 a) Expected attendance Ticket price b) Expected revenue less costs Expected profits d) 1220 £5.00 £6,100.00 £2,000.00 £4,100.00 £5,999.99 c) 3 Prob 0.7 0.6 0.6 0.5 0.4 0.3 0.2 0.2 0.2 0.1 0 £500 1 £4,000 2 £8,000 3 d. £5,999. If we have mediocre weather we make £1. Good weather we make £4,001. Probability of loss = 0.2 5. Suppose that heights in a population are normally distributed with a mean of 78 inches and a standard deviation of 5 inches. a) What is the probability that an individual selected at random will have a height between 68.2 and 79.8 inches? b) Construct a 95% confidence interval for the average height in a random sample of four individuals. Generally z= (x-)/ : a) Upper tail area: z1 = (79.8 – 78)/5 = 1.8/5 = 0.36 ; Implies area in tail = 1- 0.6406 = 0.3594 Lower tail area z2 = (68.2 – 78)/5 = 9.8/5 = -1.96 Implies area in tail = 0.025 Total area in tails = 0.3844 4 Total central area = 1- 0.3844 = 0.6156 Answer = 0.6156or 61.56% b) Width of interval = zxbar each side of the mean. 2.5% in each tail z0.025= 1.96 xbar = x/n = 5/2 = 2.5 Therefore zxbar = 1.96 x 2.5 = 4.9 Confidence interval ranges from 78 – 4.9 to 78 +4.9 Answer: Lower bound of CI = 73.1 inches Upper bound = 82.9 inches 6. Suppose we wanted to conduct a survey. It is desired that we produce an interval estimate of the population mean that is within 5 from the true population mean with 99% confidence, based on a historical planning value of 15 for the population standard deviation, how big should the sample be? x= 15. Width of interval = zxbar each side of the mean. 0.5% in each tail z= 2.575 xbar = x/n = 15/ n . Set this interval width to the desired value of 5. Therefore 2.575 x 15/ n = 5. n = 2.575 x 15/5 =2.575 x 3 = 7.725 Therefore n = 59.67563. i.e. 60 is minimum number. Answer: 60 is the minimum size of a sample to produce this result. 7. Explain in simple terms the differences (and similarities if any) between the following approaches to estimation; a) method of moments b) maximum likelihood c) least squares These were covered in chapter 7 of Ashenfelter et.al. to which you were specifically referred in Lecture 8 a) “Moments” refer to mean, variance, skewness, kurtosis, etc. The method of moments seeks to equate these moments implied by the statistical model of the population distribution with the actual moments implied by the sample. MOM estimators proceed as it were by analogy. For example if we are interested in the population mean we use the sample mean. If our model of the population (or data generation process) says that the disturbance term has expected value zero then we set the mean of the residuals equal to zero, which implies u= 0 . If our model says that the 5 disturbance term is uncorrelated with the regressors (X’s) ( i.e. E(Xu) = 0 ) then we can base our estimator on the condition Xu =0. Since they reflect the underlying properties of the population, they approach the population values as n=> 1 . This means they are consistent. b) Before sampling, the probability of a sample (x1, x2, x3,… etc.) depends of the population parameter say θ as defined by the probability density function f(x1, x2, x3,… |θ). But in an estimation situation the x’s are known and θ is unknown. If we take the x’s as parameters but θ as unknown the function f becomes a likelihood function denoted by L((x1, x2, x3,… |θ). ML estimators are also generally consistent. c) Least squares estimators are based on the criterion of minimising the sum of squared “errors” - these being defined in some appropriate way to reflect the deviations of the the sample from the implied population characteristics. Squaring does two things i) it treats positive and negative values equally. ii) it penalises large departures from the hypothetical population parameter more than small departures. Similarities: Under some conditions, all three estimates produce the same result. e.g. the sample mean as an estimator of the population mean, OLS regression coefficients being ML (if disturbances are normal) and MOM if E(u) = 0 and E(Xu) =0. Differences: The ML estimator of variance is not the same as the LS estimator. conditions are violated estimates will not generally be the same. If other 8. Under what conditions will the ordinary least squares estimator be a) unbiased b) efficient? c) What does it mean to say an estimate is consistent? a) E(u)= 0; E(Xu) =0 b) E(u2) = 2 (a constant); E(uiuj) =0 (no autocorrelation) c) As the sample size increases without limit, the estimate converges on the population parameter. 9. A regression of the cost of water delivered (Y) on the number of customers and the volume of water delivered yields the following regression: Y = 20,000,000 + 75 X1 + 0.25 X2 R2 = 0.5123 Y = operating cost (£) X 1 = number of customers 6 X2 = total volume of water delivered (cubic metres) a) What is the predicted average cost per cubic metre when X1 = 600,000 and the consumption per customer is 300 cubic metres? [4] b) If we change the units of measurement so that Y is now in millions of £ and customers are in thousands what will happen i) to each of the coefficients in the equation [4] ii) to the significance levels reported by the econometric software? [4] a) Y = 20,000,000 + 75 x 600,000 + 0.25 x 300 x 600,000 =20 + 45 + 45 £million = £110 million Total water consumption = 600,000 x 300 = 180,000,000 cubic meters Cost per cubic metre = £110 million/180million = £0.6111111 per cubic metre = 61.111 pence b) i) Each coefficient will be smaller by a factor of 106. ii) The significance levels will be unchanged 7