Download Answers

Economics 375: Introduction to Econometrics Homework #1 This homework is due on January 14 . th 1. Consider a random variable X that comes from a normal distribution with mean  and variance 2. Think of four different ways to estimate the unobservable mean : 5X1  5X 2 (a) X  12 7X 3 X X (b) X  1  2  10 5 10 (c) X  Median(X1 , X 2 , X 3 ) 4X1 3X 2 3X 3   (d) X  10 10 10 a. Which, if any estimators are unbiased? X and X are unbiased. X is unbiased in this situation (a normal distribution) because a normal distribution is symmetrical around the mean (and hence the median will, on average, equal the mean). In general, the median is not an unbiased estimator of the mean when the distribution is asymmetric. X is a biased estimator because, on average, it will produce estimates of  that are too small. A simple way to answer most of these types of questions is to insert the population average for each random draw (i.e., let X1 equal μ) and then see if the estimator equals the population 5  5  5 average. For instance, using X , we see that X    so, on average, X would 12 6 produce an estimate that is only five-sixths the size of the actual population mean. Notice, for X and X this technique produces 1μ and hence, each are unbiased. b. Between X and X , which estimator is more efficient? Let me state at the outset, this is somewhat of a difficult problem to prove. Please don’t get discouraged if you tried to prove it and were not successful. I think this is one of the more difficult algebra problems you will encounter in my class. Later, I’ll show you how I solved it but for now, let’s try to think of the intuition to arrive at the correct answer. Let’s imagine you draw three random observations from the distribution with mean μ and variance 2 the first two of which are quite close to μ and the third just happens to be a long way away (here, we are not defining “close” and “long way away”). Think of what happens when you use X to estimate the mean: because the third observation is a long way away from the mean and X gives most of the weight (seven-tenths) to this third observation, X will be “pulled” quite a bit towards the value of the third observation. This means that the value of X will tend to be close to the third observation. Now, because the third observation may be either above or below the mean, this means that if we repeatedly estimate X with multiple different draws of the Xs, then X will tend to have large values away from the mean—in other words X will tend to have a large variance. This is less likely to happen with X because no particular observation will have an overwhelming amount of weight that will allow outliers to pull it away from the mean. In our terms, X will be more efficient than X . Again, before proving this, consider the following Monte Carlo experiment (we’ll talk more about Monte Carlo experiments later. For now, just think of them as big computer simulations). Imagine I ask the computer to draw three random, normally distributed numbers and compute both X and X and save their values. Then, repeat this process, using different random draws, many times (say, 10,000). If I present histograms of all 10,000 values of each of X and X , then the one that is least efficient should have more values further from the mean (or, better said, should have a larger variance). X X Notice, that is exactly what we get. Ten thousand replications (using a standard normal distribution), gives a variance for X that is about 60% of that of X . Now, for the proof. When one generates three observations, the variance equation is . However, in this case, we weren’t asked to compute x-bar (the traditional mean of x), but instead X and X , so our two variance equations will be: (1) (2) X X X X X X Expanding (1) by inserting the definition for X : X1 X 2 7 X 3 X1 X 2 7 X 3     10 5 10 10 5 10 X1 X 2 7 X 3   10 5 10 Now, comes some algebra work where we simplify this expression so it contains as few X1, X2, and X3s as possible: . . . . . . . (3) Doing the same for (2) gives: . . . . . (4) Concentrate on the first three terms of this expression (the ones associated with the squared X’s). In all three cases, those in expression (4) (associated with X ) are smaller than those of (3). This leads us to believe that usually (4) will be smaller than (3) but, if you look at the last three terms, this does not always need to be the case—one could probably pick an X1, X2 and X3 in such a way to get (3) smaller than (4). However, remember the concept of efficiency, like that of unbiasedness, is an “average” one. In other words, an efficient estimator on average gives a smaller variance. To see the conclusion of this proof, we need one mathematical concept sometimes not introduced in some prerequisites for this class. I introduce it and show the remainder of this proof in this video. c. Can you think of a time where you would rather use an efficient, biased estimator rather than a unbiased, inefficient estimator? In some cases, it is better to be a little wrong and be very precise. Consider a field goal kicker. Would you rather have an accurate one that is imprecise (that is sometimes missing to the left and sometimes to the right but on average making it through the goal) or a precise and inaccurate one who always squeeks the ball within the left goal post? 2. Consider a random variable X that comes from a fair, four sided die (1 with probability .25, 2 with probability .25, 3 with probability .25, and 4 with probability .25). a. What is the population mean of Y where Y = (X1 + X2)/2? The mean of Y is the mean of X1 plus the mean of X2 divided by 2. The mean of both X1 and X2 is 2.5 so the mean of Y is also 2.5. b. Consider two methods of estimating the population mean of X: X  X 1 2X 2 and  3 3 3X1 2X 2  . Which method of estimating the population mean is unbiased? Which 5 5 estimator provides estimates with the smallest variance? Both estimators are unbiased so each produces the correct mean, on average. X To determine variance, consider the following spreadsheet: Die 1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Die 2 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 X 1 1.333333 1.666667 2 1.666667 2 2.333333 2.666667 2.333333 2.666667 3 3.333333 3 3.333333 3.666667 4 X 1 1.6 2.2 2.8 1.4 2 2.6 3.2 1.8 2.4 3 3.6 2.2 2.8 3.4 4 All of the possible X vary between 1 and 4 as do the X . However, the X seem to vary less (notice, there are more numbers close to the true mean in X than there are for X ). Indeed, the variance of X is .694 while the variance of X is .65 indicating X is less efficient than X . 3. A company services copiers. A review of its records shows that the time taken for a service call is normally distributed with a mean of 72.5 minutes and standard deviation of 20 minutes. a. What proportion of service calls take less than one hour? Z = (60 – 72.5)/20 = -.625. P(Z<-.625) = .266 or 26.6% of the time. (from http://www.statpages.org/pdfs.html) b. What proportion of service calls take more than 90 minutes? Z = (90 – 72.5)/20 = .875. P(Z>.875) = .1908. c. If a service call has taken an hour, what is the probability that it will take more than 90 minutes? Z = (90 – 72.5)/20 = .875. P(Z>.875|Z>-..625) = .1908/(1-.266) = .259 d. In a random sample of four calls, what is the probability that the average length of calls is less than one hour? Z = (60 – 72.5)/(20/4.5) = -1.25. P(Z<-1.25) = .10565 4. Following are the miles-per-gallon figures for a sample of cars of the same model tested under identical conditions: 25 30 28 29 24 25 27 28 26 24 27 31 First, the relevant sample statistics are: X = 27 s2 = 5.2727 s = 2.296 The hypothesis test being conducted is: H0 :  = 28 HA :   28 For this problem, I am testing the first statement, “The consumers’ group that conducted this test is skeptical of the manufacturer’s claim that its cars average 28 mpg” which implies doubt of the average rather than a one-tailed test which implies doubt of greater than or less than the average. My test statistic is: t = (27 – 28)/(2.296/12.5) = -1.509 The critical value of a t-statistic at the 99% level in a two tailed test with 11 degrees of freedom is 3.106. In this case, my test statistic is less than my critical value so I fail to reject my null hypothesis and conclude that it is likely enough for me to observe an average of 27 when the true mean is 28 that I will not conclude the auto company is being dishonest. b. Find the 95% confidence interval for the mean gas mileage of all cars of this model. The t-statistic with 11 degrees of freedom at the 95% level is 2.201. My confidence interval for the mean gas mileage of all cars in this model is: 27  2.201(2.296/12.5) = {25.54, 28.45} c. Interpret the confidence interval you found in (b). Can you conclude that 95% of all cars of this model will have mpg values within this interval? Explain. This confidence interval is the confidence interval of the unobservable true average gas mileage of all cars in this model (what we typically call μ); not of the individual vehicles themselves. In other words, I am 95% confident that the average gas mileage of this make of cars is between 25.54 and 28.45 mpg. This does not tell me about any single car which may have a gas mileage outside of this range. To find an estimate of any single car, I simply use the estimated mean and standard deviation. I would be 95% confident that any given car has a mpg between 27  2.2962.201 = {21.94, 32.05}.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Answers