Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 FUNDAMENTALS OF MATHEMATICAL STATISTICS1 I. Introduction In this Lecture Note, we will go over some fundamental concepts of mathematical statistics that are especially useful for studying Econometrics that you will be taking in the next semester. We studied most of the materials that are covered in this Note already. We go over them in this Note again but with some more rigor. The purpose of this lecture note is to understand the “exact” definitions of some of the terminologies often used in Econometrics. II. Finite Sample Properties of Estimators The first thing we will learn in this Note is finite sample properties of estimators. The term “finite” sample comes from the fact that these properties hold for a sample of any size. To put it differently, these properties hold no matter how small or large the sample size is. Sometimes, these are called small sample properties, even though the properties hold for large sample. Anyhow, the bottom line is that keep in mind that finite sample properties imply that these properties hold for any sample size n. A. Estimators and Estimates We will first define what we mean by an estimator in order to study properties of estimators. Suppose we have an SRS of size n, {X1 , X2 , ..., Xn }, that is drawn from a population distribution that depends on an unknown parameter θ. Note that each Xi is a random variable. An estimator of θ is a “rule” that assigns each possible outcome of the sample a value of θ. Note that the rule is the same regardless of the data actually obtained. In this course, we frequently used one estimator; i.e., the sample mean X̄. Let {X1 , X2 , ..., Xn } be an SRS 1 Lecture Note 8 does not correspond to any of the chapters in our textbook. This Note is heavily drawn from Introductory Econometrics—A Modern Approach, by Jeffrey Wooldridge. Hosung Sohn 1 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 from a population with mean µ. One estimator of µ we learned is n X X̄ = Xi i=1 n . X̄ is called the sample mean, and it is an estimator of the population mean µ. And we learned what X̄ can be viewed as a random variable, as it is a combination of random variables X1 , X2 , ..., Xn . As a matter of course, given any outcome of the random variables X1 , X2 , ..., Xn , we use the same rule mentioned above to estimate µ. An estimate is referred to the actual value we obtain from applying the estimator to the “actual data,” x1 , x2 , ..., xn . We will distinguish the random variables, X1 , X2 , ..., Xn , from the actual data, x1 , x2 , ..., xn . More generally, an estimator W of a parameter θ can be expressed as an abstract mathematical formula as below: W = h(X1 , X2 , ..., Xn ), for some known function h of the random variables X1 , X2 , ..., Xn . W , of course, is a random variable, and the value of W changes depending on the sample you obtain. When a particular set of data, x1 , x2 , ..., xn is plugged into the function h, we obtain an estimate of θ, denoted w = h(x1 , x2 , ..., xn ). We use an upper-case letter to denote an estimator (e.g., X̄), and a lower-case letter for an estimate (e.g., x̄). Note further that an estimator W is sometimes called a point estimator and w a point estimate to distinguish these from interval estimators and estimates. For evaluating estimation procedures, we study various properties of the probability distribution of the random variable W . The distribution of an estimator is called a sampling distribution, because this distribution describes the likelihood of various outcomes of W across different random samples. There are unlimited number of rules for combining data to estimate parameters. And as you can imagine, we need some desirable criteria for choosing among many estimators, or at least for eliminating some estimators from consideration. The criteria that we use are based on the characteristics of a sampling distribution of an estimator, and this is the focus of mathematical statistics. And in this section, we study the following characteristics of an estimator: unbiasedness, efficiency, and a mean squared error. Hosung Sohn 2 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs B. Syracuse University Fall, 2015 Unbiasedness In statistics, we focus on a few features of the distribution of W in evaluating it as an estimator of θ. The first important property of an estimator makes use of its expected value. The property is an unbisedness. Definition 1. An estimator, W of θ, is an unbiased estimator if E(W ) = θ. If an estimator is unbiased, then its sampling distribution has an expected value equal to the parameter it is supposed to be estimating. Unbiasedness does not imply that the “estimate” we get with any particular one sample is equal to θ, or even very close to θ. Rather, if we could indefinitely draw many random samples on X from the population, compute an estimate each time, and then average these estimates over all random samples, we would obtain θ. This thought experiment is abstract because, in most applications, we just have “one” random sample to work with. Given that we have defined an unbiasedness, we can define a bias of an estimator: Definition 2. If W is a biased estimator of θ, its bias is defined as Bias(W ) ≡ E(W ) − θ. In Definition 2, “≡” is used for denoting “equivalency.” If a bias of an estimator is zero, i.e., E(W ) − θ = 0 =⇒ E(W ) = θ. which implies that an estimator W is unbiased. Figure 1 shows three estimators. As you can see from the figure, W1 is an unbiased estimator for θ, and W2 is an biased estimator for θ, but has a smaller bias than the biased estimator W3 . As you can see from the two definitions above, the unbiasedness of an estimator W and the size of a bias of an estimator W depend on the distribution of X and on the function h because W = h(X1 , X2 , ..., Xn ). The distribution of X is usually beyond our control; it may Hosung Sohn 3 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 Figure 1: An Illustration of Unbiased and Biased Estimators be determined by nature or social forces. But the choice of the rule h is ours, and if we want an unbiased estimator, then we must choose h accordingly. Some estimators can be shown to be unbiased quite generally. We learned that X̄, the estimator for a population mean µ, is an unbiased because E(X̄) = µ (we proved this in Lecture Note 5). We also learned that the sample variance S 2 n 2 1 X S = Xi − X̄ n − 1 i=1 2 is an unbiased estimator for the population variance σ 2 (we did not prove this). And that is why we replaced σ 2 for S 2 when conducting a hypothesis testing. Note that for the unbiasedness of the estimator X̄, we did not require any conditions other than random sampling. That is, the estimator X̄ is an unbiased estimator for µ as long as we have a random sample. For the unbiasedness of other estimators such as β̂ that you will learn in the Econometrics course, more conditions are required other than random sampling, in order for this estimator to be unbiased. The unbiasedness as a criterion for choosing an estimator has two weaknesses. First, there are some reasonable, and even some very good, estimators are not unbiased. We will see an example for this weakness later. Second, there are some poor estimators even though these estimators are unbiased. Consider estimating the mean µ of a population. Rather than using the sample mean X̄ to estimate µ, suppose that, after collecting a sample of size n, we discard all of the observations except the first. That is, our estimator of µ is simply Hosung Sohn 4 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 W = X1 . This estimator is unbiased because E(X1 ) = µ. As you can imagine, ignoring all but the first observation is not a prudent approach to estimation; it throws out most of the information in the sample. For example, with n = 100, we obtain 100 outcomes of the random variables X1 , X2 , ..., X1 00, but then we use only the first of these, X1 , to estimate µ. This leads us to another criterion for choosing an estimator; i.e., efficiency. C. Efficiency Unbiasedness only ensures that the sampling distribution of an estimator has a mean value equal to the population parameter it is supposed to be estimating. This is quite useful, but we also need to know how spread out the distribution of an estimator is. An estimator can be very close to the population parameter θ, on average, but it can also be very far away with large probability. In Figure 2, two unbiased estimators are shown. Figure 2: Two Unbiased Estimators with Different Spread As you can see from the figure, W1 and W2 are both unbiased estimators of θ. But the sampling distribution of W1 is more tightly centered about θ. This implies that the Hosung Sohn 5 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 probability that W1 is greater than any given distance from θ is less than the probability that W2 is greater than that same distance from θ. To put it differently, using W1 as our estimator is more likely that we will obtain a random sample that yields an estimate very close to θ. Therefore, if we were to choose one estimator from these two estimators, we should definitely choose W1 to estimate θ. How do we know whether W1 is less spread out than W2 ? We rely on the variance of an estimator. The variance of an estimator is also called as sampling variance because it is the variance associated with a sampling distribution. We already learned one sampling variance; i.e., the sampling variance of an estimator X̄: V ar X̄ = σ2 . n As suggested by Figure 2, “among unbiased estimators,” we prefer the estimator with the smallest variance. Knowing the sampling variance of estimators allows us to eliminate certain unbiased estimators from consideration. For a random sample from a population with mean µ and σ 2 , we know that X̄ is an unbiased estimator for µ. We also learned that X1 is also an unbiased estimator for µ. Which one should we choose as our estimator? Let’s consider the variance of these two unbiased estimators. We know that V ar X̄ = σ 2 /n. We also learned that V ar (X1 ) = σ 2 . Now, let’s compare these two variances. The difference between V ar X̄ and V ar (X1 ) can be large even for small sample sizes. If n = 5, then V ar (X1 ) is five times as large as V ar X̄ . This gives us a formal way of excluding X1 as an estimator for µ. Comparing the variances of estimators is a general approach to comparing different unbiased estimators. Definition 3. If W1 and W2 are two “unbiased” estimators of θ, W1 is efficient relative to W2 when V ar(W1 ) ≤ V ar(W2 ). In the example above, we say that the estimator X̄ is efficient relative to X1 when n > 1. Note that the criterion “efficiency” can only be used for “unbiased” estimators. If we do not restrict out attention to unbiased estimators, then comparing variances is meaningless. One way to compare estimators that are not necessarily unbiased is to compute the mean squared error (MSE) of the estimators. Hosung Sohn 6 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 Definition 4. If W is an estimator of θ, then the MSE of W is defined as h i M SE(W ) = E (W − θ)2 . The MSE measures how far, on average, the estimator is away from θ. Observe that h i E (W − θ)2 = E W 2 − 2θW + θ2 = E W 2 − 2θE(W ) + θ2 = E W 2 − E(W )2 + E(W )2 − 2θE(W ) + θ2 = E W 2 − E(W )2 + [E(W ) − θ]2 = V ar(W ) + Bias(W )2 . Hence, M SE(W ) depends on the variance and bias. This criterion allows us to compare two estimators when one or both are biased. III. Large Sample Properties of Estimators In previous section, we learned that the estimator X1 is unbiased, but it is a poor estimator because its variance can be much larger than that of the sample mean X̄. Note that X1 has the same variance for any sample size. It seems reasonable to require any estimation procedure to improve as the sample size increases. For estimating a population mean µ, X̄ improves in the sense that its variance gets smaller as n gets larger; X1 does not improve in this sense. We can rule out certain silly estimators by studying the large sample or asymptotic properties of estimators. One of the large sample properties of estimators we learn in this Note is consistency. This large sample property concerns how far the estimator is likely to be from the parameter it is supposed to be estimating as we let the sample size increase indefinitely. Definition 5. Let Wn be an estimator of θ based on a sample X1 , X2 , ..., Xn of size n. Then Wn is a consistent estimator of θ if Wn → θ as n → ∞. If Wn is not consistent for θ, then we say it is inconsistent. Hosung Sohn 7 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 Unlike unbiasedness—which is a feature of an estimator for a given sample size—consistency involves the behavior of the sampling distribution of the estimator as the sample size n gets large. And that is why this property is called the large sample property. To emphasize this, we have indexed the estimator Wn by the sample size n in stating this definition. The intuition of the consistency is clear. It implies that the distribution of Wn becomes more and more concentrated about θ, which roughly means that for larger sample sizes, Wn is less and less likely to be very far from θ. This is illustrated in Figure 3. Figure 3: The Sampling Distributions of a Consistent Estimator for Four Sample Sizes Note that if an estimator is not consistent, then it does not help us to learn about θ, even with an unlimited amount of data. For this reason, consistency is a minimal requirement of an estimator used in statistics or econometrics. Also, unbiased estimators are not necessarily consistent, but those whose variances shrink to zero as the sample size grows are consistent. This can be stated formally: If Wn is an unbiased estimator of θ and V ar(Wn ) → 0 as n → ∞, then Wn is consistent. A good example of a consistent estimator is the sample mean X̄. We have already shown that X̄ is unbiased because E X̄ = µ. Also, V ar X̄ = σ 2 /n, which in turn implies that Hosung Sohn 8 Lecture Note 8 Maxwell School of Citizenship and Public Affairs Department of Public Administration and International Affairs Syracuse University Fall, 2015 V ar X̄ → 0 as n → ∞. Therefore, x̄ is an consistent estimator for a population mean µ. And that is why we are using the estimator X̄ to estimate the population mean µ. In the last paragraph of page 4, we learned that the unbiasedness as a criterion for choosing an estimator has two weaknesses. And one of the weakness is that there are some reasonable, even some very good, estimators are not unbiased. One example of a reasonable as well as good estimator is the sample standard deviation Sn . We learned that the sample variance Sn2 is an “unbiased” estimator for the population variance σ 2 . Note, however, that the sample standard deviation Sn is not unbiased. Nevertheless, Sn is a consistent estimator for the population standard deviation, and that is why we use the sample standard deviation Sn to estimate the population standard deviation σ. As you can see from the sample standard deviation case, therefore, if we use the “unbiasedness” criterion to choose an estimator, we would end up not using Sn to estimate σ, which is not desirable, because Sn has a very good property; i.e., consistency. Hosung Sohn 9 Lecture Note 8