Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Economics 345 Review of Useful Math and Statistics Part II Populations and Samples ! Up to this point, we’ve been reviewing probability theory. ! We step into the realm of statistics when we try to learn something about population distributions using information garnered from samples. A population is a well-defined group of subjects (firms, consumers, workers, etc.) A sample is some subset of that population, from which we can try to infer things about the overall population Example: Suppose we want to learn something about voter preferences in BC. Can administer a survey to a sample of, say, 900 potential voters. Can draw inferences about how the entire population of potential voters are likely to vote based on this sample 1 Statistical Inference ! When we practice statistical inference, we are trying to infer (learn or guess, in an educated fashion) something about the population distribution using a sample from that distribution. Estimation: Using data to come up with an estimated value (or a likely range of values) of something like a population mean, or a model parameter (say β1 in our model of education and earnings from the first lecture). When we estimate a single value of a parameter, it’s called a point estimate. When we estimate a range of possible values of a parameter, it’s called an interval estimate. Hypothesis testing: Using data to assess whether some hypothesis about a population parameter value should be rejected or not. Suppose we hypothesize that education has no effect on earnings: H0: β1=0. If our point estimate of β1 is big enough relative to the standard error of that estimate, we can reject that hypothesis. This provides evidence in support of the claim that education boosts earnings. 2 Sampling ! Let Y be RV representing a population with pdf f(y;θ); suppose we know everything about this pdf except the value of θ. In order to know the true population distribution, we need to know the true value of θ. We’ll never know the true value of θ. But hopefully, using a sample of data, we can get close. ! Random sampling: If Y1,Y2,…,Yn are independent random variables, each with the same pdf f(y; θ), then {Y1,Y2,…,Yn} is said to be a random sample from f(y; θ). We say these are iid random variables from f(y; θ). Note that Y1,Y2,…,Yn are considered RVs, because before the sample is taken, they can take on a variety of values 3 Sampling Once the sampling has occurred, we have a set of data {y1,y2,…,yn}. Note that these are not RVs. ! Sampling example: Suppose Y1,Y2,…,Yn are independent RVs and each is distributed Bernoulli(θ), then {Y1,Y2,…,Yn} is said to be a random sample from a population that is distributed Bernoulli(θ). 4 Finite sample properties of estimators ! Suppose we’re trying to infer something about the (unknown) population value of a parameter, θ. An estimator is a mathematical expression that serves as a rule for converting samples of data into an educated guess (or estimate) of the value of θ. An estimator is the same for each possible sample that is drawn. But when the estimator is applied to each possible sample of data, it will generally generate different estimates. 5 Estimators ! An example of an estimator Suppose {Y1,Y2,…,Yn} is a random sample from a population with an unknown mean, µ. A sensible estimator of µ is given by 1 n Y = ∑ Yi . n i =1 We call this the sample mean. Note also that it is a random variable (the estimate it generates will be different for each random sample we draw) Since a random sample is a set of random variables, any function of the random sample, such as an estimator, is itself a random variable. 6 Estimators Using this estimator and a sample of data, {y1,y2,…,yn}, we obtain the actual estimate of µ, 1 n y = ∑ yi n i =1 ! More generally, an estimator W of a parameter, θ, can be expressed as W=h(Y1,Y2,…,Yn), for some function h of the random variables Y1,Y2,…,Yn. This just says that an estimator can be expressed as a function of a random sample. We can express the actual estimate that’s obtained for a particular sample {y1,y2,…,yn} as w=h(y1,y2,…,yn). 7 Estimators ! In order to understand the properties of an estimator, W, we will want to study its sampling distribution. Describes the distribution of the RV W over different samples Remember, RVs have distributions. W will take on a different value, w, for each sample we draw The properties of this distribution will help us evaluate the appropriateness of W as an estimator and let us compare it with other possible estimators. Note that in principle, an infinite number of possible estimators exist for each possible parameter we might want to estimate. We need to be able to judge which possible estimator is best. 8 Things to Look for in an Estimator: Unbiasedness ! Unbiasedness: An estimator, W, of θ, is said to be unbiased if E(W)=θ. Of course, any given realization (w) of W will probably not equal θ, but it’s reassuring to have an estimator whose mean value equals θ. This means that on average the estimator will correctly predict θ. ! Unfortunately, some estimators will be biased And sometimes we will prefer a biased estimator to an unbiased estimator for reasons discussed below. 9 The Bias of an Estimator ! For a biased estimator, W, of θ, we define W’s bias to be Bias(W)=E(W)-θ Example: Suppose θ=2. Suppose E(W)=4. We would say the bias of W is 2. If E(W)=0. We would say the bias of W is -2. If E(W)=2, we would say that W is an unbiased estimator of θ. ! If we want an unbiased estimator, we need to pick h to ensure there is no bias. ! Example of an unbiased estimator: sample mean as estimator of population mean, µ. n " % " n % " n % " n % E(Y ) = E $ (1 / n)∑ Yi ' = (1 / n)E $ ∑ Yi ' = (1 / n) $ ∑ E(Yi )' = (1 / n) $ ∑ µ ' = (1 / n)(nµ ) = µ # & # i =1 & # i =1 & # i =1 & i =1 10 Estimators ! Consider a population with mean µ and variance σ2. We can define an estimator called the sample variance 1 n S = (Yi − Y )2 ∑ n − 1 i =1 2 This is also an unbiased estimator. That is E(S2)=σ2 ! Holding other things constant, unbiasedness is a desirable property in an estimator However, some good estimators are biased And some unbiased estimators are bad So don’t be too quick to judge an estimator entirely on the size of its bias. 11 Sampling Variance of an Estimator ! Why might an unbiased estimator be undesirable? It could have very high sampling variance Another way of saying this is that the estimator may be very imprecise. Among unbiased estimators, those with lower sampling variance are preferred. ! Example: See board. ! The variance of an estimator is called its sampling variance. Consider the variance of the sample mean: n n n " % 2 2 Var(Y ) = Var $ (1 / n)∑ Yi ' = (1 / n) Var ∑ Yi = (1 / n) ∑ Var(Yi ) # & i =1 i =1 i =1 n = (1 / n) 2 2 2 2 2 σ = (1 / n) (n σ ) = σ /n ∑ i =1 Notice that the sampling variance is a function of the sample size. For larger samples (higher n) the sampling variance is smaller. As n goes to infiniti, the sampling variance goes to zero. 12 Sampling Variance of an Estimator ! Here’s an example of an unbiased estimator that is undesirable on the basis of its sampling distribution Suppose we want an estimator of the population mean µ, given a sample {Y1,Y2,…,Yn}. Recall I said there are an infinite number of possible estimators. One of these is Y2. Y2 is an unbiased estimator of µ, because E(Y2)=µ. But the sampling variance of Y2 (σ2 ) is large relative to the sampling variance of the sample mean (σ2/n). We say that the sample mean is a more efficient estimator than Y2 . Another way to express this is to say that the sample mean is a more precise estimator. 13 Things to Look for in an Estimator: Efficiency ! If W1 and W2 are two unbiased estimators of θ, we say is W1 is more efficient than W2 if Var(W1)<Var(W2). Note that this use of “efficient” has nothing to do with Pareto efficiency. The term efficiency is only used to compare unbiased estimators, because efficiency allows us a way to rank estimators among the class of unbiased estimators If we want to include biased estimators in our comparison, efficiency doesn’t make sense as a way to rank them One estimator might be very low variance but very biased, while another might be unbiased but very high variance We can’t strictly say that the low variance estimator is better ! Mean squared error (MSE) is an alternative way to rank estimators that allows biased estimators to be compared MSE(W)=E[(W- θ)2] This is a measure of how far W tends to be from θ. Lower MSE estimators are preferred. 14 Note that a potential tradeoff exists between unbiasedness and small sampling variance ! Suppose we have two estimators of θ. One estimator is unbiased but has large sampling variance. The other estimator is slightly biased but has low sampling variance. It may be that the biased estimator is preferred See board. MSE captures this tradeoff between unbiasedness and low sampling variance 15 Asymptotic Properties of Estimators ! Thus far, we’ve been discussing finite sample properties of estimators ! In theory, we can consider cases where sample size gets very large (and eventually goes to infiniti) While we never have an infinite sample size in the real world, we sometimes get very big samples and we may believe that very big samples are going to have properties similar to infinite samples Some estimators that don’t perform well for small samples actually do very well for infinite samples, and therefore may be expected to perform well for large samples. When we talk about the large sample properties of an estimator, we often refer to these as the asymptotic properties of the estimator. ! Where unbiasedness is a desirable finite sample property, consistency is a desirable asymptotic property for an estimator to have. 16 Consistency ! An estimator is consistent if it converges to θ as n approaches infiniti. ! Stated formally, if Wn is an estimator of θ and we have a sample {Y1,Y2,…,Yn} of size n, then Wn is a consistent estimator of θ, if for every (arbitrarily small number) ε>0, P(|Wn-θ|>ε) 0 as n ∞ In other words, think of the smallest positive number you can. As sample size approaches infiniti, the magnitude of the gap between Wn and θ will get less than the number you’re thinking of. ! Another way to define an estimator, Wn, as consistent is to say that plim(Wn)=θ. Or “the probability limit of Wn is θ.” ! Note that consistency is analogous to unbiasedness in small samples. 17 Consistency ! If an estimator fails to meet this definition, we say it is inconsistent. We want our estimators to be consistent We may be willing to put up with an estimator that is biased in small samples, but we want one that will at least accurately predict θ asymptotically. ! Given two consistent estimators, we will prefer the one with lower asymptotic variance. ! Example of a consistent estimator: sample mean Let {Y1,Y2,…,Yn} be iid RVs with mean µ. Then plim(Yn ) = µ This is commonly known as the Law of Large Numbers Put in very loose terms, this says that if you pick a big enough sample, your sample mean is going to get incredibly close to the true population mean. 18 Asymptotic Normality ! If an estimator is consistent, this tells us that the estimator converges to the true parameter value as sample size goes to infiniti. Doesn’t tell us anything about the shape of the sampling distribution for a given sample size. As it turns out, most estimators that we will deal with have roughly a normal distribution in large samples (in small samples they may deviate substantially from the normal distribution) ! Let {Zn: n=1,2,…} be a sequence of RVs such that for all z, P(Z n ≤ z) → Φ(z) as n → ∞. We say that this sequence of RVs has an asymptotic standard normal distribution. This means that the cdf of Zn gets close to the standard normal cdf as n gets very large. This can be handy because it means we can can use the well-known standard normal distribution to approximate the distribution of such RVs. 19 The Central Limit Theorem ! Sit down for this one. ! Let {Y1,Y2,…,Yn} be a random sample with mean µ and variance σ2. Then Zn = Yn − µ is distributed asymptotically standard normal. σ/ n ! This should leave you breathless. This means that the standardized sample mean, regardless of its small sample distribution, has a cdf that approaches ~N(0,1) as sample size gets large. Think up some wacky distribution for Y. As n gets large, the standardized version of the sample mean of Y gets arbitrarily close to the standard normal distribution. We already knew from before that the standardized sample mean would have a mean of zero and variance of 1. That’s not the big deal. The big deal is that now (for large samples) we know that the standardized sample mean of Y is normally distributed, even if the RV Y is non-normally distributed. Imagine your life without the internet. That’s what life in statistics would 20 be without the CLT. Interval Estimation ! Recall that estimators are random variables. Suppose you take an estimator and a sample of data and use them to produce an estimate of some model parameter Chances are the estimate will be wrong. It will be different for each sample. If the sampling variance of the estimator is high, a given estimate (from a given sample) is likely to be way off from the true population value of the parameter. A point estimate, by itself, contains no information about sampling variability of the estimator, and therefore no information on the likely precision of the estimate. One way to deal with this problem is to produce point estimates along with confidence intervals. The confidence interval gives us a sense of the range of values in which we expect the true parameter value to lie. 21 Example of Confidence Intervals ! Suppose we are trying to estimate the sample mean of a population that is distributed N(µ,σ2). The sample mean is an estimator of µ that is distributed N(µ, σ2/n). With a given sample the sample mean will produce a point estimate, y We know that a given point estimate is going to be equal to the actual value of µ with probability 0. But suppose we decide that once we have our estimate, we’re going to pad that point estimate by drawing an interval around it. Even for a very small interval, we now have some probability greater than 0 that our interval (once we estimate it) will contain the true value of µ. How could we be sure that our interval will contain µ? Simply choose an interval that is the entire real number line (so the interval runs from negative infiniti to positive infiniti)! But this is completely useless, because then we’re not narrowing down the likely possible range of values in which µ lies. Instead we could pick an interval in which µ is likely to exist with a high level of likelihood, but not with certainty. 22 Example of Confidence Intervals ! To simplify our example slightly, let’s now suppose the population is distributed N(µ,1) and let {Y1,Y2,…,Yn} be a random sample from the population. We know that the sample mean is distributed N(µ,1/n). Let’s suppose we want to standardize the sample mean estimator and construct an interval around that estimator that will be very likely (say in 95% of the samples we draw) to contain µ. We need to pick c, so that " % Y −µ P $ −c < < c ' = 0.95 # & 1/ n 23 Example of Confidence Intervals It turns out that 95% of the probability mass of a standard normal distribution lies between -1.96 and +1.96. So a value of c=1.96 will give us the probability of 0.95 that we’re looking for. " % Y −µ P $ −1.96 < < 1.96 ' = 0.95 # & 1/ n With some rearrangement of this expression… ( ) P ( −1.96 / n < (Y − µ ) < 1.96 / n ) = 0.95 P ( −Y − 1.96 / n < − µ < −Y + 1.96 / n ) = 0.95 P (Y + 1.96 / n > µ > Y − 1.96 / n ) = 0.95 P −1.96 < n(Y − µ ) < 1.96 = 0.95 24 Example of Confidence Intervals ! We finally get ( ) P Y − 1.96 / n < µ < Y + 1.96 / n = 0.95 This says that µ will be expected to fall in the random interval (random, because Ybar is a RV), [Y − 1.96 / n,Y + 1.96 / n ] in 95% of random samples. ! For a given random sample {y1,y2,…,yn}, we can calculate an interval estimate of µ, [y − 1.96 / n, y + 1.96 / n ] This is sometimes called the 95% confidence interval around the point estimate. 25 The Meaning of the Confidence Interval ! If we construct a 95% confidence interval for µ, [y − 1.96 / n, y + 1.96 / n ] what we really mean is that the random interval [Y − 1.96 / n,Y + 1.96 / n ] Contains µ with probability 0.95. ! In other words, in repeated sampling, where each sample produces a new confidence interval, we expect that µ will lie in (on average) 95% of the confidence intervals produced. 26 An Incorrect (but common) Interpretation of the Confidence Interval ! People commonly claim that a given 95% confidence interval has a 95% chance of containing the true value of µ. This is incorrect. Once the actual interval estimate is made, µ either lies within it or lies outside of it. The interval is no longer a random interval (note the little y’s instead of the big Y’s), and therefore we can’t speak probabilistically about it. Example: The probability that the 10th person on the Econ 345 class roster shows up for class on a given day may be 0.80. However, looking around the room today, that person is either here or they aren’t. Their presence is no longer a random variable, so technically I can’t speak about it in probabilistic terms. ! See Table C.2 in text for illustration of this point. 27 Note that there’s nothing magical about the 95% confidence level ! We could pick 99%, 90%, 80%, 50%, 27.2% But 90%, 95%, and 99% tend to be favourites among empirical social science researchers. Think of them as kind of an “industry standard.” 28 Another Example of Confidence Interval ! The last example was unrealistic in that we assumed we knew the population variance, σ2. This is rarely the case. ! Stated generally, a 95% confidence interval for the sample mean of a normally distributed population is given by [y − 1.96σ / n, y + 1.96σ / n ]. If we don’t know σ, then we must estimate this parameter as well. We can use the sample standard deviation discussed above n # 1 2& s=% (yi − y ) ( ∑ $ n − 1 i =1 ' 1/2 . 29 Another Example of Confidence Interval ! Unfortunately we can’t just plug s in for σ and proceed as before. If we knew σ, we could argue that µ was contained in the random interval [Y − 1.96σ / n,Y + 1.96σ / n ] with 95% probability. If we plug in a random variable S in place of σ, then we change the probability that µ is contained in the random interval. ! This is where the t distribution comes in handy. 30 Another Example of Confidence Interval ! It turns out that Y −µ ~ t n −1 S/ n where S is the sample standard deviation. ! We can pick c such that the random interval around the sample mean contains 95% of the probability mass of the distribution. We get the confidence interval [y − c ⋅ s / n, y + c ⋅ s / n ] 31 Another Example of Confidence Interval ! When we constructed a confidence interval of the sampling mean with known variance, c was equal to 1.96. With the t distribution, c will depend on the degrees of freedom of the particular distribution. df=n-1, so the degrees of freedom are a function of the size of the random sample. The larger the sample size, the larger the df, and so the smaller the c. This means that larger sample sizes will lead to smaller (more precise) confidence intervals. ! Example (use Table G.2): If n=20 (so df=19) and we want to construct a 95% confidence interval around the sample mean, the interval will be [y − 2.093s / 20, y + 2.093s / 20 ] 32 Another Example of Confidence Interval ! More generally, a 100(1-α)% confidence interval is given by [y − cα /2 s / n, y + cα /2 s / n ] ! Recall the standard deviation of the sample mean of a normally distributed population sd(Y ) = σ / n Substituting s for σ, we get a point estimate of the std dev. This point estimate s/ n is commonly referred to as the standard error of the sample mean se(y ) = s / n 33 Another Example of Confidence Interval ! We can use this short hand to express the confidence interval as [y ± cα /2 se(y )] Again, note that since a larger sample size increases n and lowers c, larger sample sizes will lead to smaller confidence intervals. ! Asymptotic confidence intervals Recall the CLT. For a big enough sample size, even if the population is not normally distributed, the sample mean will be normally distributed. So for a non-normal population and a large enough sample, we can define a 95% confidence interval around the sample mean [y ± 1.96se(y )] We can do this because as n gets large, the t distribution approaches the standard normal distribution. 34 iClickers ! Pull out your iClickers now. Make sure you register your iClicker on MyPage on the UVic website. Do this ASAP. Unregistered clicker input can’t be matched to names at the end of the course. You can use a borrowed iClicker, so long as it’s not borrowed from a friend in the course. And you must use the same borrowed iClicker each time. You may not use someone’s iClicker to earn them credit in the course (that’s a violation of rules on Academic Honesty) Unless otherwise noted, I will give 7.5 points for each question attempted, and 2.5 additional points for each question correct. ! Here’s a test question (wait until I say “go”): Are you currently in attendance at 345 lecture? A) Yes B-E) No ! I will post results (by clicker ID) on the web so you can check that your iClicker worked. 35 Hypothesis Testing ! If we want to estimate population parameters, then point and interval estimates are useful. ! But sometimes we just want to answer a question like “Does A affect B?”. While careful consideration of interval estimates can help us answer this question, hypothesis testing provides a more explicit way to deal with such questions. ! Example: Suppose we want to test to see whether an election was rigged Results show that Candidate A received 42% of popular vote; Candidate B received 58%. Candidate A doesn’t believe these results, and wants to test for evidence of voting fraud. 36 Hypothesis Testing Example ! Candidate A could conduct a poll of voters, to estimate how many people actually voted for him. Suppose he finds that 53% of those he surveyed actually voted for him. ! Was there fraud? Not necessarily. The estimator for the share of the population that voted for him (according to his survey) has sampling variability. So getting an estimate of 53% does not necessarily preclude the possibility that the true share of votes he got was 42%. ! We can propose the hypothesis that the true percentage of the vote he got was 42%. This is called proposing a null hypothesis. We denote this as H 0 : θ = 0.42 37 Hypothesis Testing Example ! A favorite analog to the null hypothesis is the presumption of innocence for a criminal defendant. If I’m accused of a crime, I go on trial with the presumption of innocence. The jury is told to listen to all evidence, and consider whether that evidence is consistent with the presumption of innocence. If the jury finds the evidence to be far out of line with the presumption of innocence, they will reject the notion that I am innocent (and convict me). If they can’t find (beyond a reasonable doubt) that I am guilty, they fail to reject the presumption of innocence by finding me not guilty. This doesn’t mean they think I’m innocent. It just means they don’t find the evidence compelling enough to be sufficiently sure of guilt to convict me. 38 Hypothesis Testing Example ! We need an analog to “guilty” in our voting example. Since our null hypothesis is that θ=0.42, a sensible alternative hypothesis (consistent with vote rigging by Candidate A’s opponents) is that θ>0.42. We write the alternative hypothesis as H 1 : θ > 0.42. To reject the null hypothesis that θ=0.42 we must have sufficient evidence against it. The larger our estimate of θ is, the more likely we are to reject the null. The smaller the standard error of the estimate, the more likely we are to reject the null (assuming our estimate of θ is greater than 0.42) 39 Hypothesis Testing: Type I and Type II Errors ! We can make two types of error in hypothesis testing We can reject the null when it’s true. (Type I error) We can fail to reject the null when it’s false (Type II error) Just like the criminal justice system is cautious about falsely convicting people, social scientists set up hypothesis tests in a way that demands a pretty high threshold of evidence to reject the null. ! In choosing the significance level, α, of a hypothesis test, we are picking (in advance of taking a random sample) the probability of committing a Type I error (falsely rejecting the null). α = P(Reject H 0 | H 0 ) 40 Hypothesis Testing: Type I and Type II Errors ! A 5% significance level is typical of what social scientists will choose; though 1% and 10% significant levels are often seen. If you choose to conduct a hypothesis test at the 5% significance level, you are implicitly choosing to tolerate Type I error 5% of the time (in repeated sampling). ! Once the significance level is chosen, we try to minimize the probability of Type II error This is called maximizing the power of the test where power is denoted as π (θ ) = P(Reject H 0 | θ ) = 1 − P(Type II error|θ ) 41 Hypothesis Testing ! To test a null hypothesis against its alternative we need 1) a test statistic; and 2) a critical value against which to compare the test statistic. A test statistic, T, is a function of the random sample (and hence is a RV). Once we take a sample and plug it into the test statistic, that realization will be denoted t. We can define a rejection rule, by which if T takes on certain values relative to the critical value, the null hypothesis is rejected. The range of values of t that lead to rejection of the null hypothesis, are referred to as the rejection region of t. 42 Testing hypotheses about the mean in a normal population ! Assume we’re trying to test whether the mean of a Normal(µ,σ2) population takes on a certain value, µ0. Our null hypothesis will be H 0 : µ = µ0 We have three choices for the alternative hypothesis H1 : µ > µ0 H1 : µ < µ0 H1 : µ ≠ µ0 The first two of these alternatives are a form of 1-sided alternatives. The third alternative is a 2-sided alternative. 43 Testing hypotheses about the mean in a normal population ! Consider the following setup of the hypothesis test: H 0 : µ = µ0 H1 : µ < µ0 Given this setup, if we obtain an estimate of the sample mean that is sufficiently smaller than µ0, we will reject the null in favor of the alternative. ! If the setup is, instead, H 0 : µ = µ0 H1 : µ ≠ µ0 then if the sample mean is sufficiently smaller than or larger than µ0 , we will reject the null in favor of the alternative. 44 Testing hypotheses about the mean in a normal population ! If the setup is as follows: H 0 : µ = µ0 H1 : µ > µ0 then we reject the null if the sample mean is sufficiently larger than µ0. But how do we determine what is sufficiently large or small? ! Consider the RV (Y − µ0 ) T= S/ n This is a standardization of the sample mean, with S substituted in for σ. Under the null hypothesis, it has a t distribution with n-1 df. 45 Testing hypotheses about the mean in a normal population ! Our test statistic, called a t-statistic, will be (y − µ0 ) t= = (y − µ0 ) / se(y ) s/ n If our significance level is 5%, then we need to choose c so that P(T>c|H0)=0.05 The rejection rule, once we pick c, is “reject the null if t>c”. Note that the value of c chosen and the rejection rule both depend on the specific setup of the hypothesis test. H 0 : µ = µ0 H1 : µ > µ0 46 Testing hypotheses about the mean in a normal population ! For the setup H 0 : µ = µ0 H1 : µ < µ0 We would pick the critical value c so that P(T<c| H0)=0.05, and we would reject the null if t<-c. Both of these cases are cases of one-tailed tests, since the rejection region lies in just one tail of the t distribution (the upper tail in the first case, the lower tail in the second case) ! Notice that the t-statistic is more likely to lead to rejection of the null when the sample mean lies strongly in the direction of the alternative hypothesis, and when the standard error of the sample mean is small. 47 Testing hypotheses about the mean in a normal population ! A two tailed test: When the hypothesis test is set up as H 0 : µ = µ0 H1 : µ ≠ µ0 we must be careful to pick c so that the significance of the test remains α. If we want a test that is significant at the 5% level, we want the area in the rejection regions (far tails) of the distribution to sum to 0.05. This means we want 0.025 in the right tail and 0.025 in the left tail. In general, we want to pick c so that there is an area of α/2 in each tail’s rejection region. ! The rejection rule for a two-tailed alternative is |t|>c. 48 Testing hypotheses about the mean in a normal population ! Note the language used to report results of a hypothesis test: Based on the results she finds, the researcher either 1) Rejects the null hypothesis in favor of the alternative hypothesis at the 100*α% significance level; or 2) Fails to reject the null hypothesis in favour of the alternative hypothesis at the 100*α% significance level. ! One never “accepts the null.” We could pick many different values of θ as our null, and fail to reject the null for each one, given a particular sample of data. If we claimed to accept the null in each case, we would be claiming that θ was equal to several different values. This would be logically inconsistent. 49 iClickers ! Suppose we conduct the following hypothesis test about the mean, µ, of a population that is normally distributed. H0: µ=10 H1: µ≠10 We set a significance level for the test, obtain a sample, calculate the sample mean, construct the test statistic (as described above), and find the critical values 2.03 and -2.03. Question: What do we conclude if we obtain a test statistic of 1.5? A) We reject the null. B) We accept the null. C) We conclude that µ=10 D) We fail to reject the null. 50 E) None of the above. iClickers ! Suppose we conduct the following hypothesis test about the mean, µ, of a population that is normally distributed. H0: µ=10 H1: µ>10 We set a significance level for the test, obtain a sample, calculate the sample mean, construct the test statistic (as described above), and find the critical value 1.75. Question: What do we conclude if we obtain a test statistic of -300? A) We reject the null. B) We accept the null. C) We conclude that µ=10 D) We fail to reject the null. 51 E) None of the above. iClickers ! Question: Suppose we conduct a hypothesis test at the 5% significance level about the mean of a population that is normally distributed. Suppose that we reject the null in favour of the alternative. What can we conclude from this? A) The null is wrong. B) The null is right. C) This finding is evidence against the null, but it’s not conclusive proof that the null is wrong. D) None of the above. 52 Asymptotic Tests for Non-Normal Populations ! With a big enough sample, we don’t need the population distribution to be normal. We can invoke the central limit theorem Under the null hypothesis, H 0 : µ = µ 0 a T = n(Y − µ0 ) / S N(0,1) So, for large n the t statistic can be compared with critical values from the standard normal distribution. Typically for n<120, people will refer to the t distribution. For values of n>120, the t and standard normal distributions become so close to each other, that one can simply refer to the standard normal distribution for critical values. 53 How Confidence Intervals and Hypothesis Testing are Related ! Consider a (two-sided) 95% confidence interval about a sample mean If the null is not contained in the confidence interval constructed around the sample mean, then we can reject the null in favour of the alternative H1 : µ ≠ µ0 at the 5% significance level. If the null is contained in the confidence interval, then we fail to reject the null in favour of the alternative H1 : µ ≠ µ0 54 Economic versus Statistical Significance ! Statistical significance is not all that matters in the analysis that we do. We may estimate the effect of education on earnings and determine that the null hypothesis is rejected at the 5% significance level But suppose the confidence interval that we construct suggests that the most that a year of extra education raises annual earnings by is $10. In practical terms, this is as good as earnings having no effect on education. A common mistake of econometrics students is to fixate on statistical significance. If you spend all your time staring at the results of t-tests, you may forget to consider the magnitude of effects that you’re measuring. It’s important to know not only whether a measured effect is statistically significant, but whether it is big or small--that is, is it economically (or practically) significant? 55