Download Document

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Economics 345
Review of Useful Math and Statistics
Part II
Populations and Samples
!   Up to this point, we’ve been reviewing probability
theory.
!   We step into the realm of statistics when we try to
learn something about population distributions
using information garnered from samples.
 
 
A population is a well-defined group of subjects (firms,
consumers, workers, etc.)
A sample is some subset of that population, from which
we can try to infer things about the overall population
  Example: Suppose we want to learn something about voter
preferences in BC. Can administer a survey to a sample of,
say, 900 potential voters. Can draw inferences about how the
entire population of potential voters are likely to vote based on
this sample
1
Statistical Inference
!   When we practice statistical inference, we are trying to
infer (learn or guess, in an educated fashion) something
about the population distribution using a sample from that
distribution.
 
Estimation: Using data to come up with an estimated value (or a
likely range of values) of something like a population mean, or a
model parameter (say β1 in our model of education and earnings
from the first lecture).
  When we estimate a single value of a parameter, it’s called a point
estimate.
  When we estimate a range of possible values of a parameter, it’s
called an interval estimate.
 
Hypothesis testing: Using data to assess whether some hypothesis
about a population parameter value should be rejected or not.
  Suppose we hypothesize that education has no effect on earnings:
H0: β1=0.
  If our point estimate of β1 is big enough relative to the standard error
of that estimate, we can reject that hypothesis. This provides evidence
in support of the claim that education boosts earnings.
2
Sampling
!   Let Y be RV representing a population with pdf
f(y;θ); suppose we know everything about this pdf
except the value of θ.
 
 
In order to know the true population distribution, we
need to know the true value of θ.
We’ll never know the true value of θ. But hopefully,
using a sample of data, we can get close.
!   Random sampling: If Y1,Y2,…,Yn are independent
random variables, each with the same pdf f(y; θ),
then {Y1,Y2,…,Yn} is said to be a random sample
from f(y; θ).
 
 
We say these are iid random variables from f(y; θ).
Note that Y1,Y2,…,Yn are considered RVs, because
before the sample is taken, they can take on a variety of
values
3
Sampling
Once the sampling has occurred, we have a set
of data {y1,y2,…,yn}.
  Note that these are not RVs.
 
!  Sampling example: Suppose Y1,Y2,…,Yn are
independent RVs and each is distributed
Bernoulli(θ), then {Y1,Y2,…,Yn} is said to be
a random sample from a population that is
distributed Bernoulli(θ).
4
Finite sample properties of
estimators
!   Suppose we’re trying to infer something about the
(unknown) population value of a parameter, θ.
 
 
An estimator is a mathematical expression that serves
as a rule for converting samples of data into an
educated guess (or estimate) of the value of θ.
An estimator is the same for each possible sample that
is drawn. But when the estimator is applied to each
possible sample of data, it will generally generate
different estimates.
5
Estimators
!   An example of an estimator
 
 
Suppose {Y1,Y2,…,Yn} is a random sample from a
population with an unknown mean, µ.
A sensible estimator of µ is given by
1 n
Y = ∑ Yi .
n i =1
  We call this the sample mean.
 
Note also that it is a random variable (the estimate it
generates will be different for each random sample we
draw)
  Since a random sample is a set of random variables, any
function of the random sample, such as an estimator, is itself a
random variable.
6
Estimators
 
Using this estimator and a sample of data, {y1,y2,…,yn},
we obtain the actual estimate of µ,
1 n
y = ∑ yi
n i =1
!   More generally, an estimator W of a parameter, θ,
can be expressed as W=h(Y1,Y2,…,Yn), for some
function h of the random variables Y1,Y2,…,Yn.
 
 
This just says that an estimator can be expressed as a
function of a random sample.
We can express the actual estimate that’s obtained for a
particular sample {y1,y2,…,yn} as w=h(y1,y2,…,yn).
7
Estimators
!   In order to understand the properties of an
estimator, W, we will want to study its sampling
distribution.
 
 
 
Describes the distribution of the RV W over different
samples
Remember, RVs have distributions. W will take on a
different value, w, for each sample we draw
The properties of this distribution will help us evaluate
the appropriateness of W as an estimator and let us
compare it with other possible estimators.
  Note that in principle, an infinite number of possible estimators
exist for each possible parameter we might want to estimate.
  We need to be able to judge which possible estimator is best.
8
Things to Look for in an Estimator:
Unbiasedness
!   Unbiasedness: An estimator, W, of θ, is said to be
unbiased if E(W)=θ.
 
 
Of course, any given realization (w) of W will probably
not equal θ, but it’s reassuring to have an estimator
whose mean value equals θ.
This means that on average the estimator will correctly
predict θ.
!   Unfortunately, some estimators will be biased
 
And sometimes we will prefer a biased estimator to an
unbiased estimator for reasons discussed below.
9
The Bias of an Estimator
!   For a biased estimator, W, of θ, we define W’s bias to be
Bias(W)=E(W)-θ
 
Example: Suppose θ=2. Suppose E(W)=4. We would say the bias
of W is 2. If E(W)=0. We would say the bias of W is -2. If
E(W)=2, we would say that W is an unbiased estimator of θ.
!   If we want an unbiased estimator, we need to pick h to
ensure there is no bias.
!   Example of an unbiased estimator: sample mean as
estimator of population mean, µ.
n
"
%
" n %
" n
%
" n %
E(Y ) = E $ (1 / n)∑ Yi ' = (1 / n)E $ ∑ Yi ' = (1 / n) $ ∑ E(Yi )' = (1 / n) $ ∑ µ ' = (1 / n)(nµ ) = µ
#
&
# i =1 &
# i =1
&
# i =1 &
i =1
10
Estimators
!   Consider a population with mean µ and variance σ2. We
can define an estimator called the sample variance
1 n
S =
(Yi − Y )2
∑
n − 1 i =1
2
 
This is also an unbiased estimator. That is E(S2)=σ2
!   Holding other things constant, unbiasedness is a desirable
property in an estimator
 
 
 
However, some good estimators are biased
And some unbiased estimators are bad
So don’t be too quick to judge an estimator entirely on the size of
its bias.
11
Sampling Variance of an Estimator
!   Why might an unbiased estimator be undesirable?
 
 
It could have very high sampling variance
Another way of saying this is that the estimator may be very imprecise.
Among unbiased estimators, those with lower sampling variance are
preferred.
!   Example: See board.
!   The variance of an estimator is called its sampling variance. Consider
the variance of the sample mean:
n
n
n
"
%
2
2
Var(Y ) = Var $ (1 / n)∑ Yi ' = (1 / n) Var ∑ Yi = (1 / n) ∑ Var(Yi )
#
&
i =1
i =1
i =1
n
= (1 / n)
2
2
2
2
2
σ
=
(1
/
n)
(n
σ
)
=
σ
/n
∑
i =1
 
Notice that the sampling variance is a function of the sample size. For
larger samples (higher n) the sampling variance is smaller. As n goes to
infiniti, the sampling variance goes to zero.
12
Sampling Variance of an Estimator
!   Here’s an example of an unbiased estimator that is
undesirable on the basis of its sampling distribution
 
 
 
 
 
Suppose we want an estimator of the population mean µ, given a
sample {Y1,Y2,…,Yn}. Recall I said there are an infinite number of
possible estimators. One of these is Y2.
Y2 is an unbiased estimator of µ, because E(Y2)=µ.
But the sampling variance of Y2 (σ2 ) is large relative to the
sampling variance of the sample mean (σ2/n).
We say that the sample mean is a more efficient estimator than Y2 .
Another way to express this is to say that the sample mean is a
more precise estimator.
13
Things to Look for in an Estimator:
Efficiency
!   If W1 and W2 are two unbiased estimators of θ, we say is W1 is more
efficient than W2 if Var(W1)<Var(W2).
 
 
 
Note that this use of “efficient” has nothing to do with Pareto efficiency.
The term efficiency is only used to compare unbiased estimators, because
efficiency allows us a way to rank estimators among the class of unbiased
estimators
If we want to include biased estimators in our comparison, efficiency
doesn’t make sense as a way to rank them
  One estimator might be very low variance but very biased, while another might
be unbiased but very high variance
  We can’t strictly say that the low variance estimator is better
!   Mean squared error (MSE) is an alternative way to rank estimators that
allows biased estimators to be compared
MSE(W)=E[(W- θ)2]
 
 
This is a measure of how far W tends to be from θ.
Lower MSE estimators are preferred.
14
Note that a potential tradeoff exists
between unbiasedness and small
sampling variance
!   Suppose we have two estimators of θ. One estimator is
unbiased but has large sampling variance. The other
estimator is slightly biased but has low sampling variance.
 
 
 
It may be that the biased estimator is preferred
See board.
MSE captures this tradeoff between unbiasedness and low
sampling variance
15
Asymptotic Properties of Estimators
!   Thus far, we’ve been discussing finite sample properties of estimators
!   In theory, we can consider cases where sample size gets very large
(and eventually goes to infiniti)
 
 
 
While we never have an infinite sample size in the real world, we
sometimes get very big samples and we may believe that very big samples
are going to have properties similar to infinite samples
Some estimators that don’t perform well for small samples actually do
very well for infinite samples, and therefore may be expected to perform
well for large samples.
When we talk about the large sample properties of an estimator, we often
refer to these as the asymptotic properties of the estimator.
!   Where unbiasedness is a desirable finite sample property, consistency
is a desirable asymptotic property for an estimator to have.
16
Consistency
!   An estimator is consistent if it converges to θ as n approaches infiniti.
!   Stated formally, if Wn is an estimator of θ and we have a sample
{Y1,Y2,…,Yn} of size n, then Wn is a consistent estimator of θ, if for
every (arbitrarily small number) ε>0,
P(|Wn-θ|>ε)  0 as n  ∞
 
In other words, think of the smallest positive number you can. As sample
size approaches infiniti, the magnitude of the gap between Wn and θ will
get less than the number you’re thinking of.
!   Another way to define an estimator, Wn, as consistent is to say that
plim(Wn)=θ.
 
Or “the probability limit of Wn is θ.”
!   Note that consistency is analogous to unbiasedness in small samples.
17
Consistency
!   If an estimator fails to meet this definition, we say it is inconsistent.
 
 
We want our estimators to be consistent
We may be willing to put up with an estimator that is biased in small
samples, but we want one that will at least accurately predict θ
asymptotically.
!   Given two consistent estimators, we will prefer the one with lower
asymptotic variance.
!   Example of a consistent estimator: sample mean
 
Let {Y1,Y2,…,Yn} be iid RVs with mean µ. Then
plim(Yn ) = µ
 
 
This is commonly known as the Law of Large Numbers
Put in very loose terms, this says that if you pick a big enough sample,
your sample mean is going to get incredibly close to the true population
mean.
18
Asymptotic Normality
!   If an estimator is consistent, this tells us that the estimator converges to
the true parameter value as sample size goes to infiniti.
 
 
Doesn’t tell us anything about the shape of the sampling distribution for a
given sample size.
As it turns out, most estimators that we will deal with have roughly a
normal distribution in large samples (in small samples they may deviate
substantially from the normal distribution)
!   Let {Zn: n=1,2,…} be a sequence of RVs such that for all
z, P(Z n ≤ z) → Φ(z) as n → ∞.
 
 
 
We say that this sequence of RVs has an asymptotic standard normal
distribution.
This means that the cdf of Zn gets close to the standard normal cdf as n
gets very large.
This can be handy because it means we can can use the well-known
standard normal distribution to approximate the distribution of such RVs.
19
The Central Limit Theorem
!   Sit down for this one.
!   Let {Y1,Y2,…,Yn} be a random sample with mean µ and variance σ2.
Then
Zn =
Yn − µ
is distributed asymptotically standard normal.
σ/ n
!   This should leave you breathless. This means that the standardized
sample mean, regardless of its small sample distribution, has a cdf that
approaches ~N(0,1) as sample size gets large.
 
 
 
Think up some wacky distribution for Y. As n gets large, the standardized
version of the sample mean of Y gets arbitrarily close to the standard
normal distribution.
We already knew from before that the standardized sample mean would
have a mean of zero and variance of 1. That’s not the big deal. The big
deal is that now (for large samples) we know that the standardized sample
mean of Y is normally distributed, even if the RV Y is non-normally
distributed.
Imagine your life without the internet. That’s what life in statistics would
20
be without the CLT.
Interval Estimation
!   Recall that estimators are random variables. Suppose you
take an estimator and a sample of data and use them to
produce an estimate of some model parameter
 
 
 
Chances are the estimate will be wrong. It will be different for
each sample. If the sampling variance of the estimator is high, a
given estimate (from a given sample) is likely to be way off from
the true population value of the parameter.
A point estimate, by itself, contains no information about sampling
variability of the estimator, and therefore no information on the
likely precision of the estimate.
One way to deal with this problem is to produce point estimates
along with confidence intervals. The confidence interval gives us
a sense of the range of values in which we expect the true
parameter value to lie.
21
Example of Confidence Intervals
!   Suppose we are trying to estimate the sample mean of a
population that is distributed N(µ,σ2).
 
 
 
The sample mean is an estimator of µ that is distributed N(µ, σ2/n).
With a given sample the sample mean will produce a point
estimate, y
We know that a given point estimate is going to be equal to the
actual value of µ with probability 0. But suppose we decide that
once we have our estimate, we’re going to pad that point estimate
by drawing an interval around it. Even for a very small interval,
we now have some probability greater than 0 that our interval
(once we estimate it) will contain the true value of µ.
How could we be sure that our interval will contain µ?
  Simply choose an interval that is the entire real number line (so the
interval runs from negative infiniti to positive infiniti)!
  But this is completely useless, because then we’re not narrowing
down the likely possible range of values in which µ lies.
  Instead we could pick an interval in which µ is likely to exist with a
high level of likelihood, but not with certainty.
22
Example of Confidence Intervals
!   To simplify our example slightly, let’s now suppose the
population is distributed N(µ,1) and let {Y1,Y2,…,Yn} be a
random sample from the population.
 
 
We know that the sample mean is distributed N(µ,1/n).
Let’s suppose we want to standardize the sample mean estimator
and construct an interval around that estimator that will be very
likely (say in 95% of the samples we draw) to contain µ. We need
to pick c, so that
"
%
Y −µ
P $ −c <
< c ' = 0.95
#
&
1/ n
23
Example of Confidence Intervals
 
It turns out that 95% of the probability mass of a standard normal
distribution lies between -1.96 and +1.96. So a value of c=1.96
will give us the probability of 0.95 that we’re looking for.
"
%
Y −µ
P $ −1.96 <
< 1.96 ' = 0.95
#
&
1/ n
 
With some rearrangement of this expression…
(
)
P ( −1.96 / n < (Y − µ ) < 1.96 / n ) = 0.95
P ( −Y − 1.96 / n < − µ < −Y + 1.96 / n ) = 0.95
P (Y + 1.96 / n > µ > Y − 1.96 / n ) = 0.95
P −1.96 < n(Y − µ ) < 1.96 = 0.95
24
Example of Confidence Intervals
!   We finally get
(
)
P Y − 1.96 / n < µ < Y + 1.96 / n = 0.95
 
This says that µ will be expected to fall in the random interval
(random, because Ybar is a RV),
[Y − 1.96 / n,Y + 1.96 / n ]
in 95% of random samples.
!   For a given random sample {y1,y2,…,yn}, we can calculate
an interval estimate of µ,
[y − 1.96 / n, y + 1.96 / n ]
 
This is sometimes called the 95% confidence interval around the
point estimate.
25
The Meaning of the Confidence Interval
!   If we construct a 95% confidence interval for µ,
[y − 1.96 / n, y + 1.96 / n ]
what we really mean is that the random interval
[Y − 1.96 / n,Y + 1.96 / n ]
Contains µ with probability 0.95.
!   In other words, in repeated sampling, where each sample
produces a new confidence interval, we expect that µ will
lie in (on average) 95% of the confidence intervals
produced.
26
An Incorrect (but common)
Interpretation of the Confidence Interval
!   People commonly claim that a given 95% confidence
interval has a 95% chance of containing the true value of µ.
This is incorrect. Once the actual interval estimate is
made, µ either lies within it or lies outside of it. The
interval is no longer a random interval (note the little y’s
instead of the big Y’s), and therefore we can’t speak
probabilistically about it.
 
Example: The probability that the 10th person on the Econ 345
class roster shows up for class on a given day may be 0.80.
However, looking around the room today, that person is either here
or they aren’t. Their presence is no longer a random variable, so
technically I can’t speak about it in probabilistic terms.
!   See Table C.2 in text for illustration of this point.
27
Note that there’s nothing magical about
the 95% confidence level
!   We could pick 99%, 90%, 80%, 50%, 27.2%
 
 
But 90%, 95%, and 99% tend to be favourites among empirical
social science researchers.
Think of them as kind of an “industry standard.”
28
Another Example of Confidence Interval
!   The last example was unrealistic in that we assumed we
knew the population variance, σ2. This is rarely the case.
!   Stated generally, a 95% confidence interval for the sample
mean of a normally distributed population is given by
[y − 1.96σ / n, y + 1.96σ / n ].
 
 
If we don’t know σ, then we must estimate this parameter as well.
We can use the sample standard deviation discussed above
n
# 1
2&
s=%
(yi − y ) (
∑
$ n − 1 i =1
'
1/2
.
29
Another Example of Confidence Interval
!   Unfortunately we can’t just plug s in for σ and proceed as
before. If we knew σ, we could argue that µ was contained
in the random interval
[Y − 1.96σ / n,Y + 1.96σ / n ]
with 95% probability. If we plug in a random variable S in
place of σ, then we change the probability that µ is
contained in the random interval.
!   This is where the t distribution comes in handy.
30
Another Example of Confidence Interval
!   It turns out that
Y −µ
~ t n −1
S/ n
where S is the sample standard deviation.
!   We can pick c such that the random interval around the
sample mean contains 95% of the probability mass of the
distribution. We get the confidence interval
[y − c ⋅ s / n, y + c ⋅ s / n ]
31
Another Example of Confidence Interval
!   When we constructed a confidence interval of the sampling
mean with known variance, c was equal to 1.96.
 
 
 
With the t distribution, c will depend on the degrees of freedom of
the particular distribution.
df=n-1, so the degrees of freedom are a function of the size of the
random sample.
The larger the sample size, the larger the df, and so the smaller the
c. This means that larger sample sizes will lead to smaller (more
precise) confidence intervals.
!   Example (use Table G.2): If n=20 (so df=19) and we want
to construct a 95% confidence interval around the sample
mean, the interval will be
[y − 2.093s / 20, y + 2.093s / 20 ]
32
Another Example of Confidence Interval
!   More generally, a 100(1-α)% confidence interval is given
by
[y − cα /2 s / n, y + cα /2 s / n ]
!   Recall the standard deviation of the sample mean of a
normally distributed population
sd(Y ) = σ / n
 
 
Substituting s for σ, we get a point estimate of the std dev.
This point estimate
s/ n
is commonly referred to as the standard error of the sample mean
se(y ) = s / n
33
Another Example of Confidence Interval
!   We can use this short hand to express the confidence
interval as
[y ± cα /2 se(y )]
 
Again, note that since a larger sample size increases n and lowers
c, larger sample sizes will lead to smaller confidence intervals.
!   Asymptotic confidence intervals
 
Recall the CLT. For a big enough sample size, even if the
population is not normally distributed, the sample mean will be
normally distributed. So for a non-normal population and a large
enough sample, we can define a 95% confidence interval around
the sample mean
[y ± 1.96se(y )]
 
We can do this because as n gets large, the t distribution
approaches the standard normal distribution.
34
iClickers
!   Pull out your iClickers now.
 
 
 
 
Make sure you register your iClicker on MyPage on the UVic
website. Do this ASAP. Unregistered clicker input can’t be
matched to names at the end of the course.
You can use a borrowed iClicker, so long as it’s not borrowed
from a friend in the course. And you must use the same borrowed
iClicker each time.
You may not use someone’s iClicker to earn them credit in the
course (that’s a violation of rules on Academic Honesty)
Unless otherwise noted, I will give 7.5 points for each question
attempted, and 2.5 additional points for each question correct.
!   Here’s a test question (wait until I say “go”):
Are you currently in attendance at 345 lecture?
A) Yes
B-E) No
!   I will post results (by clicker ID) on the web so you can
check that your iClicker worked.
35
Hypothesis Testing
!   If we want to estimate population parameters, then point
and interval estimates are useful.
!   But sometimes we just want to answer a question like
“Does A affect B?”. While careful consideration of
interval estimates can help us answer this question,
hypothesis testing provides a more explicit way to deal
with such questions.
!   Example: Suppose we want to test to see whether an
election was rigged
 
 
Results show that Candidate A received 42% of popular vote;
Candidate B received 58%.
Candidate A doesn’t believe these results, and wants to test for
evidence of voting fraud.
36
Hypothesis Testing Example
!   Candidate A could conduct a poll of voters, to estimate
how many people actually voted for him. Suppose he
finds that 53% of those he surveyed actually voted for him.
!   Was there fraud? Not necessarily. The estimator for the
share of the population that voted for him (according to his
survey) has sampling variability. So getting an estimate of
53% does not necessarily preclude the possibility that the
true share of votes he got was 42%.
!   We can propose the hypothesis that the true percentage of
the vote he got was 42%. This is called proposing a null
hypothesis. We denote this as
H 0 : θ = 0.42
37
Hypothesis Testing Example
!   A favorite analog to the null hypothesis is the presumption
of innocence for a criminal defendant.
 
 
 
If I’m accused of a crime, I go on trial with the presumption of
innocence. The jury is told to listen to all evidence, and consider
whether that evidence is consistent with the presumption of
innocence.
If the jury finds the evidence to be far out of line with the
presumption of innocence, they will reject the notion that I am
innocent (and convict me).
If they can’t find (beyond a reasonable doubt) that I am guilty, they
fail to reject the presumption of innocence by finding me not
guilty. This doesn’t mean they think I’m innocent. It just means
they don’t find the evidence compelling enough to be sufficiently
sure of guilt to convict me.
38
Hypothesis Testing Example
!   We need an analog to “guilty” in our voting
example. Since our null hypothesis is that θ=0.42,
a sensible alternative hypothesis (consistent with
vote rigging by Candidate A’s opponents) is that
θ>0.42. We write the alternative hypothesis as
H 1 : θ > 0.42.
 
 
 
To reject the null hypothesis that θ=0.42 we must have
sufficient evidence against it.
The larger our estimate of θ is, the more likely we are
to reject the null.
The smaller the standard error of the estimate, the more
likely we are to reject the null (assuming our estimate
of θ is greater than 0.42)
39
Hypothesis Testing: Type I and Type II
Errors
!   We can make two types of error in hypothesis testing
 
 
 
We can reject the null when it’s true. (Type I error)
We can fail to reject the null when it’s false (Type II error)
Just like the criminal justice system is cautious about falsely
convicting people, social scientists set up hypothesis tests in a way
that demands a pretty high threshold of evidence to reject the null.
!   In choosing the significance level, α, of a hypothesis test,
we are picking (in advance of taking a random sample) the
probability of committing a Type I error (falsely rejecting
the null).
α = P(Reject H 0 | H 0 )
40
Hypothesis Testing: Type I and Type II
Errors
!   A 5% significance level is typical of what social scientists
will choose; though 1% and 10% significant levels are
often seen.
 
If you choose to conduct a hypothesis test at the 5% significance
level, you are implicitly choosing to tolerate Type I error 5% of the
time (in repeated sampling).
!   Once the significance level is chosen, we try to minimize
the probability of Type II error
 
This is called maximizing the power of the test where power is
denoted as
π (θ ) = P(Reject H 0 | θ ) = 1 − P(Type II error|θ )
41
Hypothesis Testing
!   To test a null hypothesis against its alternative we need 1)
a test statistic; and 2) a critical value against which to
compare the test statistic.
 
 
 
A test statistic, T, is a function of the random sample (and hence is
a RV). Once we take a sample and plug it into the test statistic,
that realization will be denoted t.
We can define a rejection rule, by which if T takes on certain
values relative to the critical value, the null hypothesis is rejected.
The range of values of t that lead to rejection of the null
hypothesis, are referred to as the rejection region of t.
42
Testing hypotheses about the mean in a
normal population
!   Assume we’re trying to test whether the mean of a
Normal(µ,σ2) population takes on a certain value, µ0.
 
Our null hypothesis will be
H 0 : µ = µ0
 
We have three choices for the alternative hypothesis
H1 : µ > µ0
H1 : µ < µ0
H1 : µ ≠ µ0
 
 
The first two of these alternatives are a form of 1-sided
alternatives.
The third alternative is a 2-sided alternative.
43
Testing hypotheses about the mean in a
normal population
!   Consider the following setup of the hypothesis test:
H 0 : µ = µ0
H1 : µ < µ0
 
Given this setup, if we obtain an estimate of the sample mean that
is sufficiently smaller than µ0, we will reject the null in favor of the
alternative.
!   If the setup is, instead,
H 0 : µ = µ0
H1 : µ ≠ µ0
then if the sample mean is sufficiently smaller than or
larger than µ0 , we will reject the null in favor of the
alternative.
44
Testing hypotheses about the mean in a
normal population
!   If the setup is as follows:
H 0 : µ = µ0
H1 : µ > µ0
then we reject the null if the sample mean is sufficiently
larger than µ0.
 
But how do we determine what is sufficiently large or small?
!   Consider the RV
(Y − µ0 )
T=
S/ n
 
This is a standardization of the sample mean, with S substituted in
for σ. Under the null hypothesis, it has a t distribution with n-1 df.
45
Testing hypotheses about the mean in a
normal population
!   Our test statistic, called a t-statistic, will be
(y − µ0 )
t=
= (y − µ0 ) / se(y )
s/ n
 
 
 
If our significance level is 5%, then we need to choose c so that
P(T>c|H0)=0.05
The rejection rule, once we pick c, is “reject the null if t>c”.
Note that the value of c chosen and the rejection rule both depend
on the specific setup of the hypothesis test.
H 0 : µ = µ0
H1 : µ > µ0
46
Testing hypotheses about the mean in a
normal population
!   For the setup
H 0 : µ = µ0
H1 : µ < µ0
We would pick the critical value c so that P(T<c|
H0)=0.05, and we would reject the null if t<-c.
 
Both of these cases are cases of one-tailed tests, since the rejection
region lies in just one tail of the t distribution (the upper tail in the
first case, the lower tail in the second case)
!   Notice that the t-statistic is more likely to lead to rejection
of the null when the sample mean lies strongly in the
direction of the alternative hypothesis, and when the
standard error of the sample mean is small.
47
Testing hypotheses about the mean in a
normal population
!   A two tailed test: When the hypothesis test is set up as
H 0 : µ = µ0
H1 : µ ≠ µ0
we must be careful to pick c so that the significance of the
test remains α. If we want a test that is significant at the
5% level, we want the area in the rejection regions (far
tails) of the distribution to sum to 0.05. This means we
want 0.025 in the right tail and 0.025 in the left tail. In
general, we want to pick c so that there is an area of α/2 in
each tail’s rejection region.
!   The rejection rule for a two-tailed alternative is |t|>c.
48
Testing hypotheses about the mean in a
normal population
!   Note the language used to report results of a hypothesis
test: Based on the results she finds, the researcher either
1) Rejects the null hypothesis in favor of the alternative
hypothesis at the 100*α% significance level; or
2) Fails to reject the null hypothesis in favour of the
alternative hypothesis at the 100*α% significance level.
!   One never “accepts the null.” We could pick many
different values of θ as our null, and fail to reject the null
for each one, given a particular sample of data. If we
claimed to accept the null in each case, we would be
claiming that θ was equal to several different values. This
would be logically inconsistent.
49
iClickers
!   Suppose we conduct the following hypothesis test about
the mean, µ, of a population that is normally distributed.
H0: µ=10
H1: µ≠10
We set a significance level for the test, obtain a sample,
calculate the sample mean, construct the test statistic (as
described above), and find the critical values 2.03 and -2.03.
Question: What do we conclude if we obtain a test statistic of
1.5?
A) We reject the null.
B) We accept the null.
C) We conclude that µ=10
D) We fail to reject the null.
50
E) None of the above.
iClickers
!   Suppose we conduct the following hypothesis test about
the mean, µ, of a population that is normally distributed.
H0: µ=10
H1: µ>10
We set a significance level for the test, obtain a sample,
calculate the sample mean, construct the test statistic (as
described above), and find the critical value 1.75.
Question: What do we conclude if we obtain a test statistic of
-300?
A) We reject the null.
B) We accept the null.
C) We conclude that µ=10
D) We fail to reject the null.
51
E) None of the above.
iClickers
!   Question: Suppose we conduct a hypothesis test at the 5%
significance level about the mean of a population that is
normally distributed. Suppose that we reject the null in
favour of the alternative. What can we conclude from
this?
A) The null is wrong.
B) The null is right.
C) This finding is evidence against the null, but it’s not
conclusive proof that the null is wrong.
D) None of the above.
52
Asymptotic Tests for Non-Normal
Populations
!   With a big enough sample, we don’t need the population
distribution to be normal. We can invoke the central limit
theorem
  Under the null hypothesis, H 0 : µ = µ 0
a
T = n(Y − µ0 ) / S  N(0,1)
So, for large n the t statistic can be compared with critical values
from the standard normal distribution. Typically for n<120, people
will refer to the t distribution. For values of n>120, the t and
standard normal distributions become so close to each other, that
one can simply refer to the standard normal distribution for critical
values.
53
How Confidence Intervals and
Hypothesis Testing are Related
!   Consider a (two-sided) 95% confidence interval about a
sample mean
 
If the null is not contained in the confidence interval constructed
around the sample mean, then we can reject the null in favour of
the alternative
H1 : µ ≠ µ0
at the 5% significance level.
  If the null is contained in the confidence interval, then we fail to
reject the null in favour of the alternative
H1 : µ ≠ µ0
54
Economic versus Statistical Significance
!   Statistical significance is not all that matters in the analysis
that we do.
 
 
 
We may estimate the effect of education on earnings and
determine that the null hypothesis is rejected at the 5%
significance level
But suppose the confidence interval that we construct suggests that
the most that a year of extra education raises annual earnings by is
$10. In practical terms, this is as good as earnings having no effect
on education.
A common mistake of econometrics students is to fixate on
statistical significance. If you spend all your time staring at the
results of t-tests, you may forget to consider the magnitude of
effects that you’re measuring. It’s important to know not only
whether a measured effect is statistically significant, but whether it
is big or small--that is, is it economically (or practically)
significant?
55