Download Document

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture 3
Ordinary Least Squares
Assumptions, Confidence
Intervals, and Statistical
Significance
1
Sampling Terminology

Parameter
 fixed,

unknown number that describes the population
Statistic
 known value calculated from a sample
 a statistic is often used to estimate a parameter

Variability
 different
samples from the same population may yield
different values of the sample statistic

Sampling Distribution
 tells
what values a statistic takes and how often it
takes those values in repeated sampling
2
Parameter vs. Statistic
A properly chosen sample of 1600 people
across the United States was asked if they
regularly watch a certain television program,
and 24% said yes. The parameter of
interest here is the true proportion of all
people in the U.S. who watch the program,
while the statistic is the value 24% obtained
from the sample of 1600 people.
3
Parameter vs. Statistic
mean of a population is denoted by µ – this
is a parameter.
The mean of a sample is denoted by x – this is
a statistic. x is used to estimate µ.
The
The
true proportion of a population with a
certain trait is denoted by p – this is a
parameter.
The proportion of a sample with a certain trait is
denoted by p̂ (“p-hat”) – this is a statistic. p̂ is
used to estimate p.
4
The Law of Large Numbers
Consider sampling at random from a
population with true mean µ. As the
number of (independent)
observations sampled increases, the
mean of the sample gets closer and
closer to the true mean of the
population. ( x gets closer to µ )
5
The Law of Large Numbers
Gambling

The “house” in a gambling operation is not
gambling at all
 the
games are defined so that the gambler has a
negative expected gain per play (the true mean
gain after all possible plays is negative)
 each play is independent of previous plays, so the
law of large numbers guarantees that the average
winnings of a large number of customers will be
close the the (negative) true average
6
Figure 10.1: Odor Threshhold
7
Sampling Distribution

The sampling distribution of a statistic
is the distribution of values taken by the
statistic in all possible samples of the
same size (n) from the same population
 to
describe a distribution we need to specify
the shape, center, and spread
 we will discuss the distribution of the sample
mean (x-bar).
8
Case Study
Does This Wine Smell Bad?
Dimethyl sulfide (DMS) is sometimes present
in wine, causing “off-odors”. Winemakers
want to know the odor threshold – the lowest
concentration of DMS that the human nose
can detect. Different people have different
thresholds, and of interest is the mean
threshold in the population of all adults.
9
Case Study
Does This Wine Smell Bad?
Suppose the mean threshold of all
adults is =25 micrograms of DMS per
liter of wine, with a standard deviation
of =7 micrograms per liter and the
threshold values follow a bell-shaped
(normal) curve. Assume we KNOW
THE VARIANCE!!!
10
Where should 95% of all individual
threshold values fall?

mean plus or minus about two standard
deviations
25  2(7) = 11
25 + 2(7) = 39

95% should fall between 11 & 39

What about the mean (average) of a sample of
n adults? What values would be expected?
11
Sampling Distribution

What about the mean (average) of a sample of
n adults? What values would be expected?

Answer this by thinking: “What would happen if we
took many samples of n subjects from this
population?” (let’s say that n=10 subjects make up a sample)
 take
a large number of samples of n=10 subjects from
the population
 calculate the sample mean (x-bar) for each sample
 make a histogram of the values of x-bar
 examine the graphical display for shape, center, spread
12
Case Study
Does This Wine Smell Bad?
Mean threshold of all adults is =25 micrograms per liter,
with a standard deviation of =7 micrograms per liter and
the threshold values follow a bell-shaped (normal) curve.
Many (1000) repetitions of sampling n=10
adults from the population were simulated
and the resulting histogram of the 1000
x-bar values is on the next slide.
13
Case Study
Does This Wine Smell Bad?
14
Mean and Standard Deviation of
Sample Means
If numerous samples of size n are taken from
a population with mean  and standard
deviation  , then the mean of the sampling
distribution of X is  (the population mean)
and the standard deviation is: 
n
( is the population s.d.)
15
Mean and Standard Deviation of
Sample Means
the mean of X is , we say that X is
an unbiased estimator of 
Since
Individual
observations have standard
deviation , but sample means X from
samples of size n have standard deviation

n . Averages are less variable than
individual observations.
16
Sampling Distribution of
Sample Means
If individual observations have the N(µ, )
distribution, then the sample mean X of n
independent observations has the N(µ, / n )
distribution. (Note, σ is KNOWN)
“If measurements in the population follow a
Normal distribution, then so does the sample
mean.”
17
Case Study
Does This Wine Smell Bad?
Mean threshold of
all adults is =25
with a standard
deviation of =7,
and the threshold
values follow a
bell-shaped
(normal) curve.
(Population distribution)
18
19
Central Limit Theorem
If a random sample of size n is selected from
ANY population with mean  and standard
deviation  , then when n is “large” the
sampling distribution of the sample mean X
is approximately Normal:
X is approximately N(µ, / n )
“No matter what distribution the population
values follow, the sample mean will follow a
Normal distribution if the sample size is large.”
20
Central Limit Theorem:
Sample Size

How large must n be for the CLT to hold?
 depends
on how far the population
distribution is from Normal
the further from Normal, the larger the sample
size needed
 a sample size of 25 or 30 is typically large
enough for any population distribution
encountered in practice
 recall: if the population is Normal, any sample
size will work (n≥1)

21
Central Limit Theorem:
Sample Size and Distribution of x-bar
n=1
n=2
n=10
n=25
22
Statistical Inference

Provides methods for drawing
conclusions about a population from
sample data
 Confidence

What is the population mean?
 Tests

Intervals
of Significance
Is the population mean larger than 66.5?
 This
would be ONE-SIDED
23
Inference about a Mean
Simple Conditions-will be relaxed
1.
2.
3.
SRS from the population of interest
Variable has a Normal distribution
N(, ) in the population
Although the value of  is unknown,
the value of the population standard
deviation  is known
24
Confidence Interval
A level C confidence interval has two parts
1. An interval calculated from the data,
usually of the form:
estimate ± margin of error
2.
The confidence level C, which is the
probability that the interval will capture the
true parameter value in repeated samples;
that is, C is the success rate for the
method.
25
Case Study
NAEP Quantitative Scores
26
Case Study
NAEP Quantitative Scores
4.
The 68-95-99.7 rule
indicates that
x and  are within
two standard
deviations (4.2) of
each other in about
95% of all samples.
x  4.2 = 272  4.2 = 267.8
x + 4.2 = 272 + 4.2 = 276.2
27
Case Study
NAEP Quantitative Scores
So, if we estimate that  lies within 4.2 of
we’ll be right about 95% of the time.
x,
28
Confidence Interval
Mean of a Normal Population
Take an SRS of size n from a Normal
population with unknown mean  and
known std dev. . A level C confidence
interval for  is:
σ
x z
n

29
Confidence Interval
Mean of a Normal Population
LOOKING FOR z*
30
Case Study
NAEP Quantitative Scores
Using the 68-95-99.7 rule gave an approximate 95%
confidence interval. A more precise 95% confidence
interval can be found using the appropriate value of z*
(1.960) with the previous formula. Show how to find in
Table B.2 in next lecture
x  (1.960)(2. 1) = 272  4.116 = 267.884
x  (1.960)(2. 1) = 272  4.116 = 276.116
We are 95% confident that the average NAEP
quantitative score for all adult males is between
267.884 and 276.116.
31
But the sample distribution is narrower than the population distribution, by a
factor of √n.
n
Sample means,
n subjects
Thus, the estimates
x
x
gained from our samples

are always relatively

n
Population, x
individual subjects
close to the population
parameter µ.


If the population is normally distributed N(µ,σ),

so will the sampling distribution N(µ,σ/√n).
32
Ninety-five percent of all sample

n
means will be within roughly 2
standard deviations (2*/√n) of
the population parameter .

Because distances are
symmetrical, this implies that
the population parameter 
must be within roughly 2
standard deviations from the
sample average x, in 95% of
all samples.
This reasoning is the essence of statistical inference.
Red dot: mean value
of individual sample
33
Summary: Confidence Interval for
the Population Mean
34
Hypothesis Testing
Start by explaining when σ is known
 Move to unknown σ should be
straightforward

35
Stating Hypotheses
Null Hypothesis, H0




The statement being tested in a statistical test
is called the null hypothesis.
The test is designed to assess the strength of
evidence against the null hypothesis.
Usually the null hypothesis is a statement of
“no effect” or “no difference”, or it is a statement
of equality.
When performing a hypothesis test, we
assume that the null hypothesis is true until
we have sufficient evidence against it.
36
Stating Hypotheses
Alternative Hypothesis, Ha



The statement we are trying to find evidence for
is called the alternative hypothesis.
Usually the alternative hypothesis is a
statement of “there is an effect” or “there is a
difference”, or it is a statement of inequality.
The alternative hypothesis should express
the hopes or suspicions we bring to the
data. It is cheating to first look at the data
and then frame Ha to fit what the data show.
37
One-sided and two-sided tests
 A two-tail
or two-sided test of the population mean has these null and
alternative hypotheses:
H0: µ = [a specific number] Ha: µ  [a specific number]
 A one-tail
or one-sided test of a population mean has these null and
alternative hypotheses:
H0: µ = [a specific number] Ha: µ < [a specific number]
OR
H0: µ = [a specific number] Ha: µ > [a specific number]
The FDA tests whether a generic drug has an absorption extent similar to the known
absorption extent of the brand-name drug it is copying. Higher or lower absorption
would both be problematic, thus we test:
H0: µgeneric = µbrand Ha: µgeneric  µbrand
two-sided
38
The P-value
The packaging process has a known standard deviation  = 5 g.
H0: µ = 227 g versus Ha: µ ≠ 227 g
The average weight from your four random boxes is 222 g.
What is the probability of drawing a random sample such as yours if H0 is true?
Tests of statistical significance quantify the chance of obtaining a
particular random sample result if the null hypothesis were true.
This quantity is the P-value.
This is a way of assessing the “believability” of the null hypothesis given
the evidence provided by a random sample.
39
Interpreting a P-value
Could random variation alone account for the difference between the null
hypothesis and observations from a random sample?

A small P-value implies that random variation because of the sampling
process alone is not likely to account for the observed difference.

With a small P-value, we reject H0. The true property of the population is
significantly different from what was stated in H0.
Thus small P-values are strong evidence AGAINST H0.
But how small is small…?
40
P = 0.2758
P = 0.1711
P = 0.0892
P = 0.0735
Significant
P-value
???
P = 0.05
P = 0.01
When the shaded area becomes very small, the probability of drawing such a
sample at random gets very slim. Oftentimes, a P-value of 0.05 or less is
considered significant: The phenomenon observed is unlikely to be entirely
due to chance event from the random sampling.
41
The significance level a
The significance level, α, is the largest P-value tolerated for rejecting a true null
hypothesis (how much evidence against H0 we require). This value is decided arbitrarily
before conducting the test.

If the P-value is equal to or less than α (p ≤ α), then we reject H0.

If the P-value is greater than α (p > α), then we fail to reject H0.
Does the packaging machine need revision?
Two-sided test. The P-value is 4.56%.
* If α had been set to 5%, then the P-value would be significant.
* If α had been set to 1%, then the P-value would not be significant.
42
Implications
We don’t need to take lots of
random samples to “rebuild” the
sampling distribution and find 
at its center.
n
THE WHOLE
POINT OF THIS!!!!
All we need is one SRS of
Sample
size n, and relying on the
n
Population
properties of the sample
means distribution to infer
the population mean .

43
If σ is Estimated
Usually we do not know σ. So when it is
estimated, we have to use the tdistribution which is based on sample
size.
 When estimating σ using
σ, as the sample
^
size increases the t-distribution
approaches the normal curve.

44
Conditions for Inference
about a Mean



Data are from a SRS of size n.
Population has a Normal distribution
with mean  and standard deviation .
Both  and  are usually unknown.


we use inference to estimate .
Problem:  unknown means we cannot
use the z procedures previously
learned.
45
Standard Error



When we do not know the population standard
deviation  (which is usually the case), we
must estimate it with the sample standard
deviation s.
When the standard deviation of a statistic is
estimated from data, the result is called the
standard error of the statistic.
The standard error of the sample mean x is
s
n

46
One-Sample t Statistic

When we estimate  with s, our one-sample z
statistic becomes a one-sample t statistic.
x  μ0
z
σ
n


x  μ0
t
s
n
By changing the denominator to be the
standard error, our statistic no longer follows a
Normal distribution. The t test statistic follows
a t distribution with k = n – 1 degrees of
freedom.
47
The t Distributions



The t density curve is similar in shape to the
standard Normal curve. They are both
symmetric about 0 and bell-shaped.
The spread of the t distributions is a bit greater
than that of the standard Normal curve (i.e.,
the t curve is slightly “fatter”).
As the degrees of freedom k increase, the t(k)
density curve approaches the N(0, 1) curve
more closely. This is because s estimates 
more accurately as the sample size increases.
48
The t Distributions
49
Critical Values from
T-Distribution
50
51
How do we find specific t or z* values?
We can use a table of z/t values (Table B.2). For a particular confidence level C, the appropriate t or z* value
is just below it, by knowing the sample size. Lookup α=1-C. If you want 98%, lookup .02, two-tailed
Ex. For a 98% confidence level, z*=t=2.326
We can use software. In Excel when n is large, or σ is known:
=NORMINV(probability,mean,standard_dev)
gives z for a given cumulative probability.
Since we want the middle C probability, the probability we require is (1 - C)/2
Example: For a 98% confidence level (NOTE: This is now for 1% on each side)
= NORMINV (.01,0,1) = −2.32635 (= neg. z*)
52
Excel
TDIST(x, degrees_freedom, tails)
TDIST = p(X > x ), where X is a random variable that follows the t distribution (x positive).
Use this function in place of a table of critical values for the t distribution or to obtain the Pvalue for a calculated, positive t-value.



X is the standardized numeric value at which to evaluate the distribution (“t”).
Degrees_freedom is an integer indicating the number of degrees of freedom.
Tails specifies the number of distribution tails to return. If tails = 1, TDIST returns the onetailed P-value. If tails = 2, TDIST returns the two-tailed P-value.
TINV(probability, degrees_freedom)
Returns the t-value of the Student's t-distribution as a function of the probability and the
degrees of freedom (for example, t*).


Probability is the probability associated with the two-tailed Student’s t distribution.
Degrees_freedom is the number of degrees of freedom characterizing the distribution.
53
Sampling Distribution of ̂0 and ̂1
Based on Simulation

Assume the relationship between grades and
hours studied for an entire population of
students in an econometrics class looks like this
The upward sloping
line suggests that more
studying results in
higher grades. The
equation for the line is
E(Grades) = 50 + 2 ×
Hours, suggesting that
if a person spent 10
hours studying their
grade would be 70
points.
54
Sampling Distribution of ̂0
and ̂1 Based on Simulation

The more typical situation involves a sample—not a population


Our goal is to learn about a population’s slope and intercept via sample
data
A plot of the trend line from a random sample of 20 observations
from the econometric student population looks like this
The random sample
has an intercept of 55
and a slope of 1.5,
while the population
sample’s intercept was
50 with a slope of 2.
55
Sampling Distribution of ̂0
and ̂1 Based on Simulation
Additional random samples with 20
observations each result in different slopes
and intercepts
 If a computer calculated all the possible
slope estimates (with the same size
random sample n) we could graph the
distribution of possible values

 Then
use it to conduct confidence intervals
and hypothesis tests
56
Sampling Distribution of ̂0
and ̂1 Based on Simulation

The following graph represents 10,000
samples of 20 observations from the
population
Observations
• Roughly centered
around 2 (the
population’s slope)
• Has a standard
deviation of 0.55
• Appears to be normally
distributed
57
Sampling Distribution of ̂0
and ̂1 Based on Simulation

Based on this simulation we can make the
following statements
 Estimated
slope and intercepts are random
variables

Value is dependent upon random sample gathered
 Mean
of the different estimates is equal to the
population value
 Distribution of the estimators is
approximately normal… this is a huge
implication.
58
The Linearity of the OLS
Estimators
A linear estimator satisfies the condition
that it is a linear combination of the
dependent variable
 The estimator for the population slope is

Known as the
ordinary least
squares (OLS)
estimator.
59
The Variance of the OLS Estimator

The variance of the OLS slope estimator
describes the dispersion in the distribution
of OLS estimates around its mean
 Var(̂1)
is smaller if the variance of Y is
smaller

The smaller the variance of Y—the less likely we
are to observe extreme samples
60
Hypothesis Testing



Hypothesis tests are conducted analogously to
those concerning population means or
proportions
Suppose someone alleges the population slope
equals a and the alternative hypothesis is that 1
does not equal a
Formally, the null and alternative hypothesis are
61
Hypothesis Testing


The farther ̂1 is from a the more plausible the alternative
hypothesis
Formalized via the T-statistic
T

T represents the number of standard deviations the
sample slope is from the slope hypothesized under the
null


The larger this number, the more plausible the alternative
hypothesis
Would expect to observe a sample slope that deviates from the
true slope by more than 1.96 standard deviations, at most 5%
62
of the time
Hypothesis Testing

Would decide in favor of the alternative when
|T|



t
Where a is the desired significance level
If the sample is used to estimate the standard deviation
of the slope, s,̂ use the t distribution rather than the
normal distribution to determine critical values
To test one-sided alternatives proceed in a similar
fashion to that used when testing hypotheses for means
and proportions
63
Hypothesis Testing

Alternative approach
 Calculate
p-values rather than comparing a zor t-statistic to the relevant critical values
 Start with the z- or t-statistics and calculate
the probability of observing the value of ̂1 or
larger

With the z-statistic use the relationship to
estimate the value of a
|T|
t
64
The Multiple Regression Model

The multiple regression model has the following
assumptions
 The
dependent variable is a linear function of the
explanatory variables
 The errors have a mean of zero
 The errors have a constant variance
 The errors are uncorrelated across observations
 The error term is not correlated with any of the
explanatory variables
 The errors are drawn from a normal distribution
 No explanatory variable is an exact linear function of
other explanatory variables (important with dummy
variables)
65
Interpretation of the Regression
Coefficients

The value of the dependent variable will
change by j units with a one unit change
in the explanatory variable
 Holding
everything else constant (ceteris
paribus)
66
MLR Assumption 1


Linear in Parameters
MLR.1




Defines POPULATION model
The dependent variable y is
related to the independent
variables x and the error (or
disturbance)
Assumption MLR.1
y   0  1 x1   2 x2  ...   k xk  u
0,1,2,...,k are k unknown
population parameters
u is unobservable random error
67
MLR Assumption 2




Random Sampling
Use a random sample of size
n, {(xi,yi): i=1,2,…,n} from the
population model
Allows redefinition of MLR.1.
Want to use DATA to
estimate our parameters
All 0,1,...,k are k population
parameters to be estimated
Assumption MLR.2
{( xi , yi ) : i  1,2,..., n}
y   0  1 xi1   2 xi 2  ...   k xik  ui , i  1,2,..., n
68
MLR Assumption 3



No Perfect Collinearity
In the sample (and therefore in the population),
none of the independent variables is constant, and
there are no exact linear relationships among the
independent variables
With collinearity, there is no way to get ceteris
paribus relationship.
Example of linear relationsh ip, where spendA  spendB  totspend
voteA   0  1spendA   2 spendB   3totspend  u
69
MLR Assumption 4


Zero Conditional Mean
For a random sample,
implication is that NO
independent variable is
correlated with ANY
unobservable
(remember error
includes unobservable
data)
Assumption MLR.4
E[u | x1 , x2 ,..., xk ]  0
For a RANDOM Sample
E[ui | xi ]  0
for all i  1,2,..., n
70
Regression Estimators




As I have repeatedly said, in the multiple
regression case, we cannot use the same
methods for calculating our estimates as before.
We MUST control for the correlation (or
relationship) between different values for X
To get the values for our estimators of Beta we
are actually regressing each X variable against
ALL OTHER X variables first… Y is not involved
in the calculation.
Each Beta estimated with this method
CONTROLS for other x’s when being calculated.
71
Regression Estimators


As I have repeatedly said, in the multiple regression case,
we cannot use the same methods for calculating our
estimates for Beta as before.
We MUST control for the correlation (or relationship)
between different values for X
Estimator for 1 in MLR case
n
ˆ1 
 rˆ y
i 1
n
i1 i
2
ˆ
 ri1
i 1
Where is the residuals from the regression of
x1 on x2 , x3 , x4 and so on...
72