* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Misuse of statistics wikipedia , lookup
Lecture 3
Ordinary Least Squares
Assumptions, Confidence
Intervals, and Statistical
Significance
1
Sampling Terminology
Parameter
fixed,
unknown number that describes the population
Statistic
known value calculated from a sample
a statistic is often used to estimate a parameter
Variability
different
samples from the same population may yield
different values of the sample statistic
Sampling Distribution
tells
what values a statistic takes and how often it
takes those values in repeated sampling
2
Parameter vs. Statistic
A properly chosen sample of 1600 people
across the United States was asked if they
regularly watch a certain television program,
and 24% said yes. The parameter of
interest here is the true proportion of all
people in the U.S. who watch the program,
while the statistic is the value 24% obtained
from the sample of 1600 people.
3
Parameter vs. Statistic
mean of a population is denoted by µ – this
is a parameter.
The mean of a sample is denoted by x – this is
a statistic. x is used to estimate µ.
The
The
true proportion of a population with a
certain trait is denoted by p – this is a
parameter.
The proportion of a sample with a certain trait is
denoted by p̂ (“p-hat”) – this is a statistic. p̂ is
used to estimate p.
4
The Law of Large Numbers
Consider sampling at random from a
population with true mean µ. As the
number of (independent)
observations sampled increases, the
mean of the sample gets closer and
closer to the true mean of the
population. ( x gets closer to µ )
5
The Law of Large Numbers
Gambling
The “house” in a gambling operation is not
gambling at all
the
games are defined so that the gambler has a
negative expected gain per play (the true mean
gain after all possible plays is negative)
each play is independent of previous plays, so the
law of large numbers guarantees that the average
winnings of a large number of customers will be
close the the (negative) true average
6
Figure 10.1: Odor Threshhold
7
Sampling Distribution
The sampling distribution of a statistic
is the distribution of values taken by the
statistic in all possible samples of the
same size (n) from the same population
to
describe a distribution we need to specify
the shape, center, and spread
we will discuss the distribution of the sample
mean (x-bar).
8
Case Study
Does This Wine Smell Bad?
Dimethyl sulfide (DMS) is sometimes present
in wine, causing “off-odors”. Winemakers
want to know the odor threshold – the lowest
concentration of DMS that the human nose
can detect. Different people have different
thresholds, and of interest is the mean
threshold in the population of all adults.
9
Case Study
Does This Wine Smell Bad?
Suppose the mean threshold of all
adults is =25 micrograms of DMS per
liter of wine, with a standard deviation
of =7 micrograms per liter and the
threshold values follow a bell-shaped
(normal) curve. Assume we KNOW
THE VARIANCE!!!
10
Where should 95% of all individual
threshold values fall?
mean plus or minus about two standard
deviations
25 2(7) = 11
25 + 2(7) = 39
95% should fall between 11 & 39
What about the mean (average) of a sample of
n adults? What values would be expected?
11
Sampling Distribution
What about the mean (average) of a sample of
n adults? What values would be expected?
Answer this by thinking: “What would happen if we
took many samples of n subjects from this
population?” (let’s say that n=10 subjects make up a sample)
take
a large number of samples of n=10 subjects from
the population
calculate the sample mean (x-bar) for each sample
make a histogram of the values of x-bar
examine the graphical display for shape, center, spread
12
Case Study
Does This Wine Smell Bad?
Mean threshold of all adults is =25 micrograms per liter,
with a standard deviation of =7 micrograms per liter and
the threshold values follow a bell-shaped (normal) curve.
Many (1000) repetitions of sampling n=10
adults from the population were simulated
and the resulting histogram of the 1000
x-bar values is on the next slide.
13
Case Study
Does This Wine Smell Bad?
14
Mean and Standard Deviation of
Sample Means
If numerous samples of size n are taken from
a population with mean and standard
deviation , then the mean of the sampling
distribution of X is (the population mean)
and the standard deviation is:
n
( is the population s.d.)
15
Mean and Standard Deviation of
Sample Means
the mean of X is , we say that X is
an unbiased estimator of
Since
Individual
observations have standard
deviation , but sample means X from
samples of size n have standard deviation
n . Averages are less variable than
individual observations.
16
Sampling Distribution of
Sample Means
If individual observations have the N(µ, )
distribution, then the sample mean X of n
independent observations has the N(µ, / n )
distribution. (Note, σ is KNOWN)
“If measurements in the population follow a
Normal distribution, then so does the sample
mean.”
17
Case Study
Does This Wine Smell Bad?
Mean threshold of
all adults is =25
with a standard
deviation of =7,
and the threshold
values follow a
bell-shaped
(normal) curve.
(Population distribution)
18
19
Central Limit Theorem
If a random sample of size n is selected from
ANY population with mean and standard
deviation , then when n is “large” the
sampling distribution of the sample mean X
is approximately Normal:
X is approximately N(µ, / n )
“No matter what distribution the population
values follow, the sample mean will follow a
Normal distribution if the sample size is large.”
20
Central Limit Theorem:
Sample Size
How large must n be for the CLT to hold?
depends
on how far the population
distribution is from Normal
the further from Normal, the larger the sample
size needed
a sample size of 25 or 30 is typically large
enough for any population distribution
encountered in practice
recall: if the population is Normal, any sample
size will work (n≥1)
21
Central Limit Theorem:
Sample Size and Distribution of x-bar
n=1
n=2
n=10
n=25
22
Statistical Inference
Provides methods for drawing
conclusions about a population from
sample data
Confidence
What is the population mean?
Tests
Intervals
of Significance
Is the population mean larger than 66.5?
This
would be ONE-SIDED
23
Inference about a Mean
Simple Conditions-will be relaxed
1.
2.
3.
SRS from the population of interest
Variable has a Normal distribution
N(, ) in the population
Although the value of is unknown,
the value of the population standard
deviation is known
24
Confidence Interval
A level C confidence interval has two parts
1. An interval calculated from the data,
usually of the form:
estimate ± margin of error
2.
The confidence level C, which is the
probability that the interval will capture the
true parameter value in repeated samples;
that is, C is the success rate for the
method.
25
Case Study
NAEP Quantitative Scores
26
Case Study
NAEP Quantitative Scores
4.
The 68-95-99.7 rule
indicates that
x and are within
two standard
deviations (4.2) of
each other in about
95% of all samples.
x 4.2 = 272 4.2 = 267.8
x + 4.2 = 272 + 4.2 = 276.2
27
Case Study
NAEP Quantitative Scores
So, if we estimate that lies within 4.2 of
we’ll be right about 95% of the time.
x,
28
Confidence Interval
Mean of a Normal Population
Take an SRS of size n from a Normal
population with unknown mean and
known std dev. . A level C confidence
interval for is:
σ
x z
n
29
Confidence Interval
Mean of a Normal Population
LOOKING FOR z*
30
Case Study
NAEP Quantitative Scores
Using the 68-95-99.7 rule gave an approximate 95%
confidence interval. A more precise 95% confidence
interval can be found using the appropriate value of z*
(1.960) with the previous formula. Show how to find in
Table B.2 in next lecture
x (1.960)(2. 1) = 272 4.116 = 267.884
x (1.960)(2. 1) = 272 4.116 = 276.116
We are 95% confident that the average NAEP
quantitative score for all adult males is between
267.884 and 276.116.
31
But the sample distribution is narrower than the population distribution, by a
factor of √n.
n
Sample means,
n subjects
Thus, the estimates
x
x
gained from our samples
are always relatively
n
Population, x
individual subjects
close to the population
parameter µ.
If the population is normally distributed N(µ,σ),
so will the sampling distribution N(µ,σ/√n).
32
Ninety-five percent of all sample
n
means will be within roughly 2
standard deviations (2*/√n) of
the population parameter .
Because distances are
symmetrical, this implies that
the population parameter
must be within roughly 2
standard deviations from the
sample average x, in 95% of
all samples.
This reasoning is the essence of statistical inference.
Red dot: mean value
of individual sample
33
Summary: Confidence Interval for
the Population Mean
34
Hypothesis Testing
Start by explaining when σ is known
Move to unknown σ should be
straightforward
35
Stating Hypotheses
Null Hypothesis, H0
The statement being tested in a statistical test
is called the null hypothesis.
The test is designed to assess the strength of
evidence against the null hypothesis.
Usually the null hypothesis is a statement of
“no effect” or “no difference”, or it is a statement
of equality.
When performing a hypothesis test, we
assume that the null hypothesis is true until
we have sufficient evidence against it.
36
Stating Hypotheses
Alternative Hypothesis, Ha
The statement we are trying to find evidence for
is called the alternative hypothesis.
Usually the alternative hypothesis is a
statement of “there is an effect” or “there is a
difference”, or it is a statement of inequality.
The alternative hypothesis should express
the hopes or suspicions we bring to the
data. It is cheating to first look at the data
and then frame Ha to fit what the data show.
37
One-sided and two-sided tests
A two-tail
or two-sided test of the population mean has these null and
alternative hypotheses:
H0: µ = [a specific number] Ha: µ [a specific number]
A one-tail
or one-sided test of a population mean has these null and
alternative hypotheses:
H0: µ = [a specific number] Ha: µ < [a specific number]
OR
H0: µ = [a specific number] Ha: µ > [a specific number]
The FDA tests whether a generic drug has an absorption extent similar to the known
absorption extent of the brand-name drug it is copying. Higher or lower absorption
would both be problematic, thus we test:
H0: µgeneric = µbrand Ha: µgeneric µbrand
two-sided
38
The P-value
The packaging process has a known standard deviation = 5 g.
H0: µ = 227 g versus Ha: µ ≠ 227 g
The average weight from your four random boxes is 222 g.
What is the probability of drawing a random sample such as yours if H0 is true?
Tests of statistical significance quantify the chance of obtaining a
particular random sample result if the null hypothesis were true.
This quantity is the P-value.
This is a way of assessing the “believability” of the null hypothesis given
the evidence provided by a random sample.
39
Interpreting a P-value
Could random variation alone account for the difference between the null
hypothesis and observations from a random sample?
A small P-value implies that random variation because of the sampling
process alone is not likely to account for the observed difference.
With a small P-value, we reject H0. The true property of the population is
significantly different from what was stated in H0.
Thus small P-values are strong evidence AGAINST H0.
But how small is small…?
40
P = 0.2758
P = 0.1711
P = 0.0892
P = 0.0735
Significant
P-value
???
P = 0.05
P = 0.01
When the shaded area becomes very small, the probability of drawing such a
sample at random gets very slim. Oftentimes, a P-value of 0.05 or less is
considered significant: The phenomenon observed is unlikely to be entirely
due to chance event from the random sampling.
41
The significance level a
The significance level, α, is the largest P-value tolerated for rejecting a true null
hypothesis (how much evidence against H0 we require). This value is decided arbitrarily
before conducting the test.
If the P-value is equal to or less than α (p ≤ α), then we reject H0.
If the P-value is greater than α (p > α), then we fail to reject H0.
Does the packaging machine need revision?
Two-sided test. The P-value is 4.56%.
* If α had been set to 5%, then the P-value would be significant.
* If α had been set to 1%, then the P-value would not be significant.
42
Implications
We don’t need to take lots of
random samples to “rebuild” the
sampling distribution and find
at its center.
n
THE WHOLE
POINT OF THIS!!!!
All we need is one SRS of
Sample
size n, and relying on the
n
Population
properties of the sample
means distribution to infer
the population mean .
43
If σ is Estimated
Usually we do not know σ. So when it is
estimated, we have to use the tdistribution which is based on sample
size.
When estimating σ using
σ, as the sample
^
size increases the t-distribution
approaches the normal curve.
44
Conditions for Inference
about a Mean
Data are from a SRS of size n.
Population has a Normal distribution
with mean and standard deviation .
Both and are usually unknown.
we use inference to estimate .
Problem: unknown means we cannot
use the z procedures previously
learned.
45
Standard Error
When we do not know the population standard
deviation (which is usually the case), we
must estimate it with the sample standard
deviation s.
When the standard deviation of a statistic is
estimated from data, the result is called the
standard error of the statistic.
The standard error of the sample mean x is
s
n
46
One-Sample t Statistic
When we estimate with s, our one-sample z
statistic becomes a one-sample t statistic.
x μ0
z
σ
n
x μ0
t
s
n
By changing the denominator to be the
standard error, our statistic no longer follows a
Normal distribution. The t test statistic follows
a t distribution with k = n – 1 degrees of
freedom.
47
The t Distributions
The t density curve is similar in shape to the
standard Normal curve. They are both
symmetric about 0 and bell-shaped.
The spread of the t distributions is a bit greater
than that of the standard Normal curve (i.e.,
the t curve is slightly “fatter”).
As the degrees of freedom k increase, the t(k)
density curve approaches the N(0, 1) curve
more closely. This is because s estimates
more accurately as the sample size increases.
48
The t Distributions
49
Critical Values from
T-Distribution
50
51
How do we find specific t or z* values?
We can use a table of z/t values (Table B.2). For a particular confidence level C, the appropriate t or z* value
is just below it, by knowing the sample size. Lookup α=1-C. If you want 98%, lookup .02, two-tailed
Ex. For a 98% confidence level, z*=t=2.326
We can use software. In Excel when n is large, or σ is known:
=NORMINV(probability,mean,standard_dev)
gives z for a given cumulative probability.
Since we want the middle C probability, the probability we require is (1 - C)/2
Example: For a 98% confidence level (NOTE: This is now for 1% on each side)
= NORMINV (.01,0,1) = −2.32635 (= neg. z*)
52
Excel
TDIST(x, degrees_freedom, tails)
TDIST = p(X > x ), where X is a random variable that follows the t distribution (x positive).
Use this function in place of a table of critical values for the t distribution or to obtain the Pvalue for a calculated, positive t-value.
X is the standardized numeric value at which to evaluate the distribution (“t”).
Degrees_freedom is an integer indicating the number of degrees of freedom.
Tails specifies the number of distribution tails to return. If tails = 1, TDIST returns the onetailed P-value. If tails = 2, TDIST returns the two-tailed P-value.
TINV(probability, degrees_freedom)
Returns the t-value of the Student's t-distribution as a function of the probability and the
degrees of freedom (for example, t*).
Probability is the probability associated with the two-tailed Student’s t distribution.
Degrees_freedom is the number of degrees of freedom characterizing the distribution.
53
Sampling Distribution of ̂0 and ̂1
Based on Simulation
Assume the relationship between grades and
hours studied for an entire population of
students in an econometrics class looks like this
The upward sloping
line suggests that more
studying results in
higher grades. The
equation for the line is
E(Grades) = 50 + 2 ×
Hours, suggesting that
if a person spent 10
hours studying their
grade would be 70
points.
54
Sampling Distribution of ̂0
and ̂1 Based on Simulation
The more typical situation involves a sample—not a population
Our goal is to learn about a population’s slope and intercept via sample
data
A plot of the trend line from a random sample of 20 observations
from the econometric student population looks like this
The random sample
has an intercept of 55
and a slope of 1.5,
while the population
sample’s intercept was
50 with a slope of 2.
55
Sampling Distribution of ̂0
and ̂1 Based on Simulation
Additional random samples with 20
observations each result in different slopes
and intercepts
If a computer calculated all the possible
slope estimates (with the same size
random sample n) we could graph the
distribution of possible values
Then
use it to conduct confidence intervals
and hypothesis tests
56
Sampling Distribution of ̂0
and ̂1 Based on Simulation
The following graph represents 10,000
samples of 20 observations from the
population
Observations
• Roughly centered
around 2 (the
population’s slope)
• Has a standard
deviation of 0.55
• Appears to be normally
distributed
57
Sampling Distribution of ̂0
and ̂1 Based on Simulation
Based on this simulation we can make the
following statements
Estimated
slope and intercepts are random
variables
Value is dependent upon random sample gathered
Mean
of the different estimates is equal to the
population value
Distribution of the estimators is
approximately normal… this is a huge
implication.
58
The Linearity of the OLS
Estimators
A linear estimator satisfies the condition
that it is a linear combination of the
dependent variable
The estimator for the population slope is
Known as the
ordinary least
squares (OLS)
estimator.
59
The Variance of the OLS Estimator
The variance of the OLS slope estimator
describes the dispersion in the distribution
of OLS estimates around its mean
Var(̂1)
is smaller if the variance of Y is
smaller
The smaller the variance of Y—the less likely we
are to observe extreme samples
60
Hypothesis Testing
Hypothesis tests are conducted analogously to
those concerning population means or
proportions
Suppose someone alleges the population slope
equals a and the alternative hypothesis is that 1
does not equal a
Formally, the null and alternative hypothesis are
61
Hypothesis Testing
The farther ̂1 is from a the more plausible the alternative
hypothesis
Formalized via the T-statistic
T
T represents the number of standard deviations the
sample slope is from the slope hypothesized under the
null
The larger this number, the more plausible the alternative
hypothesis
Would expect to observe a sample slope that deviates from the
true slope by more than 1.96 standard deviations, at most 5%
62
of the time
Hypothesis Testing
Would decide in favor of the alternative when
|T|
t
Where a is the desired significance level
If the sample is used to estimate the standard deviation
of the slope, s,̂ use the t distribution rather than the
normal distribution to determine critical values
To test one-sided alternatives proceed in a similar
fashion to that used when testing hypotheses for means
and proportions
63
Hypothesis Testing
Alternative approach
Calculate
p-values rather than comparing a zor t-statistic to the relevant critical values
Start with the z- or t-statistics and calculate
the probability of observing the value of ̂1 or
larger
With the z-statistic use the relationship to
estimate the value of a
|T|
t
64
The Multiple Regression Model
The multiple regression model has the following
assumptions
The
dependent variable is a linear function of the
explanatory variables
The errors have a mean of zero
The errors have a constant variance
The errors are uncorrelated across observations
The error term is not correlated with any of the
explanatory variables
The errors are drawn from a normal distribution
No explanatory variable is an exact linear function of
other explanatory variables (important with dummy
variables)
65
Interpretation of the Regression
Coefficients
The value of the dependent variable will
change by j units with a one unit change
in the explanatory variable
Holding
everything else constant (ceteris
paribus)
66
MLR Assumption 1
Linear in Parameters
MLR.1
Defines POPULATION model
The dependent variable y is
related to the independent
variables x and the error (or
disturbance)
Assumption MLR.1
y 0 1 x1 2 x2 ... k xk u
0,1,2,...,k are k unknown
population parameters
u is unobservable random error
67
MLR Assumption 2
Random Sampling
Use a random sample of size
n, {(xi,yi): i=1,2,…,n} from the
population model
Allows redefinition of MLR.1.
Want to use DATA to
estimate our parameters
All 0,1,...,k are k population
parameters to be estimated
Assumption MLR.2
{( xi , yi ) : i 1,2,..., n}
y 0 1 xi1 2 xi 2 ... k xik ui , i 1,2,..., n
68
MLR Assumption 3
No Perfect Collinearity
In the sample (and therefore in the population),
none of the independent variables is constant, and
there are no exact linear relationships among the
independent variables
With collinearity, there is no way to get ceteris
paribus relationship.
Example of linear relationsh ip, where spendA spendB totspend
voteA 0 1spendA 2 spendB 3totspend u
69
MLR Assumption 4
Zero Conditional Mean
For a random sample,
implication is that NO
independent variable is
correlated with ANY
unobservable
(remember error
includes unobservable
data)
Assumption MLR.4
E[u | x1 , x2 ,..., xk ] 0
For a RANDOM Sample
E[ui | xi ] 0
for all i 1,2,..., n
70
Regression Estimators
As I have repeatedly said, in the multiple
regression case, we cannot use the same
methods for calculating our estimates as before.
We MUST control for the correlation (or
relationship) between different values for X
To get the values for our estimators of Beta we
are actually regressing each X variable against
ALL OTHER X variables first… Y is not involved
in the calculation.
Each Beta estimated with this method
CONTROLS for other x’s when being calculated.
71
Regression Estimators
As I have repeatedly said, in the multiple regression case,
we cannot use the same methods for calculating our
estimates for Beta as before.
We MUST control for the correlation (or relationship)
between different values for X
Estimator for 1 in MLR case
n
ˆ1
rˆ y
i 1
n
i1 i
2
ˆ
ri1
i 1
Where is the residuals from the regression of
x1 on x2 , x3 , x4 and so on...
72