Download Statistical Inference: Estimation - SPIA UGA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical mechanics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Transcript
Statistical Inference: Estimation
Jamie Monogan
University of Georgia
Introduction to Data Analysis
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
1 / 15
Objectives
By the end of this meeting, participants should be able to:
I Distinguish between point and interval estimates of population
parameters.
I Define Type I and Type II errors and explain why they should be
avoided.
I Explain the use of a t-distribution for performing inference on a mean.
I Calculate and interpret a confidence interval.
I Deduce appropriate sample sizes for proportions and means.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
2 / 15
Goal: Drawing Inferences About Population Parameters
Two techniques:
Point estimation
What are some estimators we know?
Desirable properties:
Unbiased
Efficient
Consistent (MLE relies heavily on this.)
Interval estimation
Confidence interval = point estimate ± margin of error
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
3 / 15
Types of Error in Inference
Analogy: Serving a verdict on a jury v. the truth.
Typical form of a statistical hypothesis:
H0 : β = 0
HA : β > 0
Type I Error: Incorrectly reject a true null hypothesis.
The null hypothesis (that no relationship exists) is true. Our analysis,
however, incorrectly leads us to conclude that a relationship exists.
Type II Error: Incorrectly accept a false null hypothesis.
The null hypothesis (that no relationship exists) is false. Our analysis,
however, incorrectly leads us to conclude that no relationship exists.
Researchers need to report the Type I error rate.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
4 / 15
Defining the Confidence Interval
The Bernoulli PMF has mean π and standard deviation
p
π(1 − π).
Assume the sample size is sufficiently large so we can use the normal
approximation for π.
We are interested in building a 1 − α confidence interval for the
unknown π.
Start with p̂ = Ȳ .
p
Define SE (p̂) = p̂(1 − p̂)/n.
We can standardize using the z-score for p̂:
z=
Jamie Monogan (UGA)
p̂ − π
SE (p̂)
Statistical Inference: Estimation
POLS 7012
5 / 15
Defining the Confidence Interval
I Now the confidence interval is defined by:
1 − α = P(−z ∗ ≤ z ≤ z ∗ )
= P(−z ∗ ≤
p̂ − π
≤ z ∗)
SE (p̂)
= P(−z ∗ SE (p̂) ≤ p̂ − π ≤ z ∗ SE (p̂))
= P(p̂ − z ∗ SE (p̂) ≤ π ≤ p̂ + z ∗ SE (p̂))
I And denoted: [p̂ − z ∗ SE (p̂), p̂ + z ∗ SE (p̂)].
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
6 / 15
Using R to Get the z ∗ Values
m <- 50; n <- 10
Y <- rbinom(m,n,0.8)
p.hat <- mean(Y)/n
SE <- sqrt(p.hat*(1-p.hat)/m)
alpha <- 0.05
z.star <- -qnorm(alpha/2)
c(p.hat - SE*z.star, p.hat + SE*z.star)
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
7 / 15
I Which of these is the correct interpretation of a (1 −
α) confidence interval?
. An interval that has a
1 − α% chance of containing the true value
of the parameter.
. An interval that over
1 − α% of replications
contains the true value
of the parameter, on
average.
Confidence Intervals
Interpreting Confidence
e
Coverage
I Note: If you use Bayesian
methods, you can make different kinds of statements.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
8 / 15
Confidence Intervals for the Population Mean, µ
Start with a sample, X1 , X2 , . . . , Xn , where n is sufficiently large that
we can rely on the CLT.
So the z statistic has standard normal distribution:
z=
Jamie Monogan (UGA)
X̄ − µ
X̄ − µ
√ =
σ/ n
SD(X̄ )
Statistical Inference: Estimation
POLS 7012
9 / 15
Confidence Intervals for the Population Mean, µ, Cont.
But we don’t know σ 2 for sure, so we use the sample variance as a
substitute:
n
1 X
s2 =
(Xi − X̄ )
n−1
i=1
Now we have a “t” statistic instead of a “z” statistic:
t=
X̄ − µ
X̄ − µ
√ =
s/ n
SE (X̄ )
which has the student’s-t distribution with n − 1 degrees of freedom
(a robust statistic).
Note that these are called pivotal quanties since we know the
distribution.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
10 / 15
Comparing Normal and Student’s-t Distributions
0.2
0.0
0.1
norm.dens
0.3
0.4
Normal in green, along with t distributions with 1, 3, & 10 degrees of freedom
ï4
ï2
0
2
4
ruler
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
11 / 15
Calculating the Confidence Interval for µ
I The CI is just:
CIµ = X̄ ± t ∗ SE (X̄ )
where t∗ is the CDF value of the student’s-t corresponding to the α
of interest.
I Find CDF values in R:
qt(0.025,df=3)
[1] -3.182446
qt(0.025,df=25)
[1] -2.059539
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
12 / 15
Sample Size Considerations
I If you are designing your own study, you will likely want to consider
how large a sample you need before you start collecting data.
I If you have an exact margin of error in mind and a good guess of the
true population variance, you can deduce the necessary sample size.
I All you have to do is solve for n in the margin of error formula.
I Related: power analysis.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
13 / 15
Power Analysis
Focus: Significance Testing
The seminal book: Cohen, Jacob. 1988. Statistical Power Analysis for
the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum Associates.
Notion: Before designing a study, choose a sample size that will
shrink your Type II error rate below an acceptable level.
We define:
Probability of a Type I error: α=P(reject H0 |H0 is true)
Probability of a Type II error: β=P(fail to reject H0 |H0 is false)
power = 1 − β = 1−P(fail to reject H0 |H0 is false)
Given three of the following, we can algebraically solve for the fourth:
Sample size
Effect size
Type I error rate
Power (1−Type II error rate)
R: “pwr” library
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
14 / 15
For Next Time
Review the objectives of prior classes and prepare for the midterm
exam. (To be held March 7.)
Come with questions about exam-related objectives (first priority),
software usage, research projects, and anything else you’re concerned
about in the class.
If your questions do not fill-up class time, we will have a bonus
lecture on graphing.
Answer questions 5.14, 5.20, & 5.42.
In the software of your choice:
Open the 2004 National Election Study,
http://monogan.myweb.uga.edu/teaching/us/nes2004.dta
You want to draw inferences about the population mean of years of
education (educ).
What is the sample mean for years of education? What is the standard
error of your estimate of the mean? What is the 90% confidence
interval for the mean?
Interpret your point and interval estimates of the population mean.
Jamie Monogan (UGA)
Statistical Inference: Estimation
POLS 7012
15 / 15