Download Statistical Estimation and Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical Estimation and
Sampling Distributions
Chapter 7
1
Introduction
Poll: Most think bin Laden planning another U.S. attack
Wednesday, August 23, 2006; CNN
As the five-year anniversary of the September 11, 2001,
terrorist attacks approaches, nearly three-fourths of those
responding to a CNN poll said they believe Osama bin
Laden is planning another significant attack against the
United States. Seventy-four percent of the 1,033 adult
Americans polled said they believe an attack is being
planned, according to the poll conducted by Opinion
Research Corporation on behalf of CNN…The poll was
conducted by telephone August 18-20 with 1,033 adult
Americans. The margin of error for the question on whether
bin Laden is planning another attack is plus or minus 3
percentage points.
2
Introduction
How do use probability to infer something about
unobserved population characteristics, using statistics
from our sample which we do observe?
The purpose of statistics is to characterize the underlying
population from which a sample was taken – i.e., to infer
something about the population using information from
the sample.
3
Introduction
Parameters are characteristics of the underlying
population.
Statistics are quantities we compute from our sample
in order to estimate the values of the population
parameters.
Example: Consider the mean height of all USU
students. Let μ represent this average. In order to
estimate μ, we may sample, say, 100 students at
random, measure their individual heights, and then
compute X , the average of our sample. Thus, μ is the
population parameter representing true mean height.
X is an estimate of μ.
4
7.2 Point Estimates
A point estimate θˆ of an unknown parameter θ is a
statistic that represents the “best guess” at the value
of θ. There may be more than one good point
estimates of a parameter.
An estimate is unbiased if E(θˆ ) = θ.
Otherwise, bias = E(θˆ ) - θ.
All else being equal (e.g. equal variances), the smaller
the magnitude of the bias, the better.
Example: Suppose that E(X1) = μ and E(X2) = μ.
Is μˆ = X 1 / 2 + X 2 / 2 an unbiased estimate of μ?
5
7.2 Point Estimates
X~B(n,p), then construct a point estimate of the success
probability p. Is the point estimate unbiased?
If X1, … , Xn is a sample of observations from a probability
distribution with a mean μ, construct a point estimate
of μ. Is the point estimate unbiased?
If X1, … , Xn is a sample of observations from a probability
distribution with variance σ2, construct a point
estimate of σ2. Is the point estimate unbiased?
6
7.2 Point Estimates
pˆ =
X
n
If X~B(n,p), then
is an unbiased point estimate of
the success probability p.
If X1, … , Xn is a sample of observations from a probability
distribution with a mean μ, then the sample mean μ̂ = X
is an unbiased point estimate of the population mean μ.
If X1, … , Xn is a sample of observations from a probability
2, then the sample variance
distribution
with
variance
σ
n
( X i − X )2
∑
2
2
i =1
is an unbiased point estimate of the
σˆ = S =
n −1
population variance σ2.
7
7.2 Point Estimates
The sample Proportion is an unbiased point estimate of the
population proportion.
p̂ estimates p
The sample mean is an unbiased point estimate of the population
mean.
X estimates μ
The sample variance is an unbiased point estimate of the population
variance.
S2 estimates σ2
8
7.3 Sampling Distributions
Distribution of a Sample Proportion
A population proportion p is just an average of 1’s and
0’s. The estimate – computed from a sample of size n –
is likewise a sample average of 1’s and 0’s.
What does the Central Limit Theorem say about the
distribution of a population proportion? What are the
mean and variance of that distribution?
The standard deviation of this sampling distribution is
referred to as the standard error, or s.e.( p̂).
9
7.3 Sampling Distributions
Distribution of a Sample Proportion
X
If X~B(n,p) then the sample proportion p
ˆ=
n
has the approximate distribution
⎛ p (1 − p ) ⎞
pˆ ~ N ⎜ p,
⎟
n
⎠
⎝
10
7.3 Sampling Distributions
How do we use the sampling distribution of an estimated
proportion to infer something about the underlying
population based upon what we observe in our sample?
Consider a gender question in science and engineering:
Does the fact that there are 17 women out of 39 total students
in a statistics class for scientists and engineers say anything
about the gender breakdown in the underlying population of
science and engineering students at USU?
We can address this by asking: What is the probability of our
observing 17 or fewer women in a sample of size 39, if we
assume for the sake of argument that the underlying
population is 55% female (same as the USU breakdown)?
11
7.3 Sampling Distributions
What does “margin of error” mean when sample
proportions are reported in the news? Consider again
the article shown previously. What is the margin of
error?
What is the probability that a normally distributed
random variable is within two standard deviations of
it’s mean?
The margin of error represents approximately 2 standard
errors (1.96 standard errors to be exact). Hence, the
interval that is ±1.96 standard errors around pˆ is
called a 95% confidence interval. That is, the
researchers are 95% certain that the interval contains
the true population proportion p.
12
7.3 Sampling Distributions
Distribution of the Sample Variance
If X1, … , Xn is a sample of observations from a
probability distribution with variance σ2, then
the sample variance has the distribution
S ~σ
2
2
χ
2
n −1
n −1
13
7.3 Sampling Distributions
Distribution of the Sample Mean
If X1, … , Xn is a sample of observations from a
probability distribution with a mean μ, then the
sample mean has the distribution
⎛ σ2 ⎞
μˆ = X ~ N ⎜⎜ μ , ⎟⎟
n ⎠
⎝
And therefore
Z=
n(X − μ)
σ
~ N (0,1)
14
7.3 Sampling Distributions
The t Statistic
The question is, how do we make inferences about μ when we
don’t know σ2?
If X1, … , Xn are normally distributed with mean μ and variance
σ2, then
n( X − μ)
Z=
~ N (0,1)
σ
If σ is unknown, it can be replaced with the known quantity S
(the sample standard deviation). Then the distribution of the
statistic T follows a t distribution with n – 1 degrees of
freedom.
n(X − μ)
T=
~ t n −1
S
15
7.3 Sampling Distributions
Recall: A t-distribution with
ν degrees of freedom is
defined as follows:
tν =
N (0,1)
χν /ν
2
where the N(0,1) and Xv2 are
independently distributed.
As v→∞, the t-distribution
tends toward a standard
normal distribution.
16
17
7.3 Sampling Distributions
1.
Suppose that we have a sample X1,…,X16 that is normally
distributed with mean μ and variance σ2. What is the value
c for which
P( (X - μ ) / S ≤ c) = 0.95 | ?
We can use data from a random sample of individuals in our
class to draw conclusions about the class as a whole.
2. Construct a point estimate of the proportion of juniors in
our class. What is the standard error of the estimate?
3. Construct a point estimate of the average shoe size of
students in our class. What is the standard error of the
estimate?
18
Related documents