Download Point and Interval Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Point and Interval Estimation
Mathematics Education Section
CDI, EMB
Email: [email protected]
Normal Distribution
Continuous random variables and
normal distribution
Let X be a continuous random variable defined on
an interval Ω.
The probability distribution of X is described by a
function f(x), where x ∈ Ω, such that
(a) f(x) ≥ 0, x ∈ Ω.
(b)
∫
Ω
f ( x)dx = 1
(c) P(a ≤ X ≤ b) =
∫
b
a
f ( x)dx
This function f(x) is then called the probability
density function (pdf) of the random variable X.
Point and Interval Estimation
Parameter and Statistic
¾
A number that describes a population is called a parameter
¾
A number that describes a sample is a statistic
¾
If we take a sample and calculate a statistic, we often use that
statistic to infer something about the population from which the
sample was drawn
Population
Parameter
choose
estimate
Sample
Statistic
calculate
The sample mean
= the population mean ?
Rolling a fair dice
{
{
Cast a fair dice and take X to be the
uppermost number, the population
mean is μ = 3.5, and that the
population median is also m = 3.5
Take a sample of four throws, the
mean may be far from 3.5
The results of 5 such samples of 4 throws
X1
X2
X3
X4
X
Sample 1
6
2
5
6
4.75
Sample 2
2
3
1
6
3
Sample 3
1
1
4
6
3
Sample 4
6
2
2
1
2.75
Sample 5
1
5
1
3
2.75
The sample size n = 4
Random samples
Sampling distribution: Basic theorem
Central Limit Theorem (CLT)
Central Limit Theorem (CLT)
‹ For ANY population (regardless of its shape) the
distribution of sample means will approach a normal
distribution as n approaches infinity
‹The variability of a sample mean decreases as
the sample size increases
Central Limit Theorem (CLT)
How large is a “large sample”?
It depends upon the form of the distribution
from which the samples were taken. If the
population distribution deviates greatly from
normality larger samples will be needed to
approximate normality.
n ≥ 30
Central Limit Theorem (CLT)
The Central Limit Theorem
X
: N (μ ,
σ
n
)
68% probability that our
will be in this region X
95% probability that our
will be in this region X
POINT AND INTERVAL ESTIMATION
{
Point Estimate
‹
is a single number, calculated from available sample
data, that is used to estimate the value of an unknown
population parameter.
{
Interval Estimate (Confidence Interval)
‹
is an interval that provides an upper and lower bound
for a specific unknown population parameter.
-> arguably the most useful type of inference.
Estimators
1.
An estimator θˆ is said to be
unbiased for the parameter θ if
E θˆ = θ
()
2.
If this equality does not hold, θˆ is
said to be a biased estimator of θ,
with Bias = E θˆ − θ
()
Questions:
{
An unfair coin has a 75% chance
of landing heads-up.
Let X = 1 if it lands heads-up,
and X = 0 if it lands tails-up.
{
Find the sampling distribution of the
mean X for samples of size 3.
Solution:
{
The experiment consists of tossing a coin 3
times and measuring the sample mean X
Probability
X
HHH HHT HTH HTT THH THT
TTH
TTT
27/64
3/64
1/64
1
9/64
9/64
3/64
9/64
3/64
2/3 2/3 1/3 2/3 1/3 1/3
The possible values of X are 0, 1/3, 2/3 and 1.
0
The distribution of the sample mean is
a binomial distribution
X
P( X = x )
0
1/3
2/3
1
1/64
9/64
27/64
27/64
Is the Sample Mean an Unbiased
estimator of the Population Mean?
Example 1
An engineer wish to estimate the mean yield of a
chemical process based on the yield
measurements X1, X2, X3 from three independent
runs of an experiment.
Consider the following two estimators of the
mean yield θ :
X1 + X 2 + X 3
ˆ
{ θ1 =
3
{
X1 + 2 X 2 + X 3
ˆ
.
and θ 2 =
4
Which one should be preferred?
Solution:
{
Since E (θˆ1 ) = E (X ) = θ and
E ( X 1 ) + 2 E ( X 2 ) + E ( X 3 ) θ + 2θ + θ
ˆ
=
=θ,
E (θ 2 ) =
4
4
bothθˆ1 and θˆ2 are unbiased for θ.
Solution:
Denoting the population variance by
σ2, we have
( )
V θˆ1 = V ( X ) =
σ2
3
and
2
3
σ
V ( X 1 ) + 4V ( X 2 ) + V ( X 3 )
V (θˆ2 ) =
=
.
16
8
σ2
3σ 2 ˆ
<
, θ1 is better than θˆ2 .
3
8
Confidence Interval
Confidence Interval
‘… perhaps the most obvious
difficulty with confidence intervals
lies in how we interpret what the
confidence statement means’
(Smithson, 2003, p.16)
Explanation:
Confidence Interval
X ~ N (μ ,
σ2
n
P(− zα < z < zα ) = 1 − α
)
2
(
X − μ)
Z=
~ N (0,1)
σ
2
P(− z α <
2
n
P( x − zα
2
( x − zα
2
σ
n
σ
n
x−μ
σ
< zα ) = 1 − α
2
n
< μ < x + zα
σ
n
2
, x + zα
2
σ
n
)
) = 1−α
a 100(1-α)% confidence interval of μ
a 100(1-α)% confidence interval of μ
{
{
{
Large samples have narrower widths
than small samples
Higher confidence levels have wider
intervals than lower confidence levels
Narrow widths and high confidence
levels are desirable, but these two
things affect each other
a 100(1-α)% confidence interval of μ
a 100(1-α)% confidence interval of μ
a 100(1-α)% confidence interval of μ
Confidence interval for a population proportion
Confidence interval for a population proportion p
Assumptions:
1. The sample is a random sample
2. The conditions for the binomial
distribution are satisfied
3. The normal distribution can be used to
approximate the binomial distribution
Confidence interval for a population proportion p
μ pˆ = E( pˆ )
X
= E( )
n
1
= E( X )
n
np
=
n
=p
σ pˆ 2 = σ X 2
n
1
2
= 2 σX
n
1
= 2 [np (1 − p )]
n
p (1 − p )
=
n
Confidence interval for a population proportion p