Download Chapter 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 8
Estimation
©
Estimator and Estimate
An estimator of a population parameter is a
random variable that depends on the sample
information and whose value provides
approximations to this unknown parameter.
A specific value of that random variable is
called an estimate.
Point Estimator and Point
Estimate
Let  represent a population parameter (such as
the population mean  or the population
proportion ). A point estimator, θ̂ , of a
population parameter, , is a function of the
sample information that yields a single number
called a point estimate. For example, the sample
mean, X, is a point estimator of the population
mean , and the value that X assumes for a given
set of data is called the point estimate.
Unbiasedness
The point estimator θ̂ is said to be an unbiased
estimator of the parameter  if the expected
value, or mean, of the sampling distribution of θ̂
is ; that is,
E (ˆ)  
Probability Density Functions for
unbiased and Biased Estimators
(Figure 8.1)
ˆ1
ˆ2

ˆ
Bias
Let θ̂ be an estimator of . The bias in θ̂ is defined as
the difference between its mean and ; that is
Bias (ˆ)  E (ˆ)  
It follows that the bias of an unbiased estimator is 0.
Most Efficient Estimator and
Relative Efficiency
Suppose there are several unbiased estimators of . Then the
unbiased estimator with the smallest variance is said to
be the most efficient estimator or to be the minimum
variance unbiased estimator of . Let θ̂1and θ̂ 2 be two
unbiased estimators of , based on the same number of
sample observations. Then,
a) θ̂1is said to be more efficient than θ̂ 2 if Var (ˆ1 )  Var (ˆ2 )
b) The relative efficiency of θ̂1 with respect to θ̂ 2 is the ratio of
their variances; that is,
Var(θˆ2 )
Relative Efficiency 
Var(θˆ1 )
Point Estimators of Selected
Population Parameters
(Table 8.1)
Population
Parameter
Point
Estimator
Properties
Mean, 
X
Unbiased, Most Efficient
(assuming normality)
Mean, 
Xm
Unbiased (assuming
normality), but not most
efficient
Proportion, 
p
Unbiased, Most Efficient
Variance, 2
s2
Unbiased, Most Efficient
(assuming normality)
Confidence Interval Estimator
A confidence interval estimator for a population
parameter  is a rule for determining (based on
sample information) a range, or interval that is
likely to include the parameter. The corresponding
estimate is called a confidence interval estimate.
Confidence Interval and
Confidence Level
Let  be an unknown parameter. Suppose that on the basis
of sample information, random variables A and B are found
such that P(A <  < B) = 1 - , where  is any number
between 0 and 1. If specific sample values of A and B are a
and b, then the interval from a to b is called a 100(1 - )%
confidence interval of . The quantity of (1 - ) is called the
confidence level of the interval.
If the population were repeatedly sampled a very
large number of times, the true value of the parameter 
would be contained in 100(1 - )% of intervals calculated
this way. The confidence interval calculated in this manner
is written as a <  < b with 100(1 - )% confidence.
P(-1.96 < Z < 1.96) = 0.95, where Z is
a Standard Normal Variable
(Figure 8.3)
0.95 = P(-1.96 < Z < 1.96)
0.025
0.025
-1.96
1.96
Notation
Let Z/2 be the number for which
P ( Z  Z / 2 ) 

2
where the random variable Z follows a standard
normal distribution.
Selected Values Z/2 from the
Standard Normal Distribution Table
(Table 8.2)

Z/2
Confidence
Level
0.01
0.02
0.05
0.10
2.58
2.33
1.96
1.645
99%
98%
95%
90%
Confidence Intervals for the Mean of a
Population that is Normally Distributed:
Population Variance Known
Consider a random sample of n observations from a normal
distribution with mean  and variance 2. If the sample
mean is X, then a 100(1 - )% confidence interval for the
population mean with known variance is given by
or equivalently,
Z / 2
Z / 2
X
X 
n
n
X B
where the margin of error (also called the sampling error,
the bound, or the interval half width) is given by
B  Z / 2

n
Basic Terminology for Confidence
Interval for a Population Mean with
Known Population Variance
(Table 8.3)
Terms
Symbol
Standard Error of the Mean
X
Z Value (also called the
Reliability Factor)
Z / 2
Margin of Error
Lower Confidence Limit
Upper Confidence Limit
Width (width is twice the bound)
B
To Obtain:
/ n
Use Standard Normal
Distribution Table
B  Z / 2

n
LCL
LCL  X  Z / 2
UCL
UCL  X  Z / 2
w
w  2 B  2 Z / 2

n


n
n
Student’s t Distribution
Given a random sample of n observations, with
mean X and standard deviation s, from a normally
distributed population with mean , the variable t
follows the Student’s t distribution with (n - 1)
degrees of freedom and is given by
X 
t
s/ n
Notation
A random variable having the Student’s t
distribution with v degrees of freedom will be
denoted tv. The tv,/2 is defined as the number
for which
P(tv  tv , / 2 )   / 2
Confidence Intervals for the Mean of a
Normal Population: Population Variance
Unknown
Suppose there is a random sample of n observations from a normal
distribution with mean  and unknown variance. If the sample
mean and standard deviation are, respectively, X and s, then a 100(1
- )% confidence interval for the population mean, variance
unknown, is given by
X  tn 1, / 2
s
s
   X  tn 1, / 2
n
n
or equivalently,
X B
where the margin of error, the sampling error, or bound, B, is given
s
by
B  t n 1, / 2
n
and tn-1,/2 is the number for which
P(t n 1  t n 1, / 2 )   / 2
The random variable tn-1 has a Student’s t distribution with v=(n-1) degrees of freedom.
Confidence Intervals for Population
Proportion (Large Samples)
Let p denote the observed proportion of “successes” in a random
sample of n observations from a population with a proportion  of
successes. Then, if n is large enough that (n)()(1- )>9, then a 100(1
- )% confidence interval for the population proportion is given
by
p  Z / 2
p(1  p)
p(1  p)
   p  Z / 2
n
n
or equivalently,
pB
where the margin of error, the sampling error, or bound, B, is given
by
p(1  p)
B  Z / 2
n
and Z/2, is the number for which a standard normal variable Z
satisfies
P ( Z  Z / 2 )   / 2
Notation
A random variable having the chi-square
distribution with v = n-1 degrees of freedom
will be denoted by 2v or simply 2n-1. Define
as 2n-1, the number for which
P( 
2
n 1

2
n 1,
) 
The Chi-Square Distribution
(Figure 8.17)
1-
0

2n-1,
The Chi-Square Distribution for n – 1
and (1-)% Confidence Level
(Figure 8.18)
/2
/2
1-
2n-1,1- /2
2n-1,/2
Confidence Intervals for the Variance of a
Normal Population
Suppose there is a random sample of n observations from a
normally distributed population with variance 2. If the observed
variance is s2 , then a 100(1 - )% confidence interval for the
population variance is given by
(n  1) s 2
 n21, / 2
2 
(n  1) s 2
 n21,1 / 2
is the number for which
P( 
and 2n-1,1 - /2 is the number for which
P( 
where
2
n-1,/2
2
n 1
2
n 1


2
n 1, / 2
)
2
n 1,1 / 2

2
)
And the random variable 2n-1 follows a chi-square distribution
with (n – 1) degrees of freedom.

2
Confidence Intervals for Two Means:
Matched Pairs
Suppose that there is a random sample of n matched pairs of
observations from a normal distributions with means X and Y .
That is, x1, x2, . . ., xn denotes the values of the observations from the
population with mean X ; and y1, y2, . . ., yn the matched sampled
values from the population with mean Y . Let d and sd denote the
observed sample mean and standard deviation for the n differences
di = xi – yi . If the population distribution of the differences is
assumed to be normal, then a 100(1 - )% confidence interval for
the difference between means (d = X - Y) is given by
d  tn 1, / 2
or equivalently,
sd
s
  d  d  tn 1, / 2 d
n
n
d B
Confidence Intervals for Two Means:
Matched Pairs
(continued)
Where the margin of error, the sampling error or the bound,
B, is given by
B  t n 1, / 2
sd
n
And tn-1,/2 is the number for which
P (t n 1  t n 1, / 2 ) 

2
The random variable tn – 1, has a Student’s t distribution
with (n – 1) degrees of freedom.
Confidence Intervals for Difference Between
Means: Independent Samples (Normal
Distributions and Known Population Variances)
Suppose that there are two independent random samples of nx and
ny observations from normally distributed populations with means
X and Y and variances 2x and 2y . If the observed sample means
are X and Y, then a 100(1 - )% confidence interval for (X - Y) is
given by
( X  Y )  Z / 2
or equivalently,
 X2
nx

 Y2
ny
  X  Y  ( X  Y )  Z  / 2
(X Y )  B
where the margin of error is given by
B  Z / 2
 X2
nx

 Y2
ny
 X2
nx

 Y2
ny
Confidence Intervals for Two Means:
Unknown Population Variances that are
Assumed to be Equal
Suppose that there are two independent random samples with nx and
ny observations from normally distributed populations with means X
and Y and a common, but unknown population variance. If the
observed sample means are X and Y, and the observed sample
variances are s2X and s2Y, then a 100(1 - )% confidence interval for (X
- Y) is given by
s 2p s 2p
s 2p s 2p
( X  Y )  tnx  n y 2, / 2

  X  Y  ( X  Y )  tnx  n y 2, / 2

nx n y
nx n y
or equivalently,
(X Y )  B
where the margin of error is given by
B  tnx  n y 2, / 2
s 2p
nx

s 2p
ny
Confidence Intervals for Two Means: Unknown
Population Variances that are Assumed to be Equal
(continued)
The pooled sample variance, s2p, is given by
s 
2
p
tnx ny 2, / 2 is the number for which
(nx  1) s X2  (n y  1) sY2
nx  n y  2
P(t nx  n y  2  t nx  n y  2, / 2 ) 

2
The random variable, T, is approximately a Student’s t distribution
with nX + nY –2 degrees of freedom and T is given by,
( X  Y )  (  X  Y )
T
1
1
sp

n X nY
Confidence Intervals for Two Means:
Unknown Population Variances, Assumed
Not Equal
Suppose that there are two independent random samples of nx and ny
observations from normally distributed populations with means X
and Y and it is assumed that the population variances are not equal.
If the observed sample means and variances are X, Y, and s2X , s2Y, then
a 100(1 - )% confidence interval for (X - Y) is given by
( X  Y )  t( v , / 2)
s X2 sY2
s X2 sY2

  X  Y  ( X  Y )  t( v , / 2 )

nx n y
nx n y
where the margin of error is given by
B  t( v , / 2 )
s X2 sY2

nx n y
Confidence Intervals for Two Means: Unknown
Population Variances, Assumed Not Equal
(continued)
The degrees of freedom, v, is given by
s X2
sY2 2
[( )  ( )]
nX
nY
v 2
sX 2
sY2 2
( ) /( n X  1)  ( ) /( nY  1)
nX
nY
If the sample sizes are equal, then the degrees of freedom reduces to




2
  (n  1)
v  1  2
s X sY2 
 2 

2
sY s X 

Confidence Intervals for the Difference
Between Two Population Proportions
(Large Samples)
Let pX, denote the observed proportion of successes in a random
sample of nX observations from a population with proportion X
successes, and let pY denote the proportion of successes observed in
an independent random sample from a population with proportion
Y successes. Then, if the sample sizes are large (generally at least
forty observations in each sample), a 100(1 - )% confidence interval
for the difference between population proportions (X - Y) is given
by
( pX  pY )  B
Where the margin of error is
B  Z / 2
p X (1  p X ) pY (1  pY )

nX
nY
Sample Size for the Mean of a
Normally Distributed Population
with Known Population Variance
Suppose that a random sample from a normally
distributed population with known variance 2 is
selected. Then a 100(1 - )% confidence interval for
the population mean extends a distance B
(sometimes called the bound, sampling error, or the
margin of error) on each side of the sample mean, if
the sample size, n, is
Z / 2
n
B2
2
2
Sample Size for Population
Proportion
Suppose that a random sample is selected from a
population. Then a 100(1 - )% confidence interval
for the population proportion, extending a distance
of at most B on each side of the sample proportion,
can be guaranteed if the sample size, n, is
0.25( Z / 2 )
n
B2
2
Key Words
 Bias
 Bound
 Confidence interval:
 For mean, known variance
 For mean, unknown
variance
 For proportion
 For two means, matched
 For two means, variances
equal
 For two means, variances
not equal
 For variance
 Confidence Level
 Estimate
 Estimator
 Interval Half Width
 Lower Confidence Limit
(LCL)
 Margin of Error
 Minimum Variance
Unbiased Estimator
 Most Efficient Estimator
 Point Estimate
 Point Estimator
Key Words
(continued)
 Relative Efficiency
 Reliability Factor
 Sample Size for Mean,
Known Variance
 Sample Size for
Proportion
 Sampling Error
 Student’s t
 Unbiased Estimator
 Upper Confidence Limit
(UCL)
 Width