Download 7. Statistical Intervals Based on a Single Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
7. Statistical Intervals Based on
a Single Sample
Li Jie
• A point estimate, because it is a single number, by itself
provides no information about the precision and reliability of
estimation.
• An alternative to reporting a single sensible value of the
parameter being estimated is to calculate and report an entire
interval of plausible values—an interval estimate or
confidence interval(CI)
• A confidence interval is always calculated by first selecting a
confidence level, which is a measure of the degree of
reliability of the interval.
Li Jie
7.1 Basic Properties of Confidence Intervals
The basic concepts and properties of confidence interval (CIs) are
most easily introduced by first focusing on a simple, albeit(虽然)
somewhat unrealistic, problem situation.
Suppose that the parameter of interest is a
population mean  and that
1.The population distributi on is normal.
2.The value of the population standard
deviation σ is known.
Li Jie
7.1 Basic Properties of Confidence Intervals
Normality of the population distribution is often a
reasonable assumption. However, if the value of μ
is unknown, it is implausible that the value of σ
would be available. (Knowledge of a population’s
center typically precedes information concerning
spread.)
Li Jie
Example 7.1:
n  31 x  80 σ  2.0
find a confidence int ervals (CI) for μ
P(u1  μ  u2 )  1  α
Li Jie
X 1 , X 2 ,, X n
x1 , x2 , , xn from N ( μ, σ 2 )
2
σ
X ~ N ( μ,
)
n
X μ
~ N (0,1)
σ
n

2

2
Li Jie
σ
σ
( X  1.96 
, X  1.96 
)
n
n
(7.4)
• The interval (7.4) is random because the two
endpoints of the interval involve a random variable rv.
The interval’s width is not random; only the location
of the interval is random.
P( X  1.96 
σ
σ
 μ  X  1.96 
)  0.95
n
n
(7.3)
• 7.3 can be paraphrased as” the probability is 0.95 that
the is random interval (7.4) includes or covers the
true values of μ
Li Jie
DEFINITION:
If after observing X1  x1 , X 2  x2 ,, X n  xn , we compute
the observed sample mean x and then substitute  into
(7.4) in place of X , the resulting fixed interval is called a
95% confidence interval for  . This CI can be expressed
either as

 

 x  1.96  , x  1.96  
is a 95% CI
n
n

for 


x

1
.
96




x

1
.
96

or as
with 95%
n
n
confidence. A concise expression for the interval
is
, where – gives the left endpoint (lower
x  1.96   / n
limit) and + gives the right endpoint (upper limit)
Li Jie
Example :
The quantities needed for computation of the 95% CI for
average preferred height are   2.0, n  31, and x  80.0 .
The resulting interval is
x  1.96 

n
 80.0  1.96
2.0
 80.0  .7  79.3,80.7 
31
That is , we can be highly confident that 79.3    80.7 .
This interval is relatively narrow , indicating that  has
been rather precisely estimated .
Interpreting a confidence interval (P281)
Li Jie
Other Levels of Confidence
DEFINITION:
A 100(1 -  )% confidence interval for the mean 
of a normal population when the value of  is known is given by

 

, x  z 2 
 x  z 2 

n
n

or , equivalently, by x  z 2  
n
0
Li Jie
Example
The production process for engine control housing units of a
particular type has recently been modified. Prior to
this
modification, historical data had suggested that the distribution of
hole diameters for bushing on the housing was normal with a
standard deviation of .100 mm . It is believed that the modification
has not affected the shape of the distribution or the deviation, but
that the value of the mean diameter many have changed. A sample
of 40 housing units is selected and hole diameter is determined for
each one, resulting in a sample mean diameter of 5.426 mm. Let’s
calculate a confidence interval for true average hole diameter
using a confidence level of 90%. This requires that 100(1-  )=90,
from which α  .10 and za 2  z0.5  1.645 . The desired interval is then
.100
5.426  1.645 
 5.426  .26  5.400,5.452
40
Li Jie
with a reasonably high degree of confidence, we can say
that 5.400    5.452
This interval is narrow because of the small amount
of variability in hole diameter   .100 .
Li Jie
Confidence Level, Precision, and Choice
of Sample Size
Confidence Level
1 α
Interval width u2  u1
Precision: 1(int erval width)
An appealing strategy is to specify both the desired
confidence level and interval width and then determine
the necessary sample size.
Li Jie
Example 7.4
Extensive monitoring of a computer time-sharing system
has suggested that response time to a particular editing
command is normally distributed
with standard
deviation 25 millisec. A new operating system has been
installed, and we wish to estimate the true average
response time  for the new environment. Assuming
that response times are still normally distributed
with   25 , what sample size is necessary to ensure
that the resulting 95% CI has a width of 10? The sample
size n must satisfy 10  2  1.96 25 / n 
Li Jie
Rearranging this equation gives
n  2  1.96 25 10  9.80
so
n  9.80   96.04
2
Since n must be an integer, a sample size of 97 is
required.
Li Jie
The general formula for the sample size n necessary to
ensure an interval width w is obtained from w  2  za 2  n
as

n   2 za

σ 

2 
w
2
The half-width 1.96 / n of the 95% CI is sometimes
called the bound of on the error of estimation
associated with a 95% confidence level ;
Li Jie
Deriving a Confidence interval
Let X 1 , X 2 , X n denote this sample on which the CI
for a parameter  is to be based . Suppose a random
variable satisfying the following two properties can be
found:
1. The variable depends functionally on both X 1 , X 2 , X n
and  .
2. The probability distribution of the variable does not
depend on  or on any other unknown parameters.
Li Jie
Page 284 in the textbook (in detail)
Li Jie
Example 7.5
A theoretical model suggest that the time to breakdown
of an insulating fluid between electrodes at a particular
voltage has an exponential distribution with parameter .
A random sample of n=10 breakdown times yields the

following sample data :
x1  41.53, x2  18.73,
x3  2.99, x4  30.34, x5  12.33, x6  117.52, x7  73.02, x8  223.63,
x9  4.00, x10  26.78
A 95% CI for  and for the true average breakdown time
are desired.
Li Jie
let h( X1 , X 2 ,, X n ; λ)  2 λ X i
It can be shown that this random variable has a probability
distribution called a chi-squared distribution with 2n
degrees of freedom (df)
Li Jie
for n=10,
P9.591  2 λ X i  34.170  0.95
division by
P9.591/  X i   2 λ  34.170 /  X i   0.95
P2 X i / 34.170  1 / λ   X i / 9.591  0.95
2 X 34.170,  X
i
i
/ 9.591  32.24,114.87
Li Jie
Page 285, Exercise 1,2,3,4,8
Li Jie
7.2 Large-sample Confidence Intervals for a
Population Mean and Proportion
The CI for μ given in the previous section
assumed that the population distribution is normal
and that the value of σ is known. We now present
a large-sample CI whose validity does not require
these assumptions. After showing how the
argument leading to this interval generalizes to
yield other large-sample intervals, we focus on an
interval for a population proportion p
Li Jie
A large-sample Interval for 
Provided that n is large,
the CLT implies:
approximately
X
PROPOSITION:
~
N( ,
2
n
)
If n is sufficiently large, the standardized variable
X 
Z 
S
n
has approximately a standard normal distribution.
This implies that
s
x  za 2 
n
is a large-sample confidence interval for  with
confidence level approximately 100(1-α)%. This formula
is valid regardless of the shape of the population
distribution.
Li Jie
Z
Z
X 

n
X 
S
~ N ( 0,1 )
approximately
~
n
N ( 0, 1 )
P( u1    u2 )  P( u2     u1 )
 P( X  u2  X    X  u1 )
 P(
X  u2
 (
X  u1
S
S
n
n

X 
S
)  (
n

X  u1
S
X  u2
S
n
n
)
)
Li Jie
Example 7.6
The alternating-current (AC) breakdown voltage of an
insulating liquid indicates its dielectric strength. The
article “test practices for the AC breakdown voltage testing
of insulation liquids,” gave the accompanying sample
observations on breakdown voltage of a particular circuit
under certain conditions.
62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68
54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55
57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58
Li Jie
A boxplot of the data show a high concentration in
the middle half of the data. There is a single outlier at
the upper end, but this value is actually a bit closer to
the median(55) than is the smallest sample observation.
68-55=13, 55-41=14
40
50
60
70
Voltage
Figure 7.5
Li Jie
Summary quantities include n=48,
2
x

2626
and
x
i
 i  144950
From which
x  54.7 and s  5.23
The 95% confidence interval is then
54.7  1.96
5.23
 54.7  1.5  53.2,56.2
48
That is , 53.2  μ  56.2
With a confidence level of approximat ely 95%.
The interval is reasonably narrow, indicating that
we have precisely estimated .
Li Jie
A general Large-sample Confidence
Interval (omit)

P  z α


2
θˆ  θ

 zα
σ θˆ


2   1 α

X ~ Bin (n, p) p unknown

P  z α


2

ˆ p
p
 zα
p (1  p ) n


2   1 α

Li Jie
A Large-Sample Confidence interval for a
Population Proportion (omit)
PROPOSITION:
A confidence interval for a population proportion p
with confidence level approximately 100(1-α)% has
pˆ 
lower confidence limit 
za2 2
2n
 za 2
pˆ qˆ z 2 a 2

n
4n 2
n
 
1  za2 2
and
pˆ 
upper confidence limit 
za2 2
2n
 za 2
pˆ qˆ z 2 a 2

n
4n 2
n
 
1  za2 2
Li Jie
Example 7.8
The article “Repeatability and Reproducibility for
Pass/Fail Data” reported that in n=48 trials in a particular
laboratory, 16 resulted in ignition of a particular type of
substrate by a lighted cigarette. Let p denote the long-run
proportion of all such trials that would result in ignition.
p̂  16 48  .333
A point estimate for p is
.A confidence
interval for p with a confidence level of approximately
95% is
2
2
.333  1.96 96  1.96 .333.667  48  1.96 9216
1  1.96 48
2

.373  .139
 .217,.474
1.08
The traditional interval is
.333  1.96 .333.667 48  .333  .133  .200,.466
Li Jie
Equating the width of the CI for p to a prespecified width
w gives a quadratic equation for the sample size n
necessary to give an interval with a desired degree of
precision. Suppressing the subscript in z 2 , the solution
is
2
2 2
4
2
2 4
2 z pˆ qˆ  z w  4 z pˆ qˆ  pˆ qˆ  w   w z
n
w2
2
Neglecting the terms in the numerator involving w gives
4 z 2 pˆ qˆ
n
w2
This latter expression is what results from equating the
width of the traditional interval to w.
Li Jie
One-Side Confidence Intervals
PROPOSITION
A large-sample upper confidence bound for  is
μ  x  zα 
s
n
and a large-sample lower confidence bound for
 is
s
μ  x  zα 
n
A one-sided confidence bound for p results from replacing z 2 by z and  by either +or – in the CI
formula for p.
Li Jie
Example7.10: 斜剪试验 混泥土基材
The slant shear test is the most widely accepted procedure
for assessing the quality of a bond between a repair
materials and its concrete substrate. The article “Testing
the Bond Between Repair Materials and Concrete
Substrate” reported that in one particular investigation, a
sample of 48 shear strength observations gave a sample
mean strength of 17.17 N / mm2 and a sample standard
deviation of 3.28 N / mm2
. A lower confidence bound
for true average shear strength shear μ with confidence
level 95% is
3.28
17.17  1.645
48
 17.17  0.78  16.39
That is ,with a confidence level of 95%, the value of μ lies
in the interval (16.39, ∞).
Li Jie
7.3 Intervals Based on a Normal Population
Distribution
ASSUMPTION
The population of interest is normal, so that X 1 , X 2 ,, X n
constitutes a random sample from a normal distribution
with both μ and unknown.
Li Jie
THEOREM:
When X is the mean of a random sample of
size n from normal distributi on with mean  , the rv
X 
T
S n
has a probability called a t distribution with n-1
degrees of freedom (df).
首先W.S.Gosset在1908年发表的一篇文章中提出的.当时Gosset是
爱尔兰酿酒厂的一名雇员,而这家酒厂不准许员工发表研究.为了
避免这个限制,他用’student’这个名字秘密发表了他的工作。T分
布经常也被叫做学生t分布,简称t分布。
Li Jie
Properties of t Distributions
A t distribution is governed by only one parameter,
called the number of degrees of freedom of the
distribution.
v 1
Γ(
)
v 1
2
t 2
2
f (t ) 
(1  )
v
v
vπ Γ ( )
2
Li Jie
Properties of t Distributions
Let tv denote the density function curve for v df.
1. Each tv curve is bell - shaped and centered at 0.
2. Each tv curve is more spread out
than the standard normal curve.
3. As v increases, the spread of
the correspond ing tv curve decreases.
4. As v   , the sequence of tv curves
approaches the standard normal curve .
Li Jie
z curve
t 25 curve
t5 curve
0
Figure 7.6 t v and
z curve
Li Jie
Notation
Let t ,v =the number on the measurement axis
for which the area under the t curve with v df to the
right of t ,v is  ; t ,v is called a t critical value .
t a ,v curve
Shaded area  a
0
t a ,v
Figure 7.7 A pictorial definition of t a ,v
Li Jie
The One-Sample t Confidence Interval
PROPOSITION:
Let x and s be the sample mean and sample standard
deviation computed from the results of a random sample
from a normal population with mean μ . Then a 100(1α)% confidence interval for μ is
s
s 

, x  t a 2, n 1 
 x  t a 2,n 1 

n
n

or, more compactly, x  ta 2,n1  s n
Li Jie
An upper confidence bound for μ is
s
x  t 2,n 1 
n
and replacing +by – in this latter expression gives a
lower confidence bound for μ, both with confidence
level 100(1- α)% .
Li Jie
Example 7.11
As part of a larger project to study the behavior of
stressed-skin panels, a structural component being used
extensively in north American, the article “Time –
Dependent Bending Properties of Lumber” reported on
various mechanical properties of Scotch pine lumber
specimens . Consider the following observations on
modulus of elasticity obtained 1 minute after loading in a
certain configuration :
10490 16620 17300 15480 12970 17260 13400 13900
13630 13260 14370 11700 15470 17840 14070 14760
Li Jie
18000
17000
16000
15000
14000
13000
12000
11000
10000
-2
-1
0
1
2
Figure 7.8
Li Jie
hand calculation of the sample mean and standard
deviation is simplified by subtracting 10,000 from each
observation : yi  xi  10 ,000
y
y
i
 72,520
2
i
 392,083,800
y  4532.5
s y  2055.67
x  14,532.5
s x  2055.67
x  t.025,15 
s
2055.67
 14,532.5  2.131
 13,437.3, 15,627.7 
n
16
Li Jie
7.4 Confidence intervals for the variance and
standard deviation of a normal population
Although inference concerning a population variance
or standard deviation are usually of less interest than
about a mean or proportion, there are occasions
when such procedures are needed. In the case of a
normal population distribution, inferences are based
on the following result concerning the sample
variance S 2
Li Jie
THEOREM
Let X 1 , X 2 ,  , X n be a random sample
from a normal distributi on with parameters
μ and σ 2 . Then the rv
 X
n
n  1S
σ2
2

i 1
X
2
i
σ2
has a chi - squared ( χ 2 ) probabilit y distributi on
with n - 1df .
Li Jie
f x; v 
v 8
v  12
v  20
x
Figure 7.9 Graphs of chi-squared density functions
Li Jie
Notation:
2

Let a,v , called a chi-squared critical value, denote the
number on the measurement axis such that  of the area
under the chi-squared curve with v df lies to the right
of  a2,v.
Li Jie
 v2 pdf
Each shaded area  .01
Shaded area  

2
a ,v
(a)
.299,v
2
0.01,v
(b)
Figure 7.10  a2,v notation illustrated
Li Jie
A 1001   % confidence interval for the variance  2 of a
normal population has lower limit
n 1s 2  a2 2,n1
and upper limit
n 1s 2 12a 2,n1
A confidence interval for  has lower and upper limits
that are the square roots of the corresponding limits in
the interval for  2 .
Li Jie
Example 7.15
The accompanying data on breakdown voltage of
electrically stressed circuits was read from a normal
probability plot that appeared in the article “Damage of
Flexible Printed Wiring Boards Associated with
Lightning-Induced Voltage Surges”. The straightness of
the plot gave strong support to the assumption that
breakdown voltage is approximately normally distributed .
1170 1510 1690 1740 1900 2000 2030 2100 2190
2200 2290 2380 2390 2480 2500 2580 2700
Li Jie
let  2 denote the variance of the breakdown voltage
distribution. The computed value of the sample variance
is s 2  137,324.3 , the point estimate of  2 . With df =n1=16, a 95% CI require   6.908 and   28.845
. The interval is
2
.975,16
2
.025,16
 16137,324.3 16137,324.3 
,

  76,172.3,318,064.4
6.908
 28.845

Taking the square root of each endpoint (276.0,564.0) as
the 95% CI for  .
Li Jie
Related documents