Download Inference on the Mean of a Population, Variance Known

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 4
Modeling and Analysis of
Uncertainty
Statistical Inference
•
Statistical inference is using the information
contained in a sample to make decisions about
the population. Statistical inference has two
sub areas:
1. Parameter estimation – the estimating of a
parameter of a population probability
distribution. Point and Interval estimates.
2. Hypothesis testing – the testing of a
hypothesis made regarding a parameter of a
probability distribution.
Point Estimation
•
If X is a random variable with a probability
distribution f (x), and the distribution has an
unknown parameter θ, and if X1..Xn is a random
sample of size n from f (x)
ˆ  h X1 , X 2 , X n  is called a point estimator of .
1. 
h is a function of the observations (all random variables ) in the sample.
ˆ is a random variable.
Therefore 
2. After the random sample has been taken , the random variables
ˆ has a numerical value that is
have numerical values and 
referred to as ˆ .
3. A point estimate of some population parameter  is a single
ˆ.
numerical value ˆ of a statistic 
ˆ)
Parameters  and Point Estimates (
Population Mean μ ------------Sample mean X
Population Variance σ2 ----------Sample variance S2
Difference in two means μ1–μ2----Difference in two
sample means X1  X2
All of these point estimates are statistics.
Estimator for Percentiles
A. Let x p denote the p-quantile of X and z p
denote the p-quantile of the standard normal. Assume
X~N(μ, σ2) . Then
 xp   
FX  x p   p   
 p
  
xp  

 zp

(3)
 xp     z p
Thus, we would estimate x p by
xˆ p  ˆ  ˆ z p  x  sz p .
(4)
Estimators for Percentiles
B. When sampling from X  exp(),
X 
1

, 
2
X
1
2
  2 , x p    X ln 1  p    ln 1  p  .
Thus,
1
ˆ  x , ˆ  , and xˆ p   x ln 1  p  .
x
What if we wanted to estimate  X2 ?
We wouldn’t use s2, we would use
ˆ X2  x 2 .
Estimators for Percentiles
Example 1: The data (ordered) are thicknesses of
copper foil from a rolling operation (units in mils).
Normal Probability Plot
.999
.99
.95
Probability
THICKNESS
4.72
4.84
4.87
4.94
4.98
5.00
5.01
5.05
5.06
5.07
5.11
5.12
5.21
5.24
5.38
.80
.50
.20
.05
.01
.001
4.7
4.8
4.9
5.0
5.1
5.2
5.3
5.4
THICKNESS
Average: 5.04
StDev: 0.165745
N: 15
Anderson-Darling Normality Test
A-Squared: 0.159
P-Value: 0.936
Estimators for Percentiles
The normal distribution provides a good fit to the data.
Thus, assume the data come from a normal distribution,
with
ˆ  x  5.04, ˆ  s  0.1657.
The 90th percentile is estimated by
xˆ0.90  x  sz0.90  5.04  0.1657(1.2816)  5.25
(To obtain standard normal quantiles use normal tables or
MINITAB or find normal quantiles directly from
MINITAB –we’ll demonstrate).
The sample 90th percentile (see Chapter 2) is 5.30.
Unbiased Estimators
• Bias is a measure of closeness of the
estimator to the parameter. It is defined by:
The point estimator ̂ is an unbiased
estimator for the parameter θ if:
E( ̂ ) = θ
• If the estimator is not unbiased, then the
difference E( ̂ ) - θ is called the bias of the
estimator ̂ .
In words: “An estimator is unbiased when its expected value
is equal to the parameter it estimates.”
Point Estimation

Pdf of an unbiased
estimator ̂1
 
ˆ 
B 
Pdf of a biased
estimator ̂ 2

ˆ 0
B 
1
2
ˆ is used,
“On the average” when 
2
an error of size  is made.
Example 2: Recall that for a random sample from a
population with mean  and variance  2 ,
E  X    and E  S 2    2 .
Thus, the sample mean (variance) is always unbiased for the
population mean (variance).
But note – the sample standard deviation is never unbiased
for  :
E S   E
 
S2  E S2   
However, when sampling from the normal distribution,
for sample sizes  10 the bias is very small.
Remark: In the expression for sample variance,
n
S2 
 X
i 1
i
X
2
n 1
why not use n in the denominator? Because by using n – 1,
we get an unbiased estimator for the population variance.
Whether or not an estimator is unbiased, the smaller the
variance the better.
Point Estimation
Estimator Variance
• When there are two unbiased estimators for the
same parameter, a decision as to which is better
can be made by examining the variance of each.
• The estimator with the least variance is more
likely to result in an estimate that is closer to the
population parameter.
• The unbiased estimator with the least variance is
referred to as the minimum variance unbiased
estimator (MVUE).
Mean Square Error
• A measure that incorporates both the bias
and the variance of an estimator is the mean
square error.
• The mean square error of an estimator ̂
of the parameter θ is defined as :
MSE ( ̂ ) = E( ̂ - θ)2
• The mean square error can be rewritten as:
• MSE ( ̂ ) = V( ̂ )+(bias)2
Relative Efficiency
• The mean square error can be used to
compare the ability of two estimators to
estimate a parameter.
• For two estimators ̂1 , ̂ 2 , the relative
efficiency of ̂ 2 to ̂1 is:
 
 
ˆ1
MSE 
ˆ2
MSE 
A ratio less than 1 indicates that ̂ 2 is not as
efficient as ̂1 in estimating the parameter value.
1. The MSE of an unbiased estimator is simply Var  ˆ  .
2. An estimator with small variance and small
bias may be more efficient than one with a
larger variance and no bias, as depicted below.
Pdf of a biased
estimator ̂ 2
Pdf of unbiased
estimator ̂1

Which estimator is better? Why?
Standard Error
• The standard error of a statistic is the standard
deviation of its sampling distribution. V̂
• If the expression for the standard error
contains its own parameters with unknown
values, but which have estimators, use the
estimators and the result is the estimated
standard error of a statistic.
Point Estimation
Hypothesis Testing
Statistical Hypotheses
We like to think of statistical hypothesis testing as the
data analysis stage of a comparative experiment,
in which the engineer is interested, for example, in
comparing the mean of a population to a specified
value (e.g. mean pull strength).
Hypothesis Testing
• A statistical hypothesis is a statement about the
parameters of one or more populations. They
are expressed
H0: μ = 50cm/s
the null hypothesis
H1: μ ≠ 50cm/s
the alternative
hypothesis
• Generally the interest is in whether or not a
parameter has changed, is in accordance with
theory, or is in conformance with a requirement.
Hypothesis Testing cont.
• Testing the hypothesis involves taking a
random sample, computing a test statistic,
and from the test statistic make a decision
about the null hypothesis.
• The value of the test statistic is compared
to the critical region, and if it is in the
critical region the null hypothesis is
rejected.
Type I Error
• The limits of the critical region are referred to as
the critical values.
• These critical values depend on the desired
probability for rejecting the null hypothesis when
in fact it is true.
• Rejecting the null hypothesis H0 when it is true is
defined as a type I error.
• The probability of a type I error is denoted as α.
P( reject Ho when Ho is true)
• The type I error probability is also referred to as
the significance level or size of the test.
• Generally the α value is decided prior to
conducting the test.
Type II Error
• Failing to reject the null hypothesis when it is false is
referred to as a Type II error.
• The probability of making a type II error is denoted as β.
P( fail to reject H0 when H0 is false).
• This probability is a function of 2 values. First, how close
the hypothesized value of the parameter is to the actual
value of the parameter, and second, the sample size.
• The closer the two values of the parameter and the
smaller the sample size, the larger the value of β.
• To calculate β, a true population parameter value is
required.
Power of a test
• The power of a statistical test is the
probability of rejecting the null hypothesis
when it is false.
• This probability is 1-β.
• If the power value is deemed to low,
increase α or the sample size.
Hypothesis Testing
Statistical Hypotheses
For example, suppose that we are interested in the
burning rate of a solid propellant used to power aircrew
escape systems.
• Now burning rate is a random variable that can be
described by a probability distribution.
• Suppose that our interest focuses on the mean burning
rate (a parameter of this distribution).
• Specifically, we are interested in deciding whether or
not the mean burning rate is 50 centimeters per second.
Hypothesis Testing
4-3.1 Statistical Hypotheses
Two-sided Alternative Hypothesis
One-sided Alternative Hypotheses
Hypothesis Testing
4-3.1 Statistical Hypotheses
Test of a Hypothesis
• A procedure leading to a decision about a particular
hypothesis
• Hypothesis-testing procedures rely on using the information
in a random sample from the population of interest.
• If this information is consistent with the hypothesis, then we
will conclude that the hypothesis is true; if this information is
inconsistent with the hypothesis, we will conclude that the
hypothesis is false.
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Sometimes the type I error probability is called the
significance level, or the -error, or the size of the test.
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
• The power is computed as 1 - b, and power can be interpreted as
the probability of correctly rejecting a false null hypothesis. We
often compare statistical tests by comparing their power properties.
• For example, consider the propellant burning rate problem when
we are testing H 0 :  = 50 centimeters per second against H 1 :  not
equal 50 centimeters per second . Suppose that the true value of the
mean is  = 52. When n = 10, we found that b = 0.2643, so the
power of this test is 1 - b = 1 - 0.2643 = 0.7357 when  = 52.
Hypothesis Testing
4-3.3 One-Sided and Two-Sided Hypotheses
Two-Sided Test:
One-Sided Tests:
Hypothesis Testing
4-3.4 General Procedure for Hypothesis Testing
4 and 5 determine 6
Case 1: Inference on the Mean of a
Population, Variance Known
Assumptions
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
We wish to test:
The test statistic is:
This statistic follows the
standard normal distribution
when H0 is true
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
Reject H0 if the observed value of the test statistic z0 is
either:
z0 > z/2 or z0 < -z/2
Fail to reject H0 if
-z/2 < z0 < z/2
As ± z/2 are the critical values that define the borders of the
critical (reject) region of the test statistic
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
4-4 Inference on the Mean of a Population,
Variance Known
4-4.1 Hypothesis Testing on the Mean
Inference on the Mean of a Population,
Variance Known
4-4.2 P-Values in Hypothesis Testing
Alternatively it is the probability that the test statistic
will assume the calculated value z0 or greater when
the null is true.
Inference on the Mean of a Population,
Variance Known
4-4.2 P-Values in Hypothesis Testing
Inference on the Mean of a Population,
Variance Known
4-4.3 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error b
Inference on the Mean of a Population,
Variance Known
4-4.3 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error b
Inference on the Mean of a Population,
Variance Known
4-4.3 Type II Error and Choice of Sample Size
Sample Size Formulas
4-4 Inference on the Mean of a Population,
Variance Known
4-4.3 Type II Error and Choice of Sample Size
Example 3: A. Let X 1 , X 2 ,
N   ,  2  . Since X
, X n be a random sample from
N   ,  2 n  , the standard error of
X as an estimator of  is  n . In most cases  will be
unknown. The estimated statndard error is
ˆ X 
ˆ
s

.
n
n
B. For the data of Example 2, (thickness of copper foil)
ˆ  x  5.04, ˆ  s  0.1657.
Thus,
s
0.1657
ˆ X 

 0.0428.
n
15
ˆ X
Recall that MINITAB displays this:
Descriptive Statistics
Variable
N
Mean
THICKNESS 15 5.0400
Variable
THICKNESS
Minimum
4.7200
Median
5.0500
TrMean
5.0385
Maximum
5.3800
StDev
0.1657
Q1
4.9400
SE Mean
0.0428
Q3
5.1200
Example 4: The mean force exerted at a particular point
in a newly designed mechanical assembly should be less
than 4000 psi. Based on previous designs, the engineer
“knows” that the force is normally distributed with
standard deviation 65 psi. He will obtain data on 20
prototype assemblies to test
H 0 :   4000
H1:   4000
at the 0.01 level of significance.
A. He obtained x  3982.8.
Thus,
x  4000
z0 
 1.18
65 20
 z0.01  2.326
z0   z0.01
Therefore z 0 is not in the critical region.
 Do not reject H 0 .
The engineer cannot (as he “hoped” to) conclude that the
mean force is less than 4000 psi.
B. Do the problem on MINITAB, but note that you must
have raw, not summarized, data.
The engineer collected the data on the next slide.
Stat > Basic Statistics > 1-Sample Z
P-value  0.01.
Do not reject H0
MINITAB Output:
Test of mu = 4000.0 vs mu < 4000.0
The assumed sigma = 65.0
Variable
FORCE
N
20
Mean
3982.8
StDev
60.5
SE Mean
14.5
Z
-1.18
P
0.12
4-4 Inference on the Mean of a Population,
Variance Known
4-4.4 Large-Sample Test
In general, if n  30, the sample variance s2
will be close to 2 for most samples, and so s
can be substituted for  in the test
procedures with little harmful effect. That is
for large sample sizes:
X   approx
N  0,1
S n
The results we have developed for the case of known
variance are used for large sample size tests with
unknown variance, except for two important differences.
1. Wherever you see a  substitute the sample standard
deviation s.
2. The confidence intervals and hypothesis tests become
approximate rather than exact.
3. If the underlying distribution is not too non-normal, the
approximation should be adequate if n  30.
We’ll do an example.
Example 5: A civil engineer working for NYSDOT tests
50 standard-sized specimens of concrete for moisture
content after 5 hours of drying time. She collects the
following data (appropriate units). She also constructs
the histogram shown below.
26.4
31.0
20.6
12.4
19.6
16.9
5.2
15.8
17.8
21.1
19.6
41.7
39.3
22.7
33.1
23.4
23.0
19.3
19.3
17.2
5.1
59.4
5.7
12.1
27.7
6.5
16.9
21.9
32.0
20.9
18.6
10.4
5.7
14.2
11.7
15.3
15.2
19.5
11.4
13.6
x  19.38 s  10.51
20
Frequency
11.7
15.3
29.1
13.1
39.1
9.4
14.9
11.2
11.2
25.0
10
0
0
12
24
36
MOISTURE
48
60
The histogram indicates that distribution is quite skewed.
She also does a normality test: in MINITAB do
Stat > Basic Statistics > Normality Test
Normal Probability Plot
.999
.99
Probability
.95
.80
.50
.20
.05
.01
Very low Pvalue
.001
10
20
30
40
50
60
MOISTURE
Average: 19.384
StDev: 10.5096
N: 50
Anderson-Darling Normality Test
A-Squared: 1.264
P-Value: 0.002
She just wants a 95% confidence interval on the mean
moisture content. Since the sample size is fairly large, an
approximate 95% confidence interval is:
x  z0.025
s
10.51
 19.38  1.96
n
50
 19.38  2.91  16.47, 22.29
You can do this on MINITAB by “making believe” that
 is known and   s.
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Some Practical Comments on Hypothesis
Testing
Statistical versus Practical Significance
4-4 Inference on the Mean of a Population,
Variance Known
4-4.5 Some Practical Comments on Hypothesis
Testing
Statistical versus Practical Significance
Interval Estimation
The standard error provides a measure of precision of an
estimate. In fact, very often in the engineering literature a
point on a graph will have so-called “error bars” around it. The
plotted point is actually the mean of a sample and the error bar
is x  ˆ X . This is actually a special case of a confidence
interval, formally defined on the next slide.
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
Two-sided confidence interval:
One-sided confidence intervals:
Confidence coefficient:
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean

z
0.005 2.576
0.01
2.326
0.025 1.96
0.05
1.645
Example 6: A. Let X 1 , X 2 ,
N   , 2 
, X n be a random sample from
X 
and assume that  is known. Since
Z,
 n


X 
[by (10)]
P   z 2 
 z 2   1  
 n



 

 P   z 2
 X    z 2
  1
n
n


 

 P  X  z 2
   X  z 2
(11)
  1
n
n

L

U
Comparison of (9) and (11) implies that a 100(1 – )%
confidence interval for  is

 


 x  z 2 n , x  z 2 n   x  z 2 n .


(12)
B. The output temperature of a gas turbine is N  625 C,222  .
An engineer has “tweaked” the design in order to increase
it by at least 5%. The standard deviation will not be
affected. A test of 16 turbines gives a sample mean of
678º. By (12), a 95% confidence interval for the
(population) mean is
22
678  1.96
 678  10.78  667.2,688.8.
16
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
Relationship between Tests of Hypotheses and
Confidence Intervals
If [l,u] is a 100(1 - ) percent confidence interval for the
parameter, then the test of significance level  of the
hypothesis
will lead to rejection of H0 if and only if the hypothesized
value is not in the 100(1 - ) percent confidence interval
[l, u].
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
Confidence Level and Precision of Estimation
The length of the two-sided 95% confidence interval is
whereas the length of the two-sided 99% confidence
interval is
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
Choice of Sample Size
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
Choice of Sample Size
Example 7: The tensile strength of a fiber used in the
manufacture of insulation material is normally distributed
with a standard deviation of 0.04gm. Having made some
process adjustments, however, the mean has not been
established. In a test of 40 fibers, the data shown on the
next slide was collected. Find a 99% confidence interval
for the population mean.
Tensile Strength (gms)
9.99
10.05
10.02
9.99
10.05
9.96
10.00
9.99
10.03
10.05
10.02
9.98
9.96
10.01
10.00
9.90
10.04
9.99
9.93
9.99
9.97
9.96
10.00
10.07
9.95
9.96
9.97
9.97
10.01
9.97
9.99
10.01
10.01
10.01
9.95
9.99
9.97
9.95
9.96
10.01
x  9.9908
The 99% confidence interval is:
x  z0.005

0.04
 9.9908  (2.5758)
n
40
 9.9908  (2.5758)0.00632
 9.9908  0.0163  9.974,10.007 
Let’s see how to do it using MINITAB.
Do Basic Statistics > 1-Sample Z
The result of Example 7 is: “With 99% confidence, the
mean tensile strength is between 9.974 and 10.007
grams.” However, for a variable like tensile strength, an
engineer would perhaps like to say: “With 99%
confidence, the mean tensile strength is at least
(whatever).”
This brings us to the notion of one-sided confidence
intervals, defined next.
Definition: Suppose  is an unknown parameter. (i) Suppose
L is a statistic based on a sample X1 , X 2 , , X n . If
P   L   1   ,
(23a)
then the observed value of L, l say, is called a 100(1 – )%
one-sided lower confidence limit or bound for .
(ii) Suppose U is a statistic based on a sample
X1 , X 2 , , X n . If
P   U   1   ,
(23b)
then the observed value of U, u say, is called a 100(1 – )%
one-sided upper confidence limit for .
We said we would define one-sided confidence
intervals, but in the definition, we refer to one-sided
limits or bounds. Well – we really have defined
intervals. Corresponding to the limits l and u in
(23a,b) are the intervals (-, u) and ( l, ).
For the present case of sampling from N   , 2  with 
known, one-sided lower and upper 100(1  )%
confidence limits are respectively
x  z

n
and x  z

n
.
(24)
Example 8: In Example 7, it would probably have been
more appropriate to report a one-sided confidence limit.
Suppose there is a customer-mandated spec that specifies
that the mean strength be at least 10. The lower 99% onesided confidence limit for the mean tensile strength of this
fiber is
9.9908  z0.01 (0.00632)  9.9908  2.3263(0.00632)  9.976
Remark: MINITAB Student doesn’t do one-sided limits
directly. Can you figure out how to do it indirectly?
4-4 Inference on the Mean of a Population,
Variance Known
4-4.6 Confidence Interval on the Mean
One-Sided Confidence Bounds
Problem 4-27
•
In the production of airbag inflators for
automotive safety systems, a company is
interested in ensuring that the mean
distance of the foil to the edge of the
inflator is at least 2 cm. Measurements
on 20 inflators yielded an average value
of 2.02 cm. Assume a standard deviation
of .05 on the distance measurements
and a significance level of .01.
a) Test for conformance to the company’s
requirement.
Airbag Inflator and Assembly
Problem 4-27
b) What is the P-value of this test?
c) What is the β of this test if the true mean is
1.97 cm.?
d) What sample size would be necessary to
detect a true mean of 1.97 with probability of at
least .90?
e) Find a 99% lower confidence bound on the
true mean.
f) Use the CI found in part e to test the
hypothesis in part a.
g) What sample size is required to be 95%
confident that the error in estimating the mean
is less than .01
4.4 Case II: Inference on the Mean of a
Population, Variance Unknown


Let X1 , X 2 , , X n be a random sample from N  ,  2 ,
with  ² unknown.
This case differs from Case I in that now we shall
consider the more realistic situation in which the
variance is unknown. The primary reason for
considering Case I at all is that it is a convenient way
of illustrating the basic concepts of confidence
interval estimation and hypothesis testing.
When sampling from the normal distribution the
exact confidence interval for  is based solely on
the fact that
X 
 n
N  0,1
(27a)
and for the exact hypothesis test, under H0
X  0

n
N  0,1 .
(27b)
Although (27a,b) are true whether or not the
variance is known, the problem is that if  is not
known this knowledge does us no good with
respect to confidence intervals and hypothesis tests
for the mean.
Intuitively, what you don’t know, you estimate. In
particular, suppose in (27a), we estimate the
population standard deviation by the sample
standard deviation. This leads to the question
T
X 
S n
?
The pdf of the random variable T defined in (28) has been
derived [see eqn. (4-40), pg. 163, text)]. The distribution is
called (Student’s) t distribution. The parameter of a t
distribution is it’s degrees of freedom, say k.
In particular, the random variable defined by (28) has n – 1
degrees of freedom (write T tn1 ).
Thus, for each sample size, T has a different pdf.
Intuitively, it would seem that the pdfs of the t distributions
should look like standard normal pdfs. They do, but there is
a difference. Some representative t pdfs are shown on the
next slide.
Higher tails than pdf
of standard normal
Pdfs of several t distributions, where k denotes the degrees of
freedom. They look much like standard normal pdfs, but the
tails are higher (more variance), but note that the t pdf
approaches that of the standard normal as k  .
Percentage points of t distributions are defined similarly
to those of the standard normal as shown in the figure
below.
Percentage points of the t distribution, where k denotes the
degrees of freedom. To obtain percentage points, use
MINITAB or Table II in Appendix A. We’ll demonstrate.
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
Example 4-7
• A sample of size 15 was taken on the
coefficient of restitution of golf club drivers.
The variance of the coefficient is unknown
• Does the mean coefficient of restitution
exceed .82
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.1 Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.2 P-value for a t-Test
b - P( Type II Error)
We need to know what is the distribution of the
test statistic
X 
T0 
0
S
n
under H1?
The answer is: T0 has a so-called non-central t-distribution.
The probability for making a type II error, and the
corresponding power of the test can be illustrated for
hypothesis tests having an unknown variance in an
identical fashion as was done for tests having a known
variance. However, numerically this is a cumbersome
computational procedure, an we will find β values and the
power of a test using Minitab only.
β - P( Type II Error)
Example 4-8
If the mean differs from .82 by as much as .02, is
the sample size of 15 adequate to ensure the
null hypothesis will be rejected with probability of
at least .80
• Stat>Power and Sample>1 sample t
Calculate power for each sample size
Specify a sample size, difference, sigma, click on
options and insert the type of alternative
hypothesis and level of significance.
Sample size
• The sample size required to detect a difference
from the hypothesized mean when the variance
of the characteristic of interest is unknown, with
a specified power will be determined using
Minitab.
Example 4-8
• Stat>Power and Sample>1 sample t
Calculate sample size for each power value
Specify a power, difference, sigma, click on
options and insert the type of alternative
hypothesis and level of significance.
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.3 Type II Error and Choice of Sample Size
Fortunately, this unpleasant task has already been done,
and the results are summarized in a series of graphs in
Appendix A Charts Va, Vb, Vc, and Vd that plot for the
t-test against a parameter d for various sample sizes n.
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.3 Type II Error and Choice of Sample Size
These graphics are called operating characteristic
(or OC) curves. Curves are provided for two-sided
alternatives on Charts Va and Vb. The abscissa scale
factor d on these charts is defined as
Confidence Intervals for the Mean
We can derive a confidence interval for the t-distributed
test statistic in a manner similar to when the test statistic
follows the standard normal distribution. We can write:


X 
P  t 2,n1 
 t 2,n1   1  
S n



 P X t
n  X    t 2,n1 S
 P t 2,n1 S
 2,n 1
where T
S
n    X  t 2,n1 S
X 
~ tn1
S n

n  1

n  1 .
(29)
By the definition of confidence intervals (see (9)), we see
from (29) that
A 100(1 – )% confidence interval for the mean of a
normal distribution, variance unknown, is
s
s 
s

 x  t 2,n1 n , x  t 2,n1 n   x  t 2,n1 n .


(30)
To obtain one-sided lower (upper) 100(1 – )% limits, in
(30), change /2 to  and take the lower (upper) limit.
Example 9: Suppose in Example 7 neither the mean nor the
standard deviation is known.
A. Use the data given in Example 7 to find a 99%
confidence interval for the mean.
In Example 7, x  9.9908 and s  0.03540. The 99%
confidence interval when the variance is unknown is thus
s
0.03540
x  t0.005,39
 9.9908  (2.7079)
n
40
 9.9908  (2.7079)0.0055972
 9.9908  0.015156  9.976,10.006
Recall when the variance was known in ex. 7, the CI was
 9.9908  0.0163  9.974,10.007 
B. Now use MINITAB.
Go to: Stat>Basic Statistics > 1-Sample t
The results appear on the next slide.
Sample Size Determination for Confidence
Intervals (Variance Unknown)
Recall that in the case where the variance is known,
sample size for a 100(1  )% confidence interval is
given by (26), repeated below
 z 2 
n
 .
 E 
2
(26)
In the present case with the variance unknown, the best that
can be done is to use (26) with your best guess of . You
can also take a pilot sample and estimate .
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.4 Confidence Interval on the Mean
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.4 Confidence Interval on the Mean
4-5 Inference on the Mean of a Population,
Variance Unknown
4-5.4 Confidence Interval on the Mean