* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Population characteristics: Population mean
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
STAT 211
1
Handout 7
Chapter 7: Statistical Intervals and
Chapter 8:Tests of Hypotheses based on a Single Sample
A point estimate of a population characteristic is a single number that is based on sample data and
represents a plausible value of the characteristic.
The best statistic (MVUE) is the unbiased statistic with the smallest standard deviation.
Since the point estimate is a single number, it does not provide information about the precision and
reliability of estimation.
A confidence interval for a population characteristic (parameter) is an interval of plausible values
for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the
characteristic will be captured inside the interval. The confidence level, 1-, associated with a
confidence interval estimate the success rate of the method used to construct the interval.
If we repeatedly sample from a population and calculate a confidence interval each time with the
data available, then over the long run the proportion of the confidence intervals that actually contain
the true value of the population characteristic will be 100(1-)% (95%, 90%, or 99% for =0.05,
0.10, or 0.01, respectively).
The general form of a confidence interval:
(point estimate for a specified statistic) (critical value).(standard error for the point estimate).
What is the point estimator for parameters, , 2, p? _____________
_
Empirical Rule tells you about 95% of all our values for x will be within 1.96 standard deviation
from the mean.
1- when you compute 95% confidence interval is 0.95
when you compute 95% confidence interval is 0.05
z / 2 when you compute 95% confidence interval is 1.96
A hypothesis is a claim or statement either about the value of a single population characteristic or
about the values of several population characteristics.
Statistical testing involves two complementary hypotheses:
H0: null hypothesis
Ha : alternative hypothesis
Both of them are based on population characteristics.
STAT 211
2
One-sided (One-tailed) test:
Lower tailed: H0: population characteristics claimed constant value
(Left-sided) Ha: population characteristics < claimed constant value
Upper tailed: H0: population characteristics claimed constant value
(Right-sided) Ha: population characteristics > claimed constant value
Two-sided (Two-tailed) test: H0: population characteristics = claimed constant value
Ha: population characteristics claimed constant value
Example 1: Identify the population characteristics (parameter) and then H0 and Ha for the following.
The burning rate of propellant is an important product characteristic. Specifications require that
the mean burning rate must be 50 cm/s.
The sugar content of the syrup in canned peaches is normally distributed and the variance is
thought to be exceeding 18 mg2.
Consider the defective circuit data. Test the claim that the fraction of defective units produced is
less than 0.05.
Pizza hut, after test-marketing a new product called Bigfoot Pizza, concluded that the
introduction of The Bigfoot nationwide would increase their average sales by more than their
usual 14.
A television manufacturer claims that at least 90% of its sets will need no service during the
first three years of operation.
Water samples are taken from water used for cooling as its being discharged from a power plant
into river. It has been determined that as long as the mean temperature of the discharged is at
most 150F, there will be no negative effects on the river's ecosystem.
Hypothesis testing involves two complementary actions or choices, reject H0 and fail to reject H0. A
major concern in hypothesis testing is controlling the incidence of the two kinds of errors:
= P(Type I error) = P(reject H0 when it is true)
= P(Type II error) = P(fail to reject H0 when it is false)
1- = 1-P(Type II error) = P(reject H0 when it is false)
is also called as the significance level and 1- as the power of the test.
The following table summarizes the decisions:
Acts
Events
Fail to reject H0 Reject H0
H0 is true Correct Decision Type I error
H0 is false Type II error
Correct Decision
Test statistic: A function of the sample data on which the decision will be based.
Rejection region: The set of all test statistic values for which H0 will be rejected.
STAT 211
3
Example 2: The following hypotheses are to be tested: H 0 : 100 versus H a : 100 . Assume
that the population standard deviation is =28 and the sample size is n=100. The following
decision rule applies:
_
Fail to reject H 0 if x 104
_
Reject H 0
if x 104
(a) Compute the type I error probability, when =100 and type II error probability, when
=110.
=0.0764 and =0.0162
(b) Change the sample size from 100 to 200. Recompute the type I error probability, when =100
and type II error probability, when =110.
=0.0217 and =0.0012
(c) Is the and in part (b) are larger than the and in part (a)
How do we improve and ?
Example 3 (Exercise 8.9): Two different companies have applied to provide cable TV in a certain
region
p=proportion of all potential subscribers who favor the first company over the second one.
Test H 0 : p 0.5 versus H a : p 0.5 based on random sample of 25 individuals
X: the number in the sample who favor the first company over the second one
(a) Which of the following is the possible rejection region?
R1={x: x 7 or x 18}
R2={x: x 8}
R3={x: x 17}
(b) What is the probability distribution of X?
(c) What is the probability of type I error?
(d) What is the probability of type II error when p=0.3?
(e) What is the power when p=0.3?
(f) What would you conclude if 6 out of 25 individuals favored company 1?
(g) What would you conclude if 6 out of 25 individuals favored company 2?
P-value : The probability that the test statistic will take on a value that is at least as extreme as the
observed value of the statistics when the null hypothesis is true. It is the smallest level of
significance at which the null hypothesis would be rejected. It is customary to call the test statistic
(and the data) significant when the null hypothesis is rejected.
STAT 211
4
Population characteristics: Population mean,
0 is the claimed constant.
_
x is the sample mean
__
and s __
n
Assumption
X
Test
statistics
_
s
are the population and sample standard deviation of x , respectively.
n
normal
population and
is known
X
_
z
x 0
__
X
Unknown population and is
unknown with a large sample
size (n >40) by CLT
_
normal population and is
unknown with small sample
size (n 40)
_
x 0
t
s __
x 0
z
s __
X
X
Decision can be made in one of the three ways:
a. Let z* or t* be the computed test statistic values.
if test statistics is z
if test statistics is t
Lower tailed test P-value = P(z<z*)
P-value = P(t<t*)
Upper tailed test P-value = P(z>z*)
P-value = P(t>t*)
Two tailed test
P-value = 2P(z>|z*|)=2P(z<-|z*|) P-value = 2P(t > |t*| )=2P(t <- |t*| )
In each case, you reject H0 if P-value and fail to reject H0 (accept H0) if P-value >
b. Rejection region for level test:
if test statistics is z
Lower tailed test
z -z
Upper tailed test
z z
Two tailed test
|z| z/2
if test statistics is t
t -t;n-1
t t;n-1
|t| t/2;n-1
c. Confidence interval for the population mean,
Normal population,
is known
_
_
x z / 2
, x z / 2
n
n
Unknown population,
is unknown,
large sample (n >40) by
CLT
_
s _
s
x z / 2
, x z / 2
n
n
_
s
A large sample upper confidence bound for is x z
n
_
s
A large sample lower confidence bound for is x z
n
Normal population,
is unknown,
small sample (n 40)
_
s _
s
x t / 2;n 1
, x t / 2;n 1
n
n
STAT 211
5
Choosing the sample size: With the known desired confidence level and interval width, we can
determine the necessary sample size. Let X1, X2, ....,Xn be a random sample from a normal
population with the unknown population mean and the known population standard deviation ,
The width of the interval is w= 2 z / 2
n
The bound on the error estimation is B= z / 2
_
I mean x will be within z / 2
n
n
.
of .
The sample size required to estimate a population mean to within z / 2
n
with 100(1)%
z 2z
confidence is n= / 2 = / 2 .
B w
2
2
Example 4:Each of the following is a confidence interval for true average amount of time spent by
the patients using physical therapy device using the sample data: (10.90, 25.44), (13.58, 22.76).
Both intervals are computed using the sample data with the only difference being the confidence
level.
(a) What is the value of the sample mean time spent by the patients using physical therapy device?
(b) The confidence level for one of these intervals is 95% and for the other is 99%. Which of the
intervals has the 95% confidence level and why?
Example 5: Suppose we want to estimate the average # of violent acts on TV per hour for a specific
network. Data was collected from viewing random selection of 50 prime time hours and average of
11.7 violent acts were recorded. Suppose it is known that =5 and population distribution is
normal.
The 95% CI for is (10.3141 , 13.0859)
The 95% confidence interval for if 100 prime time hours had been viewed where the same mean
and the variance obtained is (10.72 , 12.68)
The 90% CI for is (10.5368 , 12.8632)
The width of the 90% confidence interval for is 2.3264
The bound on the error estimation of the 90% confidence interval for is 1.1632
Example 6: Investigators would like to estimate the average taxable income of apartment dwellers
to within $500, using a 95% CI for the normally distributed data. Suppose that the previous studies
show that standard deviation is $8000. How many people should they study? (Answer: 984)
STAT 211
6
Example 7: The brightness of a television picture tube can be evaluated by measuring the amount of
current required to achieve a particular brightness. An engineer has designed a tube that he believes
will require 300 microamps of current to produce the desired brightness level. A sample of 10 tubes
results in the average of 317.2 and the standard deviation 15.7. Data follows the normal
distribution.
(a) Using 95% confidence interval, did he achieve the desired brightness?
(b) Hypothesize the belief and test to see if he achieved the desired brightness?
Specify the hypothesis
Show test statistics=3.46
Either compute the P-value (=between 0.002 and 0.01) or by looking at the rejection region,
make a decision (Fail to reject H0/Reject H0)
Write the conclusion
Example 8: I want to see how long on average, it takes Drano to unclog a sink. In a recent
commercial, the stated claim was that it takes on average, 15 minutes. I wanted to see if that claim
was true, so I tested Drano on 64 randomly selected sinks. I found that it took an average of 18
minutes with standard deviation of 2.5 minutes. Was their claim false?
99% CI for is (17.1953 , 18.8047)
90% CI for is (17.4859 , 18.5141)
How would you answer this using the hypothesis testing?
Specify the hypothesis
Show test statistics=9.6
Either compute the P-value (=0) or by looking at the rejection region, make a decision (Fail to reject
H0/Reject H0)
Write the conclusion
Would my answer be different if I tested Drano on 25 randomly selected sinks and I found that it
took an average of 18 minutes with standard deviation of 2.5 minutes?
99% CI for is (16.6015 , 19.3985)
90% CI for is (17.1445 , 18.5555)
How would you answer this using the hypothesis testing?
Specify the hypothesis
Show test statistics= 6
Either compute the P-value ( <0.001) or by looking at the rejection region, make a decision (Fail to
reject H0/Reject H0)
Write the conclusion
Example 9: Students weighed in kilograms at the beginning and end of a semester long fitness class.
Assume the population of weight changes follows a normal distribution. A random sample of 12
female students yielded a mean of 0.45 and standard deviation of 1.5.
99% CI to estimate the true mean weight change is (-0.8949 , 1.7949).
Would you believe me if I claimed the average weight change was 0?
STAT 211
7
How would you answer this using the hypothesis testing?
Specify the hypothesis
Show test statistics= 1.04
Either compute the P-value ( >0.20) or by looking at the rejection region, make a decision (Fail to
reject H0/Reject H0)
Write the conclusion
Example 10: Determine the confidence level for each of the following large sample one-sided
confidence bounds.
_
s
(a) Upper bound: x 0.93
(Answer: 0.8238)
n
_
s
(b) Lower bound: x 1.75
(Answer: 0.9599)
n
Would your answer be different in small samples?
Example 11: An aptitude test has been used to test the ability of fourth graders to reason
quantitatively. The test is supposed to be calibrated so that the scores are normally distributed with
a mean of 50 and standard deviation of 10. It is suspected that the mean score is now higher than
50, although remains the same. The suspicion may be tested based on a sample of students who
have been exposed to a certain amount of computer-assisted learning. If the test is administered to a
random sample of 500 fourth graders and the sample mean is found to be 51.07, is the suspicion
confirmed? Does your decision and conclusion change if the significance level is 0.01 or 0.05?
Specify the hypothesis
Show test statistics=2.39
Either compute the P-value or by looking at the rejection region, make a decision (Fail to reject
H0/Reject H0)
Write the conclusion
Example 12: The average breaking strength of yarn used in manufacturing drapery material is
required to be at least 100 psi. Past experience indicated that the standard deviation of breaking
strength is 2 psi. A random sample of nine specimens is tested, and the average breaking strength is
found to be 98 psi.
a. Should the fiber be judged acceptable?
Specify the hypothesis
Show test statistics=-3
Either compute the P-value (=0.0013) or by looking at the rejection region, make a
decision (Fail to reject H0/Reject H0)
Write the conclusion
Does your decision and conclusion change if the significance level is 0.01 or 0.05?
b. Find a 95% two sided C.I on the true mean breaking strength.
(96.6933 , 99.3067)
STAT 211
8
A General Large Sample Confidence Interval
^
When the estimator satisfies the following properties,
a. The estimator has approximately a normal population distribution
b. It is at least unbiased
c. standard deviation of the estimator is known
The
confidence
interval
for
^
can
be
constructed
as
z / 2
^
where
^
P z / 2
z / 2 1
^
Example: large sample confidence interval for the parameter in Poisson distribution is
_
_
_
_
x _
x
x
, x z / 2
z / 2 1
x z / 2
where P z / 2
n
n
/n
Population characteristics: Population proportion, p
p0 is the claimed constant and p
p0 (1 p0 )
is the standard deviation of p.
n
^
Test Statistics: z
p p0
.
p
Need to check np0 10 and n(1-p0) 10 to be able use this test statistic.
Decision can be made in one of the three ways:
a. Let z be the computed test statistic value. Reject H0 when b. Rejection region for
P-value and fail to reject H0 when P-value > .
level test: reject H0
if
Lower
P-value = P( Z z )
z z
tailed test
Upper
P-value = P ( Z z )
z z
tailed test
TwoP-value = 2 P( Z | z |) 2 P( Z | z |)
| z | z / 2
tailed test
c. If n is sufficiently large, 100(1-)% large sample confidence interval for p is
^
^
^
^
^
^
p(1 p) ^
p(1 p)
p p
, p z / 2
z / 2 1
p z / 2
where P z / 2 ^
^
n
n
p
(
1
p
)/n
STAT 211
9
^
^
Check if n p 10 and n1 p 10 to see if you have a large sample to use this confidence
interval. Otherwise, there is a formula (7.10) in your textbook, which can be used without checking
if it is a large sample. I mean formula (7.10) can be used for large and small samples.
There is a table on page 341 of your textbook (336 for 5th edition) which gives you the formulas for
computing type II error probabilities with fixed type I error probability and computing sample size
when you know the probability of both errors.
Small sample tests will be discussed in class.
Choosing the sample size: With the known desired confidence level and interval width, we can
^
determine the necessary sample size. Bound on the error estimation is z / 2
^
^
^
p(1 p)
. I mean p
n
^
p(1 p)
will be within z / 2
of p. The sample size required to estimate a population proportion p
n
^
^
^
^
z2 / 2 p1 p
p(1 p)
. The same
to within an amount B= z / 2
with 100(1)% confidence is n=
n
B2
^
^
^
^
4 z2 / 2 p1 p
p(1 p)
.
formula can be written using the interval width, w= 2 z / 2
then n=
2
n
w
^
^
The conservative sample size can be found when p = 1 p =0.5
What is different in one-sided confidence intervals? Discussion
Example 13: We are interested in proportion of all students enrolled in Stat211 who listen to
country music. Using our class as random sample from Stat211 students, we see that ___________
out of ___________ listen to country music. Estimate the true proportion of all Stat211 students
that listen to country music using 90% confidence interval.
What parameter are we estimating?_______________
Example 14: Scripps News service reported that 4% of the members of the American Bar
Association (ABA) are African American. Suppose that this figure is based on a random sample of
400 ABA members.
(a) Is the sample size large enough to justify the use of the large-sample confidence interval for a
population proportion?
(b) Construct and interpret a 90% confidence interval for the true proportion of all ABA members
who are African American. (Answer: (0.0239 , 0.0561))
STAT 211
10
Example 15: I want to estimate the proportion of freshmen Aggies who will drop out before
graduation. How many Aggies should I include in my study in order to estimate p within 0.05 with
95% confidence? (Answer: 385)
Example 16: Drug testing of job applicants is becoming increasingly common. The associated press
reported that 12.1% of those tested in California tested positive. Suppose that this figure had been
based on a sample size 600, with 73 testing positive. Does this support a claim that more than 10%
of job applicants in California test positive for drug use?
Specify the hypothesis
Show test statistics=1.77
Either compute the P-value (=0.0384) or by looking at the rejection region, make a decision (Fail to
reject H0/Reject H0)
Write the conclusion
Does your decision and conclusion change if the significance level is 0.01 or 0.05?
Example 17: Let p denote the probability that a coin will land heads side up. The coin is tossed
50,000 times and 25,250 heads result. Using a significance level of 0.05, would you reject the
assertion that the coin is fair?
Specify the hypothesis
Show test statistics=2.24
Either compute the P-value (=0.025) or by looking at the rejection region, make a decision (Fail to
reject H0/Reject H0)
Write the conclusion
95% confidence interval for p is (0.5006 , 0.5094)
99% confidence interval for p is (0.4992 , 0.5108)
Does your decision and conclusion change if the significance level is 0.01?
Example 18: A random sample of 50 suspension helmets used by motorcycle riders and automobile
race-car drivers was subjected to an impact test, and on 18 of these helmets some damage was
observed. Do the data support the claim that the true proportion of helmets of this type that would
show damage from this test is less than 0.50, using =0.05? Does your decision and conclusion
change if the significance level is 0.01?
Specify the hypothesis
Show test statistics=-1.98
Either compute the P-value (=0.0239) or by looking at the rejection region, make a decision
(Fail to reject H0/Reject H0)
Write the conclusion
STAT 211
11
A Prediction Interval for a Single Future Value:
Let X1, X2, ...,Xn be a random sample from a normal population distribution and we wish to predict
the value of Xn+1, a single future observation. 100(1-)% prediction interval for Xn+1 is
_
_
_
x x n 1
1
1
x t / 2;n 1 s 1 , x t / 2;n 1 s 1 where P t / 2;n 1
t / 2;n 1 1
n
n
1
s 1
n
Example 19: What is the 99% prediction interval for the weight change of an individual student
from the population distribution in example 9? (Answer: (-4.3992 , 5.2992))
Tolerance Intervals: Let k be a number between 0 and 100. A tolerance interval for capturing at
least k% of the values in a normal distribution with a confidence level 100(1-)% has the form
_
_
x critical value s , x critical value s
Table A.6 is designed for the tolerance critical values where k=90, 95, 99 and =0.05 ,0.01 in one
and two-sided intervals.
Example 20: Use example 9 and calculate an interval that includes at least 95% of the student
weight changes in the population distribution using a confidence level of 99%. (Answer: (-5.355 ,
6.255))
Population characteristics: Population Variance, 2 or Standard Deviation, :
The population of interest is normal, so that X1, X2,.....,Xn constitutes a random sample from a
normal distribution with parameters and 2.
02 and 0 are the claimed constants for the population variance and the standard deviation,
respectively
s 2 is the sample variance.
_
x
x
i
2
(n 1) s
2
i 1
Test Statistics: =
2
2
n
0
2
0
Decision can be made in one of the two ways:
2
(a) Let calc
be the computed test statistic value. Reject H0 for level test if
Lower tailed test Upper tailed test Two-tailed test
2
2
2
2
calc
calc
calc
< 12 ;n1
> 2;n1
< 12 / 2;n1 or calc
> 2 / 2;n1
There is also P-value method which will be discussed in class.
STAT 211
12
(b) The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a
normal distribution with parameters and 2. Then the random variable
_
x
x
i
2
(n 1) s
i 1
2
2
n
freedom.
100(1-)%
2
has a chi-squared ( 2 ) probability distribution with n-1 degrees of
confidence
interval
for
2
is
(n 1) s 2 (n 1) s 2
, 2
2
1 / 2;n 1
/
2
;
n
1
where
(n 1) s 2
P 12 / 2;n1
2 / 2;n1 1 .
2
The details of the chi-squared ( 2 ) probability distribution will be discussed in class and the table
of critical values (Table A.7) will be demonstrated.
100(1-)% upper confidence bound for 2 is
100(1-)% lower confidence bound for 2 is
( n 1) s 2
2 ;n 1
( n 1) s 2
12 ;n 1
.
.
Example 21: Determine the following:
(a) The 95th percentile for the chi-squared distribution with n=20.
(b) The 5th percentile for the chi-squared distribution with n=20.
(c) P(10.117 2 30.143) where 2 is a chi-squared r.v. with n=20.
(d) P( 2 <10.283 or 2 >35.478) where 2 is a chi-squared r.v. with n=22.
Example 22: An automatic filing machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume 0.0153 (fluid ounces)2. If the
variance of fill volume exceeds 0.01 (fluid ounces)2, an unacceptable proportion of bottles will be
underfilled and overfilled. Is there evidence in the sample data to suggest that the manufacturer has
a problem with underfilled and overfilled bottles. Use a significance level 0.05 and assume that fill
volume has a normal distribution to answer the question. Does your decision and conclusion change
if the significance level is 0.01?
Specify the hypothesis
Show test statistics=29.07
By looking at the rejection region, make a decision (Fail to reject H0/Reject H0)
Write the conclusion
95% confidence interval for the variance of the fill volume is (0.0089,0.0326)
99% confidence interval for the variance of the fill volume?
STAT 211
13
Example 23 (Exercise 7.46, the 6th edition and Exercise 7.44, the 5th edition):
(a) Is it plausible to assume that the data come from a normal population distribution?
Normal Probability Plot for turbidity
99
ML Estimates
95
Mean:
25.3133
StDev:
1.52528
90
Percent
80
70
60
50
40
30
20
10
5
1
22
24
26
28
30
Data
Variable
turbidity
n
15
Mean
25.313
Median
25.800
TrMean
25.438
Variable
turbidity
Minimum
21.700
Maximum
27.300
Q1
24.100
Q3
26.700
StDev
1.579
SE Mean
0.408
(b) Calculate a 95% CI for the population standard deviation of turbidity.
(n 1) s 2 (n 1) s 2 14(1.579) 2 14(1.579) 2
95% CI for is
,
,
2
2
2 / 2;n 1
02.975;14
1
/
2
;
n
1
0
.
025
;
14
where 02.025;14 =26.119 and 02.975;14 =5.629
=(1.16 , 2.49)
Discussion on finding the confidence interval for the linear combination of the population means
Example 24 (Exercise 7.53, the 6th edition Exercise 7.51, the 5th edition): Four different groves of
fruit trees are selected for experimentation. The first three groves are sprayed with pesticides and
the fourth is treated with the ladybugs. We like to measure the difference in true average yields
between treatment with pesticides and treatment with ladybugs. Compute the 95% CI for
13 ( 1 2 3 ) 4 where i is the ith true average yield.
Treatment
ni
_
xi
1 (pesticide) 100 10.5
2 (pesticide) 90 10.0
3 (pesticide) 100 10.1
4 (ladybugs) 120 10.7
si
1.5
1.3
1.8
1.6
STAT 211
^
14
1 _
3
_
_
_
1
3
x1 x 2 x 3 x 4 (10.5 10 10.1) 10.7 =-0.5
2
2 2 2
_
_
_
_ 1
^ 1
Var Var x1 Var x 2 Var x 3 Var x 4 1 2 3 4
n3 n4
9
9 n1 n2
2
2
2
2
^ 1 1.5 1.3 1.8 1.6
Estimated Var
=0.0295
90 100 120
9 100
95% CI for is 0.5 1.96 0.0295 =(-0.8366 , -0.1634)
ADDITIONAL INFORMATION
Confidence Interval for a Population Mean,
(1) Let X1, X2, ....,Xn be a random sample from a normal population with the unknown population
mean and the known population standard deviation , then 100(1-)% confidence interval for is
_
_
_
x
x z / 2
, x z / 2
where P z / 2
z / 2 1
n
n
/ n
Thus, in 95% of all possible samples, will be captured in the following calculated confidence
_
interval: x 1.96
n
(2) Large Sample Confidence Interval for : Let X1, X2, ...,Xn is a random sample from a population
distribution with mean, and standard deviation, . For the large sample size n, the CLT implies
_
that X has approximately a normal distribution for any population distribution. The value of the
population standard deviation may not be known. Instead, the value of the sample standard
deviation s may be known. If n is sufficiently large (n>40), 100(1-)% large sample confidence
_
_
_
s _
s
s
s
, x z / 2
where P x z / 2
x z / 2
1
interval for is x z / 2
n
n
n
n
Thus, in 95% of all possible samples, will be captured in the following calculated confidence
_
s
interval: x 1.96
n
(3) Small Sample Confidence Interval for : When the sample size is small (n≤40), we have to
make specific assumptions to find the confidence intervals.
Assumption: The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample
from a normal distribution with both and unknown.
STAT 211
15
When the sample mean of a random sample of size n from a normal distribution with mean , the
_
random variable T
x
has a probability distribution called a t-distribution with n-1 degrees of
s/ n
freedom (Properties of t-distribution: discussion and t-distribution, Table A.5).
_
s _
s
100(1-)%
confidence
interval
for
is
where
x t / 2;n 1
, x t / 2;n 1
n
n
_
x
P t / 2;n1
t / 2;n1 1
s/ n
Thus, in 95% of all possible samples, will be captured in the following calculated confidence
_
s
interval: x t 0.025;n 1
n