Download Lecture note

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Statistical inference wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Chapter 12
Inference About One
Population
1
12.1 Introduction
• In this chapter we utilize the approach developed
before to describe a population.
– Identify the parameter to be estimated or tested.
– Specify the parameter’s estimator and its sampling
distribution.
– Construct a confidence interval estimator or perform
a hypothesis test.
2
12.1 Introduction
• We shall develop techniques to estimate and test
three population parameters.
– Population mean m
– Population variance s2
– Population proportion p
3
12.2 Inference About a Population Mean
When the Population Standard Deviation
Is Unknown
Recall that when s is known we use the following
statistic to estimate and test a population mean
z
xm
s
n
When s is unknown, we use its point estimator s,
and the z-statistic is replaced then by the t-statistic
4
The t - Statistic
ZZt t t
Z
ttt
Z
x

m
xm
Z
t
t t t t
Z
ss n
s
s
s
sss n
s
ssss sssssss
When the sampled population is normally distributed,
the t statistic is Student t distributed.
5
The t - Statistic
Using the t-table
t
The t distribution is mound-shaped,
and symmetrical around zero.
d.f. = v2
v1 < v2
d.f. = v1
0
xm
s
n
The “degrees of freedom”,
(a function of the sample size)
determine how spread the
distribution is (compared to the
normal distribution)
6
Testing m when s is unknown
• Example 12.1 - Productivity of newly hired
Trainees
7
Testing m when s is unknown
• Example 12.1
– In order to determine the number of workers required
to meet demand, the productivity of newly hired
trainees is studied.
– It is believed that trainees can process and distribute
more than 450 packages per hour within one week of
hiring.
– Can we conclude that this belief is correct, based on
productivity observation of 50 trainees
8
(see file Xm12-01).
Testing m when s is unknown
• Example 12.1 – Solution
– The problem objective is to describe the population
of the number of packages processed in one hour.
– The data are interval.
H0:m = 450
H1:m > 450
– The t statistic
t
x m
s
n
d.f. = n - 1 = 49
9
Testing m when s is unknown
• Solution continued (solving by hand)
– The rejection region is
t > ta,n – 1
ta,n - 1 = t.05,49
@ t.05,50 = 1.676.
From the data we have
 x i  23,019
2
x
 i  10,671,357, thus
23,019
x
 460 .38, and
50

x

x 

n

2
s2
2
i
i
n 1
s  1507 .55  38.83
 1507 .55.
10
Testing m when s is unknown
Rejection region
• The test statistic is
t
1.676
x m
s
n

460.38  450
38.83
50
1.89
 1.89
• Since 1.89 > 1.676 we reject the null hypothesis in favor
of the alternative.
• There is sufficient evidence to infer that the mean
productivity of trainees one week after being hired is
greater than 450 packages at .05 significance level.
11
Testing m when s is unknown
t-Test: Mean
Pack ages
Mean
460.38
Standard Deviation
38.83
Hypothesized Mean
450
df
49
t Stat
1.89
P(T<=t) one-tail
0.0323
t Critical one-tail
1.6766
P(T<=t) two-tail
0.0646
t Critical two-tail
2.0096
.05
.0323
• Since .0323 < .05, we reject the null hypothesis in favor of the
alternative.
• There is sufficient evidence to infer that the mean productivity of
trainees one week after being hired is greater than 450
packages at .05 significance level.
12
Estimating m when s is unknown
• Confidence interval estimator of m when s is
unknown
x  ta
s
2
n
d.f .  n  1
13
Estimating m when s is unknown
• Example 12.2
– An investor is trying to estimate the return on
investment in companies that won quality awards
last year.
– A random sample of 83 such companies is selected,
and the return on investment is calculated had he
invested in them.
– Construct a 95% confidence interval for the mean
return.
14
Estimating m when s is unknown
• Solution (solving by hand)
– The problem objective is to describe the population
of annual returns from buying shares of quality
award-winners.
– The data are interval.
x  15 .02 s 2  68 .98
s  68 .98  8.31
– Solving by hand
• From the Xm12-02 we determine
x  ta
2, n 1
s
@ 15 .02  1.990
n
t.025,82@ t.025,80
8.31
83
 13 .19,16 .85 
15
Estimating m when s is unknown
t-Estimate: Mean
Mean
Standard Deviation
LCL
UCL
Returns
15.02
8.31
13.20
16.83
16
Checking the required conditions
• We need to check that the population is normally
distributed, or at least not extremely nonnormal.
• There are statistical methods to test for normality
(one to be introduced later in the book).
• From the sample histograms we see…
17
A Histogram for Xm12- 01
14
12
10
8
6
4
2
0
400
425
450
475
500
525
550
Packages
A Histogram for Xm12- 02
30
575
More
25
20
15
10
5
0
-4
2
8
14
Returns
22
30
More
18
12.3 Inference About a Population Variance
• Sometimes we are interested in making inference
about the variability of processes.
• Examples:
– The consistency of a production process for quality
control purposes.
– Investors use variance as a measure of risk.
• To draw inference about variability, the parameter
of interest is s2.
19
12.3 Inference About a Population Variance
• The sample variance s2 is an unbiased, consistent and
efficient point estimator for s2.
(n  1)s 2
• The statistic
has a distribution called Chi2
s
squared, if the population is normally distributed.
d.f. = 5
2 
(n  1)s 2
s
2
d.f .  n  1
d.f. = 10
20
Testing and Estimating a Population
Variance
• From the following probability statement
P(21-a/2 < 2 < 2a/2) = 1-a
we have (by substituting 2 = [(n - 1)s2]/s2.)
(n  1)s 2
 2a / 2
 s2 
(n  1)s 2
12a / 2
21
Testing the Population Variance
• Example 12.3 (operation management application)
– A container-filling machine is believed to fill 1 liter
containers so consistently, that the variance of the
filling will be less than 1 cc (.001 liter).
– To test this belief a random sample of 25 1-liter fills
was taken, and the results recorded (Xm12-03)
– Do these data support the belief that the variance is
less than 1cc at 5% significance level?
22
Testing the Population Variance
• Solution
– The problem objective is to describe the population of 1-liter fills
from a filling machine.
– The data are interval, and we are interested in the variability of
the fills.
– The complete test is:
H0: s2 = 1
2
2
H1: s <1
(n  1)s
2
The test statistic is  
The rejection
region
.
s
is  2  12a ,n1
2
23
Testing the Population Variance
• Solving by hand
– Note that (n - 1)s2 = S(xi - x)2 = Sxi2 – (Sxi)2/n
– From the sample (Xm12-03) we can calculate Sxi = 24,996.4,
and Sxi2 = 24,992,821.3
– Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78
2
(
n

1
)
s
20.78
2
 
 2  20.78,
2
s
1
12a ,n1  .295,251  13.8484.
There is insufficient evidence
to reject the hypothesis that
the variance is less than 1.
Since 13.8484  20.78, do not reject
the null hypothesis.
24
Testing the Population Variance
a = .05
1-a = .95
Rejection
region
 2  13.8484
13.8484 20.8
2
.295,251
Do not reject the null hypothesis
25
Estimating the Population Variance
• Example 12.4
– Estimate the variance of fills in Example 12.3 with
99% confidence.
• Solution
– We have (n-1)s2 = 20.78.
From the Chi-squared table we have
2a/2,n-1 = 2.005, 24 = 45.5585
21a/2,n-1 2.995, 24 = 9.88623
26
Estimating the Population Variance
• The confidence interval estimate is
(n  1)s
(n  1)s
2
s  2
2
a / 2
1a / 2
2
2
20.78
20.78
2
s 
45.5585
9.88623
.46  s  2.10
2
27
12.4 Inference About a Population
Proportion
• When the population consists of nominal data, the
only inference we can make is about the
proportion of occurrence of a certain value.
• The parameter p was used before to calculate
these probabilities under the binomial distribution.
28
12.4 Inference About a Population
Proportion
• Statistic and sampling distribution
– the statistic used when making inference about p is:
x
p̂  where
n
x  the number of successes .
n  sample size .
– Under certain conditions, [np > 5 and n(1-p) > 5],
p̂ is approximately normally distributed, with
m = p and s2 = p(1 - p)/n.
29
Testing and Estimating the Proportion
• Test statistic for p
p̂  p
Z
p(1  p) / n
where np  5 and n(1  p)  5
• Interval estimator for p (1-a confidence level)
p̂  z a / 2 p̂(1  p̂) / n
provided np̂  5 and n(1  p̂)  5
30
Additional example
Testing the Proportion
• Example 12.5 (Predicting the winner in election day)
– Voters are asked by a certain network to participate in an
exit poll in order to predict the winner on election day.
– Based on the data presented in Xm12-05 where
1=Democrat, and 2=Republican), can the network
conclude that the republican candidate will win the state
college vote?
31
Testing the Proportion
• Solution
– The problem objective is to describe the population
of votes in the state.
– The data are nominal.
– The parameter to be tested is ‘p’.
– Success is defined as “Vote republican”.
– The hypotheses are:
H0: p = .5
H1: p > .5
More than 50% vote Republican
32
Testing the Proportion
– Solving by hand
• The rejection region is z > za = z.05 = 1.645.
• From file we count 407 success. Number of voters
participating is 765.
• The sample proportion is p̂  407 765  .532
• The value of the test statistic is
Z
p̂  p
p(1  p) / n

.532  .5
.5(1  .5) / 765
 1.77
• The p-value is = P(Z>1.77) = .0382
33
Testing the Proportion
z-Test : Proportion
Sample Proportion
Observations
Hypothesized Proportion
z Stat
P(Z<=z) one-tail
z Critical one-tail
P(Z<=z) two-tail
z Critical two-tail
0.532
765
0.5
1.77
0.0382
1.6449
0.0764
1.96
There is sufficient evidence to reject the null hypothesis
in favor of the alternative hypothesis. At 5% significance
level we can conclude that more than 50% voted Republican.
34
Estimating the Proportion
• Nielsen Ratings
– In a survey of 2000 TV viewers at 11.40 p.m. on a
certain night, 226 indicated they watched “The Tonight
Show”.
– Estimate the number of TVs tuned to the Tonight Show
in a typical night, if there are 100 million potential
television sets. Use a 95% confidence level.
– Solution
pˆ  za / 2 pˆ (1  pˆ ) / n  .113  1.96 .113 (1  .113 ) / 2000
.113  .014
35
Estimating the Proportion
• Solution
z - Estimate: Proportion
Viewers
Sample Proportion
Observations
LCL
UCL
0.113
2000
0.099
0.127
A confidence interval estimate of the
number of viewers who watched the
Tonight Show:
LCL = .099(100 million)= 9.9 million
UCL = .127(100 million)=12.7 million
36
Selecting the Sample Size to Estimate
the Proportion
• Recall: The confidence interval for the proportion is
pˆ  za / 2 pˆ (1  pˆ ) / n
• Thus, to estimate the proportion to within W, we can
write
W  za / 2 pˆ (1  pˆ ) / n
37
Selecting the Sample Size to Estimate
the Proportion
• The required sample size is
 za / 2 pˆ (1  pˆ )
n
W




2
38
Sample Size to Estimate the Proportion
• Example
– Suppose we want to estimate the proportion of customers
who prefer our company’s brand to within .03 with 95%
confidence.
 1.96 p̂(1  p̂)
– Find the sample size.
n
– Solution
.03

W = .03; 1 - a = .95,
therefore a/2 = .025,
so z.025 = 1.96
Since the sample has not yet
been taken, the sample proportion
is still unknown.
We proceed using either one of the
following two methods:
39



2
Sample Size to Estimate the Proportion
• Method 1:
– There is no knowledge about the value of p̂
• Let p̂  .5 . This results in the largest possible n needed for a
1-a confidence interval of the form p̂  .03 .
• If the sample proportion does not equal .5, the actual W will be
narrower than .03 with the n obtained by the formula below.
• Method 2:
– There is some idea about the value of p̂
• Use the value of p̂ to calculate the sample size
 1.96 .5(1  .5)
n
.03

2

  1,068
 1.96 .2(1  .2)

n
.03

2

  683

40