Download Empirical Rule for `X - Department of Mathematics and Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Central Limit Theorem (CLT) is stated as follows:
Given a large random sample of size n from a population with mean  and
standard deviation , then The sample mean X is approximately normally
distributed with mean and standard deviation given by
 = 
X
 =  / n
X
Note: 1. n >30 is usually large enough for the CLT to apply.
2. If the population from which we sample is normal thenX is exactly
normally distributed with mean and standard deviation as above for any
sample size.
(1)
Empirical Rule for X
Consider a sample of size n from a population with mean  and standard deviation
. Suppose X is normal ( or approximately normal), with  =  and  = /n
X
X
(This would be the case if the population is normal or if the sample size is large).
Find the probability that X will be within (a) 2 of  (b) 3
of  .
X
X
(a) P( X will be within 2
X
of  )
=
(2)
(b) P( X will be within 3 of  )
X
=
In general the statement “X will be within k of  “ means that X lies between
X
-k
X
and
 +k 
X
If X is normal ( or approximately normal), then
P( X will be within k of  ) = P(-k < Z <k)
X
(3)
Z Confidence Interval
Suppose we are given the following:
Normal Population: Scores on a standardized test.
Population Mean :  (unknown)
Population S.D.:  =1.5
To estimate  we will take a srs of size n =25 and use X as our estimator. Recall
that since the population is normal,
X is normally distributed with  =  and  =  /n = 1.5/5 =.3
X
X
We would like to be able to express this estimate in the form X  E or
(X – E, X + E ). Here E is some error which determines the accuracy of our
estimate. Let’s take E = 2 
for now .
X
Thus we have
For any given sample this interval may or may not contain the true mean  . It
would be useful to know what the probability is that this interval covers  .
If the interval covers the true mean  then  is somewhere in the interval above so
thatX is in fact within 2 
( =0.6) of  .
X
Thus P [ (X - 2  , X + 2  ) covers ]
X
X
= P (X is within 2  of  )
X
=
=
(4)
To make the probability above a nice number, .95, we should replace 2 by 1.96.
Thus we can say
“ For 95% of all samples of size n =25, the interval (X - 1.96  , X + 1.96  )
X
X
will cover the true value of  .”
Or,
“ For 95% of all samples of size n =25, X will be within 1.96 of the true
X
population mean .”
The 95% value is called the LEVEL OF CONFIDENCE. This tells us the
probability the interval will cover .
The 1.96
= .588 is called the margin of error. This tells us how accurate X is
X
(i.e. how closeX will be to  for 95% of all samples).
The interval (X - 1.96  , X + 1.96  ) is called a 95%
X
X
Z-CONFIDENCE INTERVAL.
The simulation below will illustrate how confidence intervals work.
(5)
MTB > random 25 c1-c40;
SUBC> norm 10 1.5.
MTB > zint 95 1.5 c1-c40.
[ The first two command lines select 40 random samples each of size n =25 from a
normal distribution with  =10 and  = 1.5. The third command line forms the 95%
Z-CONFIDENCE INTERVAL for each sample]
Confidence Intervals (The assumed sigma = 1.5)
Variable
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40
N
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
Mean
10.459
9.826
10.388
9.741
10.441
10.331
8.941
10.205
10.163
10.009
10.455
10.365
10.626
10.090
10.339
10.208
10.356
9.943
10.015
9.924
10.037
9.490
9.972
10.330
9.635
9.292
10.053
9.484
10.666
9.896
9.942
10.100
9.483
9.691
10.390
10.569
9.813
9.905
10.442
9.945
StDev
1.661
1.486
1.600
1.297
1.766
1.637
1.264
1.627
1.560
1.619
1.787
1.220
1.475
1.677
1.103
1.480
1.508
1.388
1.318
1.473
1.271
1.345
1.484
1.644
1.609
1.558
1.072
1.726
1.402
1.640
1.583
1.657
1.496
1.623
1.369
1.178
1.326
1.489
1.405
1.919
(6)
SE Mean
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
95.0% CI
9.871, 11.047)
9.238, 10.414)
9.800, 10.976)
9.153, 10.329)
9.853, 11.029)
9.743, 10.919)
8.353,
9.529)
9.617, 10.793)
9.575, 10.751)
9.421, 10.597)
9.867, 11.043)
9.777, 10.953)
10.038, 11.214)
9.502, 10.678)
9.751, 10.927)
9.620, 10.796)
9.768, 10.944)
9.355, 10.531)
9.427, 10.603)
9.336, 10.512)
9.449, 10.625)
8.902, 10.078)
9.384, 10.560)
9.742, 10.918)
9.047, 10.223)
8.704,
9.880)
9.465, 10.641)
8.896, 10.072)
10.078, 11.254)
9.308, 10.484)
9.354, 10.530)
9.512, 10.688)
8.895, 10.071)
9.103, 10.279)
9.802, 10.978)
9.981, 11.157)
9.225, 10.401)
9.317, 10.493)
9.854, 11.030)
9.357, 10.533)
QUESTIONS
1.
(a) In theory, how many of the above intervals would you expect to cover the
true population mean  (=10)?
(b) In fact how many actually do?
2. Suppose you selected 40 samples of size n =25 from a real population ( where
typically the population mean and standard deviation are unknown).
(a) Could you form a 95% Z- confidence interval for each sample?
Explain.
(b) If you knew  and formed forty 95% Z-confidence intervals, how
many of the intervals would you expect to cover the population  ? Could you tell
which? Explain.
(7)
Note: (i) 100(1-)% Z-confidence interval of  is given by
X  Z/2  ; where
X
 =  /n
X
(ii) For 95% Z –confidence interval ,  = .05. hence 95% Z-confidence interval of 
is
X  1.96 
;
X
where  =  /n
X
(iii) 99% Z-confidence interval of  is
X  2.576 
;
X
where  =  /n
X
(iv) 90% Z-Confidence Interval of  is
X  1.645 
;
X
where  =  /n
X
(8)
The t-distribution
The t-distribution depends on a single parameter. This parameter is called its
degrees of freedom (df). If sampling is done from a normal distribution whose mean
is  and standard deviation , then
X - 
Z = 
 /n
follows standard normal distribution. Since,  in practice is mostly unknown;
therefore, we can replace it by its estimate s. The random variable
X - 
T = 
S /n
follows t-distribution with n-1 degrees of freedom.
Sketch of t-distribution In comparison with standard normal distribution, the tdistribution has more area in the tails while the standard normal distribution has
more area in the middle.
t-curve approaches Z-curve if df is large.
(9)
T-Interval: Confidence Interval for the Mean  of a Normal Population
( unknown)
If a random sample X1 , X2 . . . Xn is chosen from a normal distribution; then
100(1-)% Confidence Interval of  is
X  t/2 SE
where:
df for t is n-1,
SE = s/n = standard error of X ( the estimated sd of X),
X =
s2 =
s=
Margin of Error: E = t/2 SE = t/2 s/n
Level of Confidence ( Reliability) : 100(1-)%
Notes: 1. For all n, t/2 > z/2 .
2. For df = , t/2 = z/2 , which are the entries at the bottom of the t –table.
3. For large n (n >30), the normality assumption may be ignored because of the
Central Limit Theorem.
4. The estimate of , X is the mid-point of the CI and the margin of error is
one half the width of the CI.

L
X
U
Thus,
X = (L+U)/2
(10)
and
E = (U – L)/2
Example: In a health study the birth weights of a random sample of 100 newborns
from mothers with a low socioeconomic status in a large US city was recorded. The
sample yielded a mean of 3.21 kg with a standard deviation of 0.71 kg.
(a) Find a 90% confidence interval for the true mean birth weight of newborns
from mothers with a low socioeconomic status.
(b) Interpret the confidence interval.
Solution: Here we wish to estimate
 = mean birth weight of all newborns from mothers with a low socioeconomic
status in this US city.
Given:
n=
x =
[estimate of  ]
s=
[estimate of  ]
Since n > 30, it is not necessary that the population be normal ( due to the CLT).
For a 90% CI, t/2 =
=
, df = n –1 = 99
x  t/2 s/n
=
=
or,
(c) x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
The level of confidence gives the proportion of intervals found this way that
would cover .
(11)
Note: The interpretation of a confidence interval as given in the example above
is the popular interpretation often heard on television or reported in
newspapers. A mathematically precise interpretation of the confidence interval
for this example would be “ Prior to sampling there was a .90 probability that
the confidence interval to be formed would contain the true population mean 
“.
Example: For the data in the example above, find a 95% confidence interval for
the true mean birth weight of newborns from mothers with a low socioeconomic
status.
Solution:
Recall,
n = 100, x = 3.21,
For a 95% CI,
t/2 =
s = 0.71 .
=
,
df = n –1 = 99
x  t/2 s/n
=
=
or,
Interpretation:
x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
(12)
Example: For the data in the example above, find a 99% confidence interval for the
true mean birth weight of newborns from mothers with a low socioeconomic status.
Solution:
Recall,
n = 100, x = 3.21,
For a 99% CI,
t/2 =
s = 0.71 .
=
,
df = n –1 = 99
x  t/2 s/n
=
=
or,
Interpretation:
x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
Question: Considering these three examples, if the level of confidence is
increased and all other things remain the same, the width of the confidence
interval will_______________ .
(13)
Example: A study was conducted to determine the effect of acid rain on the lake
water in an industrial region of the country. The data below gives the pH levels
from a random sample of 10 lakes from this region. ( It was assumed that the
sample came from a normal distribution). Minitab was used to find a 95%
confidence interval for the mean pH level for all lakes in this region.
C1: 6.6
7.1
7.3
6.7
6.8
6.2
6.5
5.9
6.9
6.3
MTB > tint 95 c1
One-Sample T: C1
Variable
C1
N
10
Mean
6.630
StDev
0.424
SE Mean
0.134
(
95.0% CI
6.326,
6.934)
From the Minitab output answer the following:
(a) What is the 95% confidence interval of  ?
(b) What is the estimate of  and the estimated standard deviation of this
estimate?
(c) What is the margin of error E and level of confidence (reliability) for the
estimate of  ?
(14)
Sample Size Determination for Estimating 
Problem: Suppose you wish to estimate a population mean  with a specified
margin of error E and level of confidence. What sample size should be used?
Solution:
We know that
E = t/2 s/n .
Now we solve this equation for n.
E2 =
 nE2 =
 n=
=
[t/2 s/E]2
Of course since we have not sampled yet we do not have values for s or t/2 . In
practice

t/2 is replaced by z/2 and s is replaced by a prior estimate  .

Thus n  [z/2  / E]2 , rounded up to the next whole number.
Example: How large a sample would be required to estimate the mean pH level for
all lakes in the industrial region to within .1 with level of confidence 95%. Assume
that prior estimate for  is 0.424.
(15)
STATISTICAL INFERENCE
Let us begin with a review of some basic definitions.
POPULATION: The set of all measurements or objects of interest in a particular
study.
If the entire population were available for analysis we would know everything about
it. However, in practice one cannot know the entire population because it is either
too expensive, or simply impossible or impractical to examine each member. Thus a
sample from the population is used to obtain information about the population.
SAMPLE: A subset of the population.
The sample picked should be “representative” of the population from which it
comes and should avoid any bias which might skew our view of the population. One
way to achieve this is to use a SIMPLE RANDOM SAMPLE (srs) i.e. a sample
chosen in such a way that each member of the population has an equal chance of
being chosen.
INFERENTIAL STATISTICS: deals with procedures which use the sample to draw
conclusions about the population (from which it was drawn). The procedures of
interest to us are CONFIDENCE INTERVALS and HYPOTHESIS TESTS.
In particular we will be interested in drawing conclusions about certain
characteristics of the population. Such characteristics are known as POPULATION
PARAMETRS. Examples of such characteristics are a POPULATION MEAN
(denoted by the Greek letter  ) and a POPULATION PROPORTION ( denoted by
the letter p).
EXAMPLE: Consider the population of weights ( in kg) of all newborn babies in
Canada for a particular year. In this case, the POPULATION MEAN  is the
average weight of all newborns in the population. An investigator may want to use a
simple random sample of these weights to determine if there is sufficient evidence to
answer questions like:
Is  > 3.2 kg? or Is  < 3.2 kg? or Is   3.2 kg?
EXAMPLE: Consider the population of all lakes in Nova Scotia. A biologist may be
interested in the following POPULATION PROPORTION: p = the proportion of all
lakes in Nova Scotia that are seriously affected by acid rain. She may want to use a
simple random sample of lakes from this population to determine if there is
sufficient evidence to answer questions like:
Is p>.7 ? or, Is p<.7? or, Is p.7 ?
When drawing conclusions about a population using information from a sample it is
important to realize that one can NEVER be absolutely certain the conclusion is
correct. This is because a sample, though it may be “representative” of the
population, only contains part of all the information contained in the population.
(16)
HYPOTHESIS TESTING
Example: A graduate student claims that over 70% of the lakes in Nova Scotia have
been seriously affected by acid rain. To justify this claim she proposes the following
`test`.
“ Choose a simple random sample of 15 lakes in Nova Scotia. If 11 or more of the
sampled lakes are seriously affected by acid rain, the claim is justified.”
Formally, we set up this test as follows.
First notice that the population of interest to this graduate student is the set of all
lakes in Nova Scotia. The parameter of interest in her investigation is
p=the true proportion of all lakes in Nova Scotia affected by the acid rain [p=the
unknown population proportion].
NULL HYPOTHESIS
ALTERNATIVE HYPOTHESIS
What we want to reject.
The viewpoint opposite to Ha
Research Hypothesis.
What we want to prove.
H0 :
Ha :
TEST STATISTIC ( evidence from the sample used to make a decision)
X=
Distribution of X :
Now in conducting this test we should make use of the fact that large values of X
would
be consistent with the
_______________________ hypothesis that p
.7.
How large an X? Let’s pick some number c and decide that if Xc we conclude
that_____________. Thus if X<c we must conclude that___________. The value c is
called a CRITICAL VALUE . In this example, the graduate student has decided to
use c =11. Her method for making a decision can be described as follows.
(17)
REJECTION OR CRITICAL REGION (rule for making a decision)
Now suppose she conducts her study and that she observes that X  11. Then she
would claim to have shown that Ha : p > .7 is true. If you had to use her study to
make a policy decision, the first question you should ask is
“ What is the probability that her claim is wrong? That is, what is the probability of
getting X  11 when in fact H0 : p  .7 is true ?”
Let’s find out by doing the calculations below.
Suppose that H0 : p  .7 is true
p = .5
Probability of a wrong decision
P(Reject H0 / H0 is true)
P(X  11 p =.5) = 1 – P (X  10)
=
=
p=.6
P ( X  11 p =.6) = 1 – P ( X  10)
=
p=.7
=
P(X  11 p = .7) = 1 – P(X  10)
=
=
The error of rejecting H0 when in fact H0 is true is called a TYPE 1 ERROR. Notice
that in this example the largest probability of making a type 1 error is
_____________ and that it occurs when the value of p is _____________ ( that is on
the boundary between H0 and Ha). The largest probability of making a type 1 error
is called the LEVEL OF SIGNIFICANCE or TYPE 1 ERROR RATE of the test and
is denoted by the Greek letter .
(18)
Conversely suppose that the graduate student observed X < 11 (i.e. X10), thus
leading to the claim H0 : p  .7 is true. In this case you should ask
“ What is the probability that her claim is wrong? That is, what is the probability of
getting X < 11 when in fact Ha : p > .7 is true?”
Let’s find out by doing the calculations below.
Suppose that
Probability of a wrong decision
Probability of a correct
Ha: p>.7 is true
P (Accept H0 Ha true)
decision P (Reject H0Ha true)
p=.8
P(X< 11p=.8)
P(X11 p=.8)
=P(X 10)
= 1 – P( X  10)
=
=
=
p=.9
P( X< 11 p =.9)
P(X  11 p =.9)
= P (X  10)
=1- P(X  10)
=
=
=
The error of accepting H0 when in fact Ha is true is called a TYPE II ERROR. For a
particular value of p say p1 in the alternative ( i.e. p1 >.7) the probability of making
a type II error is called the TYPE II ERROR RATE evaluated at p = p1. This
probability is denoted by (p1). Thus,
(p1) = P ( Accept H0  p = p1 in Ha)
Also for a particular value of p say p1 in the alternative (i.e. p1 > .7) we can calculate
the probability of a correct decision ( see the last column of the table above). The
probability of making a correct decision, that is, rejecting H0 when in fact Ha is true
is called the POWER OF THE TEST AGAINST THE ALTERNATIVE p1 in Ha
and is denoted K(p1). Thus
K(p1) = P (Reject H0  p = p1 in Ha)
Notice that K(p1) and (p1) are related by K(p1) = 1 -  (p1). If in fact Ha is true,
power is a measure of a test’s ability to detect this. For example if in fact p were
actually .8(.9), this test will detect this with probability___________(__________).
(19)
A good test that is one in whose results we can be confident of , will be one in which
the probabilities of the type I and type II errors are small.
The ideas discussed above are refer to the ERROR STRUCTURE of a test. A
summary is provided below.
DECISION
Accept H0 ( Do not reject H0)
Reject H0 ( Accept Ha)
ACTUAL SITUATION
H0 is True
Ha is True ( H0 is false)
Correct Decision
Type II Error
Type I Error
Correct Decision
QUESTION: For the student’s test above, state in words the consequence of
making a
(a) Type I Error:
(b) Type II Error:
ERROR RATES AND POWER OF A TEST
TYPE I ERROR
Reject H0 when H0 is true
TYPE II ERROR
Accept H0 when Ha is true
POWER AGAINST the
ALTERNATIVE p1
P(Type I Error) = P (Reject H0 H0 true).
The largest possible probability of a type
I error is denoted by  and is called the
LEVEL OF SIGNIFICANCE or TYPE I
ERROR RATE of the test. In calculating
 = P ( reject H0  H0 true ) , use the
value of p right on the boundary
between H0 and Ha
(p1) = P (Type II Error)
= P ( Accept H0 p = p1 in Ha)
K(p1) = P ( Reject H0  p = p1 in Ha )
= 1 -  (p1)
In the case that Ha is true, power is a
measure of the sensitivity of the test i.e.
the ability of the test to detect that Ha is
true.
(20)
Changing the Rejection Region
Question: If we use the same sample size, how can we modify this test in order to
reduce the type I error rate  ?
Suppose we take c =14, so we reject H0 if X  14. What is  ?
In this case what will happen to the type II error rate (p1) and the power K(p1) ?
NOTE: Ideally, we would like  and (p) to be zero and K(p) to be 1; but for fixed n
decreasing  causes (p) to increase and K(p) to decrease.
NOTE: The only way to decrease both  and (p) is to increase the sample size.
(21)
The P-value
Consider the test: H0: p  .70, Ha: p > .7, n=30; Reject H0 if X  26.
Suppose we conduct the test and observe X to be x0 = 28. According to the rejection
region we would reject H0 . We would in fact have rejected H0 even if our critical
value had been 28. But with a critical value of 28, the type I error rate would be
smaller.
The P-value is the smallest type I error rate at which one can reject H0 on the basis
of the observed outcome x0 . It is obtained by replacing the critical value ‘c’ by x0 in
the calculation of the type I error rate.
P-value = P (X  x0  H0 is true)
For example, consider the cases where x0 is 28 and x0 is 24.
Type I error rate 
P(X26 p =.7)
P-value when x0 = 28
P ( X  28  p =.7)
P-value when x0 = 24
P(X  24  p =.7)
= 1 – P (X  25  p = .7)
= 1 – P (X  27  p =.7)
= 1 – P ( X  23  p =.7)
= 1 - .9698
= 1- .9979
= 1 - .8405
=.0302
=.0021
= .1595
Notice
If x0 is in the rejection region the p-value   .
If x0 is not in the rejection region then the p-value is >  .
Thus it is clear that we can conduct our test at  = .03 without using a rejection
region. We just have to calculate the P-value and use the following rule.
If the P-value   then reject H0 .
If the P-value >  then do not reject H0.
(22)
Summary: Hypothesis Testing
Concept
Left-Tailed Test
Right-Tailed Test
Hypotheses
H0: p  p0 , Ha : p < p0
H0: p  p0 , Ha : p > p0
Critical Region
Reject H0 if X  c
Reject H0 if X  c
Type I Error Rate 
P(Reject H0H0 true)
Type II Error Rate (p1)
P(Accept H0 p =p1 in H a)
Power K(p1)
P(Reject H0 p =p1 in Ha)
P-Value
P(Xc p =p0)
P(X c p = p0)
P(X>c p = p1)
P(x<c p =p1)
P(Xcp=p1)
or, 1-(p1)
P(Xx0 p =p0)
P(Xc p= p1)
or, 1 -(p1)
P(Xx0 p =p0)
P-value Decision Rule
Reject H0 if the P-value  
Note: A similar theory also applies to a Two-tailed test, i.e., a test of
H0: p =p0, Ha: p  p0
While we will conduct such tests in our applications, we will not discuss the theory
here.
(23)
An analogy of statistical hypotheses
In practice we use  = .01 or  = .05. Thus to reject H0 we need strong evidence.
In our judicial system, we use the phrase innocent until proven guilty beyond a
reasonable doubt. We may define null and alternative hypotheses as follows:
H0: defendant is innocent
Ha: defendant is guilty.
To prove defendant is guilty we need strong evidence.
(24)
PROBLEM: Given: Ha : p <.6 , n=30; Reject H0 if X  13
(a) Find the level of significance .
(b) Find (p1) if in fact p1 = .4.
(c) Find the power against the alternative p1 = .4.
(d) Suppose that X is observed to be x0 = 12
(i) What is your decision?
(ii) What type of error are you subject to?
(iii) Find the p-value.
(25)
PROBLEM: Given: Ha : p >.4 , n=20; Reject H0 if X  16.
(a) Find the level of significance .
(b) Find (p1) if in fact p1 = .6.
(c) Find the power against the alternative p1 = .6.
(d) Suppose that X is observed to be x0 = 14
(i) What is your decision?
(ii) What type of error are you subject to?
(iii) Find the p-value.
(26)
CUMULATIVE BINOMIAL PROBABILITIES : P(Xx)
n
15
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
30
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
0.1
.2059
.5490
.8159
.9444
.9873
.9978
.9997
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.1216
.3917
.6769
.8670
.9568
.9887
.9976
.9996
.9999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.0424
.1837
.4114
.6474
.8245
.9628
.9742
.9922
.9980
.9995
.9999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.2
.0352
.1671
.3980
.6482
.8358
.9389
.9819
.9958
.9992
.9999
1.000
1.000
1.000
1.000
1.000
1.000
.0115
.0692
.2061
.4114
.6296
.8042
.9133
.9679
.9900
.9974
.9994
.9999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.0012
.0105
.0442
.1227
.2552
.4275
.6070
.7608
.8713
.9389
.9744
.9905
.9969
.9991
.9998
.9999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.3
.0047
.0353
.1268
.2969
.5155
.7216
.8689
.9500
.9848
.9963
.9993
.9999
1.000
1.000
1.000
1.000
.0008
.0076
.0355
.1071
.2375
.4164
.6080
.7723
.8867
.9520
.9829
.9949
.9987
.9997
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.0000
.0003
.0021
.0093
.0302
.0766
.1595
.2814
.4315
.5888
.7304
.8407
.9155
.9599
.9831
.9936
.9979
.9994
..9998
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.4
.0005
.0052
.0271
.0905
..2173
.4032
.6098
.7869
.9050
.9662
.9907
.9981
.9997
1.000
1.000
1.000
.0000
.0005
.0036
.0160
.0510
.1256
.2500
.4159
.5956
.7553
.8725
.9435
.9790
.9935
.9984
.9997
1.000
1.000
1.000
1.000
1.000
.0000
.0000
.0000
.0003
.0015
.0057
.0172
.0435
.0940
.1763
.2915
.4311
.5785
.7145
.8246
.9029
.9519
.9788
.9917
.9971
.9991
.9998
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.5
.0000
.0005
.0037
.0176
.0592
.1509
.3036
.5000
.6964
.8491
.9408
.9824
.9963
.9995
1.000
1.000
.0000
.0000
.0002
.0013
.0059
.0207
.0577
.1316
.2517
.4119
.5881
.7483
.8684
.9423
.9793
.9941
.9987
.9998
1.000
1.000
1.000
.0000
.0000
.0000
.0000
.0000
.0002
.0007
.0026
.0081
.0214
.0494
.1002
.1808
.2923
.4278
.5722
.7077
.8192
.8998
.9506
.9786
.9919
.9974
.9993
.9998
1.000
1.000
1.000
1.000
(27)
0.6
.0000
.0000
.0003
.0019
.0093
.0338
.0950
.2131
.3902
.5968
.7827
.9095
.9729
.9948
.9995
1.000
.0000
.0000
.0000
.0000
.0003
.0016
.0065
.0210
.0565
.1275
.2447
.4044
.5841
.7500
.8744
.9490
.9840
.9964
.9995
1.000
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0002
.0009
.0029
.0083
.0212
.0481
.0971
.1754
.2855
.4215
.5689
.7085
.8237
.9060
.9565
.9828
.9943
.9985
.9997
1.000
1.000
0.7
.0000
.0000
.0000
.0001
.0007
.0037
.0152
.0500
.1311
.2784
.4845
.7031
.8732
.9647
.9953
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0003
.0013
.0051
.0171
.0480
.1133
.2277
.3920
.5836
.7625
.8929
.9645
.9924
.9992
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0002
.0006
.0021
.0064
.0169
.0401
.0845
.1593
.2696
.4112
.5685
.7186
.8405
.9234
.9698
.9907
.9979
.9997
0.8
.0000
.0000
.0000
.0000
.0000
.0001
.0008
.0042
.0181
.0611
.1642
.3518
.6020
.8329
.9648
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0001
.0006
.0026
.0100
.0321
.0867
.1958
.3704
.5886
.7939
.9308
.9885
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0001
.0002
.0009
.0031
.0095
.0256
.0611
.1287
.2392
.3930
.5725
.7448
.8773
.9558
.9895
0.9
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0003
.0022
.0127
.0556
.1841
.4510
.7941
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0001
.0004
.0024
.0113
.0432
.1330
.3231
.6083
.8784
1.000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0001
.0005
.0020
.0078
.0258
.0732
.1755
.3526
.5886
.8163
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
z Test for a Population Mean
To test the hypothesis H0 :  = 0 based on an SRS of size n from a population with
unknown mean  and known standard deviation  , compute the test statistic
_
z =
( x  0 )
/ n
In terms of a standard normal random variable Z, the P-value for a test of H0
against
Ha :  > 0 is P(Z  z)
Ha :  < 0 is P(Z  z)
Ha :   0 is 2P(Z  z)
These P-values are exact if the population distribution is normal and are
approximately correct for large n in other cases (Page 445-Text Book).
(28)
To illustrate the test we consider the following problem.
Problem 6.45 (Page 455): The Survey of Study Habits and Attitudes (SSHA) is a
psychological test that measures the motivation, attitude toward school, and study
habits of students. Scores range from 0 to 200. The mean score for U.S. college
students is about 115, and the standard deviation is about 30. A teacher who
suspects that older students have better attitudes toward school gives the SSHA to
20 students who are at least 30 years of age. Their mean score is x = 135.2.
(a) Assuming that  = 30 for the population of older students, carry out a test of
H0 :  = 115, Ha :  > 115.
Report the P-value of your test, and state your conclusion clearly.
(b) Your test in (a) required two important assumptions in addition to the
assumption that the value of  is known. What are they? Which of these
assumptions is most important to the validity of your conclusion in (a).
Solution: Given: n=
, x =
,
=
Assume  = .05
Ha :  > 115; therefore this is right sided test.
(a) (i)
(ii) The test statistic in this case is
z=
(iii) p-value = P (Z  3.01 )
=
(iv) Decision:
(29)
(v) Concluding Sentence:
(b) Assumptions: (i)
(ii)
Note: For the above problem, let the alternative hypothesis be as follows:
(i) Ha :  < 145;  = .05. Therefore, this is a left sided test.
(ii) The test statistic in this case is
z=
(iii) p-value = P ( Z  -1.46 ) =
(iv) Decision:
(v) Concluding Sentence:
(30)
Note: Let the alternative hypothesis now be
(i)
Ha :   145;  = .05. Therefore, this is a two sided test.
(ii)
The test statistic is
z =
(iii)
p –value =
=
=
=
(iv) Decision:
(v)Concluding Sentence:
(31)
The One-Sample t Test
Suppose that an SRS of size n is drawn from a population having unknown mean .
To test the hypothesis H0 :  = 0 based on an SRS of size n, compute the one-sample
t statistic
t=
x  0
s/ n
In terms of a random variable T having the t(n-1) distribution, the P-value for a test
of H0 against
Ha :  > 0 is P(T  t)
Ha :  < 0 is P(T  t)
Ha :   0 is 2P(T  t)
These P-values are exact if the population distribution is normal and are
approximately correct for large n in other cases (Page 496-Text Book).
(32)
To illustrate the test we consider the following example.
Example: A random sample of 120 high school graduates were given an IQ test. The
sample mean IQ was 103.21 with a standard deviation of 16.18. Test at  = .10 if
there is sufficient evidence to conclude that the mean of population from which the
sample comes exceeds 100.
Solution: Given: n =120, x = 103.21, s =16.18;  = .10
(i)
Ha:  > 100
(ii)
t=
(iii)
p- value =
(iv) Decision:
(v) Concluding Sentence:
(33)
Example: A psychological test, used to assess an individual’s ability to appraise
other people, was given to a random sample of 12 supervisors in a large corporation.
Their scores are given below.
64
97
73
71
68
74
60
78
60
74
73
75
Is there sufficient evidence at  = .05 to conclude that the mean score for the
population of supervisors is below 75?
Solution: Given: n=
,x =
, s=
(i) Ha :  < 75; therefore, this is a left sided test.
(ii) t =
(iii) p-value =
(iii)Decision:
(iv) Concluding Sentence:
(34)
,  = .05
Example: A manufacturing process is supposed to produce ball bearings for use in
industry with a diameter of 2cm. A random sample of 40 ball bearings was chosen
and their diameters were measured. Mean and standard deviation of this random
sample is given below;
n =40 , x = 1.9991, s = .0089.
Test the hypothesis Ha :   2 at  = .05.
(i) Ha :   2; therefore, this is a two sided test.
(ii) t =
(iii)p-value =
(iv) Decision:
(v) Concluding Sentence:
(35)
NORMAL APPROXIMATION FOR COUNTS AND PROPORTIONS
An srs of size n is drawn from a population having population proportion p of
successes. Let X be the number of successes in the sample and
pˆ 
X
n
is the sample proportion of successes. If n is large; then
X is approximately N (np,
p̂ is approximately N (p,
np(1  p) )
p(1  p)
)
n
Note: As a rule of thumb we will use the above approximation if np10 and
n(1-p)10.
Note: The above result is on Page 376 in the text book.
(36)
Large-Sample Significance test for a Population Proportion
Draw an SRS of size n from a large population with unknown proportion p of
successes. To test the hypothesis H0: p = p0 , compute the z statistic
z=
pˆ  p 0
p 0 (1  p 0 )
n
In terms of a standard normal random variable Z, the approximate P-value for a
test of H0 against
Ha : p > p0 is P(Z  z)
Ha : p < p0 is P(Z  z)
Ha : p  p0 is 2P(Z  z)
In practice we will use this test if np0 > 10 and n(1-p0)>10. This test is given on Page
575 in the Text Book.
(37)
Pr oblem8.20
A matched pairs experiment compares the taste of instant versus
Page585
fresh-brewed coffee. Each subject tastes two unmarked cups of coffee, one of each
type, in random order and states which he or she prefers. Of the 50 subjects who
participate in the study, 19 prefer the instant coffee. Let p be the probability that a
randomly chosen subject prefers freshly brewed coffee to the instant coffee. (In
practical terms, p is the proportion of the population who prefer fresh-brewed
coffee.)
(a) Test the claim that a majority of people prefer the taste of fresh-brewed
coffee. Report the z statistic and its p-value. Is your result significant st the
5% level? What is your practical conclusion?
(b) Find a 90% confidence interval for p.
(38)
Continued:
(39)
100(1-)% CONFIDENCE INTERVAL FOR p
For large n, p̂ is approximately N (p,
(-z/2 <
p(1  p)
). Therefore,
n
pˆ  p
p (1  p )
n
< z/2) = 1- 
A simple mathematical calculation would show that the above equation is equivalent
to the following
P( p̂ -z/2
p(1  p)
< p < p̂ +z/2
n
p(1  p)
) = 1- 
n
The standard deviation of p̂ is given by
 p̂ =
p(1  p)
n
Since, p in practice is unknown; therefore, we replace it by its estimate p̂ and
define standard error of sample proportion as follows:
SE p̂ =
pˆ (1  pˆ )
n
An approximate 100(1-)% Confidence Interval for p is given by
p̂  z/2 SE p̂
Note: In practice we will use this formula if both n p̂ 10 and
n(1- p̂ )10.
(40)
Related documents