Download 제 1장 통계학이란 무엇인가? Page 2 – Page 15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bad Pharma wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Effect size wikipedia , lookup

Transcript
Comparing Two
Population Means
1
Overview
Comparing
Means
Two sample
problem
Comparing
Paired
samples
σ1 and σ2
known
Independent
sample
σ1 and σ2
unknown
Proportions
Comparing
Variances
s12
m1
Sample 1
x11, x12, ,,,, x1n1
s 22
m2
Sample 2
x21, x22, ,,,, x2n2
2
Analysis of Paired Samples
3
Paired Samples




The observations on the two populations are paired. (ex. repeated
measures, before and after treatment)
Each pair of observations, (X1j, X2j), are taken under
homogeneous conditions, but these conditions may change from
one pair to another.
Use difference between paired values D= X1-X2
Advantage: Eliminating variation in a factor other than the
difference between the two populations.
4
Example (Montgomery pp. 219) We are interested in comparing two
different types of tips for a hardness testing machine. This
machine presses the tip into a metal specimen with a known force.
By measuring the depth of the depression caused by the tip, the
hardness of the specimen can be determined. Several specimens
were selected at random and half tested with tip 1, half tested with
tip 2 and the independent t-test was applied.
Problem of this procedure
The metal specimens might not be homogeneous in some way that
might affect hardness (e.g. produced in different heats)
=> The observed difference between mean hardness readings for the
two tip types includes hardness difference between specimens.
Solution
Make two hardness readings on each specimen, one with each tip
=> Paired Sample
5
Analysis

Let (X11, X21), (X12, X22),…,(X1n, X2n) be a set of n paired observations
of (X1, X2), where
E[X1]= m1 , Var(X1)= s12 and E[X2]= m2 , Var(X2)= s22

Define Dj=X1j - X2j ( j =1,2,…,n)
=> Reducing the problem as a one sample problem

Assumption: Both X1 and X2 are normally distributed. Then,
Dj ~ N(mD
,
sD2)

Point estimator of mD = m1-m2:

Point estimator of sD2:

Distribution :
t-dist. with
d.f.= n-1
6
Confidence Intervals

100(1-a)% confidence interval for m1-m2

100(1-a)% upper confidence bound for m1-m2

100(1-a)% lower confidence bound for m1-m2
7
Example (Montgomery pp222) The journal Human Factors (1962, pp.375380) reports a study in which n=14 subjects were asked to parallel park
two cars having very different wheel bases and turning radii. The time in
seconds for each subject was recorded and is given below. Find the 90%
CI for m1-m2 assuming the normality .
8
90% CI for m1-m2 is found as follows
Note that thus CI includes zero. Thus, at the 90% level of
confidence the data do not support the claim that the two
cars have different mean parking times.
9
Hypothesis Testing for paired data

Test statistics for testing H0: m1-m2 =△0

Alternative hypothesis H1: m1-m2≠△0



Rejection Region: T0 > ta/2, n-1 or T0 < - t a/2, n-1

P-value: 2 P( T0 > | t0 | )
Alternative hypothesis H1: m1-m2>△0

Rejection Region: : T0 > ta, n-1

P-value: P( T0 > t0 )
Alternative hypothesis H1: m1-m2<△0

Rejection Region: : T0 < - ta, n-1

P-value: P( T0 < t0 )
10
Example (Hayter p386) A new drug for inducing a temporary
reduction in a patient’s heart rate is to be compared with a
standard drug. A paired experiment is run whereby each of 40
patients is administered one drug on one day and the other drug
on the following day. The spacing of the two experiments over
two days ensures that there’s no “carryover” effect since the
drugs are only temporary effective. Nevertheless, the order in
which the two drugs are administered is decided in a random
manner so that one patient may have the standard drug followed
by the new drug and another patient may have the new drug
followed by the standard drug. To compare the effects of two
drugs, the percentage heart rate reductions for the standard drug
xi and the new drug yi was recorded for the 40 subjects.
11
From the result of the experiment (Figure 9.13), we have
To compare the effects, we perform a hypothesis Testing at a=0.01
We can reject H0 at a=0.01 . That is , there is evidence that the
new drug has a different effect from the standard drug.
12
Example (Montgomery Page 224) The Federal Aviation Administration
requires material used to make evacuation systems retain their strength
over the life of the aircraft. In an accelerated life test, the principal
material, polymer coated nylon weave, is aged by exposing it to 1580F for
168 hours. The tensile strength of the specimens of this material is
measured before and after the aging process. The following data (in psi)
are recorded.
13
A.
B.
Is there evidence to support the claim that the nylon weave
tensile strength is the same before and after the aging process?
Use a significance level of 0.01.
Calculate the P-value for this test.
14
C.
Find a 99% CI on the mean difference in tensile strength and
use it to answer the question from part A.
15
Inference on the Means of Two
Independent Populations, Variance
Known
16
Assumptions





X11, X12, …, X1n1 is a random sample of size n1 from population 1
X21, X22, …, X2n2 is a random sample of size n2 from population 2
The two populations are independent .
Variances of twp populations are known.
Both populations are normal, or if they are not normal, the
conditions of the central limit theorem apply
Notations
17
Point Estimator

Point estimator of m1-m2:

Standard Error of

Distribution
18
Confidence Interval

100(1-a)% confidence interval for m1-m2

100(1-a)% upper confidence bound for m1-m2

100(1-a)% lower confidence bound for m1-m2
19
Example (Montgomery pp204) Tensile strength tests were performed on two
different grades of aluminum spars used in manufacturing the wing of a
commercial transport aircraft. From past experience with the spar
manufacturing process and the testing procedure, the standard
deviations of tensile strengths are assumed to be known. The data
obtained is given below. If m1 and m2 denote the true mean tensile
strengths for the two grades of spars, find a 90% CI on the difference in
mean strength m1-m2
90% CI: (12.22, 13.98)
20
Example (Montgomery pp 205) Two machines are used to fill plastic
bottles with dishwashing detergent. The standard deviations of fill
volume are known to be s1= 0.10 and s2 = 0.15 fluid ounces for
the two machines, respectively. Two random samples of n1=12
bottles from machine 1 and n2 =10 bottles from machine 2 are
selected, and the sample mean fill volumes are
=30.61 and
=30.34 fluid ounces. Assume normality.
A.
Construct a 90% two-sided CI on the mean difference in fill
volume. Interpret this interval.
21
B.
Construct a 95% two-sided CI on the mean difference in fill
volume. Compare and comment on the width of this interval to
the width of the interval in part A.
C.
Construct a 95% upper confidence bound on the mean
difference in fill volume. Interpret this interval.
22
Hypothesis Testing




Test statistics for testing H0: m1-m2=△0
Alternative hypothesis H1: m1-m2≠△0

Rejection Region: : Z0 > za/2 OR Z0 < - z a/2

P-value: 2 P( Z0 > | z0 | )
Alternative hypothesis H1: m1-m2>△0

Rejection Region: : Z0 > za

P-value: P( Z0 > z0 )
Alternative hypothesis H1: m1-m2<△0

Rejection Region: : Z0 < - za

P-value: P( Z0 < z0 )
23
Example (Montgomery pp201) A product developer is interested in
reducing the drying time of a primer paint. Two formulations of
the paint are tested: formulation 1 is the standard chemistry and
formulation 2 has a new drying ingredient that should reduce the
drying time. From experience, it is known that the standard
deviation of drying time is 8 minutes and this inherent variability
should be unaffected by the addition of the new ingredient. Ten
specimens are painted with formulation 1 and another 10
specimens are painted with formulation 2; the 20 specimens are
painted in random order.
The two sample average drying times are
=121 minutes and
=112 minutes, respectively. What conclusions can the product
developer draw about the effectiveness of the new ingredient,
using a= 0.05 ?
24
Example (Montgomery pp205) Two types of plastic are suitable for
use by an electronics component manufacturer. The breaking
strength of this plastic is important. It is known that s1= s2 = 1.0
psi. From a random sample of size n1=10 and n2 =12, we obtain
=162.7 and
=155.4. The company will not adopt plastic 1
unless its mean breaking strength exceeds that of plastic 2 by at
least 10 psi. Based on the sample information, should they use
plastic 1? Use a=0.05 in reaching a decision.
25
Inference on the Means of Two
Populations, Variance Unknown
26
Assumptions





X11, X12, …, X1n1 is a random sample of size n1 from population 1
X21, X22, …, X2n2 is a random sample of size n2 from population 2
The two populations are independent .
Variances of twp populations are unknown.
Both populations are normal, or if they are not normal, the
conditions of the central limit theorem apply
27
Case 1: The variances are assumed equal
(s12 = s22 = s2)
Combine the two sample variance S12 and S22 to form an estimator
of s2

: pooled variance estimator

The test statistic is
28
Confidence Interval

100(1-a)% confidence interval for m1-m2

100(1-a)% upper confidence bound for m1-m2

100(1-a)% lower confidence bound for m1-m2
29
Example (Montgomery pp215) An article in the journal Hazardous
Waste and Hazardous Materials (Vol.6, 1989) reported the results
of an analysis of the weight of calcium in standard cement and
cement doped with lead. Reduced levels of calcium would indicate
that the hydration mechanism in the cement is blocked and would
allow water to attack various location in the cement structure. Ten
samples of standard cement had an average weight percent
calcium of
=90.0 with a sample standard deviation of s1=5.0
and 15 samples of the lead-doped cement had an average weight
percent calcium of
=87.0 with a sample standard deviation of
s2=4.0 . We will assume that weight percent calcium is normally
distributed and find a 95% CI on the difference in means, m1-m2, for
the two types of cement. Assume equal variance for the two
populations.
30
31
Hypothesis Testing




Test statistics for testing H0: m1-m2=△0
Alternative hypothesis H1: m1-m2≠△0

Rejection Region: T0 > ta/2, n1+n2-2 OR T0 < - t a/2, n1+n2-2

P-value: 2 P( T0 > | t0 | )
Alternative hypothesis H1: m1-m2>△0

Rejection Region: : T0 > ta,n1+n2-2

P-value: P( T0 > t0 )
Alternative hypothesis H1: m1-m2<△0

Rejection Region: : T0 < - ta, n1+n2-2

P-value: P( T0 < t0 )
32
Example (Montgomery, pp208) Two catalysts are being analyzed to
determine how they affect the mean yield of a chemical process.
Specifically, catalyst 1 is currently in use but catalyst 2 is acceptable.
Because catalyst 2 is cheaper, it should be adopted, providing it does
not change the process yield. A test is run in the pilot plant and results in
the data shown in the following table. Is there any difference between the
mean yields? Use a=0.05 and assume equal variances.
33
Example (Montgomery pp217) A consumer organization collected data on
two types of automobile batteries, A and B. The summary statistics for 12
observations of each type are
=36.51,
=34.21, sA=1.43 and sB
=0.93. Assume that the data are normally distributed with sA= sB
A.
Is there evidence to support the claim that type A battery mean life
exceeds that of type B? Use a significance level of 0.01 in answering
this question.
34
B.
C.
Calculate the P-value for this test.
Construct a one-sided 99% confidence bound for the difference in
mean battery life. Explain how this interval confirms your finding in part
A.
35
Case 2: s12 ≠ s22

The test statistic
is distributed approximately as t with degrees of freedom given by
36
Confidence Interval

100(1-a)% confidence interval for m1-m2

100(1-a)% upper confidence bound for m1-m2

100(1-a)% lower confidence bound for m1-m2
37
Hypothesis Testing

Test statistics for testing H0: m1-m2=△0

Alternative hypothesis H1: m1-m2≠△0



Rejection Region: T0 > ta/2, n OR T0 < - t a/2, n

P-value: 2 P( T0 > | t0 | )
Alternative hypothesis H1: m1-m2>△0

Rejection Region: : T0 > ta,v

P-value: P( T0 > t0 )
Alternative hypothesis H1: m1-m2<△0

Rejection Region: : T0 < - ta, v

P-value: P( T0 < t0 )
38
Example (Montgomery pp218) Two suppliers manufacture a plastic gear
used in a laser printer. The impact strength of these gears measured in
foot-pounds is an important characteristic. A random sample of 10 gears
from supplier 1 results in
=289.30 and s1=22.5 and another random
sample of 16 gears from the second supplier results in
= 321.50 and
s2=21.
A.
Is there evidence to support the claim that supplier 2 provides
gears with higher mean impact strength? Use a=0.05 and
assume that both populations are normally distributed but the
variances are not equal.
39
B.
C.
Calculate the P-value for this test.
Do the data support the claim that the mean impact strength of gears
from supplier 2 is at least 25 foot pounds higher than that of supplier 1?
Make the same assumptions as in part A.
40
C.
Do the data support the claim that the mean impact strength of gears
from supplier 2 is at least 25 foot pounds higher than that of supplier 1?
Make the same assumptions as in part A.
41