Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
1
Basic Ideas for the CI and Testing hypothesis:
Suppose that x1, x2, · · · , xn is an SRS from a normal
distribution N (µ1, σ1). Let x̄ be the sample mean and
s1 be the sample standard deviation.
(1) According to central limit theorem, we have
x̄ − µ1
√ ∼ N (0, 1).
σ1 / n
(2) In an advanced probability course we can prove
that
x̄ − µ1
√ ∼ t(n − 1).
s1/ n
Suppose that y1, y2, · · · , ym is an SRS from a normal
distribution N (µ2, σ2). Let ȳ be the sample mean and
s2 be the sample standard deviation.
(3) According to central limit theorem, we have
(x̄ − ȳ) − (µ1 − µ2)
p
∼ N (0, 1).
2
2
σ1 /n + σ2 /m
(4) In an advanced probability course we can prove
that
(x̄ − ȳ) − (µ1 − µ2)
p
∼ t((n − 1) ∧ (m − 1)).
s21/n + s22/m
(5) An random experiment which has two possible outcomes:success(S) and failure(F) and P(S) = p, P(F ) =
1 − p, is called Bernoulli trial.
Let ξ be the number of successes observed. Then, ξ
has binomial distribution b(p, n). Thus, ξ has mean np
2
p
and standard deviation np(1 − p). According to CLT,
if np ≥ 10 and n(1 − p) ≥ 10, we have
p
ξ − np
np(1 − p)
∼ N (0, 1)
(6) Suppose that pˆ1 is the number of successes observed in a sequence of n1 identical Bernoulli trials
with unknown p1. Suppose that pˆ2 is the number of
successes observed in a sequence of another n2 identical Bernoulli trials with unknown p2. Define
number of successes in both samples combined
.
p̂ =
number of individuals in both samples combined
Then,
pˆ1 − pˆ2
r
³
´ ∼ N (0, 1)
p̂(1 − p̂) n11 + n12
Chapter Eighteen: Population mean inference with
unknown σ
CI for the mean of a normal population:
Draw an SRS of size n from a normal population having unknown mean µ and unknown standard deviation
σ. A level C confidence interval for µ is
s
s
x̄ + t∗ √ ],
[x̄ − t∗ √ ,
n
n
where t∗ is the level C critical point from the t distribution t(n − 1), i.e.
P(|t| ≤ t∗) = P({−t∗ ≤ t ≤ t∗}) = C.
3
For example, if C = 95% and n = 3, then t∗ = 4.303 from
the Table C. In above CI, t∗ √σn is called the margin of
error with C confidence.
Example: A random sample of 10 high school students gains an average of x̄ = 22 points in their second attempt at the SAT mathematical exam. The
change in score has a normal distribution with unknown standard deviation. The sample standard deviation is s = 20.
(1) Find the 95% CI for µ.
(2) Find the margin of error for 99% confidence.
Solution:
(1) The 95% CI for µ is
s
s
20
20
[x̄ − t∗ √ , x̄ + t∗ √ ] = [22 − 2.262 √ , 22 + 2.262 √ ]
n
n
10
10
= [7.693855, 36.3061]
(2) The the margin of error for 99% confidence is
s
20
t∗ √ = 3.25 √ = 20.5548.
n
10
Test procedure:
Let x1, x2, · · · , xn be a SRS of size n from a normal distribution with unknown mean µ and unknown standard deviation σ . Define
x̄ − µ0
t =: √ ∼ t(n − 1).
s/ n
(a) To test H0 : µ = µ0 versus Ha : µ > µ0 at the α level
of significance, reject H0 if t ≥ tα , where P(t ≥ tα ) = α.
4
(b) To test H0 : µ = µ0 versus Ha : µ < µ0 at the α level of
significance, reject H0 if t ≤ −tα , where P(t ≤ −tα ) = α.
(c) To test H0 : µ = µ0 versus Ha : µ 6= µ0 at the α level
of significance, reject H0 if t ≥ tα/2 or t ≤ −tα/2, where
P(|t| ≥ tα/2) = α.
The p-value is the smallest α at which we can reject
H0. More precisely, we have
p-value = P(t ≥ t0) + P(t ≤ −t0) for two sided test
p-value = P(t ≥ t0) for one sided test Ha : µ > µ0
p-value = P(t ≤ −t0) for one sided test Ha : µ < µ0
where t ∼ t(n − 1) and the t0 is the observed value of
the test statistic.
Example: By past experience, we know that the daily
yield of a chemical manufactured in a chemical plant
has N (µ, σ) with unknown mean and standard deviation. The 20 day observed sample mean of the daily
yields is x̄ = 871 tons and the sample standard deviation s = 21. Test the hypothesis that the average daily
yield of the chemical is µ = 880 tons per day against
the alternative µ 6= 880 using α = 0.05.
(1) H0 : µ = 880
Ha : µ 6= 880
(2) Test statistic:
t :=
x̄ − 880
√ ∼ t(19)
21/ 20
(3) significance level α = 0.05.
5
(4) If H0 is true, then
P(−2.093 ≤
X̄ − 880
√ ≤ 2.093) = 0.95
21/ 20
Thus, the rejection region is R = (−∞, −2.093)∪(2.093, ∞).
Since
871 − 880
√
= −1.9166 ∈
/ R,
21/ 20
we do not reject the null hypothesis µ = 880 tons.
Chapter Nineteen: Two-sample problems: Comparing two population means:
CI for Difference of Two population Means When two
independent SRSs of size n1 and size n2 observations
were selected from two different normal populations
with unknown means µ1 and µ2 and unknown variances σ12 and σ22, respectively, a level C CI for (µ1 − µ2)
is given by
s
s
¸
·
s21 s22
s21 s22
,
+ , x̄1 − x̄2 + tα/2
+
x̄1 − x̄2 − tα/2
n1 n2
n1 n2
where x̄i and s2i , i = 1, 2, are the sample mean and the
sample variance respectively from the ith population,
and tα/2 is the t critical point with degree equal to the
smaller of n1 − 1 and n2 − 1 such that
P(|t| ≥ tα/2) = α.
s
s21 s22
SE :=
+
n1 n2
6
is called the standard error.
Example: A small amount of the trace element selenium, 50 − 200 micrograms (µg) per day, is considered
essential to good health. Suppose that independent
random samples of n1 = n2 = 30 adults were selected
from two regions of the United States and that a day’s
intake of selenium, from both liquids and solids, was
recorded for each person. The mean and standard deviation of the selenium daily intakes for the 30 adults
from region 1 were x̄1 = 167.1 and s1 = 24.3µg, respectively. The corresponding statistics for the 30 adults
from region 2 were x̄2 = 140.9 and s2 = 17.6. Suppose
that the population for the selenium daily intakes in
each region has a normal distribution. Find a 95% CI
for the difference in the mean selenium intakes for the
two regions. Interpret this interval.
Solution: From the information, n1 = n2 = 30, x̄1 =
167.1, x̄2 = 140.9, s1 = 24.3, s2 = 17.6. Thus, the 95% CI
for µ1 − µ2 is approximately
·
¸
q 2
q 2
s1
s22
s1
s22
x̄1 − x̄2 − tα/2 n1 + n2 , x̄1 − x̄2 + tα/2 n1 + n2
= [14.99752,
37.40248],
where tα/2 = 2.045 with degree equal to 29. In repeated
sampling, 95% of all of intervals constructed in this
manner will enclose µ1 − µ2. We are fairly certain that
this particular interval encloses µ1 − µ2.
Test Hypothesis for Difference of Two Population Means
When two independent SRSs of size n1 and size n2
7
observations were selected from two different normal
populations with unknown means µ1 and µ2 and unknown variances σ12 and σ22, respectively, we have following test procedure: Let
(x̄1 − x̄2) − D0
t= q 2
∼ t((n1 − 1) ∧ (n2 − 1))
s22
s1
n1 + n2
where x̄1 and x̄2 are the observed sample means and
s1 and s2 are the observed sample standard deviations
from two populations, respectively.
(a) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 > D0
at the α level of significance, reject H0 if t ≥ tα , where
P(t ≥ tα ) = α.
(b) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 < D0 at
the α level of significance, reject H0 if t ≤ −tα , where
P(t ≤ −tα ) = α.
(c) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 6= D0 at
the α level of significance, reject H0 if either t ≥ tα/2 or
t ≤ −tα/2, where P(|t| ≥ tα/2) = α.
Example: Allstate and Roadkill specialize in writing
insurance policies for high-risk drivers. Last year, Allstate processed 100 claims. Settlements averaged $2000
and had a sample standard deviation of $600. A smaller
firm, Roadkill resolved only 50 claims, but the payouts averaged $2500 with a sample standard deviation
of $700. Suppose that claims for each company have a
normal distribution. Can we conclude from last year’s
experience that the average awards paid by two companies tend not to be the same? Set up and carry out
8
an appropriate analysis.
Solution: Suppose that µ1 is the true mean of payouts
in Allstate and µ2 is the true mean of payouts in Roadkill. Let x̄1 represent the sample mean of payouts in
Allstate and x̄2 represent the sample mean of payouts
in Roadkill. (1). Let α = 0.1.
(2). Set H0 : µ1 − µ2 =: D0 = 0 versus H1 : µ1 6= µ2, where
n = 100 and m = 50.
(3).
t=
=
(x̄1 −x̄2 )−D0
r
s21 s22
n1 + n2
(2000−2500)
q
(600)2 (700)2
100 + 50
= −4.3193
Since the p-value P(|t| > 4.319) ≤ 0.1%, we reject H0.
This means that we may conclude from last year’s experience that the average awards paid by two companies tend not to be the same.