Download Sample Location Problems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Sample Location Problems
Ba Chu
E-mail: ba [email protected]
Web: http://www.carleton.ca/∼bchu
(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for
details. Examples will be given and explained in the class.)
1
Objectives
In the last lecture, I have introduced the methods for drawing statistical inferences about a single
population mean. These methods were explicitly based on the assumption that the sample was
large enough to justify the use of the CLT for normal approximations.
However, when the sample size is not large enough or the data are not symmetrically distributed,
statisticians are interested in not only the population mean but also other measures of centrality
such as the population median. For this reason, the purpose of this lecture is to introduce methods
for drawing inferences about the population medians, which we call sample location problems.
Specifically, I will explicate (1) 1-sample location problem and (2) 2-sample location problem.
2
2.1
1-Sample Location Problem
Motivating example
It has been decided that the grades of ECON 4002 have the population median of 85. To determine
if this is true, a sample of 40 students is drawn, and the grade of each student is recorded. Let
Xi denote the grade of student i. Then X1 , . . . , X40 are a sample drawn from an asymmetric
1
distribution, and we are interested in testing whether or not the population median is equal to 85,
i.e., H0 : q2 (X) = 85 vs. H1 : q2 (X) 6= 85.
Before studying general cases, we start with a simple case when the data are assumed to be
normally distributed. In this case, the population mean and the population median are the same.
2.2
Hypothesis testing with normally distributed data
Suppose Xi ∼ N (µ, σ 2 ), where σ 2 may be unknown. We are interested in these hypotheses H0 : µ =
µ0 and H1 : µ 6= µ0 . The intuition underlying testing H0 vs. H1 was discussed in the last lecture:
• If H0 is true, the we would expect the sample mean to be close to the population mean µ0 .
• Hence, if the sample average xn is far from µ0 , then we are inclined to reject H0 .
More precisely, we reject H0 if and only if the significance prob. p = PH0 (|X n −µ0 | ≥ |xn −µ0 |) ≤ α.,
where α is any real number in (0, 1) chosen by the statistician. Note that α is usually chosen to be
0.05. We have learnt to approximate p by using the CLT when we did not know the distribution of
Xi , but for large sample sizes. However, if we know that X1 , . . . , Xn are normally distributed, then
it turns out that we can compute p exactly, even when n is small.
The population variance σ 2 is known
2.2.1
2
Under the null hypothesis that µ = µ0 , we have Xi ∼ N (µ0 , σ 2 ), and X n ∼ N (µ0 , σn ). Hence,
Z =
X n −µ
√ 0
σ/ n
∼ N (0, 1). The observed value of Z is z =
xn −µ
√ 0.
σ/ n
The significance prob. is p =
PH0 (|Z| ≥ |z|) = 2(1 − Φ(|z|)). The test that rejects H0 is p ≤ α is called the 1-sample test.
2.2.2
The population variance σ 2 is unknown
We can replace σ 2 with an estimator, say Sn =
statistic Tn =
let tn =
X n −µ
√0.
Sn / n
xn −µ
√0 ,
Sn / n
1
n−1
Pn
1 (Xi
− X n )2 . In this case, we obtain the test
It has been proved that, under the null hypothesis µ = µ0 , Tn ∼ t(n−1). Now
the significance prob. is p = PH0 (|Tn | ≥ |tn |) = 2PH0 (Tn ≥ |tn |) = 2(1 − Ft (|tn |)),
where Ft is the cdf of the t random variable.
Example 1. Suppose that, to test H0 :
µ = 0 vs. H1 :
µ 6= 0 (a 2-sided alternative), we
draw a sample of size n = 25 and observe x = 1 and s = 3. Then t = 5/3 and the 2-tailed
2
significance prob. is computed using both tails of the t(24) distribution (following this link: http:
// www. danielsoper. com/ statcalc/ calc08. aspx ), i.e., p = 0.1086
Example 2. Suppose that, to test H0 : µ ≤ 0 vs. H1 : µ > 0 (a 1-sided alternative), we draw a
sample of size n = 25 and observe xn = 2 and s = 5. Then t = 2 and the 1-tailed significance prob.
is computed using one tail of t(24) distribution, i.e., p = 0.0285.
2.3
Hypothesis testing with asymmetrically distributed data
Now we assume that Xi has an asymmetric probability distribution. In this case, we shall focus on
studying the population median q2 (X). The population median, denoted by θ, is a robust measure
of centrality vis-a-vis the population mean because the former is always exists and is not sensitive
to the influence of outliers.
Consider a 2-sided alternative, H0 : θ = θ0 vs. H1 : θ 6= θ0 . We will explain a testing procedure,
namely the sign test.
The intuition underlying the sign test is pretty straight-forward. Under H0 , the population
median is θ = θ0 , then when we sample from P , we should observe roughly half the xi above θ0 and
half the xi below θ0 . Hence, if we observe proportions of xi above/below θ0 that are very different
from one half, then we are inclined to reject H0 . More formally, we let p+ = PH0 (Xi > θ0 ) and
p− = PH0 (Xi < θ0 ). Therefore, under H0 , p+ = p− = 0.5. The sign test is implemented as follows:
1. Let Y denote the number of Xi greater than θ0 (in math. notation, Y = #(Xi − θ0 > 0)).
Under H0 , Y ∼ Binomial(n, 0.5). The observed value of Y is y = #(xi − θ0 > 0).
2. The significance prob. of the test statistic Y − E[Y ], where E[Y ] = n2 , is p = PH0 (|Y − n2 | >
|y − n2 |). We reject H0 if p ≤ α.
3. The significance prob. is computed as p = 2FBin (c), where FBin is the cdf of the binomial
random variable and c = min(y, n − y).
Example 3. Now suppose that we want to test H0 : θ = 100 vs. H0 : θ 6= 100 at the significance
level α = 0.05, having observed the sample {98.73, 97.17, 100.17, 101.26, 94.47, 96.39, 99.67, 97.77, 97.46
3
, 97.41}. Here, n = 10, y = #(xi > 100), and c = min(2, 10 − 2) = 2, so p = 0.109375 > 0.05 (for
a Binomial prob. calculator, follow the link: http: // joemath. com/ binomial ). We accept H0 .
3
2-Sample Location Problem
The section is concerned with comparing two populations with respect to some measure of centrality,
typically the population mean or the population median. We make the following assumptions:
• There are two mutually independent samples, say X1 , . . . , Xn1 ∼ P1 and Y1 , . . . , Yn2 ∼ P2 ,
where P1 and P2 are continuous probability distributions.
• P1 has location parameter, θ1 , and P2 has location parameter, θ2 . We shall compare population
means, θ1 = E[Xi ] and θ2 = E[Yi ], or population medians, θ1 = E[Xi ] and θ2 = E[Yi ]. The
shift parameter ∆ = θ1 − θ2 measures the difference in population location.
• We observe random samples {x1 , . . . , xn1 } and {y1 , . . . , yn2 } from which we attempt to draw
inferences about ∆. Notice that we do not assume that n1 = n2 .
We begin this section by considering an example.
3.1
Motivating example
Roundabouts (or traffic circles) are supposed to increase traffic flow. To determine if they do,
n1 + n2 drivers are recruited to participate in a hypothetical double-blind study. The drivers are
randomly assigned to a group of n1 drivers who drive from a location, A, to another location ,B, via
a roundabout and a group of n2 drivers who drive from A to B via a cross junction with STOP signs.
Assuming that those two groups of drivers face similar traffic conditions and drive the same number
of miles. All the drivers move off at the same time, each driver’s arrival time at B is recorded.
Neither the driver nor the traffic controller knows to which group the driver was assigned.
For this experiment, let Xi denote the driving time (in minutes) of driver i in the first group,
and let Yj denote the driving time of driver j in the second group. Then (X1 , . . . , Xn1 ) ∼ P1 ,
(Y1 , . . . , Yn2 ) ∼ P2 , and we are interested in drawing inferences about ∆ = θ1 − θ2 . Note that ∆ > 0
4
if the driving time is greater for the first group than for the second group. Thus, to produce a
compelling evidence of the roundabouts’ efficiency, we may test H0 : ∆ < 0 vs. H1 : ∆ > 0.
3.2
The Normal 2-sample location problem
We assume that P1 = N (µ1 , σ12 ) and P2 = N (µ2 , σ22 ). Let ∆ = µ1 − µ2 and ∆ = X n1 − Y n2 .
Recalling that hypothesis testing for a single population mean were based on knowing the distribution of the standardized natural estimator. In summary, for σ 2 known, we learnt that


X −µ  ∼
Z=p
σ 2 /n 
 ∼
N (0, 1) if X1 , . . . , Xn ∼ N (µ0 , σ 2 ),
N (0, 1) if n large.
For σ 2 unknown, we learnt that


X −µ  ∼
Z=p
S 2 /n 
 ∼
t(n − 1) if X1 , . . . , Xn ∼ N (µ0 , σ 2 ),
N (0, 1) if n large.
The logic for drawing inferences about two population means is identical to the logic for drawing
inferences about one population mean – we can base our inference about ∆ on the distribution of
∆−∆
.
standard error
σ2
σ2
Because Xi ∼ N (µ1 , σ12 ) and Yj ∼ N (µ2 , σ22 ), we have: X ∼ N (µ1 , n11 ) and Y ∼ N (µ2 , n22 ). It
σ2
follows that ∆ = X − Y ∼ N (µ1 − µ2 , n11 +
| {z }
σ22
).
n2
∆
3.2.1
The population variances are known
If ∆ = ∆0 , then Z =
for ∆0 , (∆ − q0.975
q
r∆−∆0
2
σ1
σ2
+ n2
n1
2
σ12
n1
+
σ22
n2
∼ N (0, 1). We can immediately construct the 95% confidence interval
< ∆ < ∆ − q0.975
q
σ12
n1
+
σ22
),
n2
where q0.975 = 1.96 is the 975% quantile
of the standard normal distribution.
Example 4. For the first population, we know that σ1 = 5 and that we observe a sample of size
n1 = 60 with sample average, x = 7.6. For the second population, we know that σ2 = 2.5 and
5
that we observe a sample of size n2 = 15 with sample average, y = 5.2. Then, the 0.95 confidence
interval for ∆0 is (0.61, 4.21).
To test H0 : ∆ = ∆0 vs. H1 : ∆ 6= ∆0 , we use the fact that Z ∼ N (0, 1) under H0 . Let z denote
the observed value of Z. We reject H0 if the significance prob. p = PH0 (|Z| > |z|) = 2(1 − Φ(|z|))
is less or equal to α (e.g., α = 0.05).
Example 5. Continued from Example (4), to test H0 : ∆ = 0 vs. H1 : ∆ 6= 0, we compute
z = √(7.6−5.2)−0
2
2
5 /60+2.5 /15
3.2.2
= 2.629. Since p = P (|Z| ≥ 2.629) = 0.00856 < 0.05, we reject H0 .
The population variances are unknown
When σ12 and σ22 are unknown, we replace them with their estimators, S12 and S22 respectively. We
S2
now rely on T =
r∆−∆0
2
S1
S2
+ n2
n1
2
is given by (∆ − qt
q
S12
n1
+
∼ t(b
ν ), where νb =
S22
n2
< ∆ < ∆ + qt
q
S2
( n1 + n2 )2
1
2
2 /n )2
2 /n )2
(S2
(S1
2
1
+
n1 −1
n2 −1
S12
n1
+
S22
),
n2
. The 0.95 confidence interval for ∆0
where qt,νb is the 0.975 quantile of the t(b
ν)
distribution.
Example 6. Continued from Example (4), we estimate the unknown population variances separately, σ12 by s21 = 52 and σ22 = 2.52 . We can compute the estimated degree of freedom νb = 45.26.
The 0.95 confidence interval for ∆ with qt,45.26 = 2.015 is given by (0.56, 4.24).
To test H0 : ∆ = ∆0 vs. H1 : ∆ 6= ∆0 , we use T ∼ t(b
ν ). Let t denote the observed value of
T , we reject H0 if and only if p = PH0 (|T | ≥ |t|) = 2(1 − Ft,νb(|t|)), where Ft,νb is the cdf of the t
distribution with νb degree of freedom., is less or equal to α.
Example 7. Continued form Example (4), to test H0 : ∆ = 0 vs. H1 : ∆ 6= 0, we compute
t = √(7.6−5.2)−0
2
2
5 /60+2.5 /15
= 2.629. The significance prob. is p = PH0 (|T | > 2.629) = 0.011655 < 0.05, thus
we reject H0 at the significance level α = 0.05.
4
Exercises
1. The intelligence quotients (IQs) of 16 students from one area of a city showed a mean of 107
and a standard deviation of 10, while the IQs of 14 students from another area of the city
6
showed a mean of 112 and a standard deviation of 8. Is there a significant difference between
the IQs of the two groups at significance levels (α) of (a) 0.01 and (b) 0.05?
2. If a variable X has a t distribution with ν = 10, find (a) P (X > 1.25), (b) P (−1.25 <
X < 1.25), (c) P (X < 1.30), and (d) P (X ≥ 2.10). (Hint: use the prob. calculator:
http://www.danielsoper.com/statcalc/calc08.aspx.)
3. On an examination in Statistics, 12 students in one class had a mean grade of 78 with a
standard deviation of 6, while 15 students in another class had a mean grade of 74 with a
standard deviation of 8. Using a significance level of 0.05, determine whether the first group
is superior to the second group.
4. Mecocci et al. (2004) – link: http://www.ncbi.nlm.nih.gov/pubmed/15462460 – theorizes
that living with a dog diminishes depression in the elderly, here defined as more than 70
years old. To investigate the theory, they recruits 15 single elderly mean who own dogs and
15 single elderly men who do not own any pets. The Hamilton instrument for measuring
depressive tendency is administered to each subject. High scores indicate depression. How
might Mecocci et al. (2004) use the resulting data to test his theory? Explicate a testing
procedure.
5. The breaking strengths of cables produced by a manufacturer have a mean of 1800 pounds
and a standard deviation of 100 pounds. By a new technique in the manufacturing process,
it is claimed that the breaking strength can be increased. To test this claim, a sample of
50 cables is tested and it is found that the mean breaking strength is 1850 pounds. Can we
support this claim at 0.05 significance level.
7