Download HYPOTHESIS TESTING CONTINUED TESTS FOR DIFFERENCES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Stats II, Lecture Notes, page 54
HYPOTHESIS TESTING CONTINUED
TESTS FOR DIFFERENCES
The topic in a nutshell. In statistical work, we’re often interested in the proposition “A is
like B,” where A and B are groups (populations), which differ in some observable and
indisputable way, and we wish to know whether they are alike or different in some way which
is subject to variation. Examples abound. Marketers (and politicians) often wish to know
whether young people and old people buy the same or different products. Medical
researchers are often concerned with whether different possible treatments provide different
cure rates. Production engineers often wish to know whether one production line layout is
more effective than another. Purchasing agents (and consumer advocates) may want to know
whether a product from one manufacturer is more durable than a competitor’s product. And
so on. The basic features in all these cases are the same: 1) the characteristic of interest is
subject to variation, and so to sampling variation; 2) we can only measure sample values from
the relevant populations; 3) because of variation, having different sample means from the two
populations is not conclusive proof that the population means are different. Thus, deciding
whether A is like B depends on the probability of getting the sample means we got from each
population on the assumption that the population means are equal.
To be concrete, let’s imagine a simple case: suppose we wish to determine whether
Western Carolina students and ASU students spend different amounts on entertainment each
month. Sensibly, the only evidence we have must come from samples, and here again
variation creates problems for us in interpreting the results of our samples. Suppose, for the
sake of the argument, that the population standard deviations for the two schools are known
and are known to be equal: let’s suppose σ = $20 and that we will choose from each school a
sample of 25 students and ask them to record carefully their entertainment expenditures for
one month. By the principles we’ve seen several times, the standard error of the mean for
each sample will be σ/√n = $20/5 = $4. Suppose now that we take our samples and
calculate a sample mean from each school and that x̄A = $100 while x̄W = $120. Can we
conclude that Western Carolina students spend more money on entertainment? Not
necessarily. Remember that sample means are not in general equal to the population mean
we’re trying to estimate but rather vary around the population mean. Therefore, it is possible
that the population mean is the same at both schools and that we just happened to get a
smallish sample mean at ASU and a largeish sample mean at WCU. We can never absolutely
prove whether that is the case or not; all we can do is to ask: suppose the two population
means are equal – how probable then is the pair of sample means we got. That is a question
we can answer. Again for the sake of the argument, suppose that µA = µW = $110.
(Suppose, that is, that A is like B.) Then the probability of a sample mean as small as $100 is
only about 1% because $100 is 2.5 standard errors below the population mean; similarly the
probability of a sample mean as large as $120 is only about 1%, and the joint probability of
the two samples is about .01 × .01 = .0001 or about 1 in 10,000.
Stats II, Lecture Notes, page 55
That’s pretty unlikely, but we can think of an alternative situation which is less of a
stretch to believe: suppose, in fact, that the two population means are not equal. If µA =
$102, for example, then a sample mean of $100 isn’t particularly improbable; if µW = $122, a
sample mean from WCU of $120 isn’t especially hard to believe. Thus, faced with these
sample means, we may find it easier to believe that the population means are different than that
we got two such unlikely sample means.
In a way, that’s all there is to testing for differences between populations, but the devil
is in the details. To conduct our tests, we must find the sampling distributions of differences
between sample means. That is: if x̄A is a statistic and x̄W is a statistic, then the difference x̄A
- x̄W is a statistic, and our first concern is to find the sampling distribution of this statistic.
Finding some of the standard errors is a bit tricky, but we’ll find that we follow the same
principles of sampling distributions we’ve already seen. Before we get into that, it should be
easy to see one thing: the expected value of the difference x̄ A - x̄W is equal to zero, so we’ll
be concerned with probabilities on distributions centered on zero, and our hypothesis tests will
implicitly have the form H0: µA - µW = 0 vs. H1: µA - µW ≠ 0, although we may more often use
the equivalent form H0: µA = µW vs. H1: µA ≠ µW .
HYPOTHESIS TESTS FOR DIFFERENCES
t TESTS TESTS FOR DIFFERENCES OF MEANS
Ø t Tests for the Difference between Two Population Means:
„ Independent Samples
„ equal population standard deviations
Independent samples: two distinct samples, comprising different objects drawn from
different populations, and the samples are in no way dependent on one another.
„ population standard deviations NOT known
Hypotheses to be tested:
H0: µ1 = µ2
H1: µ1 ≠ µ2
or
H0: µ1 ≥ µ2
H1: µ1 < µ2
Stats II, Lecture Notes, page 56
The hypothesis test should be conducted with the t distribution whenever:
„ population standard deviations are not known
„ population standard deviations can be assumed to be equal
„ We require also
w either both populations are normally distributed or
w n ≥ 30
Ø Under these circumstances the sampling distribution of x̄1 - x̄2 is given by the following:
„ the distribution is a t distribution
„ E(x̄1 - x̄2) = 0
1
1
„ sx1 − x2 = s p ×
+
where sp is the pooled standard deviation and is given by
n1 n2
(1) s p =
(n1 − 1) × s12 + ( n2 − 1) × s22
n1 + n2 − 2
n1 is the sample size for the first sample, n2 the sample size for the sample from
the second population and the s2’s are the respective sample variances
Ø calculated t statistic will be
( x − x2 ) − ( µ1 − µ2 )
t= 1
s x1 − x2
since, as a rule, the hypothesis is µ1 = µ2, the last term in the numerator = 0, and
we often see:
(x − x 2 )
(2) t = 1
s x1 − x 2
this is compared to a critical t value with n1 + n2 - 2 degrees of freedom or used to find
the p-value of the test
„ definition of sp: look carefully: it is really just a weighted average of the two sample
standard deviations
represents the best estimate we can give of the unknown, but assumed equal,
population standard deviations
Stats II, Lecture Notes, page 57
Example: We wish to determine whether imposing a co-payment will reduce medical
insurance claims. At ART, Inc. a sample of 37 workers whose medical insurance carries
a 20% co-payment had mean claims last year of $250 with standard deviation of $100; a
sample of 28 workers who had no co-payment requirement had mean claims of $400 with
standard deviation of $120. From long experience, we know that medical claims are
approximately normally distributed, and we believe that the standard deviations will be
equal for these two populations. Use these data to test whether co-payments reduce
claims.
1. State hypotheses
H0: µ1 ≥ µ2
H1: µ1 < µ2
2. Select the test statistic/identify the sampling distribution: here it’s a t, calculated by
the formula above, with
d.f. = n1 + n2 − 2 = 37 + 28 − 2 = 63
3. Select α and find the critical value of t. Let’s take α = 0.05; then tC = - 1.669
4. Draw sample, calculate test statistic, etc.
sp =
(n1 − 1) × s12 + ( n2 − 1) × s22
=
n1 + n2 − 2
( 37 − 1) × 100 2 + ( 28 − 1) × 120 2
37 + 28 − 2
=109.02
sx1 − x2 = s p ×
1
1
1
1
+
= 109.02 ×
+
= 109.02 × 0.25 = 27.31
n1 n2
37 28
and the statistic to be calculated is
( x − x2 ) ($250 − $400)
t= 1
=
= −5.49
s x1 − x2
27.31
Since -5.49 < -1.669 we can reject the null hypothesis
Alternatively TDIST(5.49, 63, 1) = 3.8 × 10−7 < 0.05
⇒ reject H0
5. Conclude that imposing co-payments reduces medical care usage and insurance
claims
In such problems, it is often helpful to begin by extracting the data from the problem
and writing it down. Here, we have
n1 = 37; x̄1 = $250; s1 = $100
n2 = 28; x̄2 = $400; s2 = $120
Excel Spreadsheet Tests for Differences of Two Population Means
Three ways to skin a cat:
Ø Calculate t as above and use TDIST to find the p-value OR find the critical t and
compare to calculated value
Stats II, Lecture Notes, page 58
Ø Spreadsheet formula: TTEST(Data Range 1, Data Range 2, Tails, Type): result is
the p-value of the test
„ The Data Ranges are the ranges that contain the data for your two samples
„ Tails = 1 or 2. Note that if Tails = 1, the spreadsheet always performs an upper
one tail test, taking x̄1 as the larger of the two sample means
„ Type = 1 or 2 or 3
w if Type = 1, the spreadsheet does a paired difference test, which will be
described in the next section
w if Type = 2, the spreadsheet does a pooled variance test of the type described
above
w if Type = 2, the spreadsheet does a t test assuming that the population
standard deviations are not equal
Ø Tools-Data Analysis
t-Test: Two Sample Assuming Equal Variances
Output:
t-Test: Two-Sample Assuming Equal Variances
Sample means:
Sample Variances
Sample Sizes
Equation (1) squared
equation (2)
p-value for one tail test
p-value for two tail test
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean
Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
4.666667
4.666667
6
6.350694
0
Variable 2
5.125
7.553571
8
12
-0.33677
0.371055
1.782287
0.74211
2.178813
Notice that Excel reports the calculated t statistic; it also reports the p-values for
one and two-tailed tests, as well as the critical values for α as entered in the
dialog box; the default is 0.05
The first column in the table does not appear in the Excel output; I added for
reference purposes.
Comment: A Normal Distribution test for the Difference between Two Means? An odd
case, rarely encountered. In older practice, z was used with large samples and unknown σ,
appealing to the Central Limit Theorem, especially if it could not be assumed that σ1 = σ2.
Today, most would choose to use a t test with different population variances. See above for
procedures with TTEST or Analysis Tools.
MORE ON HYPOTHESIS TESTING
Stats II, Lecture Notes, page 59
Paired Differences
Ø Hypothesis tests when samples are not independent
If samples are not independent, in the statistical sense, comparison of sample means is
inappropriate. Non-independence usually involves one of two cases:
„ The two “samples” are in fact the same set of objects, but have been subjected to
different treatments on different occasions
Example: Two routes are possible from Boone to Elizabethton. A trucking
company sends six of its drives over the route through Newland on Tuesday and on
Wednesday sends them over the route around Watauga Lake. The time for each
driver over each route is recorded.
„ the first sample is drawn randomly, but the second is drawn to match the
characteristics of the first sample
Example: We wish to know whether a business degree, in and of itself, increases
income. From our student records we draw a sample of BSBA holders; then we
draw a second sample in which each BSBA holder is matched to a BA graduate
who has the same GPA, same number of extracurricular activities, same sex, and so
on, and who has worked in the same industry for the same length of time.
PURPOSE: to control the variation in everything but the one characteristic of interest –
whatever variation is left must be due to variation in that characteristic.
Ø In this case we calculate the difference for each pair of observations and test whether
the mean difference is equal to zero. We’re interested in the sampling distribution of the
average difference d . The sampling distribution of this quantity is given by the following:
it is a t distribution with n - 1 degrees of freedom, where n is the number of pairs of
observations
it has E(d ) = 0
s
the standard error is given by sd = d where sd is the sample standard deviation of
n
Σ(d − d ) 2
n −1
So, in effect, we take each pair of observations, calculate the difference and this set of
differences become the data used to test the null hypothesis H0: µ = 0 or the equivalent
one-tailed test as appropriate
the differences, that is, sd =
Stats II, Lecture Notes, page 60
Example: Whitney has invented a “smart pill.” He asks 9 of his classmates to prepare for
the second stats exam just as they prepared for the first, but he administers a smart pill to
each a half hour before the exam. The results are:
Student # Score 1st Exam Score 2nd Exam Difference
1
82
87
5
2
43
55
12
3
71
69
-2
4
68
75
7
5
66
69
3
6
91
95
4
7
77
74
-3
8
58
68
10
9
75
78
3
The mean score on the second exam for all students who did not take smart pills
was the same as the mean score on the first exam. Did Whitney’s pill work?
Solution: First calculate for each student the difference between his score on
the first exam and on the second exam. This yields the set of differences given in
the fourth column. Notice that we must always subtract in the same
direction and preserve any minus signs in our calculations. Doing our
five steps we have
1. H0: µd ≤ 0 vs. H1: µd > 0.
2. the appropriate test is a t with n - 1 d.f. Here n = 9, that is, there are nine
d − µ0
differences. The test statistic is t =
. Since the hypothesis is that µd =
sd
0, this is more commonly seen as t =
d
.
sd
3. Let’s take α = 0.01; then tC = +2.896. Note carefully: there are 8 d.f.
4. Calculations: d = (5+12-2+7+3+4-3+10+3)/9 = 4.33; sd = 4.95 and the
standard error sd = 4.95/3 = 1.65.
Accordingly t = 4.33/1.65 = 2.626; TDIST(2.626, 8,1) = 0.015183 and we
will fail to reject H0. (Alternatively: 2.626 < 2.896)
5. Conclude that there is no strong evidence that Whitney’s pill works.
Example: We are interested in whether, other things being equal, women’s salaries are
equal to men’s. This is not a simple question, since, on average, women differ from men in
many important employment characteristics. So, we choose a random sample of 81 men;
for each man chosen we then choose a woman who has the same education level, age and
number of years experience and who works for a similar-sized company. The men’s
salaries are paired with the corresponding women’s and the paired differences are
recorded. A portion of the data looks like this:
Pair Man’s Salary
Woman’s Salary Difference
1
2500
2230
270
Stats II, Lecture Notes, page 61
2
3000
3200
-200
3
3500
3300
200
For all pairs the average difference is $110 with a standard deviation of
differences of $90. The distributions both appear to be somewhat skewed.
Can we conclude that men’s and women’s salaries are different?
Solution:
1. H0: µd = 0 vs. H1: µ d ≠ 0
2. t test with 81 - 1 = 80 d.f.
3. for α = 0.05, tC = ± 1.990
4. sd = 90/√81 = 10, so t = $110/10 = 11; we can decisively reject H0.
TDIST(11, 80, 2) = 10−17
5. Are there implications for legislation or policy?
Excel Spreadsheet Procedures
Ø Use TDIST to calculate p-value or TINV to find critical values
Ø Use TTEST(DATA RANGE 1, DATA RANGE 2, 1, 1)
The 1 in the Type position specifies that it is a paired difference test
Ø Use Data Analysis—t Test Paired Two Sample for Means
The output is very similar to that discussed above, except that it includes a
measure of the correlation between the two sets of data; we can ignore this for now.