Download Lecture 28

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Data Analysis and Statistical Methods
Statistics 651
http://www.stat.tamu.edu/~suhasini/teaching.html
Lecture 28 (MWF) The chi-squared test for independence
Suhasini Subba Rao
Lecture 28 (MWF) The chi-squared test for independence
Example
Let us return to an example in Lecture 26. The FDA approved the drug
Minodixil as a remedy for male pattern baldness. They did a study and this
is what they found:
Minodixil
Placebo
Sample size
310
309
% with new hair growth
32
20
• The way we approached the problem was to examine and compare the
conditional probabilities. We let pM be the probability a person has
new hair growth and uses Minodixil and pP be the probability a person
has new hair growth and and doesn’t use Minodixil. These are unknown
so we estimate them from the data using p̂M = 99/310 = 0.32 and
p̂P = 60/302 = 0.2.
1
Lecture 28 (MWF) The chi-squared test for independence
• If there is no difference between Minodoxil and the Placebo, pM and pP
would be the same.
• Therefore we test H0 : pM − pP ≤ 0 against the alternative HA :
pM − pP 6= 0 (in reality we were testing HA : pM − pP > 0, however the
following discussion only applies to the two-sided test).
• We return to the above problem, but look at it in terms of conditional
probabilities and independence.
• Recalling the definitions in Lecture 5, two variables are independent (in
this example the type of hair treatment and whether hair growth is seen
or not) if P (Hair growth|Minidoxil) = P (Minidoxil), in other other
words Minidoxil has no influence on hair growth.
• Again recalling definition from Lecture 5 pM and pP are conditional
2
Lecture 28 (MWF) The chi-squared test for independence
probabilities - that is the probability of seeing an increase in hair when
given Minodoxil (or the placebo).
• As there are only two groups, if pM = pP this means the conditional
probabilities are the same as the marginal probability.
• In other words, if pM = pP are the same it is the same as saying
that cream and hair difference are statistically independent. Indeed
if pM = pP (they are the same) they the same as the marginal
probabilities.
• Now we look at a different method of approaching this problem based
on explicitly checking for statistical independence. The advantage of
this approach is that it not restricted to just the two-by-two tables. The
disadvantage is that one-sided tests are not possible.
3
Lecture 28 (MWF) The chi-squared test for independence
Does gender play a role in wearing a dress at the Oscars?
• H0: Gender does not play a role on dress wearing HA :Gender does play
a role on dress wearing.
• If the null were true overall we would expect the proportion who wore a
dress to be the same as the proportion of females who wore a dress to
be the same as the proportion of males who wore a dress (marginals the
same as the conditionals).
• Data was collected.
4
Lecture 28 (MWF) The chi-squared test for independence
Example 1: Oscar data
Dress
No Dress
Total
Dress
No Dress
Total
Male
0
200
200
Male
0
0 ( 200
= 0.00)
100
200 ( 200
= 1)
200
Female
215
5
220
Total
215
205
420
Female
215 ( 215
220 = 0.977)
5
5 ( 220
= 0.023)
220
Total
215
205
420
• Looking at the numbers it appears that gender has an influence on dress.
Is this statistically significant? What would be the numbers look like if
there was no association?
5
Lecture 28 (MWF) The chi-squared test for independence
The test: What we expect under independence
Dress
No Dress
Total
Dress
No Dress
Total
Dress
No Dress
Total
Male
Female
200
200 ( 420
= 0.476)
220 ( 220
420 = 0.524)
Male
51.2% of 200 males
102.38
48.8% of 200 males
97.62
200
200 ( 420
= 0.476)
Female
51.2% of 220 females
112.6
48.8% of 220 females
107.38
220 ( 220
420 = 0.524)
Male
0
200
200
Female
215
5
220
Total
215
205
420
Total
215
215 ( 420
= 0.512)
205
205 ( 420
= 0.488)
420
Total
215 ( 215
420 = 0.512)
205 ( 205
420 = 0.488)
420
Measure the difference between what
we do observe.
6
Lecture 28 (MWF) The chi-squared test for independence
There is such a clear mismatch, we really do believe there is a
dependence. This means the we should be able to reject the null and
the ‘p-value’ in the test will be small.
How to do actually measure this difference:
(102.8 − 0)2 (112.6 − 215)2
(97.62 − 200)2 (107.38 − 5)2
T =
+
++
= 400.
102.8
112.6
97.62
107.38
This should correspond to a tiny p-value (zero almost).
7
Lecture 28 (MWF) The chi-squared test for independence
Does gender play a role in grades?
• H0: Gender does not play grades HA:Gender does play in grades.
• If the null were true overall we would expect the proportion who passed
to be the same as the proportion of females who passed to be the same as
the proportion of males passed (marginals the same as the conditionals).
• Data was collected.
8
Lecture 28 (MWF) The chi-squared test for independence
Example 1: Grades Data
Pass
Fail
Total
Male
108
12
120
Female
180
20
200
Total
288
32
320
Male
Female
Total
Pass
108 ( 108
120 = 0.9)
180 ( 180
200 = 0.9)
288
Fail
12
12 ( 120
= 0.1)
20
20 ( 200
= 0.1)
32
Total
120
200
320
• Looking at the numbers it at least for this data set there does not appear
to be an association.
9
Lecture 28 (MWF) The chi-squared test for independence
The test: What we expect under independence
Pass
Fail
Total
Pass
Fail
Total
Pass
Fail
Total
Male
Female
120
200
Total
288
32
320
Male
90% of the 120 males
= 108
10% of 120 males
=12
120 ( 120
320 = 0.375)
Male
108
12
120
Female
180
20
200
Total
288
32
320
Female
90% of the 200 females
= 180
10% of the 320 females
= 20
200 ( 200
320 = 0.625)
Total
288 ( 288
320 = 0.9)
32
32 ( 320
= 0.32)
320
Measure the difference between what
we do observe. This time it is T =
0. The data is exactly as we would
expect it to be under the null of no
association. p-value should be 100%.
10
Lecture 28 (MWF) The chi-squared test for independence
Returning to the Minidoxil Example
We relook at the data:
Minodixil
Placebo
Pooled
new hair growth
99
60
159
no hair growth
211
242
453
Sample size
310
302
612
We want to test H0 :There is no association between hair growth and
hair cream used against HA : There is an association between hair growth
and hair cream used. This is the same as H0 : pM − pP = 0 against
HA : pM − pP 6= 0.
11
Lecture 28 (MWF) The chi-squared test for independence
Suppose the null is true, that is there is no difference between Minidoxil
and the Placebo. In this case, the best estimate for the probability of seeing
hair growth if any hair cream is used is the ‘pooled’ estimate which is
p=
159
99 + 60
=
= 0.26
310 + 302 612
(see lecture 26) and the best estimate for the probability of not seeing a
difference if cream is placed in the hair is
1−p=
451
451
=
= 0.74.
310 + 302 612
We then look back at the data and calculate what we expect to see if
there is no association between the two:
12
Lecture 28 (MWF) The chi-squared test for independence
Minodixil
Placebo
new hair growth
0.26 × 310 = 80.54
0.26 × 302 = 78.46
159
no hair growth
0.74 × 453 = 229.5
0.74 × 302 = 223.5
453
Sample size
310
302
612
Just like in the goodness of fit test we calculate the difference between
what we expect and what we actually observe in the data
(99 − 80.54)2 (211 − 229.5)2 (78.46 − 60)2 (223.5 − 242)2
T =
+
+
+
= 11.58
80.54
229.5
78.46
223.5
Recall if T is zero there is a perfect match between what we expect to see
and what we observe and the data is consistent with the null being true.
On the other hand, if T is ‘large’ there is a complete mismatch between
what we expect and what we observe and there is evidence against the null
(the data is unlikely to be independent).
13
Lecture 28 (MWF) The chi-squared test for independence
How to tell determine whether T is large or not?
• For the Gender/Dress example T = 400 we had to reject the null, p-value
was very small.
• For the Gender/Grade example T = 0, we could not reject the null,
p-value must be one.
• The distribution of T under the null. that there is no dependence, follows
a χ-square distribution with 1-degree of freedom. The p-value is the
area to the right of T . The 5% critical value is 3.841. Since T = 11.58
is smaller than 3.481 the p-value is less than 5%. In fact T = 11.58
is less than 10.83, so the p-value is less than 0.1%. The output using
Statcrunch is given below, compare what we have calculated to what
you observe in the output.
14
Lecture 28 (MWF) The chi-squared test for independence
Chi-square plot
15
Lecture 28 (MWF) The chi-squared test for independence
The Minidoxil Example (Statcrunch output)
Contingency table results:
Rows: Difference
Columns: None
Cell format
Count
(Row percent)
(Column percent)
(Total percent)
Expected count
Minidoxil
Placebo
Total
Yes
99
(62.26%)
(31.94%)
(16.18%)
80.54
60
159
(37.74%) (100.00%)
(19.87%)
(25.98%)
(9.804%)
(25.98%)
78.46
No
211
(46.58%)
(68.06%)
(34.48%)
229.5
242
(53.42%)
(80.13%)
(39.54%)
223.5
453
(100.00%)
(74.02%)
(74.02%)
310
(50.65%)
(100.00%)
(50.65%)
302
(49.35%)
(100.00%)
(49.35%)
612
(100.00%)
(100.00%)
(100.00%)
Total
Chi−Square test:
Statistic
Chi−square
DF
1
Value
P−value
11.584855
0.0007
We see that the p-value is 0.07%. Since 0.07% < 5%, there is evidence
to suggest that the cream applied has an influence on hair growth. Make
sure you understand what all the percentages in the output mean.
16
Lecture 28 (MWF) The chi-squared test for independence
Comparing the results of the chi-squared test and
proportions test
• In Lecture 26 we had used the same data set to test for two-proportions
H0 : pM − pP ≤ 0 against HA : pM − pP > 0 (one-sided test). The
p-value for this test was 0.03%).
• The p-value for chi-squared test is 0.07%
• The p-value for the one-sided proportions test is 0.03%
• If the proportions test was a two-sided test, the p-value would be 0.06%.
• We see that the p-value for the two-sided two sample proportion test is
(almost) the same as the chi-squared test.
17
Lecture 28 (MWF) The chi-squared test for independence
• This is because the chi-squared test for independence for 2-by-2 tables is
the same as a two-sided test for two sample proportions.
However, an advantage of the two-sample proportion test is that it can
be used to test other alternatives such as H0 : pM − pP ≤ 0.3 against
HA : pM − pP > 0.3 and one-sided test, which isn’t possible with the
chi-squared test.
18
Lecture 28 (MWF) The chi-squared test for independence
Testing for independence (or lack of association)
• Often in statistics we want to test whether there is an association between
two events. For example, is there a link between signs of diabetes and
whether someone is over weight, smoking and cancer etc.
• These type of question can be placed within a categorical data framework
(see testing for independence lecture30.pdf for an alternative
explanation).
19
Lecture 28 (MWF) The chi-squared test for independence
Tests for independence: General setup
• We have m × k cells (the word ‘cell’ is usually used in categorial data
when referring to one event). The observed table is:
B1
B2
..
Bm
Subtotals
• n·i =
Pm
A1
n11
n21
A2
n12
n22
...
...
...
Ak
n1k
n2k
Subtotals
n·1
n·2
nm1
n1·
nm2
n2·
...
...
nmk
nk·
n·m
N
j=1 nji and ni· =
Pk
j=1 nij .
• n1· + . . . + nm· = n·1 + . . . + n·k = N .
20
Lecture 28 (MWF) The chi-squared test for independence
The general setup
Under the null hypothesis of independence between events the expected
table is
A1
A2
...
Ak
Subtotals
B1
n1· n·1
N
n·1 n2·
N
...
n·1 nk·
N
n·1
B2
..
n·1 n1·
N
n2· n·2
N
...
nk· n·2
N
n·2
Bm
n1· n·m
N
n2· n·m
N
...
nk· n·m
N
n·m
Subtotals
n1·
n2·
...
nk·
N
n n
We need to compare each nij with i·N ·j , if they are close we don’t have
enough evidence to say that the events are dependent.
21
Lecture 28 (MWF) The chi-squared test for independence
Testing for independence: the test statistic
• The test statistic is
T =
n n
k X
m
X
nij − i·N ·j
i=1 j=1
ni· n·j
N
2
∼ χ2(m−1)×(k−1).
• As usual if T is larger than under the null T ∼ χ2(m−1)×(k−1)(α).
22
Lecture 28 (MWF) The chi-squared test for independence
Length of hair and feet size
Example Test whether there is an association between length of hair
and big feet using the data:
Big feet
Not big feet
Subtotals
Long hair
20
150
170
Medium hair
80
170
250
Short Hair
250
80
330
Subtotals
350
400
Total=750
We see that the p-value is small (it
is less than 0.01%), which suggests
a dependence between feet size and
hair.
How check where this difference lies?
Do tests on proportions between each
pairing.
23
Lecture 28 (MWF) The chi-squared test for independence
This is a two sample test on proportions,
which looks for significant difference between
each groups pair
• Long Hair vs. Medium Hair. p-value
< 0.01%
• Long Hair vs. Short Hair. p-value <
0.01%
• Short Hair vs. Medium Hair. p-value
< 0.01%
It seems that hair has a influence on feet size.
The people with shorter hair tend to have
bigger feet.
24
Lecture 28 (MWF) The chi-squared test for independence
Example: Test for independence between height and
bossiness
• Psychologists wanted to investigate whether there was dependence
between height and how bossy someone was (aka Do short men have a
Napolean complex).
• They gathered the following data.
bossy
not bossy
short
90
110
200
medium
155
145
300
large
55
145
200
totals
300
400
700
• Test the hypothesis that there is no dependence between height and
bossiness against the alternative that there is.
25
Lecture 28 (MWF) The chi-squared test for independence
Solution
• Recall that independence means that if you randomly selection someone
the probability they will be be bossy is the same as if you were to restrict
the population to tall people (or short people or middle size people) and
randomly select someone in this subpopulation (of only tall, or short or
middle size people).
If this is the case, then size has no dependence on bossiness.
• In reality we cannot calculate these probabilities, because we do not
observe the entire population of people, but we do have samples from
the population.
In this case we have a sample of 700 people.
• First look at the data. We see that the proportion of short men who are
bossy is larger than the proportion that the proportion of medium and
26
Lecture 28 (MWF) The chi-squared test for independence
large men that are bossy. So from looking at the data, there appears to
be a dependence. But this difference could be due to random variation.
So we want to test whether the difference is significant or not. Our
objective is to test:
H0 : There is no dependence between height and bossiness.
HA : There is a dependence between height and bossiness.
• We first have to make a table of expected values under the null that
there is no dependence between height and bossiness.
27
Lecture 28 (MWF) The chi-squared test for independence
The test statistic is T = 29.
• Now because there are 3 × 2 cells (it is a 3 by 2
table), under the null T has a χ2 distribution with
(3 − 1) × (2 − 1) = 2-degrees of freedom.
• Look up Table 7: χ22(0.05) = 5.991.
• The p-value is P (T > 26) = 0.0001.
• Since T = 29 > 5.99, there is enough evidence
to reject the null. Equivalently the p-value is very
small. That is, based on the data there appears to
be a dependence between size and bossiness.
• What happens when we do a pair-wise test?
28
Lecture 28 (MWF) The chi-squared test for independence
Beware of interpretation: Simpson’s paradox
The Fiona, Jane and Steph are being rated at
a call center. Their ratings data is collected.
• We see that the rating of Fiona (85%) is greater
than Jane (59%) and Steph (75%).
• In fact if we compared Fiona with either Jane
or Steph we would see that the difference is
statistically significant and there is evidence that
Fiona gets better ratings (the difference is over and
above random variation).
• To check the above do a two sample test for
proportions H0 : pF − pJ ≤ 0 vs. HA : pF −
pJ > 0 etc.
• Does this mean Fiona is better at dealing with
customers?
• We need to be careful when draw causal
conclusions.
29
Lecture 28 (MWF) The chi-squared test for independence
A breakdown of the data
We now consider a break down of the data over two weeks; the first
week and the second week.
Comparing the scores we see that Jane performs better on both weeks.
What is happening?
30