Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Analysis of Two-Way Tables
Introduction to Statistics
Carl von Ossietzky Universität Oldenburg
Fakultät III - Sprach- und Kulturwissenschaften
1
Dutch dialects are traditionally divided in a Low Franconian group (yellow), a Low Saxon
group (green) and a Frisian group (blue).
2
The two-way table
• In Dutch dialects the vowel in house is pronounced differently.
• Most frequent vowels used in house: [u], [y], [œy], and [O] (or [Oi] or [Ou]).
• Is there a significant relationship between dialect group and the distribution of the
vowel pronunciation of house?
3
The two-way table
• A two-way table of counts organizes data about two categorical variables. Also known
as cross table or contingency table.
• The row variable labels the rows that run across the table. This is usually the
explanatory variable, in our example the variable group.
• The column variable labels the columns that run down the table. This is usually the
response variable, in our example the variable vowel.
4
The two-way table
Entering data in SPSS. Coding of group: 1=Low Franconian, 2=Low Saxon, 3=Frisian.
Coding of house: [u]=1, [y]=2, [œy]=3, [O]=4. The cases need to be weighted by the
frequencies given in the column freq!
5
The two-way table
• Row totals and column totals give the marginal distributions of the two variables
separately. It is clearer to present these distributions as percentages of the table total.
• The joint distribution of the row and column variables is found by dividing the count
in each cell by the total number of observations.
6
The two-way table
• To find the conditional distribution of the column variable for one specific value of
the row variable, look only at that column in the table.
• Comparing these conditional distributions is one way to describe the association between
the row and the column variables. It is particularly useful when the row variable is the
explanatory variable.
7
Bar graph
Clustered bar graph. Per area the graph shows the counts of the sounds, where the bars
represent the sounds. The [œy] is most common in Low Franconian, the [u] in Low Saxon
and the [y] in Frisian.
8
Pie graph
Side-by-side pie graphs. In Low Franconian the [œy] is most dominant. In Low Saxon
the [u] and in Frisian the [y] is most dominant.
9
Simpson’s paradox
• We present data on three categorical variables in a three-way table, printed as separate
two-way tables for each level of the third variable.
• A comparison between two variables that holds for each level of a third variable can be
changed or even reversed when data are aggregated by summing over all levels of the
third variable.
• Simpson’s paradox refers to the reversal of a comparison by aggregation.
• It is an example of the potential effect of lurking or hidden variables on an observed
association.
10
The chi-square test
• Hypotheses:
H0: there is no association between the row variable and the column variable
Ha: there is an association between the row variable and the column variable
• In our example:
H0: there is no association between dialect area and the pronunciation of the
vowel in house
Ha: there is an association between dialect area and the pronunciation of the
vowel in house
• In our example the null hypothesis means that the distribution of the vowel
pronunciations should be the same for each dialect group.
• To test the null hypothesis in r × c tables, we compare the observed cell counts with
the expected cell counts calculated under the assumption that the null hypothesis is
true.
11
The chi-square test
• Expected cell counts under the null hypothesis are computed using the formula:
expected count =
row total × column total
n
• The chi-square test is a measure of how much the observed cell counts in a two-way
table diverge from the expected cell counts. The formula for the statistics is:
X (observed count − expected count)2
X =
expected count
2
where “observed” represents an observed sample count, “expected” represents the
expected count for the same cell, and the sum is over all r × c cells in the table.
• The chi-square statistic was invented by Karl Pearson in 1900 for purposes slightly
different from ours.
12
The chi-square test
• Table showing observed and predicted cell values:
13
The chi-square test
• For each cell the squared difference of observed and expected count divided by the
expected count is calculated. The sum of these cell values is X 2:
• Large values of X 2 provide evidence against the null-hypothesis.
14
The chi-square test
• If H0 is true, the chi-square statistic X 2 has approximately a χ2 distribution with
(r − 1)(c − 1) degrees of freedom.
• The p-value for the chi-square is P (χ2 ≥ X 2) where χ2 is a random variable having
the χ2(df) distribution with df = (r − 1)(c − 1).
15
The chi-square test
• In our example, we found X 2=150.840.
• Go to http://www.vassarstats.net/ and choose Distributions, Chi-Square
Distributions. Enter df = 6.
• Highest X 2 value is 24.0 with p-value=0.000522. Since our X 2 is higher (150.840),
our p-value will be smaller: p < 0.001.
• The p-value gives the probability, computed assuming that H0 is true, that the X 2
statistic would take a value as extreme or more extreme than that actually observed
(150.840).
• We found that the p-value is smaller than 0.001. If we choose α = 0.05, then the
p-value is smaller than α = 0.05.
• We reject H0 and accept Ha. We conclude that there is an association between dialect
area and the pronunciation of the vowel in house.
16
Assumptions
• Independence of data:
each person, item or entity contributes to one cell only in the cross table.
• For 2 × 2 tables, we require that all four expected cell counts be 5 or more.
• For tables larger than 2 × 2 up to 20% of expected frequencies may be lower than 5,
but no expected frequencies should be lower than 1.
• Our 3 × 4 table satifies the latter condition:
17
The chi-square test
• Go to http://www.vassarstats.net/. Choose Frequency Data and subsequently
Chi-Square, Cramer’s V, and Lambda. Select three rows and four columns and enter
the frequencies in the table.
18
19
SPSS results
• Results of chi-square test in SPSS:
20
Yates’ correction for continuity
• In case of 2 × 2 tables the chi-square tends to produce p-values which are too small.
• Correction is easy to compute and adjusts the formula for Pearson’s chi-square test:
X (observed count − expected count − 0.5)2
X =
expected count
2
• This reduces the chi-square value obtained and thus increases its p-value. It prevents
overestimation of statistical significance for small data, but gives p-values which are
too large!
• It is probably best ignored.
21
Fisher’s exact probability test
• With small sample sizes (cells smaller then 5), chi-square is not accurate.
• Use Fisher’s exact probability test. The Fisher’s exact probability test requires more
computation, but with SPSS this is easy. Fisher’s test can also be applied to tables
larger than 2 × 2.
• It tests the probability of getting a table as strong as the observed or stronger simply
due to the chance of sampling.
• The hypogeometric distribution is used to calculate the probability of getting the
observed data, and all data sets with more extreme deviations, under the null hypothesis
that the proportions are the same.
• For the two-tailed test, the probability of getting deviations as extreme as the observed,
but in the opposite direction, is also calculated.
22
Fisher’s exact probability test
23
Results SPSS
• Results in SPSS for our dialect example:
24
Results SPSS
• Cramer’s V is an adequate effect size, which falls between 0 and 1.
• Results in SPSS for our dialect example:
25
The chi-square goodness of fit test
• Chi-square test: tool that we use to compare conditional distributions.
• Chi-square goodness of fit test: one distribution is known and we want to compare
it with a hypothesized distribution.
• Data for n observations on a categorical variable with k possible outcomes are
summarized as observed counts, n1, n2, ..., nk in k cells.
• Example: We have a total of 699 accidents involving drivers who were using a mobile
phone at the time of their accident (Example from Moore & McCabe, 2006). Numbers
of each day of the week:
Sunday
20
Monday
133
Tuesday
126
Wednesday
159
Thursday
136
Friday
113
Saturday
12
Total
699
• A null hypothesis specifies probabilities p1, p2, ...pk for the possible outcomes.
26
The chi-square goodness of fit test
• We hypothesize that each day has pk =1/7:
H0: motor verhicle accidents involving mobile phone use are equally likely to occur
on each of the seven days of the week.
Ha: the probabilities vary from day to day.
• For each cell we find the expected count by multiplying the total number of observations
n by the specified probability:
expected count = npi
• The expected count for each day is: 699 × (1/7) = 99.86.
• Use the chi-square statistic to measure how much the observed cell counts differ from
the expected counts:
X (observed count − expected count)2
X =
expected count
2
27
The chi-square goodness of fit test
• The degrees of freedom are k − 1, and p-values are computed from the chi-square
distribution.
• In our case k = 7, so the degrees of freedom are 7 − 1 = 6.
• We calculate X 2 and the p-value with VassarStats: Website for Statistical
Computation.
• Go to http://www.vassarstats.net/. Select Frequency Data, select Chi-Square
”Goodness of Fit” Test, enter the observed and expected frequencies.
28
The chi-square goodness of fit test
29
The chi-square goodness of fit test
• We get X 2=208.84 with p < 0.0001.
• If α = 0.05, then the p-value is lower than α. We reject H0 and accept Ha. We
conclude that the probabilities vary from day to day.
30
Avoid overuse of χ2
• χ2 not very sensitive, so use numerical variables where appropriate.
• Example: Instead of testing:
This course was (check one)
very useful [] useful [] not useful []
completely useless []
try so-called Likert scales:
On a scale of 1-7, this course was
very useful (1)
completely useless (7)
• Use at least 5 points. Allow a “center” (use 1 through odd number). Be consistent,
keeping “positive” answers on one side.
• Compare using t-test or ANOVA!
31