Download S-07: CHI SQUARE The "t" test and the F test described in previous

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
S-07: CHI SQUARE
The "t" test and the F test described in previous modules are called parametric tests. They assume certain
conditions about the parameters of the population from which the samples are drawn.
Parametric and nonparametric statistical procedures test hypotheses involving different assumptions.
Parametric statistics test hypotheses based on the assumption that the samples come from populations that
are normally distributed. Also, parametric statistical tests assume that there is homogeneity of variance
(variances within groups are the same). The level of measurement for parametric tests is assumed to be
interval or at least ordinal. Nonparametric statistical procedures test hypotheses that do not require
normal distribution or variance assumptions about the populations from which the samples were drawn and
are designed for ordinal or nominal data.
The main weakness of nonparametric tests is that they are less powerful than parametric tests. They are less
likely to reject the null hypothesis when it is false. When the assumptions of parametric tests can be met,
parametric tests should be used because they are the most powerful tests available.
There are, however, certain advantages of nonparametric techniques such as Chi Square (X2). For one
thing, nonparametric tests are usually much easier to compute. Another unique value of nonparametric
procedures is that they can be used to treat data which have been measured on nominal (classificatory)
scales. Such data cannot, on any logical basis, be ordered numerically, hence there is no possibility of using
parametric statistical tests which require numerical data.
The general pattern of nonparametric procedures is much like that seen with parametric tests, namely,
certain sample data are treated by a statistical model which yields a value or statistic. This value is then
interpreted for the likelihood of its chance occurrence according to some type of statistical probability
distribution. With Chi Square, a value is calculated from the data using Chi Square procedures and then
compared to a critical value from a Chi Square table with degrees of freedom corresponding to that of the
data. If the calculated value is equal to or greater than the critical value (table value), the null hypothesis is
rejected. If the calculated value is less than the critical value, the null hypothesis (Ho) is accepted. This
procedure is similar to that used with the "t" test and F test.
Purpose of Chi Square
The Chi Square (X2) test is undoubtedly the most important and most used member of the nonparametric
family of statistical tests. Chi Square is employed to test the difference between an actual sample and
another hypothetical or previously established distribution such as that which may be expected due to
chance or probability. Chi Square can also be used to test differences between two or more actual samples.
Basic Computational Equation
Example:
A
U
D
Observed responses (Fo)
8
8
14
Expected responses (Fe)
(10)
(10)
(10)
Fo - Fe
-2
-2
4
(Fo - Fe)2
4
4
16
.4
.4
1.6
2.4
Degrees of freedom - (number of levels - 1) = 2
X2.05 = 5.991 2.4 < 5.991
Therefore, accept null hypothesis.
When there is only one degree of freedom, an adjustment known as Yates correction for continuity must
be employed. To use this correction, a value of 0.5 is subtracted from the absolute value (irrespective of
algebraic sign) of the numerator contribution of each cell to the above basic computational formula. The
basic chi square computational formula then becomes:
One-Way Classification
The One-Way Classification (or sometimes referred to as the Single Sample Chi Square Test) is one of the
most frequently reported nonparametric tests in journal articles. The test is used when a researcher is
interested in the number of responses, objects, or people that fall in two or more categories. This procedure
is sometimes called a goodness-of-fit statistic. Goodness-of-fit refers to whether a significant difference
exists between an observed number and an expected number of responses, people or objects falling in each
category designated by the researcher. The expected number is what the researcher expects by chance or
according to some null hypothesis.
Example of a One-Way Classification (with Yates Correction):
Suppose that we flip a coin 20 times and record the frequency of occurrence of heads and tails. We know
from the laws of probability that we should expect 10 heads and 10 tails. We also know that because of
sampling error we could easily come up with 9 heads and 11 tails or 12 heads and 8 tails.
Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails. We would enter our expected
frequencies (10 - 10) and our observed frequencies (12 - 8) in a table.
Observed
Expected
(Fo-Fe-0.5)
(Fo-Fe0.5)2
Heads
12
10
1.5
2.25
0.225
Tails
8
10
-1.5
2.25
0.225
20
20
0.450
The calculation of x in a one-way classification (Yates Correction) is very straight forward. The expected
frequency in a category ("heads") is subtracted from the observed frequency, and since Yates Correction is
being used, 0.5 is subtracted from the absolute value of Fo - Fe, the difference is squared, and the square is
divided by its expected frequency. This is repeated for the remaining categories, and as the formula for x2
indicates, these results are summed for all categories.
How does a calculated X2 of 0.450 tell us if our observed results of 12 heads and 8 tails represent a
significant deviation from an expected 10-10 split? The shape of the Chi Square sampling distribution
depends upon the number of degrees of freedom. The degrees of freedom for a one-way classification X2 is
r - 1, where r is the number of levels. In our problem above r = 2, so there would obviously be 1 degree of
freedom. From our statistical reference tables, a X2 of 3.84 or greater is needed for X2 to be significant at
the .05 level, so we conclude that our X2 of 0.450 in the coin-flipping experiment could have happened by
sampling error and the deviations between the observed and expected frequencies are not significant. We
would expect any data set yielding a calculated X2 value less than 3.84 with one degree of freedom at least
5% of the time due to chance alone. Therefore, the observed difference is not statistically significant at the
.05 level.
Two-Way Classification
The two-way Chi Square is a convenient technique for determining the significance of the difference
between the frequencies of occurrence in two or more categories with two or more groups. For example, we
may see if there is any difference in the number of freshmen, sophomores, juniors, or seniors in regards to
their preference for spectator sports (football, basketball, or baseball). This is called a two-way
classification since we would need two bits of information from the students in the sample, their class and
their sports preference.
Example of a Two-Way Classification
Suppose an investigator wishes to see if 20 boys and girls respond differently to an attitudinal question
regarding the educational value of extracurricular activities and observed the following (A = very valuable,
U = uncertain, and D = little value).
Boys A = 60 U = 20 D = 20
Girls A = 40 U = 0 D = 60
Expected frequencies (Fe) for each cell are determined by the following formula.
Example - For the cell "Boys - A", the corresponding row subtotal = 100, the corresponding column
subtotal = 100, and the total number of observations = 200. NOTE: Row subtotals and column subtotals
must have equal sums, and total expected frequencies must equal total observed frequencies.
A
U
D
Row Subtotals
Boys
60
(50)
20
(10)
20
(40)
100
Girls
40
(50)
0
(10)
60
(40)
100
Column Subtotals
100
20
Degrees of Freedom = (Rows - 1)(Columns - 1) = (2 - 1)(3 - 1) = 2
Table value of X2.05 with 2 degrees of freedom = 5.991
Therefore, reject null hypothesis.
80
200
Degrees of Freedom
A value of X2 cannot be evaluated unless the number of degrees of freedom associated with it is known.
The number of degrees of freedom associated with any X2 may be easily computed.
If there is one independent variable, df = r - 1 where r is the number of levels of the independent variable.
If there are two independent variables, df = (r - l) (s - l) where r and s are the number of levels of the first
and second independent variables, respectively.
If there are three independent variables, df = (r - l) (s - 1) (t - 1) where r, s, and t are the number of levels of
the first, second, and third independent variables, respectively.
Assumptions
Even though a nonparametric statistic does not require a normally distributed population, there still are
some restrictions regarding its use.
1. Representative sample (Random)
2. The data must be in frequency form (nominal data) or greater.
3. The individual observations must be independent of each other.
4. Sample size must be adequate. In a 2 x 2 table, Chi Square should not be used if n is less than 20. In a
larger table, no expected value should be less than 1, and not more than 20% of the variables can have
expected values of less than 5.
5. Distribution basis must be decided on before the data is collected.
6. The sum of the observed frequencies must equal the sum of the expected frequencies.
SELF ASSESSMENT
1. Explain the purpose and importance of Chi Square as a nonparametric statistic.
2. Write the basic computational equation for Chi Square.
3. Explain the difference between a one-way classification and a two-way classification.
4. How do you compute the degrees of freedom for the following:
X2 with one independent variable
X2 with two independent variables
X2 with three independent variables
6. What purpose does Yates correction for continuity serve?
7. What are some of the major differences between parametric and nonparametric statistics?
8. What level of measurement is required for Chi Square?
9. A public opinion polling team in a small town was interested in the type of sporting events that adults in
the age bracket of 20-50 years prefer to watch on TV. A random sample of 120 was selected and asked,
"Given your preference, would you prefer to watch baseball (P1), basketball (P2), or football (P3) on TV?"
Of the respondent, 39 indicated a preference for baseball, 25 selected basketball, and 56 selected football.
Null Hypothesis
General - In the population being sampled, the proportions of people in each category are equal. Ho: P1 =
P2 = P3
Specific - In the population being sampled, equal proportions of people prefer baseball, basketball, and
football.
P1
P2
P3
Observed responses (Fo)
Expected responses (Fe)
Fo - Fe
(Fo - Fe)2
Degrees of freedom - (number of levels - 1) =
X2.05 =
Ho = Accept or Reject?
10. In a school with a merit system for pay raises a random sample of the faculty were asked if they wished
that system to be continued. Of the 10 faculty members responding, 7 wanted to continue and 3 did not
want to continue. Use the one-sample case technique with Yates correction and determine if the difference
in proportions are statistically significant at the 0.05 level.
Observed
Expected
(Fo-Fe-0.5)
(Fo-Fe-0.5)2
Continue
Discontinue
Degrees of freedom - (number of levels - 1) =
X2.05 =
Ho = Accept or Reject?
11. A representative of a major university was interested in how undergraduate males and females
responded differently to a question regarding a proposed athletic fee. Of the 100 males and 100 females
who responded, 20 males and 60 females agreed, 70 males and 20 females disagreed, and 10 males and 20
females were undecided.
A
Males
Females
Column Subtotals
Degrees of Freedom = (Rows - 1)(Columns - 1) =
X2.05 =
Ho = Accept or Reject?
U
D
Row Subtotals