Download Chi-square

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
S519: Evaluation of
Information Systems
Social Statistics
Inferential Statistics
Chapter 15: Chi-square
Last week

Linear regression


Slope
Intercept
This week



What is chi-square
CHIDIST
Non-parameteric statistics
Parametric statistics

A main branch of statistics




Assuming data with a type of probability distribution
(e.g. normal distribution)
Making inferences about the parameters of the
distribution (e.g. sample size, factors in the test)
Assumption: the sample is large enough to represent
the population (e.g. sample size around 30).
They are not distribution-free (they require a
probability distribution)
Nonparametric statistics

Nonparametric statistics (distribution-free statistics)




Do not rely on assumptions that the data are drawn from a given
probability distribution (data model is not specified).
It was widely used for studying populations that take on a ranked
order (e.g. movie reviews from one to four stars, opinions about
hotel ranking). Fits for ordinal data.
It makes less assumption. Therefore it can be applied in
situations where less is known about the application.
It might require to draw conclusion on a larger sample size with
the same degree of confidence comparing with parametric
statistics.
Nonparametric statistics

Nonparametric statistics (distribution-free
statistics)

Data with frequencies or percentage


Number of kids in difference grades
The percentage of people receiving social security
One-sample chi-square

One-sample chi-square includes only one
dimension



Whether the number of respondents is equally
distributed across all levels of education.
Whether the voting for the school voucher has a
pattern of preference.
Two-sample chi-square includes two
dimensions

Whether preference for the school voucher is
independent of political party affiliation and gender
Compute chi-square
2
(
O

E
)
2  
E
One-sample chi-square test
O: the observed frequency
E: the expected frequency
Example
Question: Whether the number of respondents is
equally distributed across all opinions
One-sample chi-square
Preference for School Voucher
for
maybe
against
23
17
total
50
90
Chi-square steps

Step1: a statement of null and research
hypothesis
There is no difference in the frequency or proportion in each category
H 0 : P1  P2  P3
There is difference in the frequency or proportion in each category
H1 : P1  P2  P3
Chi-square steps

Step2: setting the level of risk (or the level of
significance or Type I error) associated with
the null hypothesis

0.05
Chi-square steps

Step3: selection of proper test statistic

Frequencynonparametric procedureschisquare
Chi-square steps

Step4. Computation of the test statistic value
(called the obtained value)
observed
expected
category frequency (O) frequency (E)
for
23
maybe
17
against
50
Total
90
(O-E)2
D(difference)
30
30
30
90
7
13
20
(O-E)2/E
49
169
400
1.63
5.63
13.33
20.60
Chi-square steps

Step5: Determination of the value needed for
rejection of the null hypothesis using the
appropriate table of critical values for the
particular statistic




Table B5
df=r-1 (r= number of categories)
If the obtained value > the critical value  reject the
null hypothesis
If the obtained value < the critical value  accept the
null hypothesis
Chi-square steps

Step6: a comparison of the obtained value
and the critical value is made

20.6 and 5.99
Chi-square steps

Step 7 and 8: decision time

What is your conclusion, why and how to
interpret?
Another example

We’ll settle the age-old debate of whether
people can actually detect their favorite cola
based solely on taste. For 30 coke-lovers, I
blindfold them, and have them sample 3
colas…is there a true difference, or are these
preference differences explainable by
chance?
Hypothesis


Null: There are no preferences: The
population is divided evenly among the
brands
Alternate: There are preferences: The
population is not divided evenly among the
brands
Chance Model


df = C -1 = 3 -1 = 2, set α = .05
For df = 2, X2-crit = 5.99
Calculate Chi-Square
observed
expected
category frequency (O) frequency (E)
Coke
13
Pepsi
9
RC Cola
8
Total
30
(O-E)2
D(difference)
10
10
10
30
3
1
2
(O-E)2/E
9
1
4
0.9
0.1
0.4
1.4
Decision and Conclusion


2
 crit
 5.99

2
obt
 1.40

2
obt

2
crit
Conclude that the preferences are evenly
divided among the colas when the logos are
removed.
Excel functions

CHIDIST (x, degree of freedom)

CHIDIST(20.6,2)


3.36331E-05<0.05
CHIDIST(1.40,2)

0.496585308>0.05
More non parametric statistics

Table 15.1 (P297)
Related documents