Download eBook - The Chi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
eBook - The Chi-Square Test
Common Business Analytics Applications
Contents
Business Analytics Applications for Chi-Square Test
Background
1
What is the chi-square test?
When can the chi-square test be used?
2
3 common business analytics problems
How to implement the chi-square test?
3
A simple 5-step process
How does the chi-square test actually work?
4
3 core concepts which underpin the test
5
2
Be aware of …
2 assumptions on which the test is based
Copyright 2011
Background
•
The chi-square test of
independence can be used to
check if two variables are
related to each other.
•
•
3
Functionally it is very similar to
the correlation coefficient of
determination, R2
However, chi-square test is
designed to work with
categorical or nominal data
• R2 works only on numeric
data
Copyright 2011
Business Analytics Uses
Three typical applications for chi-square test
•
Geographic relationship to
type of product sold
•
•
More winter boots are sold
from a retail outlet located in
the upper mid-west than one
in the south.
A slightly more complicated
example: check if the type of
gasoline sold in a
neighborhood is indicative of
the median income in the
locality.
• So variable 1 would be
the type of gasoline and
variable 2 would be
income ranges (e.g. <0k,
41k-50k, etc)
•
•
Effect of product mix
change (% of upscale,
% mid-range and %
volume items)
•
•
Are men the primary decision
Compare sales revenues of
makers when it comes to
each product type before and
purchasing a big ticket items?
after the change in product
•
mix.
•
Verify the influence of
gender on purchase
decisions.
Thus the categories in variable
1 would include all the product
Is gender a factor in color
preference of a car?
•
Here variable 1 would be gender
and variable 2would be color.
types and the categories in
variable 2 would include
period 1 and period 2.
4
Copyright 2011
Implementing the chi-square test
A 5-step process to actually apply the test
Step 1
• Identify the two variables of interest from the data table
Step 2
• Compute Margin summations
Step 3
• Build the contingency table
Step 4
• Compute the observed chi-square value
Step 5
• Compare the observed value to critical chi-square value
5
Copyright 2011
Implementing the chi-square test
Business Problem
A specialty retail chain wants to determine if their
strategy for changing the product mix has resulted in
increased revenues. Their products are categorized
into eight types according to price range. The
category prices range from $30 per item to $120+ per
item. Management decided that in order to increase
sales, they need to reduce their higher priced
inventory ($120+ range) by 50%.
6
Has their strategy
worked?
Copyright 2011
Implementing the chi-square test
Step 1: Identify the X and Y variables
• Convention dictates that X's are usually the
parameters that can be changed or
controlled.
• In this case, the X is the strategy, and its
data are the columns which represents all
sales before strategy change and after
strategy change.
• Therefore the Y's are the sales by
category, whose data are rows which
represent the different price categories.
Y
X
7
Copyright 2011
Implementing the chi-square test
Step 2: Compute the Margin Summations
Simply sum all rows and columns and enter these sums on
the "margins"
8
Copyright 2011
Implementing the chi-square test
Step 3: Complete the contingency table
Contingency table, shown below, has the same
dimension as the data table from step 1.
9
Copyright 2011
Implementing the chi-square test
Step 4: Calculate the observed chi-square value
Original data
Contingency table
from Step 3
Observed chi-square
value is the sum of all
the ratios shown in
this table = 0.8539
10
Copyright 2011
Implementing the chi-square test
Step 5: Compare observed chi-square value to “critical” value
The degrees of freedom is simply = (number of rows -1)*(number
of columns -1) in our original data table
df = (8-1)*(2-1) = 7
Let us use a 90% level of confidence, which means alpha = 0.1
Observed chi-square value
from step 4:
0.8539
<
Critical chi-square value* for
90% confidence and degree of
freedom, 7:
12.01
*Find a table of critical values of chi-square from this site
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm
11
Copyright 2011
Implementing the chi-square test
Step 5: Compare observed chi-square value to “critical” value
Observed chi-square value
from step 4:
0.8539
12
<
Critical chi-square value for
90% confidence and degree of
freedom, 7:
12.01
Copyright 2011
How does the chi-square test work?
Based on 3 core concepts
Remember that the chisquare test is needed when
the data are categorical (or
nominal) in nature:
for example if a variable is
the type of financial
investment, its range of
values could be stocks,
bonds and cash.
Therefore the analysis
involves counting
occurrences (of stocks,
bonds and cash) and
comparing variables (type
of investment, customer
demographics, etc.) based
on occurrences.
Concept 1
Check if frequencies of
occurrences are correlated
Concept 2
Multiplication law of probability
Concept 3
Joint Probability of two events
Thus the chi-square test
works by keeping track of
frequencies of occurrences.
13
Copyright 2011
How does the chi-square test work?
Based on 3 core concepts
Concept 1
Check if frequencies of
occurrences are correlated
The chi-square test checks if the frequencies of occurrences across
any pair of variables - such as type of investment and customer
demographic - are correlated.
Thus it is simply a means for comparing "categorical correlations".
14
Copyright 2011
How does the chi-square test work?
Based on 3 core concepts
Concept 2
Multiplication law of probability
If event A (purchasing stocks) happens, what
is the probability that event B (age being 3544) also happens (correlation)?
The multiplication law of probabilities states
that if event A happening is independent of
event B, then the probabilities of A and B
happening together is simply (pA * pB)
15
Copyright 2011
How does the chi-square test work?
Based on 3 core concepts
Concept 3
Joint probability of two events
Each cell in the contingency table first computes this joint probability.
The next step is to convert this joint probability into an "expected
frequency" which is simply (pA*pB*N) where N is the sum of all
occurrences in the dataset.
The test of independence between any two parameters is done by
checking if this expected frequency is the same as the actual observed
frequency for that cell in the table.
If all expected frequencies are equal (or very close) to the corresponding
observed frequencies, then the value of square of the difference
between them (and hence the name CHI-SQUARE) will be very low. In
such a case, we conclude the two parameters are independent (or not
related).
16
Copyright 2011
Key Takeaways
When to use, What to watch out for
•
Use chi-square to test if two categorical variables are related or
independent
•
Chi-square test works on the multiplication law of probabilities
•
17
Need to exercise caution when …
• When sample sizes are small, as indicated by more than 20% of the
contingency cells having expected values < 5, a Fisher's exact
test maybe more appropriate
• the data is correlated. When you are looking to test differences in
proportions among matched pairs in a before/after scenario, an
appropriate choice would be the McNemar's test
Copyright 2011
Questions? Need More Information?
Feel free to contact us.
SimaFore Inc.
[email protected]
330 E Liberty
Ann Arbor, MI 48104
www.simafore.com
Twitter: complexMan
18
Copyright 2011