Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
eBook - The Chi-Square Test Common Business Analytics Applications Contents Business Analytics Applications for Chi-Square Test Background 1 What is the chi-square test? When can the chi-square test be used? 2 3 common business analytics problems How to implement the chi-square test? 3 A simple 5-step process How does the chi-square test actually work? 4 3 core concepts which underpin the test 5 2 Be aware of … 2 assumptions on which the test is based Copyright 2011 Background • The chi-square test of independence can be used to check if two variables are related to each other. • • 3 Functionally it is very similar to the correlation coefficient of determination, R2 However, chi-square test is designed to work with categorical or nominal data • R2 works only on numeric data Copyright 2011 Business Analytics Uses Three typical applications for chi-square test • Geographic relationship to type of product sold • • More winter boots are sold from a retail outlet located in the upper mid-west than one in the south. A slightly more complicated example: check if the type of gasoline sold in a neighborhood is indicative of the median income in the locality. • So variable 1 would be the type of gasoline and variable 2 would be income ranges (e.g. <0k, 41k-50k, etc) • • Effect of product mix change (% of upscale, % mid-range and % volume items) • • Are men the primary decision Compare sales revenues of makers when it comes to each product type before and purchasing a big ticket items? after the change in product • mix. • Verify the influence of gender on purchase decisions. Thus the categories in variable 1 would include all the product Is gender a factor in color preference of a car? • Here variable 1 would be gender and variable 2would be color. types and the categories in variable 2 would include period 1 and period 2. 4 Copyright 2011 Implementing the chi-square test A 5-step process to actually apply the test Step 1 • Identify the two variables of interest from the data table Step 2 • Compute Margin summations Step 3 • Build the contingency table Step 4 • Compute the observed chi-square value Step 5 • Compare the observed value to critical chi-square value 5 Copyright 2011 Implementing the chi-square test Business Problem A specialty retail chain wants to determine if their strategy for changing the product mix has resulted in increased revenues. Their products are categorized into eight types according to price range. The category prices range from $30 per item to $120+ per item. Management decided that in order to increase sales, they need to reduce their higher priced inventory ($120+ range) by 50%. 6 Has their strategy worked? Copyright 2011 Implementing the chi-square test Step 1: Identify the X and Y variables • Convention dictates that X's are usually the parameters that can be changed or controlled. • In this case, the X is the strategy, and its data are the columns which represents all sales before strategy change and after strategy change. • Therefore the Y's are the sales by category, whose data are rows which represent the different price categories. Y X 7 Copyright 2011 Implementing the chi-square test Step 2: Compute the Margin Summations Simply sum all rows and columns and enter these sums on the "margins" 8 Copyright 2011 Implementing the chi-square test Step 3: Complete the contingency table Contingency table, shown below, has the same dimension as the data table from step 1. 9 Copyright 2011 Implementing the chi-square test Step 4: Calculate the observed chi-square value Original data Contingency table from Step 3 Observed chi-square value is the sum of all the ratios shown in this table = 0.8539 10 Copyright 2011 Implementing the chi-square test Step 5: Compare observed chi-square value to “critical” value The degrees of freedom is simply = (number of rows -1)*(number of columns -1) in our original data table df = (8-1)*(2-1) = 7 Let us use a 90% level of confidence, which means alpha = 0.1 Observed chi-square value from step 4: 0.8539 < Critical chi-square value* for 90% confidence and degree of freedom, 7: 12.01 *Find a table of critical values of chi-square from this site http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm 11 Copyright 2011 Implementing the chi-square test Step 5: Compare observed chi-square value to “critical” value Observed chi-square value from step 4: 0.8539 12 < Critical chi-square value for 90% confidence and degree of freedom, 7: 12.01 Copyright 2011 How does the chi-square test work? Based on 3 core concepts Remember that the chisquare test is needed when the data are categorical (or nominal) in nature: for example if a variable is the type of financial investment, its range of values could be stocks, bonds and cash. Therefore the analysis involves counting occurrences (of stocks, bonds and cash) and comparing variables (type of investment, customer demographics, etc.) based on occurrences. Concept 1 Check if frequencies of occurrences are correlated Concept 2 Multiplication law of probability Concept 3 Joint Probability of two events Thus the chi-square test works by keeping track of frequencies of occurrences. 13 Copyright 2011 How does the chi-square test work? Based on 3 core concepts Concept 1 Check if frequencies of occurrences are correlated The chi-square test checks if the frequencies of occurrences across any pair of variables - such as type of investment and customer demographic - are correlated. Thus it is simply a means for comparing "categorical correlations". 14 Copyright 2011 How does the chi-square test work? Based on 3 core concepts Concept 2 Multiplication law of probability If event A (purchasing stocks) happens, what is the probability that event B (age being 3544) also happens (correlation)? The multiplication law of probabilities states that if event A happening is independent of event B, then the probabilities of A and B happening together is simply (pA * pB) 15 Copyright 2011 How does the chi-square test work? Based on 3 core concepts Concept 3 Joint probability of two events Each cell in the contingency table first computes this joint probability. The next step is to convert this joint probability into an "expected frequency" which is simply (pA*pB*N) where N is the sum of all occurrences in the dataset. The test of independence between any two parameters is done by checking if this expected frequency is the same as the actual observed frequency for that cell in the table. If all expected frequencies are equal (or very close) to the corresponding observed frequencies, then the value of square of the difference between them (and hence the name CHI-SQUARE) will be very low. In such a case, we conclude the two parameters are independent (or not related). 16 Copyright 2011 Key Takeaways When to use, What to watch out for • Use chi-square to test if two categorical variables are related or independent • Chi-square test works on the multiplication law of probabilities • 17 Need to exercise caution when … • When sample sizes are small, as indicated by more than 20% of the contingency cells having expected values < 5, a Fisher's exact test maybe more appropriate • the data is correlated. When you are looking to test differences in proportions among matched pairs in a before/after scenario, an appropriate choice would be the McNemar's test Copyright 2011 Questions? Need More Information? Feel free to contact us. SimaFore Inc. [email protected] 330 E Liberty Ann Arbor, MI 48104 www.simafore.com Twitter: complexMan 18 Copyright 2011