Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cross Tabulation and Chi Square Test for Independence Cross-tabulation • Helps answer questions about whether two or more variables of interest are linked: – Is the type of mouthwash user (heavy or light) related to gender? – Is the preference for a certain flavor (cherry or lemon) related to the geographic region (north, south, east, west)? – Is income level associated with gender? • Cross-tabulation determines association not causality. Dependent and Independent Variables • The variable being studied is called the dependent variable or response variable. • A variable that influences the dependent variable is called independent variable. Cross-tabulation • Cross-tabulation of two or more variables is possible if the variables are discrete: – The frequency of one variable is subdivided by the other variable categories. • Generally a cross-tabulation table has: – Row percentages – Column percentages – Total percentages • Which one is better? DEPENDS on which variable is considered as independent. Cross tabulation GROUPINC * Gender Crosstabulation GROUPINC income <= 5 5<Income<= 10 income >10 Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Gender Female Male 10 9 52.6% 47.4% 55.6% 18.8% 15.2% 13.6% 5 25 16.7% 83.3% 27.8% 52.1% 7.6% 37.9% 3 14 17.6% 82.4% 16.7% 29.2% 4.5% 21.2% 18 48 27.3% 72.7% 100.0% 100.0% 27.3% 72.7% Total 19 100.0% 28.8% 28.8% 30 100.0% 45.5% 45.5% 17 100.0% 25.8% 25.8% 66 100.0% 100.0% 100.0% Contingency Table • A contingency table shows the conjoint distribution of two discrete variables • This distribution represents the probability of observing a case in each cell – Probability is calculated as: Observed cases P= Total cases Chi-square Test for Independence • The Chi-square test for independence determines whether two variables are associated or not. H0: Two variables are independent H1: Two variables are not independent Chi-square test results are unstable if cell count is lower than 5 Chi-Square Test R iC j Estimated cell E  ij Frequency n Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Eij = estimated cell frequency Chi-Square statistic x²   (Oi  E i )² Ei x² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell Degrees of Freedom d.f.=(R-1)(C-1) Awareness of Tire Manufacturer’s Brand Men Women Total Aware 50/39 10/21 60 Unaware 15/21 65 25/14 35 40 100 Chi-Square Test: Differences Among Groups Example X 2 ( 50  39 ) 2 (10  21) 2   39 21 2 (15  26 ) ( 25  14 ) 2   26 14  2  3.102  5.762  4.654  8.643   2  22.161 d . f .  ( R  1)(C  1) d . f .  ( 2  1)( 2  1)  1 X2 with 1 d.f. at .05 critical value = 3.84 Chi-square Test for Independence • Under H0, the joint distribution is approximately distributed by the Chisquare distribution (2). Chi-square 3.84 2 Reject H0  22.16 Differences Between Groups when Comparing Means • Ratio scaled dependent variables • t-test – When groups are small – When population standard deviation is unknown • z-test – When groups are large Null Hypothesis About Mean Differences Between Groups   1 2 OR   0 1 2 t-Test for Difference of Means mean 1 - mean 2 t Variabilit y of random means t-Test for Difference of Means 1   2 t S X1  X 2 X1 = mean for Group 1 X2 = mean for Group 2 SX1-X2 = the pooled or combined standard error of difference between means. t-Test for Difference of Means 1   2 t S X1  X 2 t-Test for Difference of Means X1 = mean for Group 1 X2 = mean for Group 2 SX -X = the pooled or combined standard error 1 2 of difference between means. Pooled Estimate of the Standard Error  n1 1S (n2 1)S SX1X2   n1  n2 2  2 1 2 2 )  1 1      n1 n2  Pooled Estimate of the Standard Error S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2 Pooled Estimate of the Standard Error t-test for the Difference of Means S X1  X 2  n1  1S12  ( n2  1) S 22 )  1 1       n1  n2  2   n1 n2  S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2 Degrees of Freedom • d.f. = n - k • where: – n = n1 + n2 – k = number of groups t-Test for Difference of Means Example  202.1  132.6   33  2 S X1 X 2  .797 2  1 1      21 14   16.5  12.2 4 .3 t  .797 .797  5.395 Comparing Two Groups when Comparing Proportions • Percentage Comparisons • Sample Proportion - P • Population Proportion -  Differences Between Two Groups when Comparing Proportions The hypothesis is: Ho: 1  2 may be restated as: Ho: 1  2  0 Z-Test for Differences of Proportions Ho : 1   2 or Ho : 1   2  0 Z-Test for Differences of Proportions Z  p1  p 2    1   2   S p1  p 2 Z-Test for Differences of Proportions p1 = sample portion of successes in Group 1 p2 = sample portion of successes in Group 2 1  1) = hypothesized population proportion 1 minus hypothesized population proportion 1 minus Sp1-p2 = pooled estimate of the standard errors of difference of proportions Z-Test for Differences of Proportions S p1  p2  1 1 pq    n n 2   1 Z-Test for Differences of Proportions pp = pooled estimate of proportion of success in a sample of both groups qp = (1- pp) or a pooled estimate of proportion of failures in a sample of both groups n1= sample size for group 1 n2= sample size for group 2 Z-Test for Differences of Proportions n1 p1  n2 p2 p n1  n2 Z-Test for Differences of Proportions S p1  p2 1   1  .375 .625     100 100   .068 A Z-Test for Differences of Proportions  100 .35  100 .4  p 100  100  .375 Analysis of Variance Hypothesis when comparing three groups 1  2  3 Analysis of Variance F-Ratio Variance  between  groups F Variance  within  groups Analysis of Variance Sum of Squares SStotal  SSwithin  SSbetween Analysis of Variance Sum of SquaresTotal n c SStotal   ( X ij  X ) i  1 j 1 2 Analysis of Variance Sum of Squares X piij = individual scores, i.e., the ith observation or test unit in the jth group pi = grand mean X n = number of all observations or test units in a group c = number of jth groups (or columns) Analysis of Variance Sum of SquaresWithin n c SS within   ( X ij  X j ) i  1 j 1 2 Analysis of Variance Sum of SquaresWithin X piij= individual scores, i.e., the ith observation or test unit in the jth group pi = grand mean X n = number of all observations or test units in a group c = number of jth groups (or columns) Analysis of Variance Sum of Squares Between n SS between   n j ( X j  X ) j 1 2 Analysis of Variance Sum of squares Between X j= individual scores, i.e., the ith observation or test unit in the jth group X = grand mean nj = number of all observations or test units in a group Analysis of Variance Mean Squares Between MS between SS between  c 1 Analysis of Variance Mean Square Within MS within SS within  cn  c Analysis of Variance F-Ratio MSbetween F MS within A Test Market Experiment on Pricing Sales in Units (thousands) Regular Price $.99 Test Market A, B, or C Test Market D, E, or F Test Market G, H, or I Test Market J, K, or L Mean Grand Mean Reduced Price $.89 Cents-Off Coupon Regular Price 130 118 87 84 145 143 120 131 153 129 96 99 X1=104.75 X=119.58 X2=134.75 X1=119.25 ANOVA Summary Table Source of Variation • Between groups • Sum of squares – SSbetween • Degrees of freedom – c-1 where c=number of groups • Mean squared-MSbetween – SSbetween/c-1 ANOVA Summary Table Source of Variation • Within groups • Sum of squares – SSwithin • Degrees of freedom – cn-c where c=number of groups, n= number of observations in a group • Mean squared-MSwithin – SSwithin/cn-c ANOVA Summary Table Source of Variation • Total • Sum of Squares – SStotal • Degrees of Freedom – cn-1 where c=number of groups, n= number of observations in a group MS BETWEEN F MS WITHIN