Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Worksheet 7: Contingency analysis (frequencies) - 2017 Example of Contingency Analysis This example uses the Car Poll.CSV sample data table, which contains data collected from car polls. The data includes aspects about the individual polled, such as their sex, marital status, and age. The data also includes aspects about the car that they own, such as the country of origin, the size, and the type. Here we want to examine the relationship between car sizes (small, medium, and large) and the cars’ country of origin. 1) Graph the relationship between Size of car (size) and country of car’s origin (country). Note that both are categorical variables and that you will be graphing the frequency of occurrence of the combinations of size and country a. Using Graph Builder put country in the x-axis and size as an overlay variable. b. Make sure you put N as the summary statistic c. What is the hypothesis that is being tested (the null hypothesis) d. What are the qualitative results? 2) Test your hypothesis using the contingency platform (fit Y by X) a. To launch the Fit Y by X platform, select Analyze > Fit Y by X. b. Put size in the X category and country in the Y category. Note both are categorical. c. Look at the output Page | 1 Worksheet 7: Contingency analysis (frequencies) - 2017 i. The graph is a mosaic plot. A mosaic plot is a graphical representation of the two-way frequency table or Contingency Table. A mosaic plot is divided into rectangles, so that the vertical length of each rectangle is proportional to the proportions of the Y variable in each level of the X variable. The width of the x dimension of each rectangle is proportional to the proportions of each level of the x variable. 1) The proportions on the x-axis represent the number of observations for each level of the X variable, which is country. 2) The proportions on the y-axis at right represent the overall proportions of Small, Medium, and Large cars for the combined levels (American, European, and Japanese). 3) The scale of the y-axis at left shows the response probability, with the whole axis being a probability of one (representing the total sample). ii. Now look at the Contingency table Page | 2 Worksheet 7: Contingency analysis (frequencies) - 2017 Note the following about Contingency tables: The Count, Total%, Col%, and Row% correspond to the data within each cell that has row and column headings (such as the cell under American and Large). The last column contains the total counts for each row and percentages for each row. The bottom row contains total counts for each column and percentages for each column. iii. Now look at the Tests 1) What is you conclusion concerning your null hypothesis? 2) If the null hypothesis is rejected - What combination of country of origin and size of car contributes most to that conclusion? i. Click on the red triangle next to the contingency table, then on Deviation and Cell Cho Square. The Deviation value is the deviation between observed and expected assuming the null hypothesis is true. The cell chi-square is the contribution of the cell to the overall chi square value. The expected values for both are zero (assuming null hypothesis is true). Deviation values have direction (observed – expected), Chi square values do not. Example of GENERALIZED (Poisson) Regression This example uses the same data but in a different framework that allows you to look in more detail at the distributional patterns. In particular you can assess the all three terms: Size, Country and the interaction between Size and Country. You need to restructure the data set to do this 1) Use TABLES SUMMARY and put Country and Size in the Group Box and leave the STATISTICS box empty. Run the model 2) Your new data table will have three columns – Size, Country and N Rows. N Rows is the frequency of observations for combinations of Size and Country 3) Using the new table go to ANALYZE, FIT MODEL and put N Rows in the Y Box and Size, County and Country*Size in the Model Effects Box 4) Use PERSONALITY = GENERALIZED and DISTRIBUTION = POISSON. 5) Run the Model and open EFFECTS TEST Page | 3 Worksheet 7: Contingency analysis (frequencies) - 2017 a. What terms are significant? b. Do the results make sense – are they more informative than the simple Contingency Analysis? 6) Now Click the red arrow next to MAXIMUM LIKELIHOOD and click on Profilers- Profiler. Play with Country and see if the interaction between Country and Size makes sense. USE of KS test to compare distributions Here we want to compare the size frequency distributions of abalone for two sample areas: one in a Marine Protected Area (MPA) and another in an area of No Protection. We can do this using a KS Test (Kolmogorov- Smirnov) . 1) Open “Abalone frequencies by MPA status. There are three columns. Status, Size (mm) and Frequency, which is the number of observations of abalone of a given size in an area of given status. 2) Now use FIT Y by X and put Size in the Y Box and Status in the X Box. Also put Frequency in the FREQ Box. Click OK 3) From the results click on the red triangle, then on NON PARAMETRIC then on KOLMOGOROV SMIRNOV TEST – What is the P – Value? Is it significant. 4) You may be wondering what the test actually is doing - this may help a. Click on the red arrow again and this time click on CDF PLOT, Now do the same for DENISTIES, COMPARE DENSITIES and DENSITIES, PROPORTION OF DENSITIES b. THE KS TEST is a test of cumulative distribution (CDF) and the distance between them (D). If the maximum D is large enough then the two distributions differ. What was your hypothesis? Probably that the CDF for the MPA was lower than that for no protection because that would be an indication that there were more large individuals in the MPA (ask if this does not make sense). Based on the CDF and D and P-value, is this hypothesis supported? c. Now look and the COMPARE DENSTIES and PROPORTION OF DENSITIES graphs. Do these also support your hypothesis? Page | 4