Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Contingency table test for multiple-categories with arbitrary weights on the cells Ryo Yamada(1), Takahisa Kawaguchi(2) Statistical Genetics(1) and Human Disease Genomics(2), Center for Genomic Medicine, Department of Medicine, Kyoto University, Kyoto, Japan Contact addres: [email protected] Motivation Abstract How to test complex a categorical table Example: Complex Categorical Phenotype Test of independence between two factors is one of the most basic tests for many fields. In genetics, it has been in regular use for genotype-phenotype correlation. The simplest case of genotype-phenotype correlation handles two categorical phenotypes (cases vs. controls) and three diplotypes of diallelic genetic variants, i.e., 2x3 table tests. Even for this simple table, multiple tests, Pearson's test with two degrees of freedom (df), genetic mode-specific tests with 1 df (dominant test, recessive test, additive test). Sometimes MAX3 test or MAX test, which is consisted of the three mode-specific tests, is also used. All of these tests analyze the same table and they output different result because they weigh cells of the table differently. In the recent genetic epidemiology studies, the phenotypes can be multicategorical with disease stages or grades and combinations of multiple conditions of disease criteria, and the number of categories of genotypes can be more than three. Therefore it is beneficial to handle NxM tables in the same way with the genetic mode-specific tests for 2x3 tables. When we generalized the idea for 2x3 tables to NxM tables, we evaluated the structure of tests geometrically. In the case of 2x3 tables, Pearson's test of 2 df and other tests of 1 df were in the relation where Pearson's test's statistics drew elliptic contour lines in 2-dimensional space and the others did straight lines. We generalized two-dimension to df-dimension for NxM tables, where df = (N-1)x(M-1) and gave the method to calculate statistics for NxM tables with arbitrary weights on the celles. C1 C2 C3 C4 C5 R1 + + + + Disease R2 R3 + + + + + + + + R4 + + + + total MM 200 180 90 90 270 830 Genotype Mm 1260 840 420 420 1260 4200 mm 1470 980 490 490 1470 4900 total 2930 2000 1000 1000 3000 9930 Tests are described with ellipsoid and sphere Tables with the same χ2 value for (N-1)(M-1) degrees of freedom (Pearson’s test) draw ellipsoid and tables with the same χ2 value for 1 df test draw parallel lines as a contour, which can be converted to spheres and tangent lines, respectively. Their values are geometrically given. Weight matrix of NxM table (N-1)(M-1) component test matrices of MAX test for NxM tables 2x3 table: MAX3 test and MAX test and their extension to higher dimensions Three tests (dominant ,additive and recessive modes) draw three sets of parallel contour lines. MAX3 test takes the maximum stats among three mode tests. MAX3 test’s contour lines consist of three sets of parallel lines. MAX test make the contour lines of MAX3 test smooth arcs between dominant and recessive mode. Distribution of statistics compared 2 with χ of one degree and (N-1)(M-1) degrees of freedom Extension to 2x4 table The Tips of green triangles are the surface normals for discrete model Green triangles on the surface are the area of continuous model Black dots : Observed tables Red arcs the shortest path from observed table to the model The arcs concentrate into the tips in “discrete MAX test” The arcs reaches to the edges of the model area or the tips of the area Arc Reference Yamada, R. & Okada, Y. An optimal dose-effect mode trend test for SNP genotype tables. Genet Epidemiol 33, 114-127 (2009) Hirosawa, K, Kawaguchi, T, Matsuda, F, Yamada R. Estimation of P-value of MAX test with double triangle diagram for 2x3 SNP case-control tables Genet Epidemiol (in press)