Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning Hierarchical Riffle Independent Groupings from Rankings Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel Carnegie Mellon American Psychological Association Elections Each ballot in election data is a ranked list of candidates [Diaconis, 89] 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 William Bevan Ira Iscoe Charles Kiesler Max Siegle Logan Wright 10 20 30 ranks probability 5738 full ballots (1980 election) First-order matrix Prob(candidate i was ranked j) 5 candidates 40 Candidate 3 has most 1st place votes And many last place votes 50 60 70 80 90 permutations candidates 100 110 120 Factorial Possibilities n items means n! rankings: n n! Memory required to store n! doubles 5 120 <4 kilobytes 15 1.31x1012 1729 petabytes (!!) probability (Not to mention sample complexity issues…) 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 Possible learning biases for taming complexity: Parametric? (e.g., Mallows, Plackett Luce,…) Sparsity? [Reid79, Jagabathula08] Independence/Graphical models? [Huang09] 10 20 30 40 50 60 70 permutations 80 90 100 110 120 Full independence on rankings Graphical model for joint ranking of 6 items A F B A E C {A,B,C}, {D,E,F} independent D Rank of item C Mutual exclusivity leads to fully connected model! F B E C D Ranks of {A,B,C} a permutation of {1,2,3} Ranks of {D,E,F} a permutation of {4,5,6} First-order independence condition Sparsity: any permutation putting A, B, or C in ranks 4, 5, or 6 has zero probability! But… such sparsity unlikely to exist in real data Not independent Independent A F A F Prob(candidate i was ranked j) E B E vs. C candidates Candidates D Ranks D Ranks C ranks B Candidates Drawing independent fruits/veggies Veggies in ranks {1,2}, Fruits in ranks {3,4} (veggies always better than fruits) Draw veggie rankings, fruit rankings independently: Veggies Artichoke >Broccoli Fruits Cherry > Dates Full independence: fruit/veggie ahead of time! Form joint positions ranking of fixed veggies and fruits: Artichoke Veggie > Broccoli Veggie > Fruit Date > Cherry Fruit Artichoke Veggie > Broccoli Veggie > Cherry Fruit > Fruit Date Broccoli Veggie > Artichoke Veggie > Fruit Date > Cherry Fruit Riffled independence Veggies Artichoke >Broccoli Fruits Cherry > Dates Riffled independence model [Huang, Guestrin, 2009]: Draw interleaving of veggie/fruit positions (according to a distribution) Artichoke Veggie Fruit > Veggie Cherry Fruit > Broccoli Veggie Fruit > Veggie Dates Fruit Riffle Shuffles Riffle shuffle (dovetail shuffle) Cut deck into two piles. Interleave piles. Riffle shuffles corresponds to distributions over interleavings Interleaving distribution American Psych. Assoc. Election (1980) Best KL split Minimize: {12345} KL(empirical || riffle indep. approx.) {1345} {2} probability Candidate 3 fully independent vs. Candidate 2 riffle independent 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 10 20 30 40 50 60 70 permutations 80 90 100 110 120 Irish House of Parliament election 2002 64,081 votes, 14 candidates Two main parties: Fianna Fail (FF) Fine Gael (FG) Minor parties: Independents (I) Green Party (GP) Christian Solidarity (CS) Labour (L) Sinn Fein (SF) [Gormley, Murphy, 2006] Ireland Approximating the Irish Election Prob(candidate i was ranked j) n=14 candidates Sinn Fein, Christian Solidarity marginals not well captured by a single riffle independent split! FF FG I FF FG FG I I I GP CS SF FF L Candidates “True” first order marginals FF FG I FF FG FG I I I GP CS SF FF L Candidates Riffle Independent approx. Major parties riffle independent of minor parties? Back to Fruits and Vegetables banana, apple, orange, broccoli, carrot, lettuce candy, cookies banana, apple, orange candy, cookies ?? broccoli, carrot, lettuce candy, cookies ?? Need to factor out {candy, cookies} first! Hierarchical Decompositions Banana, apple, orange, broccoli, Alllettuce, foodscandy, items carrot, cookies Fruits, Vegetables, marginally riffle independent Banana, apple, orange, Healthy foods broccoli, carrot, lettuce Banana, Fruits apple, orange Junk cookies food Candy, Broccoli, carrot, lettuce Vegetables Generative process for hierarchy better Rank fruits Rank vegetables Hierarchical Riffle Independent Models Interleave - Encode intuitive independence constraints Rank Junk food fruits/vegetables - Can be learned with lower sample complexity Healthy foods - Have interpretableInterleave parameters/structure with Junk food Contributions Structured Representation: Introduction of a hierarchical model based on riffled independence factorizations Structure Learning Objectives: An objective function for learning model structure from ranking data Structure Learning Algorithms: Efficient algorithms for optimizing the proposed objective Learning a Hierarchy Exponentially many hierarchies, each encoding a distinct set of independence assumptions {12345} {12345} {1345} {2} {12345} {1234} {5} {12345} {245} {13} {2345} {1} {234} {1}given i.i.d. ranked data, learn Problem statement: {13} {45} {24} {5} {34} {25} the hierarchical structure that generated the data {2} {34} {12345} {1345} {3} {1} {452} {12345} {1234} {5} {234} {1} {24} {3} {12345} {243} {15} {34} {2} Learning a Hierarchy Our Approach: top-down partitioning of item set X={1,…,n} Binary splits at each stage {12345} {1234} {5} {234} {1} Hierarchical Decomposition Best KL hierarchy 0.1 Minimize: {12345} 0.09 KL(empirical || hierarchical model) “True” first order Learned first order 0.08 {1345} 1 0.05 2 {13} 3 Research KL(true,best hierarchy)=6.76e-02, 2 TV(true, best hierarchy)=1.44e-01 0.04 3 0.03 {2} Community 0.25 psychologists {45} psychologists 0.02 4 Clinical psychologists 4 0.01 5 0 ranks 1 0.06 ranks probability 0.07 50 60 170 permutations 0.15 0.1 0.05 5 1 10 2 20 3 30 4 40 5 candidates 0.2 280 390 4 100 5 110 candidates 0120 KL-based objective function KL Objective: Minimize KL(true distribution || riffle independent approx) Need to search over exponentially many subsets! Algorithm: For each binary partitioning of the item set: [A,B] Estimate parameters: (ranking probabilities, for A, B and interleaving probabilities) Compute log likelihood of data Return maximum likelihood partition If hierarchical structure of A or B is unknown, might not have enough samples to learn parameters! Finding fully independent sets by clustering Compute pairwise mutual informations Partition resulting graph A Pairwise measures unlikely toBbe effective for detecting riffled independence! Why this won’t work (for riffled independence): Pairs of candidates on opposite sides of the split can be strongly correlated in general (If I vote up Democrat, I am likely to vote down Republican) Higher order measure of riffled independence Key insight: Riffled independence means: absolute rankings in A not informative about relative rankings in B Idea: measure mutual information between singleton rankings and pairwise rankings preference over Democrat i relative preference over Republicans j & k If i, (j,k) lie on opposite sides, Mutual information=0 Tripletwise objective function Objective function: Estimating measure: What’s the sample complexity? all items in set A – plays no no role in objective Tripletwise measure: no longer obviously a graph cut problem… A B Estimating the objective Objective function not directly available: Need to estimate mutual information terms Strongly connected: Theorem: If A and B are riffle independent and each strongly connected, then OBJ is minimized exactly at [A,B] with prob. at least given samples. Efficient Splitting: Anchors heuristic Given two elements of A, we can decide whether rest of elements are in A or B: Large Small small large anchor elements Efficient Splitting: Anchors heuristic Theorem: Anchors heuristic guaranteed to recover riffle independent split (under certain In practice, anchor elements a1, a2, unknown! strong connectivity assumptions) Learning a chain on synthetic data -5400 -5500 true structure log-likelihood -5600 -5700 -5800 -5900 learned structure -6000 -6100 random 1-chain (with learned parameters) -6200 -6300 1 2 16 items, 4 items in each leaf 3 log10(# samples) 4 5 Anchors on Irish election data Irish Election hierarchy (first four splits): {1,2,3,4,5,6,7,8,9,10,11,12,13,14} “True” first order Learned first order 0.25 Ranks Ranks 2 2 {1,2,3,4,5,6,7,8,9 0.2 {12} 4 4 ,10,11,13,14} Sinn Fein 6 6 0.15 8 8 {1,2,3,4,5,6,7,8,9 {11} 0.1 ,10,13,14} 10 10 Christian Solidarity 0.05 12 12 Running time 14 14 {2,3,5,6,7,8,9,10,14} {1,4,13} 0 Brute force optimization: 70.2s 2 4 6 8 10 12 14 2 4 6 8 10 12 14 Fianna Fail Candidates Candidates Anchors method: 2.3s {2,5,6} {3,7,8,9,10,14} Fine Gael Independents, Labour, Green Meath log likelihood comparison loglikelihood Optimized Hierarchy -4500 Optimized 1-chain -5000 50 85 144 243 412 698 1180 2000 # samples (logarithmic scale) Bootstrapped substructures: Irish data 1 Major party {FF, FG} leaves recovered Top partition recovered Success rate 0.8 0.6 All leaves recovered 0.4 0.2 Full tree recovered 0 76 171 389 882 # samples 2001 Sushi ranking Dataset: 5000 preference rankings of 10 types of sushi Prob(sushi i was ranked j) ranks Types 1. Ebi (shrimp) Fatty tuna (Toro) 2. Anago (sea eel) is a favorite! 3. Maguro (tuna) 4. Ika (squid) 5. Uni (sea urchin) 6. Sake (salmon roe) No one likes cucumber roll ! 7. Tamago (egg) 8. Toro (fatty tuna) 9. Tekka-make (tuna roll) 10. Kappa-maki (cucumber roll) sushi Sushi hierarchy {1,2,3,4,5,6,7,8,9,10} {2} {1,3,4,5,6,7,8,9,10} (sea eel) {4} {1,3,5,6,7,8,9,10} (squid) {1,3,7,8,9,10} {5,6} (sea urchin, salmoe roe) {3,7,8,9,10} {1} (shrimp) {3,8,9} (tuna, fatty tuna, tuna roll) {7,10} (egg, cucumber roll) Riffled Independence… is a natural notion of independence for rankings can be exploited for efficient inference, low sample complexity approximately holds in many real datasets Hierarchical riffled independence captures more structure in data structure can be learned efficiently: related to clustering and to graphical model structure learning efficient algorithm & polynomial sample complexity result Acknowledgements: Brendan Murphy and Claire Gormley provided the Irish voting datasets. Discussions with Marina Meila provided important initial ideas upon which this work is based.