Download Slide - Jonathan Huang

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Learning Hierarchical Riffle
Independent Groupings
from Rankings
Jonathan Huang Carlos Guestrin
Carnegie Mellon University
ICML 2010
Haifa, Israel
Carnegie Mellon
American Psychological Association Elections
Each ballot in election data is a ranked list of candidates
[Diaconis, 89]
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
William Bevan
Ira Iscoe
Charles Kiesler
Max Siegle
Logan Wright
10
20
30
ranks
probability
5738 full ballots (1980 election)
First-order matrix
Prob(candidate i was ranked j)
5 candidates
40
Candidate 3 has most
1st place votes
And many
last place votes
50
60
70
80
90
permutations
candidates
100
110
120
Factorial Possibilities
n items means n! rankings:
n
n!
Memory required to store n! doubles
5
120
<4 kilobytes
15
1.31x1012
1729 petabytes (!!)
probability
(Not to mention sample
complexity issues…)
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
Possible learning biases for taming complexity:
Parametric? (e.g., Mallows, Plackett Luce,…)
Sparsity? [Reid79, Jagabathula08]
Independence/Graphical models? [Huang09]
10
20
30
40
50
60
70
permutations
80
90
100
110
120
Full independence on rankings
Graphical model for joint
ranking of 6 items
A
F
B
A
E
C
{A,B,C}, {D,E,F}
independent
D
Rank of item C
Mutual exclusivity
leads to fully
connected model!
F
B
E
C
D
Ranks of {A,B,C} a
permutation of {1,2,3}
Ranks of {D,E,F} a
permutation of {4,5,6}
First-order
independence
condition
Sparsity:
any permutation
putting A, B,
or C in
ranks 4, 5, or 6 has zero probability!
But… such sparsity unlikely to exist in real data
Not independent
Independent
A
F
A
F
Prob(candidate i was ranked j)
E
B
E
vs.
C
candidates
Candidates
D
Ranks
D
Ranks
C
ranks
B
Candidates
Drawing independent fruits/veggies
Veggies in ranks {1,2}, Fruits in ranks {3,4}
(veggies always better than fruits)
Draw veggie rankings, fruit rankings independently:
Veggies
Artichoke >Broccoli
Fruits
Cherry > Dates
Full independence: fruit/veggie
ahead
of time!
Form joint positions
ranking of fixed
veggies
and fruits:
Artichoke
Veggie
>
Broccoli
Veggie
>
Fruit
Date
>
Cherry
Fruit
Artichoke
Veggie
>
Broccoli
Veggie
>
Cherry
Fruit
>
Fruit
Date
Broccoli
Veggie
>
Artichoke
Veggie
>
Fruit
Date
>
Cherry
Fruit
Riffled independence
Veggies
Artichoke >Broccoli
Fruits
Cherry > Dates
Riffled independence model [Huang, Guestrin, 2009]:
Draw interleaving of veggie/fruit positions (according
to a distribution)
Artichoke
Veggie
Fruit
>
Veggie
Cherry
Fruit
>
Broccoli
Veggie
Fruit
>
Veggie
Dates
Fruit
Riffle Shuffles
Riffle shuffle (dovetail shuffle)
Cut deck into two piles.
Interleave piles.
Riffle shuffles corresponds
to distributions over
interleavings
Interleaving distribution
American Psych. Assoc. Election (1980)
Best KL split
Minimize:
{12345}
KL(empirical || riffle indep. approx.)
{1345}
{2}
probability
Candidate 3 fully independent vs. Candidate 2 riffle independent
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
10
20
30
40
50
60
70
permutations
80
90
100
110
120
Irish House of Parliament election 2002
64,081 votes, 14 candidates
Two main parties:
Fianna Fail (FF)
Fine Gael (FG)
Minor parties:
Independents (I)
Green Party (GP)
Christian Solidarity (CS)
Labour (L)
Sinn Fein (SF)
[Gormley, Murphy, 2006]
Ireland
Approximating the Irish Election
Prob(candidate i was ranked j)
n=14 candidates
Sinn Fein, Christian Solidarity marginals not well
captured by a single riffle independent split!
FF FG I FF FG FG I
I
I GP CS SF FF L
Candidates
“True” first order marginals
FF FG I FF FG FG I
I
I GP CS SF FF L
Candidates
Riffle Independent approx.
Major parties riffle independent of minor parties?
Back to Fruits and Vegetables
banana, apple, orange, broccoli, carrot, lettuce
candy, cookies
banana, apple, orange
candy, cookies
??
broccoli, carrot, lettuce
candy, cookies
??
Need to factor out {candy, cookies} first!
Hierarchical Decompositions
Banana, apple, orange, broccoli,
Alllettuce,
foodscandy,
items
carrot,
cookies
Fruits, Vegetables, marginally
riffle independent
Banana, apple, orange,
Healthy foods
broccoli, carrot, lettuce
Banana, Fruits
apple, orange
Junk cookies
food
Candy,
Broccoli,
carrot, lettuce
Vegetables
Generative process for hierarchy
better
Rank fruits
Rank vegetables
Hierarchical Riffle Independent Models
Interleave
- Encode intuitive
independence constraints
Rank Junk food
fruits/vegetables
- Can be learned with lower sample complexity
Healthy foods
- Have interpretableInterleave
parameters/structure
with Junk food
Contributions
Structured Representation: Introduction of a hierarchical model
based on riffled independence factorizations
Structure Learning Objectives: An objective function for
learning model structure from ranking data
Structure Learning Algorithms: Efficient algorithms for
optimizing the proposed objective
Learning a Hierarchy
Exponentially many hierarchies, each encoding
a distinct set of independence assumptions
{12345}
{12345}
{1345} {2}
{12345}
{1234} {5}
{12345}
{245} {13}
{2345} {1}
{234} {1}given i.i.d. ranked data, learn
Problem statement:
{13} {45}
{24} {5}
{34} {25}
the hierarchical
structure that generated the data
{2} {34}
{12345}
{1345} {3}
{1} {452}
{12345}
{1234} {5}
{234} {1}
{24} {3}
{12345}
{243} {15}
{34} {2}
Learning a Hierarchy
Our Approach: top-down partitioning of item
set X={1,…,n}
Binary splits at each stage
{12345}
{1234} {5}
{234} {1}
Hierarchical Decomposition
Best KL hierarchy
0.1
Minimize:
{12345}
0.09
KL(empirical || hierarchical model)
“True” first order
Learned first order
0.08
{1345}
1
0.05
2
{13}
3
Research
KL(true,best hierarchy)=6.76e-02,
2
TV(true,
best hierarchy)=1.44e-01
0.04
3
0.03
{2}
Community
0.25
psychologists
{45}
psychologists
0.02
4
Clinical
psychologists
4
0.01
5
0
ranks
1
0.06
ranks
probability
0.07
50
60
170
permutations
0.15
0.1
0.05
5
1 10 2 20 3 30 4 40 5
candidates
0.2
280 390 4 100 5 110
candidates
0120
KL-based objective function
KL Objective:
Minimize KL(true distribution || riffle independent approx)
Need to search over
exponentially many subsets!
Algorithm:
For each binary partitioning of the item set: [A,B]
Estimate parameters: (ranking probabilities, for A, B and
interleaving probabilities)
Compute log likelihood of data
Return maximum likelihood partition
If hierarchical structure of A or B is unknown,
might not have enough samples to learn parameters!
Finding fully independent sets by clustering
Compute pairwise mutual informations
Partition resulting graph
A
Pairwise measures unlikely toBbe effective
for detecting riffled independence!
Why this won’t work (for riffled independence):
Pairs of candidates on opposite sides of the split can
be strongly correlated in general
(If I vote up Democrat, I am likely to vote down
Republican)
Higher order measure of riffled independence
Key insight:
Riffled independence means: absolute rankings in A not
informative about relative rankings in B
Idea: measure mutual information between
singleton rankings and pairwise rankings
preference
over Democrat i
relative preference
over Republicans j & k
If i, (j,k) lie on opposite sides,
Mutual information=0
Tripletwise objective function
Objective function:
Estimating measure: What’s the
sample complexity?
all items
in set A –
plays no
no role in
objective
Tripletwise measure: no longer
obviously a graph cut problem…
A
B
Estimating the objective
Objective function not directly available:
Need to estimate mutual information terms
Strongly connected:
Theorem: If A and B are riffle independent and
each strongly connected, then OBJ is minimized
exactly at [A,B] with prob. at least
given
samples.
Efficient Splitting: Anchors heuristic
Given two elements of A, we can decide whether
rest of elements are in A or B:
Large
Small
small
large
anchor elements
Efficient Splitting: Anchors heuristic
Theorem: Anchors heuristic guaranteed to
recover riffle independent split (under certain
In
practice,
anchor elements
a1, a2, unknown!
strong
connectivity
assumptions)
Learning a chain on synthetic data
-5400
-5500
true structure
log-likelihood
-5600
-5700
-5800
-5900
learned structure
-6000
-6100
random 1-chain (with
learned parameters)
-6200
-6300
1
2
16 items, 4 items in each leaf
3
log10(# samples)
4
5
Anchors on Irish election data
Irish Election hierarchy (first four splits):
{1,2,3,4,5,6,7,8,9,10,11,12,13,14}
“True” first order
Learned first order
0.25
Ranks
Ranks
2
2
{1,2,3,4,5,6,7,8,9
0.2
{12}
4
4
,10,11,13,14}
Sinn Fein
6
6
0.15
8
8
{1,2,3,4,5,6,7,8,9
{11}
0.1
,10,13,14}
10
10
Christian Solidarity
0.05
12
12
Running time
14
14
{2,3,5,6,7,8,9,10,14}
{1,4,13}
0
Brute
force
optimization:
70.2s
2 4 6 8 10 12
14
2
4
6
8
10
12
14
Fianna Fail
Candidates
Candidates
Anchors method: 2.3s
{2,5,6}
{3,7,8,9,10,14}
Fine Gael Independents, Labour, Green
Meath log likelihood comparison
loglikelihood
Optimized Hierarchy
-4500
Optimized 1-chain
-5000
50
85 144 243 412 698 1180 2000
# samples (logarithmic scale)
Bootstrapped substructures: Irish data
1
Major party {FF, FG}
leaves recovered
Top partition
recovered
Success rate
0.8
0.6
All leaves recovered
0.4
0.2
Full tree recovered
0
76
171
389
882
# samples
2001
Sushi ranking
Dataset: 5000 preference rankings of 10 types
of sushi
Prob(sushi i was ranked j)
ranks
Types
1. Ebi (shrimp)
Fatty tuna (Toro)
2. Anago (sea eel)
is a favorite!
3. Maguro (tuna)
4. Ika (squid)
5. Uni (sea urchin)
6. Sake (salmon roe)
No one likes
cucumber roll !
7. Tamago (egg)
8. Toro (fatty tuna)
9. Tekka-make (tuna roll)
10. Kappa-maki (cucumber roll)
sushi
Sushi hierarchy
{1,2,3,4,5,6,7,8,9,10}
{2}
{1,3,4,5,6,7,8,9,10}
(sea eel)
{4}
{1,3,5,6,7,8,9,10}
(squid)
{1,3,7,8,9,10}
{5,6}
(sea urchin, salmoe roe)
{3,7,8,9,10}
{1}
(shrimp)
{3,8,9}
(tuna, fatty tuna, tuna roll)
{7,10}
(egg, cucumber roll)
Riffled Independence…
is a natural notion of independence for rankings
can be exploited for efficient inference,
low sample complexity
approximately holds in many real datasets
Hierarchical riffled independence
captures more structure in data
structure can be learned efficiently:
related to clustering and to graphical model structure learning
efficient algorithm & polynomial sample complexity result
Acknowledgements: Brendan Murphy and Claire Gormley provided the
Irish voting datasets. Discussions with Marina Meila provided important
initial ideas upon which this work is based.