Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Large Scale Multi-Label Classification via MetaLabeler
Lei Tang
Arizona State University
Suju Rajan and Vijay K. Narayanan
Yahoo! Data Mining & Research
Large Scale Multi-Label Classification
• Huge number of instances and categories
• Common for online contents
Query Categorization
Web Page Classification
Social Bookmark/Tag
Recommendation
Video Annotation/Organization
Yahoo! Data Mining & Research
Challenges
• Multi-Class: thousands of categories
• Multi-Label: each instance has >1 labels
• Large Scale: huge number of instances and categories
– Our query categorization problem: 1.5M queries, 7K categories
– Yahoo! Directory 792K docs, 246K categories in Liu et al. 05
• Most existing multi-label methods do not scale
– structural SVM, mixture model, collective inference, maximumentropy model, etc.
• The simplest One-vs-Rest SVM is still widely used
Yahoo! Data Mining & Research
One-vs-Rest SVM
C1
x1
C1, C3
x2
C1, C2, C4
x3
C2
x4
C2, C4
x1 +
x1 -
x1 +
x1 -
x2 +
x2 +
x2 -
x2 +
x3 -
C2
x3 +
C3
x3 x4 -
C4
x3 -
x4 -
x4 +
x4 +
SVM1
SVM2
SVM3
SVM4
C1
C2
C3
C4
Predict
Yahoo! Data Mining & Research
One-vs-Rest SVM
• Pros:
– Simple, Fast, Scalable
– Each label trained independently, easy to parallel
• Cons:
– Highly skewed class distribution (few +, many -)
– Biased prediction scores
• Output reasonable good ranking (Rifkin and Klauta 04)
– e.g. 4 categories C1, C2, C3, C4
– True Labels for x1: C1, C3
– Prediction Scores: {s1, s3} > {s2, s4}
• Predict the number of labels?
Yahoo! Data Mining & Research
MetaLabeler Algorithm
1. Obtain a ranking of class membership for each
instance
–
Any genetic ranking algorithm can be applied
–
Use One-vs-Rest SVM
2. Build a Meta Model to predict the number of top
classes
–
Construct Meta Label
–
Construct Meta Feature
–
Build Meta Model
Yahoo! Data Mining & Research
Meta Model – Training
Clothing
Q1 = affordable cocktail dress
Labels:
•
•
Leather clothing
Women
Clothing
Formal
wear
Formal wear
Women Clothing
Meta data
Q2 = cotton children jeans
Labels:
• Children clothing
Children
Clothing
Fashion
How to handle
predictions like
2.5 labels?
Query: #labels
Q1: 2 Regression
Meta-Model
Q2: 1
Q3: 3 One-vs-Rest
SVM
Q3 = leather fashion in 1990s
Labels:
•Fashion
•Women Clothing
• Leather Clothing
Yahoo! Data Mining & Research
Meta Feature Construction
• Content-Based
– Use raw data
– Raw data contains all the info
• Score-Based
C1
– Use prediction scores
Meta Feature
C2
C3
0.9 -0.2 0.7
C4
- 0.6
– Bias with scores might be learned
• Rank-Based
– Use sorted prediction scores
C1
C2
C3
0.9
-0.2 0.7
C4
-0.6
Meta Feature
0.9
0.7
-0.2 -0.6
Yahoo! Data Mining & Research
MetaLabeler Prediction
• Given one instance:
– Obtain the rankings for all labels;
– Use the meta model to predict the number of labels
– Pick the top-ranking labels
• MetaLabeler
– Easy to implement
– Use existing SVM package/software directly
– Can be combined with a hierarchical structure easily
• Simply build a Meta Model at each internal node
Yahoo! Data Mining & Research
Baseline Methods
• Existing thresholding methods (Yang 2001)
– Rank-based Cut (Rcut)
• output fixed number of top-ranking labels for each instance
– Proportion-based Cut
• For each label, choose a portion of test instances as positive
• Not applicable for online prediction
– Score-based Cut (Scut, aka. threshold tuning)
• For each label, determine a threshold based on cross-validation
• Tends to overfit and is not very stable
• MetaLabeler: A local RCut method
– Customize the number of labels for each instance
Yahoo! Data Mining & Research
Publicly Available Benchmark Data
• Yahoo! Web Page Classification
– 11 data sets:
• each constructed from a top-level category
• 2nd level topics are the categories
– 16-32k instances, 6-15k features, 14-23 categories
– 1.2 -1.6 labels per instance, maximum 17 labels
– Each label has at least 100 instances
• RCV1:
– A large scale text corpus
– 101 categories, 3.2 labels per instance
– For evaluation purpose, use 3000 for training, 3000 for testing
– Highly skewed distribution (some labels have only 3-4 instances)
Yahoo! Data Mining & Research
MetaLabeler of Different Meta Features
• Which type of meta feature is more predictive?
Yahoo!
RCV1
65
80
60
70
55
60
50
50
45
40
40
30
35
30
20
Exact Match Ratio
content
Micro-F1
score
Macro-F1
rank
Exact Match Ratio
Micro-F1
content
score
Macro-F1
rank
• Content-based MetaLabeler outperforms other meta features
Yahoo! Data Mining & Research
Performance Comparison
RCV1
Yahoo!
65
80
60
70
55
60
50
50
45
40
40
35
30
30
20
Exact Match Ratio
SVM
RCut
Micro-F1
SCut
Macro-F1
MetaLabeler
Exact Match Ratio
SVM
RCut
Micro-F1
SCut
Macro-F1
MetaLabeler
• MetaLabeler tends to outperform other methods
Yahoo! Data Mining & Research
Bias with MetaLabeler
• The distribution of number of labels is imbalanced
– Most instances have small number of labels;
– Small portion of data instances have many more labels
• Imbalanced Distribution leads to bias in MetaLabeler
– Prefer to predict lesser labels
– Only predict many labels with strong confidence
Label Distribution on Yahoo! Society Data
12000
Ground Truth
Frequency
10000
8000
MetaLabeler Prediction
6000
4000
2000
0
1
2
3
4
5
6
7
Number of Labels
Yahoo! Data Mining & Research
Scalability Study
Computation Time (seconds)
Computation Time Comparison on Yahoo! Society Data
2000
1800
1600
1400
1200
1000
800
600
400
200
0
SVM
MetaLabeler
Threshold Tuning
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Number of Samples for Training
• Threshold tuning requires cross-validation, otherwise overfit
• MetaLabeler simply adds some meta labels and learn One-vsRest SVMs
Yahoo! Data Mining & Research
Scalability Study (cond.)
• Threshold tuning: linearly increasing with number of
categories in the data
– E.g. 6000 categories -> 6000 thresholds to be tuned
• MetaLabeler: upper bounded by the maximum number
of labels with one instance
– E.g. 6000 categories
– but one instance has at most 15 labels
– Just need to learn additional 15 binary SVMs
• Meta Model is “independent” of number of categories
Yahoo! Data Mining & Research
Application to Large Scale Query Categorization
• Query categorization problem:
– 1.5 million unique queries: 1M for training, 0.5M for testing
– 120k features
– A 8-level taxonomy of 6433 categories
• Multiple labels
3+
– e.g. 0% interest
credit card no transfer fee
labelslabels
•2 Financial
Services/Credit, Loans and Debt/Credit/Credit Card/ Balance Transfer
16% 3%
• Financial Services/Credit, Loans and• Debt/Credit/Credit
Low Interest Card
1.23 labels onCard/
average
• Financial Services/Credit, Loans and• Debt/Credit/Credit
Card/ Low-No-fee Card
At most 26 labels
1 label
81%
Yahoo! Data Mining & Research
Flat Model
• Flat Model: do not leverage the hierarchical structure
– Threshold tuning on training data alone takes 40 hours to finish
while MetaLabeler costs 2 hours.
100
90
80
Micro-F1
70
60
SVM
MetaLabeler
Threshold Tuning
50
40
30
20
10
0
1
2
3
4
5
6
7
8
Depth
Yahoo! Data Mining & Research
Hierarchical Model - Training
Root
Step 1: Generate Training Data
.....
. . . . . . .Step
. .3:.Create
. . “Other” Category
Step 4: Train One vs. Rest SVM
. . Other
........
Step 2: Roll up labels
N
.....
.............
.............
New Training Data
Training Data
Yahoo! Data Mining & Research
Hierarchical Model - Prediction
Root
Query q
m1
Query q
m2
c1
m2
c2
.....
Stop !!!
m3
Predict using SVMs trained at root level
m3
m4
.....
Query q . . . . . . . . .
Other . . . . . . . . .
c3
Stop !!!
.............
.............
• Stop if reaching a leaf node or “other” category
Yahoo! Data Mining & Research
Hierarchical Model + MetaLabeler
• Precision decrease by 1-2%, but recall is improved by 10% at
deeper levels.
100
95
Performance
90
85
MetaLabeler-Precision
80
MetaLabeler-Recall
SVM-Precision
75
SVM-Recall
70
65
60
1
2
3
4
5
6
7
8
Depth
Yahoo! Data Mining & Research
Features in MetaLabeler
Feature
Overstock.com
Related Categories
–Mass Merchants/…/discount department stores
–Apparel & Jewelry
–Electronics & Appliances
–Home & Garden
–Books-Movies-Music-Tickets
Blizard
–Toys & Hobbies/…/Video Game
–Computing/…/Computer Game Software
–Entertainment & Social Event/…/Fast Food Restaurant
–Reference/News/Weather Information
Threading
– Books-Movies-Music-Tickets/…/Computing Books
– Computing/…/Programming
– Health and Beauty/…/Unwanted Hair
– Toys and Hobbies/…/Sewing
Yahoo! Data Mining & Research
Conclusions & Future Work
• MetaLabeler is promising for large-scale multi-label
classification
– Core idea: learn a meta model to predict the number of labels
– Simple, efficient and scalable
– Use existing SVM software directly
– Easy for practical deployment
• Future work
– How to optimize MetaLabeler for desired performance ?
• E.g. > 95% precision
– Application to social networking related tasks
Yahoo! Data Mining & Research
Questions?
References
• Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., and Ma, W. 2005.
Support vector machines classification with a very large-scale
taxonomy. SIGKDD Explor. Newsl. 7, 1 (Jun. 2005), 36-43.
• Rifkin, R. and Klautau, A. 2004. In Defense of One-Vs-All
Classification. J. Mach. Learn. Res. 5 (Dec. 2004), 101-141.
• Yang, Y. 2001. A study of thresholding strategies for text
categorization. In Proceedings of the 24th Annual international
ACM SIGIR Conference on Research and Development in
information Retrieval (New Orleans, Louisiana, United States).
SIGIR '01. ACM, New York, NY, 137-145.
Yahoo! Data Mining & Research