Download SOUP - WWW2009 EPrints

Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research Large Scale Multi-Label Classification • Huge number of instances and categories • Common for online contents Query Categorization Web Page Classification Social Bookmark/Tag Recommendation Video Annotation/Organization Yahoo! Data Mining & Research Challenges • Multi-Class: thousands of categories • Multi-Label: each instance has >1 labels • Large Scale: huge number of instances and categories – Our query categorization problem: 1.5M queries, 7K categories – Yahoo! Directory 792K docs, 246K categories in Liu et al. 05 • Most existing multi-label methods do not scale – structural SVM, mixture model, collective inference, maximumentropy model, etc. • The simplest One-vs-Rest SVM is still widely used Yahoo! Data Mining & Research One-vs-Rest SVM C1 x1 C1, C3 x2 C1, C2, C4 x3 C2 x4 C2, C4 x1 + x1 - x1 + x1 - x2 + x2 + x2 - x2 + x3 - C2 x3 + C3 x3 x4 - C4 x3 - x4 - x4 + x4 + SVM1 SVM2 SVM3 SVM4 C1 C2 C3 C4 Predict Yahoo! Data Mining & Research One-vs-Rest SVM • Pros: – Simple, Fast, Scalable – Each label trained independently, easy to parallel • Cons: – Highly skewed class distribution (few +, many -) – Biased prediction scores • Output reasonable good ranking (Rifkin and Klauta 04) – e.g. 4 categories C1, C2, C3, C4 – True Labels for x1: C1, C3 – Prediction Scores: {s1, s3} > {s2, s4} • Predict the number of labels? Yahoo! Data Mining & Research MetaLabeler Algorithm 1. Obtain a ranking of class membership for each instance – Any genetic ranking algorithm can be applied – Use One-vs-Rest SVM 2. Build a Meta Model to predict the number of top classes – Construct Meta Label – Construct Meta Feature – Build Meta Model Yahoo! Data Mining & Research Meta Model – Training Clothing Q1 = affordable cocktail dress Labels: • • Leather clothing Women Clothing Formal wear Formal wear Women Clothing Meta data Q2 = cotton children jeans Labels: • Children clothing Children Clothing Fashion How to handle predictions like 2.5 labels? Query: #labels Q1: 2 Regression Meta-Model Q2: 1 Q3: 3 One-vs-Rest SVM Q3 = leather fashion in 1990s Labels: •Fashion •Women Clothing • Leather Clothing Yahoo! Data Mining & Research Meta Feature Construction • Content-Based – Use raw data – Raw data contains all the info • Score-Based C1 – Use prediction scores Meta Feature C2 C3 0.9 -0.2 0.7 C4 - 0.6 – Bias with scores might be learned • Rank-Based – Use sorted prediction scores C1 C2 C3 0.9 -0.2 0.7 C4 -0.6 Meta Feature 0.9 0.7 -0.2 -0.6 Yahoo! Data Mining & Research MetaLabeler Prediction • Given one instance: – Obtain the rankings for all labels; – Use the meta model to predict the number of labels – Pick the top-ranking labels • MetaLabeler – Easy to implement – Use existing SVM package/software directly – Can be combined with a hierarchical structure easily • Simply build a Meta Model at each internal node Yahoo! Data Mining & Research Baseline Methods • Existing thresholding methods (Yang 2001) – Rank-based Cut (Rcut) • output fixed number of top-ranking labels for each instance – Proportion-based Cut • For each label, choose a portion of test instances as positive • Not applicable for online prediction – Score-based Cut (Scut, aka. threshold tuning) • For each label, determine a threshold based on cross-validation • Tends to overfit and is not very stable • MetaLabeler: A local RCut method – Customize the number of labels for each instance Yahoo! Data Mining & Research Publicly Available Benchmark Data • Yahoo! Web Page Classification – 11 data sets: • each constructed from a top-level category • 2nd level topics are the categories – 16-32k instances, 6-15k features, 14-23 categories – 1.2 -1.6 labels per instance, maximum 17 labels – Each label has at least 100 instances • RCV1: – A large scale text corpus – 101 categories, 3.2 labels per instance – For evaluation purpose, use 3000 for training, 3000 for testing – Highly skewed distribution (some labels have only 3-4 instances) Yahoo! Data Mining & Research MetaLabeler of Different Meta Features • Which type of meta feature is more predictive? Yahoo! RCV1 65 80 60 70 55 60 50 50 45 40 40 30 35 30 20 Exact Match Ratio content Micro-F1 score Macro-F1 rank Exact Match Ratio Micro-F1 content score Macro-F1 rank • Content-based MetaLabeler outperforms other meta features Yahoo! Data Mining & Research Performance Comparison RCV1 Yahoo! 65 80 60 70 55 60 50 50 45 40 40 35 30 30 20 Exact Match Ratio SVM RCut Micro-F1 SCut Macro-F1 MetaLabeler Exact Match Ratio SVM RCut Micro-F1 SCut Macro-F1 MetaLabeler • MetaLabeler tends to outperform other methods Yahoo! Data Mining & Research Bias with MetaLabeler • The distribution of number of labels is imbalanced – Most instances have small number of labels; – Small portion of data instances have many more labels • Imbalanced Distribution leads to bias in MetaLabeler – Prefer to predict lesser labels – Only predict many labels with strong confidence Label Distribution on Yahoo! Society Data 12000 Ground Truth Frequency 10000 8000 MetaLabeler Prediction 6000 4000 2000 0 1 2 3 4 5 6 7 Number of Labels Yahoo! Data Mining & Research Scalability Study Computation Time (seconds) Computation Time Comparison on Yahoo! Society Data 2000 1800 1600 1400 1200 1000 800 600 400 200 0 SVM MetaLabeler Threshold Tuning 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of Samples for Training • Threshold tuning requires cross-validation, otherwise overfit • MetaLabeler simply adds some meta labels and learn One-vsRest SVMs Yahoo! Data Mining & Research Scalability Study (cond.) • Threshold tuning: linearly increasing with number of categories in the data – E.g. 6000 categories -> 6000 thresholds to be tuned • MetaLabeler: upper bounded by the maximum number of labels with one instance – E.g. 6000 categories – but one instance has at most 15 labels – Just need to learn additional 15 binary SVMs • Meta Model is “independent” of number of categories Yahoo! Data Mining & Research Application to Large Scale Query Categorization • Query categorization problem: – 1.5 million unique queries: 1M for training, 0.5M for testing – 120k features – A 8-level taxonomy of 6433 categories • Multiple labels 3+ – e.g. 0% interest credit card no transfer fee labelslabels •2 Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Balance Transfer 16% 3% • Financial Services/Credit, Loans and• Debt/Credit/Credit Low Interest Card 1.23 labels onCard/ average • Financial Services/Credit, Loans and• Debt/Credit/Credit Card/ Low-No-fee Card At most 26 labels 1 label 81% Yahoo! Data Mining & Research Flat Model • Flat Model: do not leverage the hierarchical structure – Threshold tuning on training data alone takes 40 hours to finish while MetaLabeler costs 2 hours. 100 90 80 Micro-F1 70 60 SVM MetaLabeler Threshold Tuning 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Depth Yahoo! Data Mining & Research Hierarchical Model - Training Root Step 1: Generate Training Data ..... . . . . . . .Step . .3:.Create . . “Other” Category Step 4: Train One vs. Rest SVM . . Other ........ Step 2: Roll up labels N ..... ............. ............. New Training Data Training Data Yahoo! Data Mining & Research Hierarchical Model - Prediction Root Query q m1 Query q m2 c1 m2 c2 ..... Stop !!! m3 Predict using SVMs trained at root level m3 m4 ..... Query q . . . . . . . . . Other . . . . . . . . . c3 Stop !!! ............. ............. • Stop if reaching a leaf node or “other” category Yahoo! Data Mining & Research Hierarchical Model + MetaLabeler • Precision decrease by 1-2%, but recall is improved by 10% at deeper levels. 100 95 Performance 90 85 MetaLabeler-Precision 80 MetaLabeler-Recall SVM-Precision 75 SVM-Recall 70 65 60 1 2 3 4 5 6 7 8 Depth Yahoo! Data Mining & Research Features in MetaLabeler Feature Overstock.com Related Categories –Mass Merchants/…/discount department stores –Apparel & Jewelry –Electronics & Appliances –Home & Garden –Books-Movies-Music-Tickets Blizard –Toys & Hobbies/…/Video Game –Computing/…/Computer Game Software –Entertainment & Social Event/…/Fast Food Restaurant –Reference/News/Weather Information Threading – Books-Movies-Music-Tickets/…/Computing Books – Computing/…/Programming – Health and Beauty/…/Unwanted Hair – Toys and Hobbies/…/Sewing Yahoo! Data Mining & Research Conclusions & Future Work • MetaLabeler is promising for large-scale multi-label classification – Core idea: learn a meta model to predict the number of labels – Simple, efficient and scalable – Use existing SVM software directly – Easy for practical deployment • Future work – How to optimize MetaLabeler for desired performance ? • E.g. > 95% precision – Application to social networking related tasks Yahoo! Data Mining & Research Questions? References • Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., and Ma, W. 2005. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl. 7, 1 (Jun. 2005), 36-43. • Rifkin, R. and Klautau, A. 2004. In Defense of One-Vs-All Classification. J. Mach. Learn. Res. 5 (Dec. 2004), 101-141. • Yang, Y. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM, New York, NY, 137-145. Yahoo! Data Mining & Research

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SOUP - WWW2009 EPrints