Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou Problem • Caloricious.com: – Semantic search engine for food items • Free-text queries over structured data – Query: gluten free high protein bars – Data: Each food item is database record with attributes name, brand, category, nutrients, allergens, .. • Query segmentation and structured annotation gluten ALLERGEN free high protein NUTRIENT bars CATEGORY 1st Approach MEMM with Synthetic Training Data • Seems as instance of NER • Problem: No labeled queries to train MEMM • Solution: Generate synthetic labeled queries – Query study in 100 queries • 96% queries contain 1–3 segments. • One of the segments in 98% queries refers to Name or Category or Brand – Algorithm • Pick a food item at random • Pick 1-3 attributes and generate a query 2nd Approach Segmentation & MaxEnt Classification Query Segmentation • Train language model on structured data text • Use model to find segment probabilities • Find the ML segmentation through DP gluten free high protein bars Segment Annotation • Annotate each segment with an attribute using MaxEnt classifier • Training: For each attribute training examples come from the corresponding entries of database products gluten free high protein bars Results Accuracy of Segment Classification 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1st Approach 2nd Approach (2-grams) 2nd Approach (3-grams) Conclusions – Future Work • Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data • It would be interesting to compare with NER on a big labeled set • We also plan to compare with the state-of-the art algorithm in the context of a research submission. More Results… • Evangelos • March 12, 2011 @ 9.14am • 19.5 inches • 6lbs 11oz