Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping Techniques for Text Mining Active Learning for Named Entity Recognition Markus Becker January 28, 2004 Markus Becker Active Learning for NER Bootstrapping Techniques 1 Active Learning • Supervised Learning – Large amount of annotated data – Expensive – Time consuming • Active Learning – Selects most informative data points – Maximal reduction of error rate with minimal amount of labelling – Faster converging learning curves Markus Becker Active Learning for NER Bootstrapping Techniques 2 Accuracy Learning Curves Amount of labelled data Faster convergence → higher accuracy for same amount of labelled data → less labelled data for same levels of accuracy Markus Becker Active Learning for NER Bootstrapping Techniques 3 Overview • Approaches to Active Learning – – – – – Statistically optimal solutions Uncertainty sampling Query by committee Co-testing Support vector machine based methods • Level of Annotation • Conclusion • Outlook Markus Becker Active Learning for NER Bootstrapping Techniques 4 Statistically Optimal Solutions Cohn et al (1996) construct queries such that variance is minimised. Bias/Variance Decomposition (Geman et al, 1992) error = noise + bias + variance Markus Becker Active Learning for NER Bootstrapping Techniques 4 Statistically Optimal Solutions Cohn et al (1996) construct queries such that variance is minimised. Bias/Variance Decomposition (Geman et al, 1992) error = noise + bias + variance Reducing variance minimises future error, given an unbiased learner. • Computation of expected variance difficult • Construction of examples infeasible in NLP Markus Becker Active Learning for NER Bootstrapping Techniques 5 Uncertainty Sampling (1) • Lewis & Gale (1994) • Document classification • Usefulness ≈ uncertainty of single learner – Label probabilities close to 0.5 (binary classification) Markus Becker Active Learning for NER Bootstrapping Techniques 6 Uncertainty Sampling (2) • Scheffer et al (2001) • Named entity recognition • Usefulness ≈ small margin of HMM states – Difference between best and second best state • Problem: ignores rest of state distribution Markus Becker Active Learning for NER Bootstrapping Techniques 7 Uncertainty Sampling (3) Hwa (2000), Entropy of output distribution high entropy low entropy • Near uniform distribution (high entropy) – low certainty • Spiked distribution (low entropy) – high certainty Markus Becker Active Learning for NER Bootstrapping Techniques 8 Query by Committee • Seung et al (1992) • Applications to – Document classification – Part-of-speech tagging – Parse selection • Usefulness ≈ disagreement of committee of learners – Vote entropy: disagreement between winners – KL-divergence: distance between class output distributions Markus Becker Active Learning for NER Bootstrapping Techniques 9 Co-Testing • Muslea et al (2000) • Applications to – Wrapper induction • Maximise – Disagreement of committee of learners – Certainty of individual learners Markus Becker Active Learning for NER Bootstrapping Techniques 10 Support Vector Machines • Data points are vectors in an n-dimensional feature space • Dividing hyperplane is subspace of dimensionality n − 1 – Labelling according to location wrt hyperplane • Projection into higher-dimension spaces for linear separability Markus Becker Active Learning for NER Bootstrapping Techniques 11 SVM-Based Active Learning − ? ? − − − − ? − − − ? + + + ? + ++ + ? + ? + + • Schohn & Cohn (2000), Tong & Koller (2001) • Usefulness ≈ proximity to hyperplane Markus Becker Active Learning for NER Bootstrapping Techniques 11 Problem: Unrepresentative Data Points ? − − − − − ? − − − + + + + + + ++ + + • Outliers may have high uncertainty • Knowing label not advantageous for classifier • Need to model distribution of data points Markus Becker Active Learning for NER Bootstrapping Techniques 12 Level of Annotation (1) What kind of data points are queried? • Document level – Finn & Kushmerick (2003) – Allows use of well-defined document metrics • Word level – Scheffer et al (2001) – Need to combine labeled with unlabeled data – Annotating single tokens may be frustrating Markus Becker Active Learning for NER Bootstrapping Techniques 13 Level of Annotation (2) • Phrase level – Annotation seems most natural – Uncertainty-based approaches likely to suggest subsequences which are not a phrase or named entity – Co-testing would reduce this risk Markus Becker Active Learning for NER Bootstrapping Techniques 14 Conclusion • Promising for migrating to new domains and languages • Application to NER comparatively new Markus Becker Active Learning for NER Bootstrapping Techniques 15 Outlook Need to determine • Appropriate active learning paradigm • Level of annotation Markus Becker Active Learning for NER Bootstrapping Techniques