Download Active Learning for Named Entity Recognition Markus Becker January 28, 2004

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Machine learning wikipedia , lookup

Transcript
Bootstrapping Techniques for Text Mining
Active Learning for
Named Entity Recognition
Markus Becker
January 28, 2004
Markus Becker
Active Learning for NER
Bootstrapping Techniques
1
Active Learning
• Supervised Learning
– Large amount of annotated data
– Expensive
– Time consuming
• Active Learning
– Selects most informative data points
– Maximal reduction of error rate with minimal amount of labelling
– Faster converging learning curves
Markus Becker
Active Learning for NER
Bootstrapping Techniques
2
Accuracy
Learning Curves
Amount of labelled data
Faster convergence
→ higher accuracy for same amount of labelled data
→ less labelled data for same levels of accuracy
Markus Becker
Active Learning for NER
Bootstrapping Techniques
3
Overview
• Approaches to Active Learning
–
–
–
–
–
Statistically optimal solutions
Uncertainty sampling
Query by committee
Co-testing
Support vector machine based methods
• Level of Annotation
• Conclusion
• Outlook
Markus Becker
Active Learning for NER
Bootstrapping Techniques
4
Statistically Optimal Solutions
Cohn et al (1996) construct queries such that variance is minimised.
Bias/Variance Decomposition (Geman et al, 1992)
error = noise + bias + variance
Markus Becker
Active Learning for NER
Bootstrapping Techniques
4
Statistically Optimal Solutions
Cohn et al (1996) construct queries such that variance is minimised.
Bias/Variance Decomposition (Geman et al, 1992)
error = noise + bias + variance
Reducing variance minimises future error, given an unbiased learner.
• Computation of expected variance difficult
• Construction of examples infeasible in NLP
Markus Becker
Active Learning for NER
Bootstrapping Techniques
5
Uncertainty Sampling (1)
• Lewis & Gale (1994)
• Document classification
• Usefulness ≈ uncertainty of single learner
– Label probabilities close to 0.5 (binary classification)
Markus Becker
Active Learning for NER
Bootstrapping Techniques
6
Uncertainty Sampling (2)
• Scheffer et al (2001)
• Named entity recognition
• Usefulness ≈ small margin of HMM states
– Difference between best and second best state
• Problem: ignores rest of state distribution
Markus Becker
Active Learning for NER
Bootstrapping Techniques
7
Uncertainty Sampling (3)
Hwa (2000), Entropy of output distribution
high entropy
low entropy
• Near uniform distribution (high entropy)
– low certainty
• Spiked distribution (low entropy)
– high certainty
Markus Becker
Active Learning for NER
Bootstrapping Techniques
8
Query by Committee
• Seung et al (1992)
• Applications to
– Document classification
– Part-of-speech tagging
– Parse selection
• Usefulness ≈ disagreement of committee of learners
– Vote entropy: disagreement between winners
– KL-divergence: distance between class output distributions
Markus Becker
Active Learning for NER
Bootstrapping Techniques
9
Co-Testing
• Muslea et al (2000)
• Applications to
– Wrapper induction
• Maximise
– Disagreement of committee of learners
– Certainty of individual learners
Markus Becker
Active Learning for NER
Bootstrapping Techniques
10
Support Vector Machines
• Data points are vectors in an n-dimensional feature space
• Dividing hyperplane is subspace of dimensionality n − 1
– Labelling according to location wrt hyperplane
• Projection into higher-dimension spaces for linear separability
Markus Becker
Active Learning for NER
Bootstrapping Techniques
11
SVM-Based Active Learning
−
?
?
−
−
−
−
?
−
−
−
?
+
+
+
? + ++
+
?
+
?
+
+
• Schohn & Cohn (2000), Tong & Koller (2001)
• Usefulness ≈ proximity to hyperplane
Markus Becker
Active Learning for NER
Bootstrapping Techniques
11
Problem: Unrepresentative Data Points
?
−
−
−
−
−
?
−
−
−
+
+
+
+
+
+
++
+
+
• Outliers may have high uncertainty
• Knowing label not advantageous for classifier
• Need to model distribution of data points
Markus Becker
Active Learning for NER
Bootstrapping Techniques
12
Level of Annotation (1)
What kind of data points are queried?
• Document level
– Finn & Kushmerick (2003)
– Allows use of well-defined document metrics
• Word level
– Scheffer et al (2001)
– Need to combine labeled with unlabeled data
– Annotating single tokens may be frustrating
Markus Becker
Active Learning for NER
Bootstrapping Techniques
13
Level of Annotation (2)
• Phrase level
– Annotation seems most natural
– Uncertainty-based approaches likely to suggest
subsequences which are not a phrase or named entity
– Co-testing would reduce this risk
Markus Becker
Active Learning for NER
Bootstrapping Techniques
14
Conclusion
• Promising for migrating to new domains and languages
• Application to NER comparatively new
Markus Becker
Active Learning for NER
Bootstrapping Techniques
15
Outlook
Need to determine
• Appropriate active learning paradigm
• Level of annotation
Markus Becker
Active Learning for NER
Bootstrapping Techniques