Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallum Kamal Nigam Just Research and Carnegie Mellon University Text Active Learning • Many applications • Scenario: ask for labels of a few documents • While learning: – Learner carefully selects unlabeled document – Trainer provides label – Learner rebuilds classifier Query-By-Committee (QBC) • Label documents with high classification variance • Iterate: – Create a committee of classifiers – Measure committee disagreement about the class of unlabeled documents – Select a document for labeling • Theoretical results promising [Freund et al. 97] [Seung et al. 92] Text Framework • “Bag of Words” document representation • Naïve Bayes classification: P (class ) P (class | doc ) P ( word | class) worddoc P (doc ) • For each class, estimate P(word|class) Outline: Our approach • Create committee by sampling from distribution over classifiers • Measure committee disagreement with KLdivergence of the committee members • Select documents from a large pool using both disagreement and density-weighting • Add EM to use documents not selected for labeling Creating Committees • Each class a distribution of word frequencies • For each member, construct each class by: – Drawing from the Dirichlet distribution defined by the labeled data MAP classifier Member 1 Member 2 labeled data Member 3 Classifier distribution Committee Measuring Committee Disagreement • Kullback-Leibler Divergence to the mean – compares differences in how members “vote” for classes – Considers entire class distribution of each member – Considers “confidence” of the top-ranked class • Pk (c ) Disagreement Pk (c ) log P (c ) kcommittee cclasses avg Selecting Documents • Stream-based sampling: – Disagreement => Probability of selection – Implicit (but crude) instance distribution information • Pool-based sampling: – Select highest disagreement of all documents – Lose distribution information Disagreement Density-weighted pool-based sampling • A balance of disagreement and distributional information • Select documents by: arg max Density(d ) Disagreement (d ) dunlabeled • Calculate Density by: – (Geometric) Average Distance to all documents • Distance( di , d j ) e ~ β D[ P ( word|d j )||P ( word|di )] Disagreement Density Datasets and Protocol • Reuters-21578 and subset of Newsgroups • One initial labeled document per class • 200 iterations of active learning computers mac acq trade corn ... ibm X windows graphics QBC on Reuters Title: Title: Title: Creator: gnuplot Preview : This EPS picture w as not saved w ith a preview included in it. Comment: This EPS picture w ill print to a PostScript printer, but not to other ty pes of printers . Creator: gnuplot Preview : This EPS picture w as not saved w ith a preview included in it. Comment: This EPS picture w ill print to a PostScript printer, but not to other ty pes of printers . Creator: gnuplot Preview : This EPS picture w as not saved w ith a preview included in it. Comment: This EPS picture w ill print to a PostScript printer, but not to other ty pes of printers . acq, P(+) = 0.25 trade, P(+) = 0.038 corn, P(+) = 0.018 Selection comparison on News5 Title: Creator: gnuplot Preview : This EPS picture w as not saved w ith a preview included in it. Comment: This EPS picture w ill print to a PostScript printer, but not to other ty pes of printers . EM after Active Learning After active learning only a few documents have been labeled Use EM to predict the labels of the remaining unlabeled documents Use all documents to build a new classification model, which is often more accurate. QBC and EM on News5 Title: Creator: gnuplot Preview : This EPS picture w as not saved w ith a preview included in it. Comment: This EPS picture w ill print to a PostScript printer, but not to other ty pes of printers . Related Work • Active learning with text: – [Dagan & Engelson 95]: QBC Part of speech tagging – [Lewis & Gale 94]: Pool-based non-QBC – [Liere & Tadepalli 97 & 98]: QBC Winnow & Perceptrons • EM with text: – [Nigam et al. 98]: EM with unlabeled data Conclusions & Future Work • Small P(+) => better active learning • Leverage unlabeled pool by: – pool-based sampling – density-weighting – Expectation-Maximization • Different active learning approaches a la [Cohn et al. 96] • Interleaved EM & active learning