Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROJECT TITLE ADDING DOMAIN KNOWLEDGE TO INDUCTIVE LEARNING METHODS FOR CLASSIFYING TEXTS Kevin D. Ashley University of Pittsburgh Learning Research and Development Center Graduate Program in Intelligent Systems Contact Information Kevin D. Ashley 3939 O'Hara Street Pittsburgh, PA 15213 Phone: (412) 624 -7496 Fax : (412) 624 -9149 Email: [email protected] http://www.lrdc.pitt.edu/Ashley/Default.htm WWW PAGE http://www.pitt.edu/~steffi/CBR/group.html List of Supported Students and Staff (optional) Stefanie Brüninghaus, GRA, University of Pittsburgh Graduate Program in Intelligent Systems Project Award Information l l l Award Number: NSF IIS-9987869 Duration: 9/1/2000 -- 8/31/2001 Title: Adding Domain Knowledge to Inductive Learning Methods for Classifying Texts Keywords case-based reasoning (CBR), automated case indexing, automated text classification, knowledge-guided machine learning, text-oriented CBR, factor-based text classification, legal information retrieval Project Summary The work improves current methods for learning to classify texts by incorporating knowledge from an expert domain model. The goal is automatically to classify the texts of legal opinions in terms of the factors that apply to the cases described. Factors, stereotypical fact patterns tending to strengthen or weaken the underlying legal claims in a case, and their relations to legal issues, are a kind of expert domain knowledge useful in legal argumentation. The program takes as inputs the raw texts of legal opinions and assigns as outputs the applicable factors. The program's training instances are drawn from a corpus of legal opinions whose textual descriptions of cases have been represented manually in terms of factors. The problem is hard because the language of the opinions is complex; the mere fact that an opinion discusses factors does not necessarily imply that those factors actually apply to the case. We employ ID3 to induce decision trees for classifying by factors. We are exploring means of using information extraction techniques and certain linguistic information (e.g., about negation) to improve the text representation and classification performance. Publications and Products l Papers in Conference and Workshop Proceedings. Ashley, K.D., and St. Bruninghaus (1998) Developing Mapping and Evaluation Techniques for Textual CBR. In: Proceedings of the AAAI-98 Workshop on Textual Case-Based Reasoning. Pages 20 - 23. AAAI Technical Report WS-98 -12. AAAI Press, Menlo Park, CA. Bruninghaus, S. and Ashley, K.D. (2001). "Improving the Representation of Legal Case Texts with Information Extraction Methods." To appear in Proceedings, Eighth International Conference on Artificial Intelligence and Law, Association of Computing Machinery, New York. St. Louis. May. Bruninghaus, S. and Ashley, K.D. (1999a). "Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases," In Proceedings, Seventh International Conference on Artificial Intelligence and Law, Association of Computing Machinery, New York. Oslo. June. Donald H. Berman Award for Best Student Paper. http://www.pitt.edu/~steffi/papers/icail99.ps. Bruninghaus, S. and Ashley, K.D. (1999b). "Bootstrapping Case Base Development with Annotated Case Summaries," In Proceedings of the Third International Conference On Case-Based Reasoning. Munich, Germany. July. Outstanding Research Paper Award. http://www.pitt.edu/~steffi/papers/iccbr99.ps. Bruninghaus, St., and K.D. Ashley (1998a) Evaluation of Textual CBR Approaches. In: Proceedings of the AAAI-98 Workshop on Textual Case-Based Reasoning. Pages 30-34. AAAI Technical Report WS-98-05. AAAI Press, Menlo Park, CA. Bruninghaus, St., and K.D. Ashley (1998b) How Machine Learning Can be Beneficial for Textual Case-Based Reasoning. In: Proceedings of the AAAI-98/ICML-98 Workshop on Learning for Text Categorization. Pages 71-74. AAAI Technical Report WS-98-05. AAAI Press, Menlo Park, CA. l Invited Talks Ashley, K. D. (2000) Applying Textual Case-Based Reasoning and Information Extraction in Lessons Learned Systems in Papers from the AAAI Workshop on Intelligent Lessons Learned Systems. Technical Report WS-00-03. pp. 1-4. AAAI Press. Menlo Park, CA. Ashley, K.D. (1999) Progress in Text-Based Case-Based Reasoning. Invited Talk for the Third International Conference on Case-Based Reasoning. Seon, Germany. http://www.lrdc.pitt.edu/Ashley/TalkOverheads_files/v3_document.htm Bruninghaus, St. (1998) Case-Based Reasoning From Textual Documents. Invited Talk at the Sixth German Workshop on Case-Based Reasoning. Extended Abstract published in: Proceedings of the Sixth German Workshop on Case-Based Reasoning (GWCBR-98). Pages 55-58. Berlin, Germany. http://www.pitt.edu/~steffi/papers/slidesgwcbr98.ps Project Impact The project has enabled a graduate student in the University of Pittsburgh Graduate Program in Intelligent Systems to pursue her ideal research topic. Stefanie Brüninghaus is performing her Ph.D. dissertation project with this funding and plans to enter academia in AI/Computer Science, a field that needs more female faculty. This funding has already bolstered Ms. Brüninghaus’ professional experience and exposure with an invited talk and two "Best Paper" awards. The funding has enabled me to hire a number of law student assistants who have gained exposure and interest in computational approaches to dealing with legal texts. Finally, the work will enable us to improve the intelligent tutoring system CATO, which teaches law students basic skills of legal argument, and to expand its database to include other legal domains. Goals, Objectives, and Targeted Activities We are exploring how to use information extraction methods, especially AutoSlog, and certain linguistic information (e.g., about negation) to improve the representation of the legal case texts. Specifically, we are testing hypotheses that (1) abstracting from the individual actors and events in cases, (2) capturing actions in multi-word features, and (3) recognizing negation, can improve the text representation and classification performance. Project References (See Publications and Products above) Area Background Previously, we developed an expert model of case-based reasoning, which is the basis for an intelligent tutoring system to teach law students argumentation with previous cases available as texts. The texts are legal opinions in which judges record their decisions and rationales for litigated disputes. We have compiled a large corpus of full-text descriptions of cases and a parallel abstract representation of some important aspects of those cases which capture their content and meaning. Our model of expert legal reasoning relates a set of factors, stereotypical factual strengths and weaknesses which tend to strengthen or weaken a legal claim, with the more abstract legal issues to which the factors are relevant. The evidence that factors apply to a given case are passages in the text of the opinions. We have constructed these resources in building the CATO program, an NSF PYI-supported intelligent tutoring environment designed to teach law students to make arguments with cases. CATO's Factor Hierarchy relates factors to more aggregated concepts and ultimately to legal issues raised by the legal claim. Together factors and the Factor Hierarchy enable CATO to generate examples of legal arguments and to provide some feedback on a students' work. We think that using the representation as guidance, a machine learning program trained on the corpus could learn to classify which factors and issues apply in new cases presented as raw texts. Area References Rissland, E. L. and Daniels, J. (1995) "A Hybrid CBR-IR Approach to Legal Information Retrieval." In Proceedings of the Fifth International Conference on AI and Law, (ICAIL-95), pp. 52-61. ACM-Press: New York, NY. Smith, J.C., Gelbart, D., MacKimmon, K., Atherton, B., McClean, J., Shinehoft, M. and Quintana, L. (1995). "Artificial Intelligence and Legal Discourse: The Flexlaw Legal Text Management System". In Artificial Intelligence and Law, Volume 3, Number 1, pp. 55-95. Kluwer Academic Publishers: Dordrecht, The Netherlands. Potential Related Projects To be determined.