Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Mining relational data from text – From strictly supervised to weakly supervised learning Presenter : Shao-Wei Cheng Authors : Zhu Zhang IS 2008 Intelligent Database Systems Lab Outline Motivation Objective Methodology Experiments Conclusion Personal Comments N.Y.U.S.T. I. M. 2 Intelligent Database Systems Lab Motivation N.Y.U.S.T. I. M. The world today is full of various information sources, often with different ways of representing the same information. And lots of relations are hidden in natural language text. While supervised learning is usually preferred when applicable, it is not always easy to acquire large amount of labeled training data. Relation:(author, book) (1) ‘‘… Shakespeare’s famous work Hamlet …’’ (2) ‘‘… A Brief History of Time was written by Stephen Hawking …’’ 3 Intelligent Database Systems Lab Objectives N.Y.U.S.T. I. M. The goal of the study is to automatically classify relations between entities using machine learning techniques (SVM), especially weakly supervised learning algorithms. Active learning Bootstrapping Introduce random subspace-based algorithms, in the context ROLE of both active learning and bootstrapping for relation classification. PART AT NEAR Shares of Disney, parent company of ABC, are up five eighths. SOC 4 Intelligent Database Systems Lab Methodology N.Y.U.S.T. I. M. Co-training algorithm Active learning RandSelect:Random sampling strategy. ActiveLearnBaseline:Presents the examples with the highest uncertainty to the user for annotation. ActiveLearnBagging:Committee-based strategy. Presents the most disagreement example for further labeling. ActiveLearnSubspace:The features are randomly sampled with probability p. 5 Intelligent Database Systems Lab Methodology N.Y.U.S.T. I. M. Bootstrapping BootSelf-Y:The highest-probability label is assigned. BootBagging-Y:Committee-based strategy, and the highest-probability label is assigned. BootSubspace-Y:Random sampling in feature space. The modified bootstrapping are named BootSelf-I and BootBagging-I and BootSubspace-I, the “I” stand for “incremental”. 6 Intelligent Database Systems Lab Experiments Dataset N.Y.U.S.T. I. M. From the ACE corpus Data treatment Parse the sentences into syntactic trees. Convert into chunklink format and generate feature vectors. John hit theof ball. Performance the SVM model. 7 Intelligent Database Systems Lab Experiments N.Y.U.S.T. I. M. Active learning 8 Intelligent Database Systems Lab Experiments N.Y.U.S.T. I. M. Something about Co-training algorithm Bootstrapping Intelligent Database Systems Lab Conclusion N.Y.U.S.T. I. M. A variety of weakly supervised learning (active learning and bootstrapping) algorithms can take advantage of large amount of unlabeled data when labeling is costly. Innovative use of RS-based algorithms in the context of weakly supervised learning demonstrated empirical advantage. 10 Intelligent Database Systems Lab Personal Comments Advantage The goal of the study is clearly. Drawback N.Y.U.S.T. I. M. Some of proper noun without any explanation. Application Relation extraction. Weakly supervised learning. 11 Intelligent Database Systems Lab