Download Multi-Instance Learning

Theoretical Analysis of Multi-Instance Leaning 张敏灵周志华 [email protected] 南京大学软件新技术国家重点实验室 2002.10.11 Outline  Introduction  Theoretical analysis  PAC learning model  PAC learnablility of APR  Real-valued multi-instance learning  Future work Introduction  Origin   Multi-instance learning originated from the problem of “drug activity prediction”, and was first formalized by T. G. Dietterich et al. in their seminal paper “Solving the multiple-instance problem with axis-parallel rectangles”(1997) Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “multi-instance learning” to “multi-part learning”, and pointed out that many previously studied problems are “multi-part” problems rather than “multi-instance” ones. Introduction-cont’d  Drug activity prediction problem Fig.1. The shape of a molecule changes as it rotates it’s bonds  Comparisons Fig.2. Classical and multi-instance learning frameworks Introduction-cont’d  Experiment data Data set #dim #bags #pos bags #neg bags #instances #instances/bag max min ave musk1 166 92 47 45 476 40 2 5.17 musk2 166 102 39 63 6598 1044 1 64.69  APR(Axis-Parallel Rectangles) algorithms GFS elim-count APR(standard) GFS elim-kde APR(outside-in) Iterated discrim APR(inside-out) musk1: 92.4% musk2: 89.2% Fig.3. APR algorithms Introduction-cont’d  Various algorithms  APR (T. G. Dietterich et al.1997)  MULTINST (P. Auer 1997)  Diverse Density (O. Maron 1998)  Bayesian-kNN, Citation-kNN (J. Wang et al. 2000)  Relic (G. Ruffo 2000)  EM-DD (Q. Zhang & S. A. Goldman 2001)  …… Introduction-cont’d  Comparison on benchmark data sets Algorithms Musk1 (%correct) Musk2 (%correct) iterated-discrim APR 92.4 89.2 Citation-kNN 92.4 86.3 Diverse Density 88.9 82.5 RELIC 83.7 87.3 MULTINST 76.7 84.0 BP 75.0 67.7 C4.5 68.5 58.8 Fig.4. A comparison of several multi-instance learning algorithm Introduction-cont’d  Application area  Drug activity prediction (T. G. Dietterich et al. 1997)  Stock prediction (O. Maron 1998)  Learn a simple description of a person from a series of images (O. Maron 1998)  Natural scene classification (O. Maron & A. L. Ratan 1998)  Event prediction (G. M. Weiss & H. Hirsh 1998)  Data mining and computer security (G. Ruffo 2000)  …… Multi-instance learning has been regarded as the fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning. Theoretical analysis  PAC learning model  Definition and it’s properties  VC dimension  PAC learnability of APR  Real-valued multi-instance learning Theoretical Analysis－PAC model  Computational learning theory  L. G. Valiant (1984)    A theory of learnable Deductive learning Used for constructing a mathematical model of a cognitive process. W Actual example P Coded example M Fig.5. Diagram of a framework for learning 0/1 PAC model-cont’d  Definition of PAC learning We say that a learning algorithm L is a pac(probably approximately correct) learning algorithm for the hypothesis space H if, given   A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1); then there is a positive integer mL＝ mL (δ,ε) such that  For any target concept t ∈H  For any probability distribution µ on X whenever m  mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ PAC model-cont’d  Properties of a pac learning algorithm  It is probable that a useful training sample is presented.  One can only expect that the output hypothesis is approximately correct.  mL depends upon δ and ε, but not on t and µ. If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable.  Efficient pac learning algorithm If the running time of a pac learning algorithm L is polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm to be efficient. PAC model-cont’d  VC dimension  VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986)  VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘expressive power’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘expressive power’ of H, so H is more difficult to learn. PAC model-cont’d  Consistency If for any target concept t∈H and any training sample s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi, then we say that L is a consistent algorithm.  VC dimension and pac learnability H has finite VC dimension L is a consistent learning algorithm for H H is pac-learnable Theoretical Analysis－PAC learning of APR  Early work While T. G. Dietterich et al. have proposed three APR algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if,   Each instance in a bag is draw from a product distribution. All instance in a bag are drawn independently. then APR is pac learnable under the multi-instance nd ) learning framework with sample complexity O( d n log  nd ). and time complexity O( d n log  2 6 10 5 12 2 20 PAC learning of APR-cont’d  A hardness result Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity O( d n log d ) and time complexity O( d n ). 2 2 3 2 2 2 More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NPComplete problem. PAC learning of APR-cont’d  A further reduction A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that   If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multiinstance examples. Via a reduction to the “Statistical Query” model ( M. Kearns 1993), APR is pac learnable from multi-instance d 2n examples with 3sample complexity O ( 2 ) and with time 2  complexity O ( d n2 ) .  PAC learning of APR-cont’d  Summary P. M. Long et al. P. Auer et al. A. Blum et al. Sample Time complexity complexity d 2n6 nd d 5n12 2 nd O( 10 log ) O( 20 log )     d 2n 2 d O( 2 log )   d 2n O( 2 )  Constrains Theoretical tools product distribution, p-concept, independent instances d 3n 2 O( 2 )  independent instances d 3n 2 O( 2 )  independent instances VC dimension VC dimension statistical query model, VC dimension Fig.6. A comparison of three theoretical algorithm Theoretical Analysis－ Real-valued multi-instance learning  Real-valued multi-instance learning    It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable. S. Ray & D. Page (2001) showed that the problem of multiinstance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF. Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them. Future work  Further theoretical analysis of multi-instance     learning. Design multi-instance modifications for neural networks, decision trees, and other popular machine learning algorithms. Explore more issues which can be translated into multi-instance learning problems. Design appropriate bag generating methods. …… Thanks

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Multi-Instance Learning