Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 [email protected] 南京大学软件新技术国家重点实验室 2002.10.11 Outline Introduction Theoretical analysis PAC learning model PAC learnablility of APR Real-valued multi-instance learning Future work Introduction Origin Multi-instance learning originated from the problem of “drug activity prediction”, and was first formalized by T. G. Dietterich et al. in their seminal paper “Solving the multiple-instance problem with axis-parallel rectangles”(1997) Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “multi-instance learning” to “multi-part learning”, and pointed out that many previously studied problems are “multi-part” problems rather than “multi-instance” ones. Introduction-cont’d Drug activity prediction problem Fig.1. The shape of a molecule changes as it rotates it’s bonds Comparisons Fig.2. Classical and multi-instance learning frameworks Introduction-cont’d Experiment data Data set #dim #bags #pos bags #neg bags #instances #instances/bag max min ave musk1 166 92 47 45 476 40 2 5.17 musk2 166 102 39 63 6598 1044 1 64.69 APR(Axis-Parallel Rectangles) algorithms GFS elim-count APR(standard) GFS elim-kde APR(outside-in) Iterated discrim APR(inside-out) musk1: 92.4% musk2: 89.2% Fig.3. APR algorithms Introduction-cont’d Various algorithms APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 2000) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) …… Introduction-cont’d Comparison on benchmark data sets Algorithms Musk1 (%correct) Musk2 (%correct) iterated-discrim APR 92.4 89.2 Citation-kNN 92.4 86.3 Diverse Density 88.9 82.5 RELIC 83.7 87.3 MULTINST 76.7 84.0 BP 75.0 67.7 C4.5 68.5 58.8 Fig.4. A comparison of several multi-instance learning algorithm Introduction-cont’d Application area Drug activity prediction (T. G. Dietterich et al. 1997) Stock prediction (O. Maron 1998) Learn a simple description of a person from a series of images (O. Maron 1998) Natural scene classification (O. Maron & A. L. Ratan 1998) Event prediction (G. M. Weiss & H. Hirsh 1998) Data mining and computer security (G. Ruffo 2000) …… Multi-instance learning has been regarded as the fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning. Theoretical analysis PAC learning model Definition and it’s properties VC dimension PAC learnability of APR Real-valued multi-instance learning Theoretical Analysis-PAC model Computational learning theory L. G. Valiant (1984) A theory of learnable Deductive learning Used for constructing a mathematical model of a cognitive process. W Actual example P Coded example M Fig.5. Diagram of a framework for learning 0/1 PAC model-cont’d Definition of PAC learning We say that a learning algorithm L is a pac(probably approximately correct) learning algorithm for the hypothesis space H if, given A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1); then there is a positive integer mL= mL (δ,ε) such that For any target concept t ∈H For any probability distribution µ on X whenever m mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ PAC model-cont’d Properties of a pac learning algorithm It is probable that a useful training sample is presented. One can only expect that the output hypothesis is approximately correct. mL depends upon δ and ε, but not on t and µ. If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable. Efficient pac learning algorithm If the running time of a pac learning algorithm L is polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm to be efficient. PAC model-cont’d VC dimension VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986) VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘expressive power’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘expressive power’ of H, so H is more difficult to learn. PAC model-cont’d Consistency If for any target concept t∈H and any training sample s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi, then we say that L is a consistent algorithm. VC dimension and pac learnability H has finite VC dimension L is a consistent learning algorithm for H H is pac-learnable Theoretical Analysis-PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if, Each instance in a bag is draw from a product distribution. All instance in a bag are drawn independently. then APR is pac learnable under the multi-instance nd ) learning framework with sample complexity O( d n log nd ). and time complexity O( d n log 2 6 10 5 12 2 20 PAC learning of APR-cont’d A hardness result Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity O( d n log d ) and time complexity O( d n ). 2 2 3 2 2 2 More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NPComplete problem. PAC learning of APR-cont’d A further reduction A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multiinstance examples. Via a reduction to the “Statistical Query” model ( M. Kearns 1993), APR is pac learnable from multi-instance d 2n examples with 3sample complexity O ( 2 ) and with time 2 complexity O ( d n2 ) . PAC learning of APR-cont’d Summary P. M. Long et al. P. Auer et al. A. Blum et al. Sample Time complexity complexity d 2n6 nd d 5n12 2 nd O( 10 log ) O( 20 log ) d 2n 2 d O( 2 log ) d 2n O( 2 ) Constrains Theoretical tools product distribution, p-concept, independent instances d 3n 2 O( 2 ) independent instances d 3n 2 O( 2 ) independent instances VC dimension VC dimension statistical query model, VC dimension Fig.6. A comparison of three theoretical algorithm Theoretical Analysis- Real-valued multi-instance learning Real-valued multi-instance learning It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable. S. Ray & D. Page (2001) showed that the problem of multiinstance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF. Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them. Future work Further theoretical analysis of multi-instance learning. Design multi-instance modifications for neural networks, decision trees, and other popular machine learning algorithms. Explore more issues which can be translated into multi-instance learning problems. Design appropriate bag generating methods. …… Thanks