* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download supplementary material
Survey
Document related concepts
Predictive analytics wikipedia , lookup
Corecursion wikipedia , lookup
Vector generalized linear model wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Genetic algorithm wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Shapley–Folkman lemma wikipedia , lookup
Pattern recognition wikipedia , lookup
Selection algorithm wikipedia , lookup
Multiple-criteria decision analysis wikipedia , lookup
Regression analysis wikipedia , lookup
Multi-objective optimization wikipedia , lookup
Simplex algorithm wikipedia , lookup
Generalized linear model wikipedia , lookup
Transcript
Supplementary File for Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network Mengting Wan∗ 1 Yunbo Ouyang∗ Theorem Proofs Recall the constraint optimization problem min J(w; f ) =2 K X Lance Kaplan† ∂ ∂tk JL (t; f ) From Jiawei Han∗ = 0, we have 2f T L(k) f = λtk . T wk f L (k) T f + α1 (fL − yL ) (fL − yL ) k=1 T + α2 (fU − ỹU ) Σ −1 Since (fU − ỹU ). subject to K X K P K P exp(−wk ) = k=1 tk = 1, by plugging in we k=1 have the above solution. Theorem 1.2. Suppose w is fixed, the objective problem J(w; f ) is a convex optimization problem. The globk=1 al optimal solution is given by solving the following linWe provide following theorems and proofs: ear system: Theorem 1.1. Suppose f is fixed, the objective problem (1.2) J(w; f ) with constraint function δ(w) = 0 is a convex K K X X (k) (k) optimization problem. The global optimal solution is (2 wk + α1 )fL = 2 wk (S11 fL + S12 fU ) + α1 yL ; given by k=1 k=1 K K X X f T L(k) f (k) (k) wk (S21 fL + S22 fU ) + α2 Σ−1 ỹU . wk + α2 Σ−1 )fU = 2 (2 wk = −log( K ). P T (k) (1.1) k=1 k=1 f L f exp(−wk ) = 1. k=1 where # " (k) (k) Proof. The Hessian matrix for δ(w) is a diagonal S S 11 12 S(k) = matrix with elements {exp(−wk )}K (k) (k) , k=1 > 0. Besides S21 S22 when f is fixed, the objective function is a linear function. Hence both the objective function and the is partitioned according labeled and unlabeled objects. constraint function are convex, therefore this problem Proof. When w is fixed, J(w; f ) is a non-negative is a convex optimization problem. quadratic function which has Hessian matrix Suppose JL (w; f ) = J(w; f ) + λ( K X exp(−wk ) − 1), k=1 and (1.3) H=4 K X k=1 (k) wk L + 2α1 In O O 2α2 Σ−1 For any f and any k, f T L(k) f > 0, therefore Hessian matrix is non-negative, which means J(w; f ) is convex when w is fixed. We have Given w, we can derive f . If K X # " T (k) T JL (t; f ) =2 (−log(tk ))f L f + α1 (fL − yL ) (fL − yL ) (k) (k) S11 S12 (k) k=1 S = (k) (k) , S21 S22 K X + α2 (fU − ỹU )T Σ−1 (fU − ỹU ) + λ( tk − 1) we have k=1 " # (k) (k) In − S11 −S12 ∗ University of Illinois at Urbana-Champaign (k) L = (k) (k) . † U.S. Army Research Laboratory −S21 Im − S22 t = (t1 , ..., tK )T , tk = exp(−wk ), k = 1, ..., K. The original meta-path candidates for IMDb data are movie-actor-movie (M-A1 -M), movie-actress-movie K (k) X T In − S(k) −S12 fL (M-A -M), movie-director-movie (M-D-M), movieT 11 2 Ω(w; f ) =2 wk fL fU (k) (k) fU genre-movie (M-G-M) ,movie-studio-movie (M-S-M) −S21 Im − S22 k=1 and movie-writer-movie (M-W-M), and the original K X (k) (k) T T meta-path candidates for DBLP data are author-paper=2 wk (fL (In − S11 )fL + 2fU (−S21 )fL author (A-P-A), author-term-author (A-T-A), authork=1 (k) venue-author (A-V-A),author-paper-(cite)-paper-(cited T + fU (Im − S22 )fU ). by)-paper-author (A-P→P←P-A) and author-paperThen from ∂f∂L JL (w; f ) = 0 and ∂f∂U JL (w; f ) = 0, we (cited by)-paper-(cite)-paper-author (A-P←P→P-A). For the IMDb data, result shows that M-A1 -M, Mhave the above linear system. A2 -M, M-D-M, M-G-M and M-W-M have significant contribution to the network based consistency of 2 Meta-path Selection Notice that we only use some of the meta-path can- log(box of f ice). The fact that M-S-M has not been sedidates in our experiment. Meta-path selection is neces- lected shows that movies produced by the same studio sary in our framework because different meta-paths have may have significantly different box office values. For different meanings and different contributions to the in- the DBLP data, we notice that A-P-A, A-V-A and Ateraction between objects with respect to the target re- P←P→P-A are selected by Lasso, which means that ausponse variable. Indeed, the user-guided meta-path s- thors who collaborated with each other, who published election model has been studied in [1]. However, its papers in same venues or whose papers have been cited probabilistic approach requires user-specified seeds and by same papers tend to have similar academic influence, has a relatively high computational cost. In our frame- which is represented by log(#citation + 1). Thus " # work, we suggest an intuitive selection framework to identify the role of each meta-path using Lasso [2], the L1 -constrained selection method for linear regression. For any two different labeled objects xu and xv , we can calculate a distance between yu and yv . For example, we can calculate the distance using distu,v = (yu − yv )2 . (k) This distance can be regarded as response and Ru,v can be regarded as predictors, where k = 1, ..., K and (k) ˆu,v = b0 + PK bk Ru,v u, v = 1, 2, ..., n. Let dist . The i=1 target is to minimize X (u,v) ˆu,v )2 + λ (distu,v − dist K X ||bk ||, k=1 where λ can be determined by cross-validation. Intuitively, a good meta-path Pk should be selected and its corresponding coefficient bk should be negative. We also notice that when we have a large number of meta-path candidates, this method can achieve metapath selection automatically and efficiently due to the special property of Lasso. Unfortunately, real-world heterogeneous information networks usually contain a large number of links. Regression model thus cannot be efficiently computed. However, since the target of this framework is only to provide an intuition of each meta-path’s contribution, we can randomly sample some of them and apply Lasso regression on them. Specifically, we randomly select 10 sets of 50000 edges. The final meta-paths used in our Grempt model should be selected by more than half of these 10 models and associated coefficients should be negative. References [1] Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu, “Integrating meta-path selection with userguided object clustering in heterogeneous information networks,” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012, pp. 1348–1356. [2] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.