Download supplementary material

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Corecursion wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Genetic algorithm wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Pattern recognition wikipedia , lookup

Selection algorithm wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Regression analysis wikipedia , lookup

Multi-objective optimization wikipedia , lookup

Simplex algorithm wikipedia , lookup

Generalized linear model wikipedia , lookup

Inverse problem wikipedia , lookup

Mathematical optimization wikipedia , lookup

Transcript
Supplementary File for Graph Regularized Meta-path Based
Transductive Regression in Heterogeneous Information Network
Mengting Wan∗
1
Yunbo Ouyang∗
Theorem Proofs
Recall the constraint optimization problem
min J(w; f ) =2
K
X
Lance Kaplan†
∂
∂tk JL (t; f )
From
Jiawei Han∗
= 0, we have
2f T L(k) f = λtk .
T
wk f L
(k)
T
f + α1 (fL − yL ) (fL − yL )
k=1
T
+ α2 (fU − ỹU ) Σ
−1
Since
(fU − ỹU ).
subject to
K
X
K
P
K
P
exp(−wk ) =
k=1
tk = 1, by plugging in we
k=1
have the above solution.
Theorem 1.2. Suppose w is fixed, the objective problem J(w; f ) is a convex optimization problem. The globk=1
al optimal solution is given by solving the following linWe provide following theorems and proofs:
ear system:
Theorem 1.1. Suppose f is fixed, the objective problem (1.2)
J(w; f ) with constraint function δ(w) = 0 is a convex
K
K
X
X
(k)
(k)
optimization problem. The global optimal solution is
(2
wk + α1 )fL = 2
wk (S11 fL + S12 fU ) + α1 yL ;
given by
k=1
k=1
K
K
X
X
f T L(k) f
(k)
(k)
wk (S21 fL + S22 fU ) + α2 Σ−1 ỹU .
wk + α2 Σ−1 )fU = 2
(2
wk = −log( K
).
P T (k)
(1.1)
k=1
k=1
f L f
exp(−wk ) = 1.
k=1
where
#
"
(k)
(k)
Proof. The Hessian matrix for δ(w) is a diagonal
S
S
11
12
S(k) =
matrix with elements {exp(−wk )}K
(k)
(k) ,
k=1 > 0. Besides
S21 S22
when f is fixed, the objective function is a linear
function. Hence both the objective function and the is partitioned according labeled and unlabeled objects.
constraint function are convex, therefore this problem
Proof. When w is fixed, J(w; f ) is a non-negative
is a convex optimization problem.
quadratic function which has Hessian matrix
Suppose
JL (w; f ) = J(w; f ) + λ(
K
X
exp(−wk ) − 1),
k=1
and
(1.3)
H=4
K
X
k=1
(k)
wk L
+
2α1 In
O
O
2α2 Σ−1
For any f and any k, f T L(k) f > 0, therefore Hessian
matrix is non-negative, which means J(w; f ) is convex
when w is fixed.
We have
Given w, we can derive f . If
K
X
#
"
T (k)
T
JL (t; f ) =2
(−log(tk ))f L f + α1 (fL − yL ) (fL − yL )
(k)
(k)
S11 S12
(k)
k=1
S =
(k)
(k) ,
S21 S22
K
X
+ α2 (fU − ỹU )T Σ−1 (fU − ỹU ) + λ(
tk − 1)
we have
k=1
"
#
(k)
(k)
In − S11
−S12
∗ University of Illinois at Urbana-Champaign
(k)
L =
(k)
(k) .
† U.S. Army Research Laboratory
−S21
Im − S22
t = (t1 , ..., tK )T , tk = exp(−wk ), k = 1, ..., K.
The original meta-path candidates for IMDb data
are
movie-actor-movie
(M-A1 -M), movie-actress-movie
K
(k)
X
T
In − S(k)
−S12
fL (M-A -M), movie-director-movie (M-D-M), movieT
11
2
Ω(w; f ) =2
wk fL fU
(k)
(k)
fU genre-movie (M-G-M) ,movie-studio-movie (M-S-M)
−S21
Im − S22
k=1
and movie-writer-movie (M-W-M), and the original
K
X
(k)
(k)
T
T
meta-path candidates for DBLP data are author-paper=2
wk (fL (In − S11 )fL + 2fU (−S21 )fL
author (A-P-A), author-term-author (A-T-A), authork=1
(k)
venue-author (A-V-A),author-paper-(cite)-paper-(cited
T
+ fU (Im − S22 )fU ).
by)-paper-author (A-P→P←P-A) and author-paperThen from ∂f∂L JL (w; f ) = 0 and ∂f∂U JL (w; f ) = 0, we (cited by)-paper-(cite)-paper-author (A-P←P→P-A).
For the IMDb data, result shows that M-A1 -M, Mhave the above linear system.
A2 -M, M-D-M, M-G-M and M-W-M have significant contribution to the network based consistency of
2 Meta-path Selection
Notice that we only use some of the meta-path can- log(box of f ice). The fact that M-S-M has not been sedidates in our experiment. Meta-path selection is neces- lected shows that movies produced by the same studio
sary in our framework because different meta-paths have may have significantly different box office values. For
different meanings and different contributions to the in- the DBLP data, we notice that A-P-A, A-V-A and Ateraction between objects with respect to the target re- P←P→P-A are selected by Lasso, which means that ausponse variable. Indeed, the user-guided meta-path s- thors who collaborated with each other, who published
election model has been studied in [1]. However, its papers in same venues or whose papers have been cited
probabilistic approach requires user-specified seeds and by same papers tend to have similar academic influence,
has a relatively high computational cost. In our frame- which is represented by log(#citation + 1).
Thus
"
#
work, we suggest an intuitive selection framework to
identify the role of each meta-path using Lasso [2], the
L1 -constrained selection method for linear regression.
For any two different labeled objects xu and xv , we can
calculate a distance between yu and yv . For example,
we can calculate the distance using distu,v = (yu − yv )2 .
(k)
This distance can be regarded as response and Ru,v
can be regarded as predictors, where k = 1, ..., K and
(k)
ˆu,v = b0 + PK bk Ru,v
u, v = 1, 2, ..., n. Let dist
. The
i=1
target is to minimize
X
(u,v)
ˆu,v )2 + λ
(distu,v − dist
K
X
||bk ||,
k=1
where λ can be determined by cross-validation. Intuitively, a good meta-path Pk should be selected and its
corresponding coefficient bk should be negative.
We also notice that when we have a large number
of meta-path candidates, this method can achieve metapath selection automatically and efficiently due to the
special property of Lasso. Unfortunately, real-world
heterogeneous information networks usually contain a
large number of links. Regression model thus cannot
be efficiently computed. However, since the target
of this framework is only to provide an intuition of
each meta-path’s contribution, we can randomly sample
some of them and apply Lasso regression on them.
Specifically, we randomly select 10 sets of 50000 edges.
The final meta-paths used in our Grempt model should
be selected by more than half of these 10 models and
associated coefficients should be negative.
References
[1] Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and
X. Yu, “Integrating meta-path selection with userguided object clustering in heterogeneous information
networks,” in Proceedings of the 18th ACM SIGKDD
international conference on Knowledge discovery and
data mining. ACM, 2012, pp. 1348–1356.
[2] R. Tibshirani, “Regression shrinkage and selection via
the lasso,” Journal of the Royal Statistical Society.
Series B (Methodological), pp. 267–288, 1996.