Download Complex Social Network Mining—Theory, Methodologies, and

Document related concepts

Network tap wikipedia , lookup

IEEE 1355 wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Complex Social Network Mining
—Theory, Methodologies, and Applications
Jie Tang
Department of Computer Science and Technology
Tsinghua University
Email: [email protected]
1
Social Networks
Mobile Web (2008-20)
Web 1.0 (1989)
Connecting via
mobiles…
Pages, hyperlinks
Relevance search
Web 2.0 (2004)
social networks
Blogs, micro-blogs
Web-based (or mobile-based) social networks already
become a bridge to connect our real daily life and the virtual
web space
2
Web-based Social Network Mining
—Theory, Methodologies, and Applications
2
Search/query
over networks
3
7
1
4
Social data
integration
Attribute/link
prediction
Social
knowledge
acquisition
6
5
Trust and privacy
Social dynamics
Social influence
analysis
Theoretical layer
Collective learning
Learning from users
Web1.0: Web of Pages
3
Social theory
Web 2.0: Web of People
Graphical models
Web 3.0: Web of Semantics
Outline
• ArnetMiner: Academic Social Network
• Core Techniques
– Knowledge Acquisition
– Semantic Integration
– Heterogeneous Ranking
– Social Influence Analysis
• Demo
4
ArnetMiner.org
- Academic research social network analysis and mining system
提供全面的研究者网络分析与挖掘功能
Papers published: ACM TKDD, KDD’08-10, SDM’09,
ICDM’07-09, CIKM’07-09, DKE, JIS
http://arnetminer.org/
5
Why Arnetminer.org?
“The information
need is not only
about publication…”
“Academic search is
treated as document
search, but ignore
semantics”
6
Examples – Expertise search
• When starting a
work in a new research topic;
• Or brainstorming for novel
ideas.
Researcher A
• Who are experts in this field?
• What are the top conferences in
the field?
• What are the best papers?
• What are the top research labs?
7
Examples – Citation network analysis
Researcher B
• an in-depth understanding
of the research field?
An Inverted Index
Implementation
Introduction of Modern
Information Retrieval
Topics
Filtered
Document Retrieval with
Frequency-Sorted Indexes
Parameterised Compression for
Sparse Bitmaps
Memory Efficient
Ranking
Topic 31: Ranking and Inverted Index
Topic 1 : Theory
Topic 27: Information retrieval
Topic 23: Index method
Signature les: An access Method
for Documents and
its Analytical Performance
Evaluation
Topic 21: Framework
Topic 34: Parallel computing
Self-Indexing Inverted Files for
Fast Text Retrieval
Topic 22: Compression
A Document-centric Approach
to Static Index Pruning in Text
Retrieval Systems
Other
Vector-space Ranking with
Effective Early Termination
Citation Relationship Type
Efficient Document Retrieval in
Main Memory
8
Static
Index Pruning for Information
Retrieval Systems
Basic theory
Comparable work
Other
Examples – Conference Suggestion
authors
Which conference
should we submit the
paper?
Researcher C
content
9
Examples – Reviewer Suggestion
KDD Committee
conference
Paper content
10
Who are best matching
reviewers for each
paper?
Our Social Network is Black
Social network without
role/relationship info, e.g. a
company’s email network
Latent relationship graph
How to
infer
CEO
Manager
Employee
Fortunately, user interactions form implicit groups
11
From BW to Color
12
3
2
1
13
Person Search
Basic Info.
Citation statistics
Research Interests
Social Network
Fundings
Publications
14
Expertise
Search
Finding experts,
expertise conferences,
and expertise papers
for ―information
retrieval‖
15
Course Search
Finding courses for
―data mining‖
16
Association Search
Finding associations
between persons
- high efficiency
- Top-K associations
Usage:
- to find a partner
- to find a person with
same interests
17
Sub-Graph Search
Sub graphs
18
Topic Browser
200 topics have been
discovered automatically
from the academic network
19
Academic Performance Measurement
Academic Statistics
Personal Statistics
20
Outline
• ArnetMiner: Academic Social Network
• Core Techniques
– Knowledge Acquisition
– Semantic Integration
– Heterogeneous Ranking
– Social Influence Analysis
• Demo
21
ArnetMiner: Overview
2
3
Modeling and Search Network
α
β
θ
Φ
A
Social Network Analysis
Social involution
analysis
Topic model
T
w
ad
x
z
c
Nd
μ
ψ
T
D
Academic
suggestion
Citation tracing
analysis
Expertise search
Social influence
analysis
Association...
cite
Access interface
write
write
Tree CRF...
Indexing
publish
RNKB
Storage
publish
ISWC
Social Network Extraction
cite
1
write
EOS...
publish
Semantic...
write
Name Disambiguation
Pc member
Profiling extraction
Publication extraction
Homepage finding
Data Sources
Papers
Homepage
22
ACM
Papers
DBLP
Papers
Libra
citation
Scholar
publish
write
Limin
Prof. Wangpublish
cite
Integration
SVM...
WWW
cite
publish
Metadata
Map-reduce--Distributed processing platform
Dr. Tang
Social Network Storage
write
Prof. Li
coauthor
Write
write
Annotation...
coauthor
IJCAI
CT1: Knowledge Acquisition from Social Web
(ACM TKDD, ISWC’06, ICDM’07, ACL’07, CIKM’07-08)
Ruud Bolle
2
Office: 1S-D58
Letters: IBM T.J.
Watson Information
Research Center
Contact
P.O. Box 704
Ruud Bolle
Office: 1S-D58 Yorktown Heights, NY 10598 USA
IBM T.J.
WatsonCenter
Research Center
Letters: Packages:
IBM T.J. Watson
Research
Skyline Drive
P.O. Box19704
Hawthorne,
NY10598
10532USA
USA
Yorktown
Heights, NY
Email:
Packages:
[email protected]
T.J. Watson Research Center
19 Skyline Drive
Ruud M. Bolle was born in Voorburg,
The Netherlands.
He received the Bachelor's
Hawthorne,
NY 10532 USA
Degree in Analog Electronics
1977 and the Master's Degree in Electrical
Email: [email protected]
Engineering in 1980, both from Delft University of Technology, Delft, The
In 1983
he received
Master'sEducational
Degree
in Applied
Mathematics and in
Ruud M.Netherlands.
Bolle was born
in Voorburg,
Thethe
Netherlands.
He received
thehistory
Bachelor's
the Ph.D.
in Electrical
Engineering
from Brown
University,
Providence, Rhode
Degree 1984
in Analog
Electronics
in 1977
and the Master's
Degree
in Electrical
Island.
In 1984
he from
became
Research of
Staff
Member atDelft,
the IBM
Engineering
in 1980,
both
Delfta University
Technology,
TheThomas J. Watson
Research
Center
in the Artificial
Intelligence
Department
the Computer
Netherlands.
In 1983
he received
the Master's
Degree
in Applied of
Mathematics
andScience
in
In 1988Engineering
he became from
manager
of University,
the newly formed
Exploratory
1984 theDepartment.
Ph.D. in Electrical
Brown
Providence,
RhodeComputer
Vision
whichaisResearch
part of theStaff
Math
Sciences
Department.
Island. In
1984Group
he became
Member
at the
IBM Thomas J. Watson
Research Center in the Artificial Intelligence Department of the Computer Science
Currently,
hishe
research
are
onformed
video database
indexing,
video
Department.
In 1988
becameinterests
manager
offocused
the newly
Exploratory
Computer
processing,
visual
interaction
and biometrics applications.
Vision Group
which is
part human-computer
of the Math Sciences
Department.
video database indexing
video processing
visual human-computer interaction
biometrics applications
1
Two questions:
IBM T.J. Watson
Research Center
IBM T.J. Watson Research
Center
19 Skyline Drive
Hawthorne, NY 10532 USA
Research_Interest
Research Staff
Affiliation
IBM T.J. Watson Research
Center
P.O. Box 704
Address Yorktown Heights,
NY 10598 USA
Address
http://researchweb.watson.ibm.com/
ecvg/people/bolle.html
Position
Homepage
Photo
Name
Ruud Bolle
[email protected]
Email
• How to accurately extract the researcher
1
profile information from the Web?
services
• How toAcademic
integrate
the information from different
sources?
Ruud
M. Bolle interests
is a Fellow
the IEEE
thedatabase
AIPR. Heindexing,
is Area Editor
Currently,
his research
areoffocused
onand
video
video of Computer
Vision
andhuman-computer
Image Understanding
and Associate
Editor applications.
of Pattern Recognition. Ruud
processing,
visual
interaction
and biometrics
M. Bolle is a Member of the IBM Academy of Technology.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer
Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud
M. Bolle is a Member of the IBM Academy of Technology.
1984
Brown University
Electrical Engineering
2006
Msuniv
Delft University of Technology
Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint
49 EE Representation Using Localized Texture Features. ICPR (4) 2006: 521-524
2
Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle:
48 EE Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)
Electrical Engineering
Applied Mathematics
Co-author
47
46
...
23
Co-author
Publication 2#
Publication 1#
Title
Title
Cancelable Biometrics:
A Case Study in
Venue
Fingerprints
Date
End_page
Start_page
ICPR
2005
1Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior:
EE The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20
Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha:
EE 2
Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116
Bsdate
1977
Bsuniv
Delft University of Technology
Bsmajor
Msmajor
Msmajor
1Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics:
50 EE A Case Study in Fingerprints. ICPR (4) 2006: 370-373
Ruud Bolle
Analog Electronics
1980
Publications
DBLP: Ruud Bolle
Phddate
Phduniv
Phdmajor
Msdate
370
Fingerprint
Representation Using
Localized Texture
Features
Venue
End_page
Start_page
2006
2006
521
ICPR
373
coauthor
Publication #3
affiliation
524
UIUC
Ruud Bolle
2
Publication #5
...
Date
coauthor
position
Professor
Researcher Network Extraction
70.60% of the researchers
have at least one homepage
or an introducing page
Research_Interest
Fax
Affiliation
Title
85.6% from
universities
14.4% from
companies
Start_page
71.9% are
homepages
End_page
40% are in lists
and tables
28.1% are
introducing
pages
60% are natural
language text
Phone
Postion
Publication_venue
Address
Person Photo
Email
Homepage
Publication
Name
Authored
Coauthor
Researcher
Bsdate
Bsuniv
Phddate
Phduniv
Phdmajor
Msdate
Bsmajor
Msuniv
Msmajor
Date
There are a large number of
person names having the
ambiguity problem
Even 3 ―Yi Li‖ graduated from
the author’s lab
70% moved at least one time
24
Our Approach Picture
– based on Markov Random Field
Markov Property:
Ya
P(Yi | Y j | Y j  Yi )
Yb
Yc
Special cases:
 P(Yi | Y j | Y j ~ Yi )
- Conditional Random Fields
- Hidden Markov Random
Fields
Ye
Yd
Yf
y4=2
y9=3
t -coauthor y7=2
y1=1
cite
coauthor
y10=3
y5=2
y6=2
co-conference y3=1
y2=1
coauthor
cite
coauthor
co-conference
cite
y11=3
y8=1
coauthor
x4
x9
x7
x1
Researcher Profiling
25
x5
Name Disambiguation
x3
x6
x2
x11
x8
x10
CT2: Semantic Integration
(IEEE TKDE, SIGMOD’09, IJCAI’09, ISWC’09)
26
RiMOM-A Tool for Semantic Integration
(OAEI’06-09)
Benchmark Results
1
0.5
0
Precsion
Recall
F-measure
1
0.8
0.6
0.4
0.2
0
Anatomy Results
Precision
Recall
Recall+
F-measure
agrafsa
Subtrack
“I’m really
surprised
byResults
the
1
good
0.8 results of these years
0.6
RiMOM,
you can compete withPrecision
0.4
http://keg.cs.tsinghua.edu.cn/project/RiMOM/
the0.2
top
systems
that make use
0
of such background knowledge.‖
27
CT3: Topic-based Heterogeneous Ranking
(Machine Learn. J, KDD’08, ICDM’08, CIKM’09, DKE)
Data
mining
Modeling using VSM
ery
Qu tor
vec
Search with
keyword
0
0
Principles of Data Mining.
DJ Hand - Drug Safety, 2007 - drugsafety.adisonline.com
Doc1
r
vecto
1
1
1 0
1
0
1 1
Doc3
1 0 1 0
1 0 vector
0 1
1
1
Do
1
1 vectoc4
Return
Advances in Knowledge Discovery and Data Mining
UM Fayyad, G Piatetsky-Shapiro, P Smyth, R…
Data Mining: Concepts and Techniques
J Han, M Kamber - 2001…
r
Search with
semantic
modeling
Experts
Expertise
conferences
Modeling using semantic topics
Topics
Data mining
Data
mining
0.4
Association Rules
0.2
Database systems
0.15
0.1
Data management
0.05
0.02 Web databases
Information systems
28
Return
Expertise
papers
Challenges
write
write
1. How to model the
heterogeneous academic
network?
Cite
Cite
Cite
write
write
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Co-author
Co-author
Co-author
Co-author
Co-author
Co-author
Co-write
Co-write
Cite
Cite
Cite
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Co-write
Co-write
2. How to capture the link
information for ranking
objects in the academic
network?
29
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Cite
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
write
write
publish
publish
publish
PC
PC member
member
PC
member
chair
chair
chair
publish
publish
publish
publish
publish
publish
Cite
Cite
Cite
Modeling the Academic Network
α
β
θ
Φ
A
θ
Φ
AC
T
α
x
z
c
Nd
μ
T
D
z
c
x
T
z
w
Nd
w
Nd
c
D
D
ψ
η,σ2
T
ACT1
conference
ACT2
Author-Conference-Topic Model [Tang et al., 08]
30
Φ
A
ad
x
β
θ
ad
w
Topic
β
words
authors
ad
α
ACT3
Integrating Topic Model into Random Walk
Random walk over the
academic network
Modeling academic
network with topics
write
Cite
write
------------------------------------------------------------------------------------------------------------------------
Co-author
-------------------------------------------------------------------------------------------------------------------------
Cite
Co-write
Co-author
Cite
+
--------------------------------------------------------------------------------------------------------------------
Co-write
------------------------------------------------------------------------------------------------
write
publish
publish
chair
PC member
publish
Author-Conference-Topic
Model [Tang et al., 08]
31
Cite
Combination Method 1
Author Graph Ge
Prof. Tang
Prof.
Wang
λdd
Conference
Graph Gc
IJCAI
Stage 1:
Random walk
ISWC
Jing Zhang
λed
λde
Paper Graph Gp λcd
Ranking score
WWW
EOS...
Tree CRF...
Association...
λdc
Combination by
multiplication
Topic layer
Data
mining
Stage 2.
Topic-based
relevance
Topic-based
relevance score
Query
ISWC
IJCAI
WWW
...
EOS...
Tree CRF...
Association...
Prof. Tang
Prof.
Wang
Jing Zhang
32
...
Combination Method 2
Author Graph Ge
Prof. Tang
Prof.
Wang
Conference
Graph Gc
λdd
Ranking score
ISWC
IJCAI
Jing Zhang
λed
λde
Paper Graph Gp λcd
WWW
EOS...
Tree CRF...
Association...
λdc
Transition probability
λdt
λtd
Hidden Theme
Graph Gt
pos
Web
service
λqt
Query:
ontology alignment
33
owl
λtq
Learning to Rank Experts
• Combining more information
Empirical loss
weight
feature

2
a
b 

min   1  zTi wT , xTi  xTi
  wT 2 


wT

 i 1

Model penalty
n2
Language model,
BM25, tf*idf
34
Heterogeneous Cross-domain Ranking
Query: “data mining”
Papers
Principles of Data Mining
Conferences
Data Mining: Concepts and
Techniques
KDD
?
SDM
?
Authors
ICDM
?
author/
paper
conf
P. Yu
PAKDD
?
?
Loss in one domain
Loss in another domain

min   1  z Si w S , x Sai  x Sbi   C  1  zTi wT , xTai  xTbi    W




w S , wT
i 1
 i 1
n1
Common feature space
 n1 
min   1  z Si w S , U T x Sai  x Sbi
w S , wT ,U
 i 1 

35
n2

n2

  C 1  z w , U T x a  x b

Ti
T
Ti
Ti


i 1

2,1 

2

  W


2 ,1 

2
Learning Algorithm
• Equivalent objective function:
Optimize the loss function for
each domain
Common space discovery
Optimize the weight
via the common space
36
Experimental Results
• Data sets
– Homogeneous Data
• LETOR 2.0: TREC2003, TREC2004, and OHSUMED
– Heterogeneous Data
• Academic network consisting of 14,134 authors, 10,716
papers, and 1,434 conferences.
– Heterogeneous Tasks
• Expert finding vs. Bole search
• Baselines
– RSVM
– Language model
37
Results on Homogeneous Data
38
Results on Heterogeneous Data
39
Results on Heterogeneous Tasks
• Expert finding verse Bole search (finding best supervisor)
• To obtain ground truth of bole for each query
– We sent emails to 50 senior researchers and 50 junior researchers
(91.6% are post doc or graduates)
– Average their feedbacks
40
CT4: Social Influence Analysis
(KDD’10, KDD’10, KDD’09, ICDM’09, JIS)
• How to quantify the influence between users?
• What is the relationship between users?
• How to discover topic distribution over links?
Dr. Tang
Association...
cite
• Can we predict the user’s actions?
SVM...
publish
write
write
publish WWW
Tree CRF...
publish
write
Limin
cite
publish
ISWC
cite
Prof. Wangpublish
cite
publish
write
EOS... Semantic...
write
Annotation...
write
Pc member
write
41
Prof. Li
coauthor
Write
coauthor
IJCAI
Topic-based Social Influence Analysis
• Social network -> Topical influence network
42
Social Influence Sub-graph on ―Data mining‖
43
Influential nodes on different topics
44
CT4: Social Influence Analysis
(KDD’10, KDD’10, KDD’09, ICDM’09, JIS)
• How to quantify the influence between users?
• What is the relationship between users?
• How to discover topic distribution over links?
Dr. Tang
Association...
cite
• Can we predict the user’s actions?
SVM...
publish
write
write
publish WWW
Tree CRF...
publish
write
Limin
cite
publish
ISWC
cite
Prof. Wangpublish
cite
publish
write
EOS... Semantic...
write
Annotation...
write
Pc member
write
45
Prof. Li
coauthor
Write
coauthor
IJCAI
Mining Advisor-Advisee Relationship
from Research Publication Networks
Input:T em poral
collaboration netw ork
O utput:R elationship analysis
1999
A da
22000
000
(0.9,[/,1998])
A da
B ob
(0.4,
[/,1998])
(0.5,[/,2000])
2000
2001
Y ing
2002 Sm ith
(0.8,[1999,2000])
Jerry
(0.7,
[2000,2001])
B ob
Y ing
2003
2004
46
(0.65,[2002,2004])
Sm ith
(0.49,
[/,1999])
Jerry
(0.2,
[2001,2003])
V isualized chorologicalhierarchies
Results
47
Application: visualization
RULE
48
TPFG
Bole Search In Arnetminer
An example on a real
system: Arnetminer
49
Performance improvement
Results (cont.)
50
CT4: Social Influence Analysis
(KDD’10, KDD’10, KDD’09, ICDM’09, JIS)
• How to quantify the influence between users?
• What is the relationship between users?
• How to discover topic distribution over links?
Dr. Tang
Association...
cite
• Can we predict the user’s actions?
SVM...
publish
write
write
publish WWW
Tree CRF...
publish
write
Limin
cite
publish
ISWC
cite
Prof. Wangpublish
cite
publish
write
EOS... Semantic...
write
Annotation...
write
Pc member
write
51
Prof. Li
coauthor
Write
coauthor
IJCAI
Examples – Topic distribution analysis over citations
Researcher A
• an in-depth understanding
of the research field?
An Inverted Index
Implementation
Introduction of Modern
Information Retrieval
Topics
Filtered
Document Retrieval with
Frequency-Sorted Indexes
Parameterised Compression for
Sparse Bitmaps
Memory Efficient
Ranking
Topic 31: Ranking and Inverted Index
Topic 1 : Theory
Topic 27: Information retrieval
Topic 23: Index method
Signature les: An access Method
for Documents and
its Analytical Performance
Evaluation
Topic 21: Framework
Self-Indexing Inverted Files for
Fast Text Retrieval
Topic 34: Parallel computing
VS.
Topic 22: Compression
A Document-centric Approach
to Static Index Pruning in Text
Retrieval Systems
Other
Vector-space Ranking with
Effective Early Termination
Citation Relationship Type
Efficient Document Retrieval in
Main Memory
Semantic citation network
52
Static
Index Pruning for Information
Retrieval Systems
Original citation network
Basic theory
Comparable work
Other
Problem: Link Semantic Analysis
Citation context
words
Link semantics
53
Topic modeling
over links
Pairwise Restricted Boltzmann Machines
(PRBMs)
Link category
Latent variables
defined over the
link to bridge the
two pages
Topic distribution
Example
54
Link context
words
Pairwise Restricted Boltzmann
Machines (PRBMs)
Accuracy of Link Categorization
Tested on Arnetminer
citation data
gPRBM: our approach
with generative
learning
dPRBM: our approach
with discriminative
learning
hPRBM: our approach
with hybrid learning
55
CT4: Social Influence Analysis
(KDD’10, KDD’10, KDD’09, ICDM’09, JIS)
• How to quantify the influence between users?
• What is the relationship between users?
• How to discover topic distribution over links?
Dr. Tang
Association...
cite
• Can we predict the user’s actions?
SVM...
publish
write
write
publish WWW
Tree CRF...
publish
write
Limin
cite
publish
ISWC
cite
Prof. Wangpublish
cite
publish
write
EOS... Semantic...
write
Annotation...
write
Pc member
write
56
Prof. Li
coauthor
Write
coauthor
IJCAI
What can we do in SNS?
57
Social Action
Twitter
Flickr
KDD
Add favorites
Comment on Haiti
Earthquake
58
Publish in KDD
Conference
Action
Comment on Haiti
Earthquake
Time t
Time t+1
1. Always watch news
2. Enjoy sports
3. …
59
Attribute
Results
60
Arnetminer Today
— A brief summary
61
ArnetMiner’s History
• 2006/5, V0.1 Perl-based CGI version
– Profile extraction, person/paper/conf. search
• 2006/8, V1.0 Java (Demo @ ASWC)
– Rewrite the above functions
• 2007/7, V2.0 (Demo @ KDD, ISWC)
– New: survey search, research interest, association search
• 2008/4, V3.0 (Demo @ WWW)
– Query understanding, New search GUI, log analysis
• 2008/11, V4.0 (Demo @ KDD, ICDM)
– Graph search, topic mining, NSFC/NSF
• 2009/4, V5.0 (Demo @ KDD)
– Bole/course search, profile editing, open resources, #citation
• 2009/12, V6.0
– Academic statistics, user feedbacks, refined ranking
• V7.0, coming soon
– Name disambiguation, reviewer assignment, supervisor suggestion, open API
62
ArnetMiner Today
* Arnetminer data:
> 0.6 M researcher profiles
> 3M papers
> 17M citation relationships
> 5K conferences
> 50M logs
Messages from Users
… I’ve happened to visit your Arnetminer, and
shocked. It was really impressive, its usefulness
and your works!!! … [from …@selab.snu.ac.kr]
…I would first of all congratulate you on the
excellent work you have done in Arnetminer and I
am much inspired… [from …@nu.edu.pk]
a survey by UK Southampton
… Arnetminer is one of my favorite tools to find
* Visits
come
from
more
than
190
Title: Semantic Technologies for Learning
and
Teaching
in Web
2.0. [from …@qlink.com]
folk
and
academic
relatives…
Top
10
countries
— Thanassis Tiropanis, Hugh Davis, Dave Millard, Mark Weal
countries
1.outside
Canada
Dr.
JieUSA
Tang,
…Exposing the expertise
of the institution
to the
world in order6.
to attract
* Continuously
+20% increase
of Dear
our example
papers (http://www.waset.org
)
2.include
China
Japan
funding
students. ArnetMiner is the Can
mostyou
representative
of7.such
tools
visits
per and
month
in your 3.
Arnetminer?...
at the moment…
Germany[from …@waset.org]
8. Taiwan
4.across
9.
France
.. top top!
I India
am very
interested
in your
ArnetMiner.
Contextualised
queries
and searches,
repositories
potentially
in
* >100,000
page
views
per day searches
that possible
give me
a bit of10.
yourItaly
social
5. of
UK
different departments or institutions, andIsmatching
people
for collaborative
data…
activities. Best example of the surveyed network
technologies
to[from
this …@cse.ust.hk]
end is ArnetMiner.
63
Opportunity: exploiting semantic web and social network
in the real-world
Web, relational data,
ontological data,
social data
Data Mining and Social Network techniques
Scientific
Literature
Social search
& mining
Advertisement
Mobile Context
Energy trend
analysis
Large-scale
Mining
Users cover >180
countries
>600K researcher
>3M papers
Social network
Extraction
Social network
Mining
Advertisement
Recommendation
Mobile search
& recommendation
Energy product
Evolution
Techniques
Trend
Scalable algorithms
for message tagging
and community
Discovery
Arnetminer.org
IBM
Sohu
Nokia
Oil Company
Google
科技信息资源内容监测与分析服务平台 (中国科技部信息情报研究所)
Search, browsing, complex query, integration, collaboration, trustable
analysis, decision support, intelligent services,
64
Arnetminer
PatentMiner
65
ScopusMiner
PubmedMiner
CheMiner
Representative Publications
•
•
•
•
•
•
•
•
•
•
•
•
Jie Tang, Jing Zhang, Ruoming Jin, Zi Yang, Keke Cai, Li Zhang, and Zhong Su. Topic Level Expertise
Search over Heterogeneous Networks. Machine Learning Journal.
Jie Tang, Limin Yao, Duo Zhang, and Jing Zhang. A Combination Approach to Web User Profiling. ACM
TKDD, 2010.
Juanzi Li, Jie Tang, Yi Li, Qiong Luo. RiMOM: A Dynamic Multi-Strategy Ontology Alignment Framework.
IEEE TKDE, 2009.
Chenhao Tan, Jie Tang, Jimeng Sun, Quan Lin, and Fengjiao Wang. Social Action Tracking via Noise
Tolerant Time-varying Factor Graphs. KDD’10.
Chi Wang, Jiawei Han, Yuntao Jia, Duo Zhang, Yintao Yu, Jie Tang, Jingyi Guo. Mining Advisor-Advisee
Relationships from Research Publication Networks. KDD’10.
Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social Influence Analysis in Large-scale Networks. KDD'09.
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of
Academic Social Networks. KDD’08.
Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang. Email Data Cleaning. KDD’05.
Jie Tang, Ho-fung Leung, Qiong Luo, Dewei Chen, and Jibin Gong. Towards Ontology Learning from
Folksonomies. IJCAI’09.
Qian Zhong, Hanyu Li, Juanzi Li, Guotong Xie, Jie Tang, Lizhu Zhou. A Gauss Function based Approach for
Unbalanced Ontology Matching. SIGMOD’09.
Feng Shi, Juanzi Li, Jie Tang. Actively Learning Ontology Matching via User Interaction. ISWC’09.
Chonghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng, and Tiejun Zhao. A Unified Tagging Approach to Text
Normalization. ACL’07.
Others: ICDM’07-09, CIKM’07-09, SDM’09, ISWC’06, DKE, JIS, etc.
66
Thanks!
Demo: http://arnetminer.org
HP: http://keg.cs.tsinghua.edu.cn/persons/tj/
67