Download Research-Insight - Illinois Wiki TEST

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
Research-Insight
Providing Insight on Research by Publication
Network Analysis
Fangbo Tao, Kin Hou Lei, Yizhou Sun, Chi Wang, Tim Weninger,
Jiawei Han.
Motivation
 When doing research, what’s the confusing
part that an “information system” may help?




What’s your next research “big thing”?
Who is the guy you should collaborate with?
Which papers you need to read? Latest one? Related ones?
Which papers you need to cite?
 Global insight? Personalized insight!
 Previous works, common affiliation, social connections,
paper already read, etc…
Data Source: CSR-Net
 DBLP Dataset
 An information-rich CS publication network
 Mining Web Information
 Hierarchical Affiliation Info: University-Department-Research
Group
 Citation Data
 ArnetMiner & Citeseer
Functions We Want Support
 Similarity Search
 Ranking-based Clustering & Classification
 Literature Recommendation
 Collaboration Prediction
 Academic Profile Generation
 Historical Affiliations Prediction
 Academic Family Discovery
System Architecture
Similarity Search
 Example
 Given an author, find his/her top-k similar authors and
explain why (by showing the corresponding meta-path and
similarity measure.).
 Compare the pathsim with other measures
 SimRank, Personalized-PagerRank.
 Potential Extension
 Finding top-k most related heterogeneous typed objects
 “Christos Faloutsos”  related venues and terms
Ranking-based Clustering &
Classification
 Example
 Given a sub-network (DB, DM, IR, ML) and a desired number
of clusters, perform clustering and show the top-k objects for
each type. Do the same for the restricted network (DB).
 Classification is similar.
 Potential Extension
 User-provided constraints
 Specify one node belongs to a class/cluster
 Different Ranking rules
Literature Recommendation
 Traditional keyword-based search system (G-Scholar)
 Measure the document similarity between query and paper
 Combine Network Structural Similarity & Document
Similarity
 Academic History
 Research Community
Reading Recommendation
 Example: If a young researcher has published 10 papers
about two themes in three major confs. He wants know:
 Newly published paper along her/his research line
 Paper extending her/his research scope
 Classical papers along the theme but he has not cited
 Papers from same group/university for similar domain.
 Planned Solution
 Step1: Find a set of term clusters of a researcher’s work
 Step2: Find authors/venues/papers that are reputed in this
themes
 Step3: Recommend based on freshness, topic closeness,
influence of the paper, and structural closeness.
Citation Recommendation
 Example
 Given a set of authors, title and abstract of a planned
paper, return the papers should be cited. It may includes
influential original papers and recent published related ones.
 Solution:
 Two step approach [Yu et al.]: Use a meta-path-based
feature space to interpret structural information and build up
discriminative term bucket for citation prediction.
 Further more, combine both document similarity and
structural similarity.
Collaboration Prediction
 For each researcher with his publication history and
affiliation, one may get
 Advisor and group mates
 Other professor/student in the same institution in a related
discipline
 Researchers in same field but different affiliations
 For each recommended relationship, explain the reason
why such a prediction is made, showing weighted paths.
 Potential Extension
 Predict Collaboration given a specific research theme.
Academic Family, Affiliation History
 Example
 Given a researcher (Jure Leskovec), we’ll present his current
institution and likely time, as well as his historical institutions
and academic family.
Academic Family, Affiliation History
 Our solution’s based on a set of training data and a small set
of rules.
 “Advisor has more publication and long history than advisee at
the time of advising”
 “Once an advisee become advisor, will never become
advisee”
 The training data comes from web mined current affiliations.
 Academic Family Affiliation History
 Iterative constraint propagation will help uncover the hidden
affiliation history and academic family
Roadmap
 Data Cube:
 Efficient Summary
 Highly Structured Data.
 Rich Text:
 Topic Analysis, query answer
 Common: ASRS, IMDB, Publication-Net, News…
 Network (HIN)
 Good at mining, contains structural information.
 No information loss
One more thing: Rich text