Download Construction of Web-Based, Service-Oriented Information Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Types of artificial neural networks wikipedia , lookup

Transcript
Construction of Web-Based, Service-Oriented
Information Networks:
A Data Mining Perspective
(Abstract)
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL 61801, U.S.A.
[email protected]
https://www.cs.uiuc.edu/homes/hanj
Abstract. Mining directly on the existing networks formed by explicit
webpage links on the World-Wide Web may not be so fruitful due to
the diversity and semantic heterogeneity of such web-links. However,
construction of service-oriented, semi-structured information networks
from the Web and mining on such networks may lead to many exciting
discoveries of useful information on the Web. This talk will discuss this
direction and its associated research opportunities.
The World-Wide Web can be viewed as a gigantic information network, where webpages are the nodes of the network, and links connecting those pages form an intertwined, gigantic network. However, due to the unstructured nature of such a
network and semantic heterogeneity of web-links, it is difficult to mine interesting
knowledge from such a network except for finding authoritative pages and hubs.
Alternatively, one can also view that Web is a gigantic repository of multiple information sources, such as universities, governments, companies, news, services, sales
of commodities, and so on. An interesting problem is whether this view may provide any new functions for web-based information services, and if it does, whether
one can construct such kind of semi-structured information networks automatically or semi-automatically from the Web, and whether one can use such new kind
of networks to derive interesting new information and expand web services.
In this talk, we take this alternative view and examine the following issues:
(1) what are the potential benefits if one can construct service-oriented, semistructured information networks from the World-Wide Web and perform data
mining on them, (2) whether it is possible to construct such kind of serviceoriented, semi-structured information networks from the World-Wide Web automatically or semi-automatically, and (3) research problems for constructing
and mining Web-Based, service-oriented, semi-structured information networks.
This view is motivated from our recent work on (1) mining semi-structured
heterogeneous information networks, and (2) discovery of entity Web pages and
their corresponding semantic structures from parallel path structures.
H. Gao et al. (Eds.): WAIM 2012, LNCS 7418, pp. 17–19, 2012.
c Springer-Verlag Berlin Heidelberg 2012
18
J. Han
First, real world physical and abstract data objects are interconnected, forming
gigantic, interconnected networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected
social media and social networks, scientific, engineering, or medical information
systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. For example, in a medical care
network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together,
providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge. Our recent studies show that the semi-structured
heterogeneous information network model leverages the rich semantics of typed
nodes and links in a network and can uncover surprisingly rich knowledge from
interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining interconnected data.
The examples to be used in this discussion include (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta pathbased link/relationship prediction, (5) relation strength-aware mining, as well as
a few other recent developments.
Second, it is not easy to automatically or semi-automatically construct serviceoriented, semi-structured, heterogeneous information networks from the WWW.
However, with the enormous size and diversity of WWW, it is impossible to
construct such information networks manually. Recently, there are progresses on
finding entity-pages and mining web structural information using the structural
and relational information on the Web. Specifically, given a Web site and an
entity-page (e.g., department and faculty member homepage) it is possible to find
all or almost all of the entity-pages of the same type (e.g., all faculty members
in the department) by growing parallel paths through the web graph and DOM
trees. By further developing such methodologies, it is possible that one can
construct service-oriented, semi-structured, heterogeneous information networks
from the WWW for many critical services. By integrating methodologies for
construction and mining of such web-based information networks, the quality of
both construction and mining of such information networks can be progressively
and mutually enhanced.
Finally, we point out some open research problems and promising research
directions and hope that the construction and mining of Web-based, serviceoriented, semi-structured heterogeneous information networks will become an
interesting frontier in the research into Web-aged information management systems.
References
1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine.
In: Proc. 7th Int. World Wide Web Conf. (WWW 1998), Brisbane, Australia, pp.
107–117 (April 1998)
Web-Based Information Network
19
2. Ji, M., Han, J., Danilevsky, M.: Ranking-based classification of heterogeneous information networks. In: Proc. 2011 ACM SIGKDD Int. Conf. on Knowledge Discovery
and Data Mining (KDD 2011), San Diego, CA (August 2011)
3. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Proc. 2010 European Conf. Machine Learning and Principles and Practice of Knowledge Discovery
in Databases (ECMLPKDD 2010), Barcelona, Spain (September 2010)
4. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46,
604–632 (1999)
5. Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. PVLDB 5, 394–405 (2012)
6. Sun, Y., Barber, R., Gupta, M., Aggarwal, C., Han, J.: Co-author relationship prediction in heterogeneous bibliographic networks. In: Proc. 2011 Int. Conf. Advances
in Social Network Analysis and Mining (ASONAM 2011), Kaohsiung, Taiwan (July
2011)
7. Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and
Methodologies. Morgan & Claypool Publishers (2012)
8. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: Meta path-based top-k
similarity search in heterogeneous information networks. In: Proc. 2011 Int. Conf.
Very Large Data Bases (VLDB 2011), Seattle, WA (August 2011)
9. Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: Integrating
clustering with ranking for heterogeneous information network analysis. In: Proc.
2009 Int. Conf. Extending Data Base Technology (EDBT 2009), Saint-Petersburg,
Russia (March 2009)
10. Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information
networks with star network schema. In: Proc. 2009 ACM SIGKDD Int. Conf.
Knowledge Discovery and Data Mining (KDD 2009), Paris, France (June 2009)
11. Wang, C., Han, J., Jia, Y., Tang, J., Zhang, D., Yu, Y., Guo, J.: Mining advisoradvisee relationships from research publication networks. In: Proc. 2010 ACM
SIGKDD Conf. Knowledge Discovery and Data Mining (KDD 2010), Washington
D.C. (July 2010)
12. Weninger, T., Danilevsky, M., Fumarola, F., Hailpern, J., Han, J., Ji, M., Johnston,
T.J., Kallumadi, S., Kim, H., Li, Z., McCloskey, D., Sun, Y., TeGrotenhuis, N.E.,
Wang, C., Yu, X.: Winacs: Construction and analysis of web-based computer science information networks. In: Proc. 2011 ACM SIGMOD Int. Conf. Management
of Data (SIGMOD 2011) (system demo), Athens, Greece (June 2011)
13. Weninger, T., Fumarola, F., Lin, C.X., Barber, R., Han, J., Malerba, D.: Growing
parallel paths for entity-page discovery. In: Proc. 2011 Int. World Wide Web Conf.
(WWW 2011), Hyderabad, India (March 2011)