Download Web Intelligence (WI) - Department of Software and Information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Web Intelligence (WI)
Definition, Research Challenges
and Major Tools
Yang Chen
UNC Charlotte
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
A Brief History of WI
• 1999: Collaborative research initiatives
– Ning Zhong, Data Mining and Knowledge Systems
– Jiming Liu, Intelligent agents and multi-agents
– Yiyu Yao, Information retrieval and intelligent
information systems
• Combined research efforts with common
goal: create a new sub-discipline covering
theories and techniques related to web
information.
A Brief History of WI
• 2000: Publication of a two-page position
paper on WI (Zhong, Liu, Yao, Ohsuga,
COMPSAC 2000)
A Brief History of WI
• 2001: First Asia-Pacific Conference on Web
Intelligence
• 2002: Publication of first special issue on WI in
IEEE Computer
• 2002: Web Intelligence Consortium
• 2003: First edited book on WI
• 2005: The international WIC Institute
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Trends and Research Agenda
Major Web Intelligence Tools
Conclusion
Motivation
• The sheer size of Web
– Difficulties in the storage, management, and
efficient and effective retrieval
• Complexity of Web
– Heterogeneous collection of structured,
unstructured, semi-structured, interrelated,
and distributed Web documents
– Consist texts, images and sounds
Motivation
Web Intelligence on the Web
Industrial Interests in WI
• Web Intelligence kis-lab.com/wi01/
• Web-Intelligence Home Page
– www.web-intelligence.com/
• Intelligence on the Web
– www.fas.org/irp/intelwww.html
• WIN: home WEB INTELLIGENCE NETWORK,
– smarter.net/
• CatchTheWeb - Web Research, Web Intelligence
Collaboration www.catchtheweb.com/
• Infonoia: Web Intelligence In Your Hands
– www.infonoia.com/myagent/en/baseframe.html
Motivations
• Data production on the Web is at an
exponential growth rate.
• A fast growing industrial interest in WI
• Only a few academic papers
• We need to narrow the gap between
industry needs and academic research.
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
What is Web Intelligence
• Web Intelligence (WI) exploits the fundamental
and practical impact that advanced Information
Technology (IT) and innovative Artificial
Intelligence (AI) will have on the Web:
– Integration of IT with AI
– Applications of AI on the Web
Web Intelligence System
Based on Zhong`s AWIC03
keynote talk
An Example
Advanced Questions
• How the customer enters VIP portal in
order to target products and manage
promotions and marketing campaigns?
• What is the semantic association between
the pages the customer visited?
• Is the visitor familiar with the Web
structure? Or is he or she a new user or a
random one?
• Is the visitor a Web robot or other users?
• …
Advanced WI System
• Making a dynamic recommendation to a
Web user based on the user profile and
usage behavior;
• Automatic modification of a website’s
contents and organization;
• Combining Web usage data with
marketing data to give information about
how visitors used a website.
Advanced WI System
Perspectives of WI
• WI can be classified into four categories
(based on Russel & Norvig`s scheme)
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Research Agenda
Major Web Intelligence Tools
Conclusion
Research Agenda of WI
• Semantic Web mining and automatic
construction of ontologies
• Social network intelligence
The Semantic Web
• The Semantic Web is based on languages
that make more of the semantic content of
the page available in machine-readable
formats for agent-based computing.
A “semantic” language that ties the
information on a page to machine
readable semantics (ontology).
Components of Semantic Web
• A unifying data model such as RDF.
• Languages with defined semantics, built on
RDF, such as OWL (DAML+OIL).
• Ontologies of standardized terminology for
marking up Web resources.
• Tools that assist the generation and processing
of semantic markup.
Ontologies provides the semantic backbone for
Semantic Web applications.
Ontologies offer
• Communication
– Normative models, Networks of relationships
• Sharing & Reuse
– Specifications, Reliability
• Control
– Classification, and Finding, sharing,
discovering relationships
Categories of Ontologies
• A domain-specific ontology describes a welldefined technical or business domain.
• A task ontology might be either domain-specific
or reconstructed from a set of domain-specific
ontologies for meeting the requirement of a task.
• A universal ontology describes knowledge at
higher levels.
Research Agenda of WI
• Semantic Web mining and automatic
construction of ontologies
• Social network intelligence
The Web as a Graph
• We can view the Web as a directed social
network that connects people
(organizations or social entities).
• Research Questions:
• How big is the graph? (outdegree and indegree)
• Can we browse from any page to any other? (clicks)
• Can we exploit the structure of the Web? (searching and mining)
• How to discover and manage the Web communities?
• What does the Web graph reveal about social dynamics?
Social Network Intelligence
Social Network
Outline
•
•
•
•
•
•
A brief history of Web Intelligence
Motivations for WI
Definition and Perspectives of WI
Trends and Research Agenda
Major Web Intelligence Tools
Conclusion
Major Web Intelligence Tools
•
I. Collection
– Offline Explorer
– SpidersRUs (AI Lab)
– Google Scholar
•
II. Analysis (Data and Text Mining)
– Google APIs
– Google Translation
– GATE
– Arizona Noun Phraser (AI Lab)
– Self-Organizing Map, SOM (AI Lab)
– Weka
•
III. Visualization
– NetDraw
– JUNG
– Analyst’s Notebook and Starlight
Collection: Offline Explorer
Project list
Project properties setup window
Download
URLs
File filters, URL filters,
and other advanced
properties.
Download
level
File modification
check
Analysis: Google APIs
•
Google provides many APIs to help you quickly develop your own applications.
http://code.google.com/more/
•
Examples of Google APIs:
– Google API for Inlink: Discovers what pages link to your website.
– Google Data APIs: Provide a simple, standard protocol for reading and writing
data on the Web. Several Google services provide a Google Data API, including
Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web
Albums.
– Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google
search box and display search results in your own Web pages.
– Google Analytics: Allows users gather, view, and analyze data about their
Website traffic. Users can see which content gets the most visits, average page
views and time on site for visits.
– Google Safe Browsing APIs: Allow client applications to check URLs against
Google's constantly-updated blacklists of suspected phishing and malware
pages.
– YouTube Data API: Integrates online videos from YouTube into your
applications.
GATE
• Information Extraction tasks:
– Named Entity Recognition (NE)
• Finds names, places, dates, etc.
– Co-reference Resolution (CO)
• Identifies identity relations between entities in texts.
– Template Element Construction (TE)
• Adds descriptive information to NE results (using CO).
– Template Relation Construction (TR)
• Finds relations between TE entities.
– Scenario Template Production (ST)
• Fits TE and TR results into specified event scenarios.
• GATE also includes:
– Parsers, stemmers, and Information Retrieval tools;
– Tools for visualizing and manipulating ontology; and
– Evaluation and benchmarking tools.
GATE
Attributes
oject information
Results display
SOM
• The multi-level self-organizing map neural network
algorithm was developed by Artificial Intelligence Lab at
the University of Arizona.
– Using a 2D map display, similar topics are positioned
closer according to their co-occurrence patterns;
more important topics occupy larger regions.
SOM
Topic
Topic
region
Different
Topics
# of
documents
belonging to
this topic
Warm colors
represent
new topics.
Visualization: JUNG
• The Java Universal Network/Graph Framework (JUNG) is a
software library for the modeling, analysis, and visualization of data
that can be represented as a graph or network. It was developed by
School of Information and Computer Science at the University of
California, Irvine.
http://jung.sourceforge.net/index.html
• The current distribution of JUNG includes implementations of a
number of algorithms from graph theory, data mining, and social
network analysis:
– Clustering
– Decomposition
– Optimization
– Random Graph Generation
– Statistical Analysis
– Calculation of Network Distances and Flows and Importance
Measures (Centrality, PageRank, HITS, etc.).
JUNG
Examples of visualization types
Conclusion
• The marriage of hypertext and internet
leads to a revolution: the Web.
• The marriage of Artificial Intelligence and
Advanced Information Technology, on the
platform of Web, will lead to another
paradigm shift: the Intelligent and Wisdom
Web.
Thank You
Any Question?