Download Chapter 1 - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
Chapter 1
Introduction
1
2
The Web
redefines the meanings and processes of
business, commerce, marketing, publishing,
education, research, government, and
development, as well as other aspects of
our daily life.
3
What’s the difference?
4
New challenges of the web
 Size
 Complexity
 we need to modify or enhance existing
theories and technologies to deal with the
size and complexity of the web
5
What is WI?
“Web Intelligence (WI) exploits Artificial
Intelligence (AI) and advanced Information
Technology (IT) on the Web and Internet.”
AI
IT
WI
6
Web Intelligence (WI)
 The
term WI was conceived in late 1999
 A recent sub discipline in computer
science, first WI conference was the AsiaPacific Conference on WI-2001
7
Intelligent Web
 Learning
new knowledge from the Web
 Searching for relevant information
 Personalized web pages
 Learning about individual users
8
Information Retrieval
9
Information Retrieval (IR)
 As
soon as information archives started
building, so did information retrieval
techniques.

Catalogues, index, table of contents
 Computerized
information storage and
retrieval from 1950 and 60’s
 Renewed interest after the advent of the
Web
10
Figure 1.1 Timeline of
information and retrieval
(Courtesy of Ned Fielden, San
Francisco State University)
11
Modern Information Retrieval
 Document
representation
 Query representation
 Retrieval model
 Similarity between document and
query
 Rank the documents
 Performance evaluation of the
retrieval process
12
Semantic Web
13
Keywords versus Semantics
 The
traditional IR is limited by keywords
 Key phrases can be used to introduce a
bit of semantics
 Semantic Web is an emerging area
14
Semantic Web
 The
Semantic Web proposed by Tim
Berners-Lee, the developer of the
World Wide Web
 The
Semantic Web is concerned with
the representation of data on the
World Wide Web.
 W3C, researchers and industrial
partners
15
Web Mining
16
Data Mining Applied to Web
 Data
mining is the process of discovering
knowledge from large amount of data
 Used significantly in commercial and
scientific applications
 Adjustment needs to be made for the
Web
17
Data Mining
 Clustering:
Finding natural groupings of
users or pages
 Classification and prediction:
Determining the class or behavior of a
user or resource
 Associations: Determining which URLs
tend to be requested together
18
Web Mining
 Web
content mining
 Web
structure mining
 Web
usage mining
 Applied
to primary data on the Web,
text and multimedia documents
 Hyperlink
analysis
 Secondary
data consisting of user
interaction with the Web
 User
profiles
19
Figure 1.2 Web mining classifications (Courtesy of O. Romanko, 2002)
20
Web Usage Mining
21
Web Usage Mining
 Study
of data generated by the
surfer’s sessions or behaviors
 Works with the secondary data from
user’s communications with the Web

web logs, proxy-server logs, browser logs
A
Web-access log is an inventory of
page-reference data

referred to as clickstream data, as each
entry corresponds to a mouse click
 Cookies
22
Figure 1.3 High level web usage mining process
(Courtesy of Srivastava et al., 2000)
Web Usage Mining
 Logs
can be observed from two angles:
 Server:
to advance the design of a website.
 Client: assessing a client’s sequence of clicks.
 Useful
for caching of pages
 Efficient loading of Web pages
 Helps
organizations efficiently market
their products on the Web.
 Can supply essential information on how
to restructure a website
23
24
Applications of Web Usage
Mining
Figure 1.4 Applications of web usage mining (Courtesy of O. Romanko, 2002;
Courtesy of Srivastava et al., 2000)
25
Web Content Mining
26
Web Content Mining
 Text


mining
Traditional information retrieval
Semantic Web
 Multimedia



Images
Audio
Video
 Web
crawlers
27
Figure 1.5 Architecture of a search engine (Courtesy of O. Romanko, 2002)
28
Web Structure Mining
29
Web-Structure Mining
Finding
the model underlying
the link structures of the Web,
 classify
web pages.
 similarity and relationship
between various websites
30
Web Structure Mining
 Algorithms



PageRank
HITS
CLEVER
 Primarily
to model web topology
useful as a technique for
computing the rank of every web
page
 Assumption: if one web page points to
another web page, then the former is
approving the significance of the
latter.
31
Why Web Intelligence?
32
Build Better Web Sites Using
Intelligent Technologies
 Better
keyword and key-phrase based
search
 Multimedia information retrieval using Web
content mining
 Analyze the shopping trends using data
mining
 Improve access to website by studying
Web usage
 Improved structure using Web structure
mining
33
Benefits of Intelligent Web
 Matching
existing resources to a
visitor’s interests
 Boost the value of visitors
 Enhance the visitor’s experience on
the web site
 Achieve targeted resource
management
 Test the significance of content and
web site architecture