Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 1 Introduction 1 2 The Web redefines the meanings and processes of business, commerce, marketing, publishing, education, research, government, and development, as well as other aspects of our daily life. 3 What’s the difference? 4 New challenges of the web Size Complexity we need to modify or enhance existing theories and technologies to deal with the size and complexity of the web 5 What is WI? “Web Intelligence (WI) exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet.” AI IT WI 6 Web Intelligence (WI) The term WI was conceived in late 1999 A recent sub discipline in computer science, first WI conference was the AsiaPacific Conference on WI-2001 7 Intelligent Web Learning new knowledge from the Web Searching for relevant information Personalized web pages Learning about individual users 8 Information Retrieval 9 Information Retrieval (IR) As soon as information archives started building, so did information retrieval techniques. Catalogues, index, table of contents Computerized information storage and retrieval from 1950 and 60’s Renewed interest after the advent of the Web 10 Figure 1.1 Timeline of information and retrieval (Courtesy of Ned Fielden, San Francisco State University) 11 Modern Information Retrieval Document representation Query representation Retrieval model Similarity between document and query Rank the documents Performance evaluation of the retrieval process 12 Semantic Web 13 Keywords versus Semantics The traditional IR is limited by keywords Key phrases can be used to introduce a bit of semantics Semantic Web is an emerging area 14 Semantic Web The Semantic Web proposed by Tim Berners-Lee, the developer of the World Wide Web The Semantic Web is concerned with the representation of data on the World Wide Web. W3C, researchers and industrial partners 15 Web Mining 16 Data Mining Applied to Web Data mining is the process of discovering knowledge from large amount of data Used significantly in commercial and scientific applications Adjustment needs to be made for the Web 17 Data Mining Clustering: Finding natural groupings of users or pages Classification and prediction: Determining the class or behavior of a user or resource Associations: Determining which URLs tend to be requested together 18 Web Mining Web content mining Web structure mining Web usage mining Applied to primary data on the Web, text and multimedia documents Hyperlink analysis Secondary data consisting of user interaction with the Web User profiles 19 Figure 1.2 Web mining classifications (Courtesy of O. Romanko, 2002) 20 Web Usage Mining 21 Web Usage Mining Study of data generated by the surfer’s sessions or behaviors Works with the secondary data from user’s communications with the Web web logs, proxy-server logs, browser logs A Web-access log is an inventory of page-reference data referred to as clickstream data, as each entry corresponds to a mouse click Cookies 22 Figure 1.3 High level web usage mining process (Courtesy of Srivastava et al., 2000) Web Usage Mining Logs can be observed from two angles: Server: to advance the design of a website. Client: assessing a client’s sequence of clicks. Useful for caching of pages Efficient loading of Web pages Helps organizations efficiently market their products on the Web. Can supply essential information on how to restructure a website 23 24 Applications of Web Usage Mining Figure 1.4 Applications of web usage mining (Courtesy of O. Romanko, 2002; Courtesy of Srivastava et al., 2000) 25 Web Content Mining 26 Web Content Mining Text mining Traditional information retrieval Semantic Web Multimedia Images Audio Video Web crawlers 27 Figure 1.5 Architecture of a search engine (Courtesy of O. Romanko, 2002) 28 Web Structure Mining 29 Web-Structure Mining Finding the model underlying the link structures of the Web, classify web pages. similarity and relationship between various websites 30 Web Structure Mining Algorithms PageRank HITS CLEVER Primarily to model web topology useful as a technique for computing the rank of every web page Assumption: if one web page points to another web page, then the former is approving the significance of the latter. 31 Why Web Intelligence? 32 Build Better Web Sites Using Intelligent Technologies Better keyword and key-phrase based search Multimedia information retrieval using Web content mining Analyze the shopping trends using data mining Improve access to website by studying Web usage Improved structure using Web structure mining 33 Benefits of Intelligent Web Matching existing resources to a visitor’s interests Boost the value of visitors Enhance the visitor’s experience on the web site Achieve targeted resource management Test the significance of content and web site architecture