Download Chapter 1 - WordPress.com

Chapter 1 Introduction 1 2 The Web redefines the meanings and processes of business, commerce, marketing, publishing, education, research, government, and development, as well as other aspects of our daily life. 3 What’s the difference? 4 New challenges of the web  Size  Complexity  we need to modify or enhance existing theories and technologies to deal with the size and complexity of the web 5 What is WI? “Web Intelligence (WI) exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet.” AI IT WI 6 Web Intelligence (WI)  The term WI was conceived in late 1999  A recent sub discipline in computer science, first WI conference was the AsiaPacific Conference on WI-2001 7 Intelligent Web  Learning new knowledge from the Web  Searching for relevant information  Personalized web pages  Learning about individual users 8 Information Retrieval 9 Information Retrieval (IR)  As soon as information archives started building, so did information retrieval techniques.  Catalogues, index, table of contents  Computerized information storage and retrieval from 1950 and 60’s  Renewed interest after the advent of the Web 10 Figure 1.1 Timeline of information and retrieval (Courtesy of Ned Fielden, San Francisco State University) 11 Modern Information Retrieval  Document representation  Query representation  Retrieval model  Similarity between document and query  Rank the documents  Performance evaluation of the retrieval process 12 Semantic Web 13 Keywords versus Semantics  The traditional IR is limited by keywords  Key phrases can be used to introduce a bit of semantics  Semantic Web is an emerging area 14 Semantic Web  The Semantic Web proposed by Tim Berners-Lee, the developer of the World Wide Web  The Semantic Web is concerned with the representation of data on the World Wide Web.  W3C, researchers and industrial partners 15 Web Mining 16 Data Mining Applied to Web  Data mining is the process of discovering knowledge from large amount of data  Used significantly in commercial and scientific applications  Adjustment needs to be made for the Web 17 Data Mining  Clustering: Finding natural groupings of users or pages  Classification and prediction: Determining the class or behavior of a user or resource  Associations: Determining which URLs tend to be requested together 18 Web Mining  Web content mining  Web structure mining  Web usage mining  Applied to primary data on the Web, text and multimedia documents  Hyperlink analysis  Secondary data consisting of user interaction with the Web  User profiles 19 Figure 1.2 Web mining classifications (Courtesy of O. Romanko, 2002) 20 Web Usage Mining 21 Web Usage Mining  Study of data generated by the surfer’s sessions or behaviors  Works with the secondary data from user’s communications with the Web  web logs, proxy-server logs, browser logs A Web-access log is an inventory of page-reference data  referred to as clickstream data, as each entry corresponds to a mouse click  Cookies 22 Figure 1.3 High level web usage mining process (Courtesy of Srivastava et al., 2000) Web Usage Mining  Logs can be observed from two angles:  Server: to advance the design of a website.  Client: assessing a client’s sequence of clicks.  Useful for caching of pages  Efficient loading of Web pages  Helps organizations efficiently market their products on the Web.  Can supply essential information on how to restructure a website 23 24 Applications of Web Usage Mining Figure 1.4 Applications of web usage mining (Courtesy of O. Romanko, 2002; Courtesy of Srivastava et al., 2000) 25 Web Content Mining 26 Web Content Mining  Text   mining Traditional information retrieval Semantic Web  Multimedia    Images Audio Video  Web crawlers 27 Figure 1.5 Architecture of a search engine (Courtesy of O. Romanko, 2002) 28 Web Structure Mining 29 Web-Structure Mining Finding the model underlying the link structures of the Web,  classify web pages.  similarity and relationship between various websites 30 Web Structure Mining  Algorithms    PageRank HITS CLEVER  Primarily to model web topology useful as a technique for computing the rank of every web page  Assumption: if one web page points to another web page, then the former is approving the significance of the latter. 31 Why Web Intelligence? 32 Build Better Web Sites Using Intelligent Technologies  Better keyword and key-phrase based search  Multimedia information retrieval using Web content mining  Analyze the shopping trends using data mining  Improve access to website by studying Web usage  Improved structure using Web structure mining 33 Benefits of Intelligent Web  Matching existing resources to a visitor’s interests  Boost the value of visitors  Enhance the visitor’s experience on the web site  Achieve targeted resource management  Test the significance of content and web site architecture

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 1 - WordPress.com