Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012) Web Mining 1 Sonia Sharama 1 Assistant Professor in Computer Science,Bebe Nanakai University College, Mithra, (Kapurthala) Abstract --- In this paper we will discuss about web mining. Web mining technique is used to fetch knowledge form Web data. Web mining can be broadly defined as the search and measure of useful information from the World Wide Web. It provides automatic search of information resources available online, and the discovery of user access information from Web servers, i.e., Web usage mining. Web Mining is the extraction of useful patterns and implicit information related to the World Wide Web. Web content mining is the process of fetching knowledge from the documents or their descriptions. Finally, web usage mining, also known as Web Log Mining, is process of extracting interesting patterns in web access logs. III. Web Mining Keywords --- Web Mining,Web Content Mining,Web Usage Mining ,Web Structure Mining,Uses of Web Mining. I. INTRODUCTION With the growth of information resources available on World Wide Web, it has become necessary for users to utilize automated tools in find the desired information resources, and to track and analyze their usage patterns. These factors creating server side and clientside intelligent systems that can become mine for knowledge. Web mining can be defined as the search and measure of useful information from the World Wide Web. This describes the automatic search of information resources available online, i.e. Web content mining, and the search of user access patterns from Web servers, i.e., Web usage mining. II. WEB MINING T AXONOMY Web Mining involves analysis of web server logs of a website whereas data mining involves using Techniques to find relationships in large amounts of data. Web Content Web Usage Mining Mining IV. Web Structure WEB CONTENT MINING Web Content Mining is related to data mining and text mining. It is related to data mining because many data mining techniques can be applied in Web content "mining. It is related to text mining because much of the web content is text. Web data contents may involve the different types of data. These are: Text, Image, Audio, Video, Metadata and Hyperlinks. It discovers useful information from web contents/data/documents. It involves mainly three processes. These are: Preprocessing data before web content mining: feature selection, Processing, Post-processing data can reduce ambiguous searching results. Its Improves the content search of other tools like search engines. WHAT IS WEB MINING Web Mining is the extraction of interesting and potentially useful patterns and implicit information from artifacts or activity related to the World Wide Web. There are roughly three knowledge discovery domains that pertain to web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining. Web content mining is the process extracting knowledge from the content of documents or their descriptions. Web document text mining, resource discovery based on concepts indexing or agent based technology may also fall in this category. Web structure mining is the process of inferring knowledge from the World Wide Web organization and links between references and referents in the Web. 269 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012) V. •Using Web Mining the companies can establish better customer relationship by giving them exactly what they need. WEB USAGE MINING Web usage mining also known as Web log mining. It is used for discovering user ‘navigation patterns’ from web data and prediction of user behavior while he interacts with the web. It helps to improve large collection of resources. Typical sources of data: Automatically generated data stored in server access logs, referrer logs, agent logs and client-side cookies. •This technology has enabled ecommerce to do personalized marketing, which eventually results in higher trade. •The companies can find, attract and retain customers, they can save on production costs by utilizing the acquired insight of customer requirements. •The predicting capability of the mining application can benefit the society by identifying criminal activities. VI. WEB STRUCTURE MINING •Companies can understand the needs of the customer better and they can react to customer needs faster. The structure of a typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting two related pages. It is the process of discovering Information from the Web. It is used for finding information about the web pages and inference on Hyperlink. The Web consists not only of pages, but also of hyperlinks pointing from one page to another. It discovers the link structure of the hyperlinks at the inter-document level and to generate structural summary about the Website and Web page. It is used for retrieving pages that are not only relevant but are also of high quality, or authoritative on the topic. VII. VIII. SUMMARY Web mining plays a considerable role in the area of one-to-one marketing through content personalization. In this paper we also discussed about the Web Mining. We also discussed that Web mining may be sub-divided into web-content mining, web structure mining and webusage mining. Web-content mining is the extraction of information from Internet pages. Web-structure mining is the application of data mining to reconstruct the structure of a web site or sites. Web usage mining is the mining of log files and associated data from a particular web site to discover knowledge on browser and buyer behavior on that site. We also explained the various uses of Web Mining. USES OF WEB MINING The potential of web mining is in the application of existing and new data mining algorithms to Internet data, which include Internet server logs, as well external data on customer, sales, and products. Web mining may be sub-divided into web-content mining, web structure mining and web-usage mining. Web-content mining is the extraction of information from Internet pages. Webstructure mining is the application of data mining to reconstruct the structure of a web site or sites. Web usage mining is the mining of log files and associated data from a particular web site to discover knowledge on browser and buyer behavior on that site. In short, Web Mining can be seen to apply existing analysis techniques together with cutting edge technology to the plethora of data that the internet is generating. The various uses of web mining are: REFERENCES [1 ] Pant, Gautam, Padmini Srinivasan and Filippo Menczer: Crawling the Web, 2003. [2 ] Chakrabarti, Soumen. Mining the Web: Analysis of Hypertext and Semi Structured Data, 2003. [3 ] Baldi, Pierre. Modeling the Internet and the Web: Probabilistic Methods and Algorithms, 2003. [4 ] Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, S. Raghavan, "Searching the Web”, ACM Transactions on Internet Technology. [5 ] Marina Buzzi, Cooperative crawling Proceedings of the First Latin American Web Congress 2003 IEEE. [6 ] C. Chung, C. Clarke, "Topic-oriented collaborative crawling", CIKM 2002. [7 ] Grossan, B. "Search Engines: What they are, how they work, and practical suggestions for getting the most out of them.1997. 270 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012) [8 ] Brin, Sergey and Page Lawrence "The anatomy of a large-scale hypertextual Web search engine". Computer Networks and ISDN Systems, 1998 [9 ] C. Aggarwal, F. AI-Garawi, P. Yu, "Intelligent crawling on the World Wide Web with arbitrary predicates", 2001. [10 ] Jun Hirai, Sriram Raghavan, Hector GarciaMolina, and Andreas Paepcke. Web Base A repository of Web pages. In Proceedings of the Ninth International World Wide Web Conference, 2000. 271