Download Web Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)
Web Mining
1
Sonia Sharama
1
Assistant Professor in Computer Science,Bebe Nanakai University College, Mithra, (Kapurthala)
Abstract --- In this paper we will discuss about web
mining. Web mining technique is used to fetch knowledge
form Web data. Web mining can be broadly defined as the
search and measure of useful information from the World
Wide Web. It provides automatic search of information
resources available online, and the discovery of user access
information from Web servers, i.e., Web usage mining. Web
Mining is the extraction of useful patterns and implicit
information related to the World Wide Web. Web content
mining is the process of fetching knowledge from the
documents or their descriptions.
Finally, web usage mining, also known as Web Log
Mining, is process of extracting interesting patterns in
web access logs.
III.
Web
Mining
Keywords --- Web Mining,Web Content Mining,Web
Usage Mining ,Web Structure Mining,Uses of Web Mining.
I.
INTRODUCTION
With the growth of information resources available on
World Wide Web, it has become necessary for users to
utilize automated tools in find the desired information
resources, and to track and analyze their usage patterns.
These factors creating server side and clientside
intelligent systems that can become mine for knowledge.
Web mining can be defined as the search and measure of
useful information from the World Wide Web. This
describes the automatic search of information resources
available online, i.e. Web content mining, and the search
of user access patterns from Web servers, i.e., Web usage
mining.
II.
WEB MINING T AXONOMY
Web Mining involves analysis of web server logs of a
website whereas data mining involves using Techniques
to find relationships in large amounts of data.
Web
Content
Web
Usage
Mining
Mining
IV.
Web
Structure
WEB CONTENT MINING
Web Content Mining is related to data mining and text
mining. It is related to data mining because many data
mining techniques can be applied in Web content
"mining. It is related to text mining because much of the
web content is text. Web data contents may involve the
different types of data. These are: Text, Image, Audio,
Video, Metadata and Hyperlinks. It discovers useful
information from web contents/data/documents. It
involves mainly three processes. These are: Preprocessing data before web content mining: feature
selection, Processing, Post-processing data can reduce
ambiguous searching results. Its Improves the content
search of other tools like search engines.
WHAT IS WEB MINING
Web Mining is the extraction of interesting and
potentially useful patterns and implicit information from
artifacts or activity related to the World Wide Web.
There are roughly three knowledge discovery domains
that pertain to web mining: Web Content Mining, Web
Structure Mining, and Web Usage Mining. Web content
mining is the process extracting knowledge from the
content of documents or their descriptions. Web
document text mining, resource discovery based on
concepts indexing or agent based technology may also
fall in this category. Web structure mining is the process
of inferring knowledge from the World Wide Web
organization and links between references and referents
in the Web.
269
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)
V.
•Using Web Mining the companies can establish better
customer relationship by giving them exactly what they
need.
WEB USAGE MINING
Web usage mining also known as Web log mining. It
is used for discovering user ‘navigation patterns’ from
web data and prediction of user behavior while he
interacts with the web. It helps to improve large
collection of resources. Typical sources of data:
Automatically generated data stored in server access
logs, referrer logs, agent logs and client-side cookies.
•This technology has enabled ecommerce to do
personalized marketing, which eventually results in
higher trade.
•The companies can find, attract and retain customers,
they can save on production costs by utilizing the
acquired insight of customer requirements.
•The predicting capability of the mining application can
benefit the society by identifying criminal activities.
VI.
WEB STRUCTURE MINING
•Companies can understand the needs of the customer
better and they can react to customer needs faster.
The structure of a typical Web graph consists of Web
pages as nodes, and hyperlinks as edges connecting two
related pages. It is the process of discovering Information
from the Web. It is used for finding information about the
web pages and inference on Hyperlink. The Web consists
not only of pages, but also of hyperlinks pointing from
one page to another. It discovers the link structure of the
hyperlinks at the inter-document level and to generate
structural summary about the Website and Web page. It
is used for retrieving pages that are not only relevant but
are also of high quality, or authoritative on the topic.
VII.
VIII.
SUMMARY
Web mining plays a considerable role in the area of
one-to-one marketing through content personalization. In
this paper we also discussed about the Web Mining. We
also discussed that Web mining may be sub-divided into
web-content mining, web structure mining and webusage mining. Web-content mining is the extraction of
information from Internet pages. Web-structure mining is
the application of data mining to reconstruct the structure
of a web site or sites. Web usage mining is the mining of
log files and associated data from a particular web site to
discover knowledge on browser and buyer behavior on
that site. We also explained the various uses of Web
Mining.
USES OF WEB MINING
The potential of web mining is in the application of
existing and new data mining algorithms to Internet data,
which include Internet server logs, as well external data
on customer, sales, and products. Web mining may be
sub-divided into web-content mining, web structure
mining and web-usage mining. Web-content mining is
the extraction of information from Internet pages. Webstructure mining is the application of data mining to
reconstruct the structure of a web site or sites. Web usage
mining is the mining of log files and associated data from
a particular web site to discover knowledge on browser
and buyer behavior on that site. In short, Web Mining
can be seen to apply existing analysis techniques together
with cutting edge technology to the plethora of data that
the internet is generating. The various uses of web
mining are:
REFERENCES
[1 ] Pant, Gautam, Padmini Srinivasan and Filippo
Menczer: Crawling the Web, 2003.
[2 ] Chakrabarti, Soumen. Mining the Web: Analysis of
Hypertext and Semi Structured Data, 2003.
[3 ] Baldi, Pierre. Modeling the Internet and the Web:
Probabilistic Methods and Algorithms, 2003.
[4 ] Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, S.
Raghavan, "Searching the Web”, ACM Transactions
on Internet Technology.
[5 ] Marina Buzzi, Cooperative crawling Proceedings of
the First Latin American Web Congress 2003 IEEE.
[6 ] C. Chung, C. Clarke, "Topic-oriented collaborative
crawling", CIKM 2002.
[7 ] Grossan, B. "Search Engines: What they are, how
they work, and practical suggestions for getting the
most out of them.1997.
270
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)
[8 ] Brin, Sergey and Page Lawrence "The anatomy of a
large-scale hypertextual Web search engine".
Computer Networks and ISDN Systems, 1998
[9 ] C. Aggarwal, F. AI-Garawi, P. Yu, "Intelligent
crawling on the World Wide Web with arbitrary
predicates", 2001.
[10 ] Jun Hirai, Sriram Raghavan, Hector GarciaMolina, and Andreas Paepcke. Web Base A
repository of Web pages. In Proceedings of the
Ninth International World Wide Web Conference,
2000.
271