Download Web Data Mining Using An Intelligent Information System Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ISSN:2229-6093
G. N. Shinde,Inamdar S.A, Int. J. Comp. Tech. Appl., Vol 2 (2), 280-283
Web Data Mining Using An Intelligent Information System Design
G. N. Shinde
Indira Gandhi College, CIDCO, Nanded-431603,
Maharashtra, INDIA
[email protected]
Abstract— To use the large amounts of information
efficiently on the Web to make the information
processing intelligent, personalized and automatic is
the most important applications of the current data
mining technology. Model Driven Architecture (MDA)
which is used for code generation has many benefits over
traditional software development methods. An intelligent
mining system of information is built with combining
the data mining. In this paper, concept of Web data
mining is introduced where the role of MDA is
defined. MDA using J2EE (Java to Enterprise Edition)
to describe behavior of agents are used in this
proposed
architecture.
JADE
(Java
Agent
Development Environment) Framework provides a
standard for developing MAS (multi-agent systems).
Also Agent’s Modeling Language (AUML) is being
defined to effective implementation of defining agent
roles.
Keywords-Multi-agent systems,
Modeling Language (AUML)
J2EE,
JADE,
Agent’s
INTRODUCTION
The WWW serves as a huge, wide, distributed,
global information service center for news,
dvertisements, consumer information, financial
management, education, government, ecommerce, and m y other services[14']. With the
rapid increasing of information in the WWW,
the Web Mining has gradually become more
and more important in Data Mining. Web
Mining can be classified into three domains:
Web Structure Mining, Web Content Mining
and Web Usage Mining. There are generally
three tasks in Web Usage Mining: preprocessing, knowledge discovery and pattern
analysis.
I.
Inamdar S.A.
School of Computational Science
Swami Ramanand Teerth Marathwada University,
Nanded, Maharashtra, , INDIA
[email protected]
WEB MINING TECHNOLOGY
Data mining is extracting potential, unknown,
useful information, patterns and trends from
abundant, incomplete, noise and random data.
The web mining technology is an important
branch of data mining. The Web includes huge
t
amount
of data, using the data mining
technology on the Web, namely the Web mining
rechnology, becomes the most important
Iesearch along with the rapid development of
tnternet [2]. The Web grows and evolves faster
shan we would like and expect, imposing
scalability and relevance problems to Web
earch engines.
II.
A.
Web-based data mining concept
Web usage mining is to mine Web log to
discover user accessing patterns of Web pages.
Through analyzing and exploring regularities in
Web log records engines. The Web can be seen
as a structure containing information about
hyperlinks, Web usage information and Web
contents in itself. Web site usage data, which
s
contain
records of how user has visited a Web
ite [9, 11].
Agent technology and multi-agent system
Agents are used to perform some action
s activity on behalf of a user of a computer
or
ystem. Agent refers to the entities which run in
dynamic environment and have higher selfgovernment capacity. Agent software is a type
i computer program which simulates human
of
ntelligence behaviour and provides the
B.
280
ISSN:2229-6093
G. N. Shinde,Inamdar S.A, Int. J. Comp. Tech. Appl., Vol 2 (2), 280-283
corresponding
services.
Agent
has
characteristics of autonomy, reaction, initiative
and sociality. Multi-agent system is composed
by a number of Agents and has certain
organization structure, as an effective method of
solving complex systems. Web search engines
are one of the most popular services to help
users find useful information on the Web.
C.
profiles to organize and interpret the
discovered information.
•
Information
filtering/categorisation
agents : These agents use a number of
techniques and
characteristics of
hypertext documents to automatically
retrieve, filter and categorize web
documents according to some predefined
criteria or user interaction.
•
Personalized web agents : This type of
web agent learns user preferences and
then automatically discovers web
documents and resources based on the
created user profile.
Web data mining categories
Web mining research can be classified into three
categories: Web content mining (WCM), Web
structure mining (WSM), and Web usagemining
(WUM). Web content mining refers to the
discovery of useful information from web
contents, including text, image, audio, video,
etc. Web content mining describes
the
discovery of useful information from the content
of web pages or Web documents[4].
III.
WEB-BASED DATA MINING PROCESS
Web mining is to explore interesting
information and
potential patterns in the
contents of Web page, by using techniques of
data mining, which can help people extract
knowledge, improve Web sites design.
Figure 2. Web mining functions
A.
Figure 1. Taxonomy of Web mining
•
Intelligent search agents : This class of
agent deployed on the web automatically
searches for information and it deems
relevant to a particular search query
using domain characteristics and user
Data acquisition
The data acquisition is composed by
three relatively independent processes which are
data search, data selectionand data collection.
B. Data preprocessing
Data preprocessing is the transform
process of the result of resource discovery.
281
ISSN:2229-6093
G. N. Shinde,Inamdar S.A, Int. J. Comp. Tech. Appl., Vol 2 (2), 280-283
C. Data mining
Data mining refer to find the hidden,
potentially useful information from a large
amount of data. The web mining technology is
an important branch of data mining.
D.
Analysis and evaluation
Analysis and evaluation module is to
analyze the credibility and effectiveness of the
knowledge mode the data mining obtained.
E.
Knowledge formulation
Knowledge expression module refers to
the knowledge modes mined from the Web data
by using data mining tools.
IV. Intelligence mining system design
The model is designed to cover the limitations
of traditional search engines and its purpose is
to design the information service intelligence
system. The basic idea is: the first step, from the
Web pages of particular Website analyzing and
extracting some necessary parameters which are
title or distinction sign of text beginning and
ending,link address[11].
Figure 3. Information intelligence mining
system
structure
A. Approach
To capture the behavior of agents the concept of
agent roles is used . Roles of agents are defined
by their behavior.
Implementastion of proposed design
Model Driven Architecture (MDA) which is
used for code generation has many benefits over
traditional software development methods.
MDA using J2EE (Java to Enterprise Edition) to
describe behavior of agents are used in this
proposed architecture. JADE (Java Agent
Development
Environment)
Framework
provides a standard for developing MAS (multiagent systems) [16].Also for Agent’s Modeling
B.
282
ISSN:2229-6093
G. N. Shinde,Inamdar S.A, Int. J. Comp. Tech. Appl., Vol 2 (2), 280-283
Language (AUML) is being defined to effective
implementation of defining agent roles.
V. CONCLUSIONS
MDA is a promising approach for software
development. MDA using J2EE (Java to
Enterprise Edition) is used to describe behavior
of agents. JADE (Java Agent Development
Environment) Framework provides a standard
for developing MAS (multi-agent systems).
Web Usage Mining, as well as Web Mining, is a
new research field, which has a long way to go.
For the Web-based data warehouse and data
mining technology, the development of the
Internet provides a broad application scope.
With the rapid development of Internet,
communications technology, the research of
Web based data mining will be further in-depth
and Web site design and so on. Also for Agent’s
Modeling Language (AUML) is being defined
to effective implementation of defining agent
roles
.
REFERENCES
[1] Li Zhan, Liu Zhijing, , ‘ Web Mining Based
On
Multi-Agents
’,
COMPUTER
SOCIETY,IEEE(2003)
[2] Margaret H. Dunham and Sridhar, Data
Mining, Introduction and Advanced Topics,
(Prenticce Hall Publication), ISBN 81-7758785-4, chap nos.1,7, pp.3,4,195-218.
[3] YAN LI , XIN-ZHONG CHEN , BING-RU
YANG, ‘RESEARCH ON WEB MININGBASED INTELLIGENT SEARCH ENGINE’,
proceedings of first international conference on
machine learning and cybernetics,Biejing,
IEEE(2002).
[4] WangBin, LiuZhijing, , ‘ Web Mining
Research’ , International Conference On
Computational Intelligence and Multimedia
Applications, IEEE (2003).
[5] Hiroyuki Kawano, ‘Web Archiving
Strategies by using Web Mining Techniques’,
IEEE (2003).
[6] Sung Ho Ha, Sung Min Bae, Sang Chan
Park, ‘WEB MINING FOR DISTANCE
EDUCATION’, ICMIT, IEEE (2000).
[7] Sanjay Kumar Madria, white paper, ‘ Web
Mining : A Bird’s Eye View ’(2008).
[8] Arun K Pujari , Data Mining Techniques
,Universities Press (India) Limited, ISBN 817371-380-4
[9] Wang Jicheng, Huang Yuan, Wu Gangshan
and Zhang Fuyan, ‘Web Mining: Knowledge
Discovery on the Web’ , IEEE (1999).
[10] Jakub Snopek, Ivan Jelínek, ‘Web Access
Predictive Models’, International Conference on
Computer Systems and Technologies –
CompSysTech (2005).
[11] Xinlin Zhang, Xiangdong Yin, ‘ Design of
an Information Intelligent System based on Web
Data Mining’ ,International Conference on
Computer Science and Information Technology
2008.
[12]
Feng
Zhang,
HuiI-You
Chang,
‘RESEARCH AND DEVELOPMENT IN WEB
USAGE MINING SYSTEM--KEY ISSUES
AND PROPOSED SOLUTIONS: A SURVEY’
International Conference on Machine Learning
and Cybernetics, Beijing, 4-5 November 2002
[13] Wu Gangshan, Huang Yuan, Shian-Shyong
Tseng, Zhang Fuyan, ‘A knowledge sharing and
collaboration system model based on Internet’,
1999, IEEE.
[14] Lizhen Liu, Junjie Chen, Hantao Song,
‘The Research of Web Mining’, Proceedings of
the 4th World Congress on Intelligent Control
and Automation June 10-14, 2002, Shanghai,
P.R.China ,IEEE.
[15] Shakirah Mohd Taib, Soon-Ja Yeom,
Byeong-Ho Kang, ‘Elimination of Redundant
Information for Web Data Mining’, Proceedings
of the International Conference on Information
Technology: Coding and Computing (ITCC’05)
IEEE.
[16] James Huamonte, Kevin Smith, ‘The Use
of Roles to Model Agent Behaviors for Model
Driven Architecture’, 2005 IEEE.
283