Download Knowledge Extraction using Data Mining Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Knowledge Extraction using Data Mining
Techniques
Prof. R. A. Gangurde, Prof. M. R. Sonar
Department of MCA, K K Wagh Institute of Engineering Education and Research, Nashik
Maharashtra, India
Abstract: Data mining is a logical process which finds
useful patterns from large amount of data. It is the
process
of
extracting
previously
unknown,
comprehensible and actionable information from large
databases and using it to make crucial business
decisions. Data mining is the computer-assisted process
that digs and analyzes enormous sets of data and then
extracts the knowledge out of it. The various techniques
of data mining are used to extract the useful piece of
knowledge from a database / data warehouse which is
growing continuously. This extraction of knowledge is
useful in research as well as in organization. In this
paper authors have reviewed the literature of data
mining techniques such as Classification, Clustering,
Association Rules and Prediction.
Through data mining we can identify the trends or
patterns of the data, thus we can propose a
corresponding and optimum plan for the
enterprise. As a part of data mining research, this
paper focuses on surveying data mining techniques
used in knowledge extraction.
2.
KNOWLEDGE EXTRACTION
PROCESS
Keywords- Knowledge discovery, Classification,
Clustering, Association Rule, Prediction.
1.
INTRODUCTION
In the modern era, each and every day, people are
dealing with vast amount of data present in different
formats. People are making decisions by analyzing
these data. Data mining is a process of extraction of
useful information and patterns from huge data. It is
also called as knowledge discovery process,
knowledge mining from data, knowledge extraction
or data /pattern analysis.
In information age, knowledge is becoming a
fundamental organizational resource that provides
reasonable advantage and giving rise to knowledge
management (KM) initiative. Many organizations
collect and stores huge amount of data. However,
they are unable to discover valuable information
hidden in the data by transforming these data into
valuable and useful knowledge. Managing
knowledge resources can be a challenge. Data mining
is a process of sorting and picking out meaningful
and useful information from a large pool of data.
Knowledge discovery is a process that extracts
implicit, potentially useful or previously unknown
information from the data. The knowledge discovery
process is described as follows:
Data comes from variety of sources is integrated into
a single data store called Data warehouse/Data mart.
The data stored in Data warehouse/Data mart is
called as target data.


The target Data is then pre-processed and
transformed into standard format.
The data mining algorithms process the data
to the output in form of patterns or rules.

Then those patterns and rules are interpreted
to new or useful knowledge or information.
As we can see, data mining is a heart of knowledge
discovery process. Using data mining we can find
Recently various data mining techniques have been
developed and used for projects including
classification, clustering, association, prediction and
sequential patterns etc., are used for knowledge
discovery from databases.
3.1. Classification
Classification is a classic data mining technique
based on machine learning. It is the process which
finds common properties among a set of objects in a
database and classifies them into different classes
according to a classification model. The objective of
classification is to first analyze the training data and
develop an accurate description or a model for search
class using the feature available in the data. Such
class description are then use to classify future test
data in the database or to develop a better description
for each class in the database.
Basically classification is used to classify
each item in a set of data into one of predefined set of
classes or groups. For Example, Teachers classify
students’ grades as A, B, C, D, or F. Classification
method makes use of mathematical techniques such
as decision trees, linear programming, neural network
and statistics. In classification, we make the software
that can learn how to classify the data items into
groups. For example, we can apply classification in
application that “given all past records of employees
who left the company, predict which current
employees are probably to leave in the future.” In this
case, we divide the employee’s records into two
groups that are “leave” and “stay”. And then we can
ask our data mining software to classify the
employees into each group.
In recent years, many advanced
classification techniques are developed as follow:
 Regression
 Distance
 Decision Trees
 Fuzzy-sets
 Neural Networks
 Support Vector Machine
3.2. Clustering
It the process of grouping physical or abstract objects
into classes of similar objects. The term
useful patterns from large volumes of data and
interpret them for useful knowledge and information.
3.
DATA MINING TECHNIQUES
“Unsupervised Classification” is also often used. The
term comes from the differentiation between
Clustering and Classification. The difference is that –
though both the methods produce a set of clusters
with similar properties – in Clustering, we don’t
know the number of output classes in advance.
Clustering analysis helps to construct meaningful
partitioning of a large set of objects based on a divide
& conquer methodology which decomposes a large
scale system into smaller components to simplify
design and implementation.
Clustering is the process of arranging items into
groups whose elements are similar in some manner.
A cluster is therefore a collection of items which are
similar between them and are dissimilar to the objects
belonging to other clusters. Dissimilarities and
similarities are evaluated based on the attribute
values describing the objects. Clustering deals with
finding a structure in a collection of unlabeled data.
The general approach for all clustering techniques is
to find cluster’s centre that will characterize each
cluster.
Various data clustering methods:
1) Partitioning method – It divides the data into
number of groups and each group contains
atleast one object. E.g. K-Means Clustering.
2) Hierarchical method – It creates a
hierarchical decomposition of the data either
by
using
botton
up
approach
(agglomerative) or top down approach
(divisive). E.g. Hierarchical Clustering
3) Density based method – In this, cluster
continue to grow as long as density of
objects exceeds some threshold. E.g.
DBSCAN Clustering
4) Grid based method – It forms grid structure
from object. The main advantage of this
approach is its fast processing time. E.g.
STING Clustering
3.3. Association Rule
Association rule mining approach is the most
efficient data mining method to find out hidden or
required pattern among the large volume of data. It is
responsible to find relationships among various data
attributes in a huge set of items in a database. A huge
quantity of interesting relevant associations across
the itemsets has been identified by association rules
mining. A typical example of the association rules
mining is the market basket analysis. Association
rules mining helps to explore the relationship among
different products in transaction databases and to find
out the buyer behaviors, such as the purchase of a
commodity impact on other goods. The results can be
applied to goods shelf layout, storage arrangements,
and classification of users according to buying
patterns.
Association rule has a mentionable amount of
practical applications, including Market Basket
Analysis, Recommendation Systems, Classification,
XML Mining and Share Market. This rule measure
with support to ensure every dataset treated equally
in classical model. The perception of association rule
mining suggests the support confidence level outline
and condensed association rule mining to the
discovery of frequent item sets. Rule support and
confidence are two measures of interestingness.
Association rules are observed as appealing if a
minimum support and a minimum confidence
threshold is satisfied. Association rule mining
procedure can be finished in four steps.
1.
2.
3.
4.
Data preparation and select the required data
Produce itemsets that determines the rule
constraints for knowledge
Mine k frequent itemsets using the new database
Produce the association rule that set up the
knowledge base.
The types of association rules are:
1.
2.
3.
Multilevel association rule
Multidimensional association rule
Quantitative association rule
3.4. Prediction
Regression technique can be considered for
prediction. This technique is used to predict the value
of dependent (response) variable from one or more
independent (predictor) variable where variables are
numeric. There are various forms of regression as
1.
2.
3.
4.
5.
6.
4.
Linear Regression
Multiple Regression
Weighted Regression
Polynomial Regression
Non- Parametric Regression
Robust Regression
CONCLUSION
Data mining has wide application field almost in
every industry where the data is generated
enormously. That’s why data mining is considered
one of the most important cutting edge in database
and information systems. It is one of the most
promising
interdisciplinary
developments
in
Information Technology also. Data mining
techniques such as classification, clustering,
association rule, prediction etc helps in finding the
patterns to decide upon the future trends in
businesses to grow. In this paper we focused on a
comprehensive overview of certain data mining
techniques.
5.
REFERENCES
Journal Papers:
[1] Kalyani M Raval, Data Mining Techniques,
International Journal of Advanced Research in
Computer Science and Software Engineering Volume
2, Issue 10, October 2012.
[2] Madhuri V. Joseph, Lipsa Sadath, Vanaja Rajan,
Data Mining: A Comparative Study on Various
Techniques and Methods, International Journal of
Advanced Research in Computer Science and
Software Engineering Volume 3, Issue 2, February
2013.
[3] Mrs. Tejaswini Abhijit Hilage and R. V.
Kulkarni, Review of Literature on Data Mining,
IJRRAS 10 (1) January 2012.
[4] Usama Fayyad, Gregory Piatetsky-Shapiro, and
Padhraic Smyth, From Data Mining to Knowledge
Discovery in Databases, AI Magazine Volume 17,
Number 3 1996
[5] Dr. Lokanatha C. Reddy, A Review on Data
mining from Past to the Future, International Journal
of Computer Applications (0975 – 8887) Volume 15–
No.7, February 2011
[6] Md.Zuber, N.Suman, Md. Gouse Pasha, Md.
Adam,
A
STUDY
ON
DATA
MINING
APPROACHES, International Journal of Emerging
Trends in Engineering and Development Issue 3,
Vol.1, January 2013
Books:
[7] Jiawei Han, Micheline Kamber, Data Mining:
Concepts and Techniques