Download A Literature Review on Data Mining and its Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Research Paper
Engineering
Volume : 5 | Issue : 6 | June 2015 | ISSN - 2249-555X
A Literature Review on Data Mining and its
Techniques
Keywords
temporal mining, spatial mining, clustering, prediction
Shraddha Soni
IIPS, DAVV
ABSTRACT Data mining has received immense attraction of business professionals with its great ability to dig out
the hidden information and pattern related to their customer behavior, sales and future trends. One of
the challenging tasks in data mining is to choose correct technique to be applied. This paper focuses on some popular
data mining techniques with their applications and algorithms proposed for them. The significance of this review is to
provide concise literature on data mining techniques as a valuable source to future researchers.
INTRODUCTION
With the advancement of information technology and
increased application of high speed computers in diversified fields, there has been an unexpected growth in the
amount of data that is being generated in the repositories.
The data may be in the form of audio, text or video also
in different format, which is both complex and large. There
is a need to convert this data into information by analyzing it from various dimensions, identifying the relationships
or grouping the data into categories. This information can
then be used in different decision making processes.
Data mining is to discover knowledge that is of interest
from large amounts of data stored in data repositories [1].
Mining refers to extraction of something which is of value
,hence , data mining is to pull out valuable information
from data. Numerous data mining tools are available in the
market to predict future trends and assist decision making,
that further help organizations to make proactive decisions
by looking into past and present data [2]. The varied application areas of data mining are marketing/sales, customer
relationship management, banking, insurance, fraud detection, bioinformatics and many more.
Following are the uses of Data mining
Spatial data mining: Spatial data mining refers to discovering and extract implicit, significant and unforeseen patterns
from large spatial database [3].
Temporal data mining: The Process of extracting useful
and previously undiscovered information from a large dataset, along with some temporal reasoning [4].
Sequence data mining : A type of data mining where various patterns which are present in the data are discovered.
It could be an association between variables or the way in
which events are occurring [5].
Intention mining: To determine computer user intention
from the past data pertaining in interacting with the system [6].
Literature on Data mining techniques:
Han et al. provided a comprehensive survey, in database
perspective, on the data mining techniques developed
recently [7]. Several major kinds of data mining methods,
including generalization, characterization, classification,
clustering, association, evolution, pattern matching, data
visualization, and meta-rule guided mining, was reviewed
730 X INDIAN JOURNAL OF APPLIED RESEARCH
by them. Techniques for mining knowledge in different
kinds of databases, included relational, transaction, objectoriented, spatial, and active databases, as well as global
information systems, was examined by them [7].
Clustering is the most commonly used technique of data
mining under which patterns are discovered in the underlying data.[8] Sidhu et al. presented that how clustering was
carried out and the applications of clustering. They also
provided us with a framework for the mixed attributes clustering problem and also showed us that how the customer
data can be clustered identifying the high-profit, high-value and low-risk customer [8].
Huang presented an algorithm, called k-modes, to extend
the k-means paradigm to categorical domains. He introduced new dissimilarity measures to deal with categorical
objects, replace means of clusters with modes, and use
a frequency based method to update modes in the clustering process to minimize the clustering cost function..
Experimented on a very large health insurance data set
consisted of half a million records and 34 categorical attributes showed that the algorithm was scalable in terms of
both the number of clusters and the number of records[9]
Berkhin surveyed that concentrated on clustering algorithms from a data mining perspective. [10]
In [11] k-prototypes algorithm was proposed, which was
based on the k-means paradigm but removed the numeric data limitation whilst preserved its efficiency. In that
algorithm, objects were clustered against k prototypes. A
method was developed to dynamically update the k prototypes in order to maximize the intra cluster similarity of
objects.
In [12] the efficiency and scalability issues were addressed
by proposing a data classification method which integrated
attribute oriented induction, relevance analysis, and the
induction of decision trees. Such an integration lead to
efficient, high quality, multiple level classification of large
amounts of data, the relaxation of the requirement of perfect training sets, and the elegant handling of continuous
and noisy data.
Antonie et al. presented some experiments for tumor detection in digital mammography. They investigated the use
of different data mining techniques, for anomaly detection
Research Paper
and classification. The experiments they conducted demonstrated the use and effectiveness of association rule mining in image categorization [13].
Kohavi et al. described the two most commonly used systems for induction of decision trees for classification: C4.5
and CART. They highlighted the methods and different
decisions made in each system with respect to splitting
criteria, pruning, noise handling, and other differentiating
features[14].
Volume : 5 | Issue : 6 | June 2015 | ISSN - 2249-555X
Conclusion
This paper briefly reviewed data mining techniques with
applications and algorithms proposed. Proper selection of
data mining technique and domain knowledge is an important consideration to make effective utilization of data
mining. This work will provide a pathway for beginners in
this area.
Ma et al. proposed to integrate classification and association rule mining techniques. The integration was done by
focusing on mining a special subset of association rules,
called class association rules (CARs). An efficient algorithm
was also given to build a classifier based on the set of discovered CAR [15].
In [16] two new algorithms were presented for solving the problem of discovering association rules between items in a large database of sales transactions .The algorithms were fundamentally
different
from the known algorithms. They showed how the best
features of the two proposed algorithms could be combined into a hybrid algorithm, called AprioriHybrid .
Agrawal et al. proposed an efficient algorithm that generated all significant association rules between items in the
database. The algorithm incorporated buffer management
and novel estimation and pruning techniques [17].
Yavas et al. proposed an algorithm for predicting the next
inter-cell movement of a mobile user in a Personal Communication Systems network. The performance of the proposed algorithm was evaluated through simulation as compared to two other prediction methods [18].
Nanopoulos et al. presented a new context for the interpretation of Web pre fetching algorithms as Markov predictors. They identify the factors that affect the performance of
Web pre fetching algorithms and proposed a new algorithm
called WM,,, which was based on data mining and was
proved to be a generalization of existing ones [19].
REFERENCE
[1] Bhise, R. B., S. S. Thorat, and A. K. Supekar. “Importance of data mining in higher education system.” IOSR Journal Of Humanities And
Social Science (IOSR-JHSS) ISSN (2013): 2279-0837. | [2] Padhy, Neelamadhab, Dr Mishra, and Rasmita Panigrahi. "The survey of data mining
applications and feature scope." arXiv preprint arXiv:1211.5723 (2012). | [3] Sumathi, N., R. Geetha, and S. Sathiya Bama. "Spatial data mining—techniques trends
and its applications." Journal of Computer Applications 1, no. 4 (2008): 28-30 | [4]Shaikh,Yasmin ,and Sanjay Tanwani. "INTERACTIVE TEMPORAL MINING OF
WORKFLOW LOGS." (2013). | [5]http://en.wikipedia.org/wiki/Sequential_Pattern_Mining | [6] http://en.wikipedia.org/wiki/Intention_mining | [7] Han, Jiawei. "Data
mining techniques." In ACM SIGMOD Record, vol. 25, no. 2, p. 545. ACM, 1996 | [8] Sidhu, Nimrat Kaur, and Rajneet Kaur. "Clustering In Data Mining." | [9] Huang,
Zhexue. "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining." In DMKD, p. 0. 1997. | [10] Berkhin, Pavel. "A survey of clustering
data mining techniques." In Grouping multidimensional data, pp. 25-71. Springer Berlin Heidelberg, 2006. | [11] Huang, Zhexue. "Clustering large data sets with
mixed numeric and categorical values." In Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining,(PAKDD), pp. 21-34. 1997. |
[12] Kamber, Micheline, Lara Winstone, Wan Gong, Shan Cheng, and Jiawei Han. "Generalization and decision tree induction: efficient classification in data mining."
In Research Issues in Data Engineering, 1997. Proceedings. Seventh International Workshop on, pp. 111-120. IEEE, 1997. | [13] Antonie, Maria-Luiza, Osmar R.
Zaiane, and Alexandru Coman. "Application of Data Mining Techniques for Medical Image Classification." In Proceedings of the Second International Workshop
on Multimedia Data Mining, MDM/KDD'2001, August 26th, 2001, San Francisco, CA, USA, pp. 94-101. 2001. | [14] Kohavi, Ronny, and J. Ross Quinlan. "Data
mining tasks and methods: Classification: decision-tree discovery." In Handbook of data mining and knowledge discovery, pp. 267-276. Oxford University Press,
Inc., 2002. | [15] Ma, Bing Liu Wynne Hsu Yiming. "Integrating classification and association rule mining." In Proceedings of the 4th. 1998. | [16] Agrawal, Rakesh,
and Ramakrishnan Srikant. "Fast algorithms for mining association rules." In Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487-499. 1994. | [17]
Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami. "Mining association rules between sets of items in large databases." In ACM SIGMOD Record, vol. 22, no. 2,
pp. 207-216. ACM, 1993. | [18] Yavaş, Gökhan, Dimitrios Katsaros, Özgür Ulusoy, and Yannis Manolopoulos. "A data mining approach for location prediction in mobile
environments." Data & Knowledge Engineering 54, no. 2 (2005): 121-146 | [19] Nanopoulos, Alexandros, Dimitrios Katsaros, and Yannis Manolopoulos. "A data
mining algorithm for generalized web prefetching." Knowledge and Data Engineering, IEEE Transactions on 15, no. 5 (2003): 1155-1169. |
INDIAN JOURNAL OF APPLIED RESEARCH X 731