Download analysis of data mining trends, applications, benefits and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
ISSN 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 5 issue 1(February 2016 issue)
ANALYSIS OF DATA MINING TRENDS, APPLICATIONS,
BENEFITS AND ISSUES
Dinesh Bhardwaj1, Sunil Mahajan2
1,2
Assistant Professor
Department of Computer Science & Information Technology, SSM College, Dinanagar, (Punjab) India
1
[email protected] [email protected]
1,2
ABSTRACT: In recent times Information Technology acting
a very important role in every aspects of the human life. It is
very essential to gather data from different sources. This
data can be stored and maintained to generate information
and knowledge. Data mining has become an essential factor
in various fields including business, education, health care,
finance, scientific etc. Data mining is part of the knowledge
discovery process that offers a new way to look at data.
Knowledge Discovery in Databases is the process of finding
knowledge in massive amount of data where data mining is
the core of this process. Data mining can be used to mine
understandable meaningful patterns from large databases
and these patterns may then be converted into knowledge.
Data mining works with data warehouse and the whole
process is divided into action plan to be performed on data:
Selection, transformation, mining and results interpretation.
In this paper, we have reviewed different types applications
in data mining, also explains different areas where used data
mining concept and issues of it.
disadvantages as well such as privacy, security and misuse of
information. This paper also discuses data mining techniques
like prediction modeling etc, data minig tools, applications and
trends in data mining , trends in data mining, major issues in
data minig , notable uses of data mining and Conclusion.
II.
KNOWLEDGE DISCOVERY PROCESS
The various processes are:
Data cleaning: Remove noise that is unwanted data.
 Data Integration: Integration means combining multiple data
sources.
Data selection: Select related data to task from database.
KEYWORDS: Data Mining, Knowledge discovery, Trends
in data mining.
I.
INTRODUCTION
The storing information in a data warehouse does not provide
the benefits an organization is seeking. There are a number of
features to this definition: data mining is concerned with the
discovery of hidden, unexpected patterns of data. Data mining is
the process of extracting previously unknown data from large
databases and using it to make organizational decisions [1].data
mining began its life in specialist applications such as geological
research and meteorological research. More recently it has been
applied in a number of areas of industry and commerce [2]. To
generate information massive collection of data is required. The
data can be simple like numerical data, figures and text
documents, to more complex such as spatial data, multimedia
data and hypertext documents. With large amount of data stored
in databases, files, and other repositories, it is increasingly
important, to develop powerful tool for analysis and
interpretation of such data and for the extraction of interesting
knowledge and patterns that could help in Decision making.
Data mining is a set of activities or tool used to find new, hidden
or unexpected patterns in data or unusual patterns in data. [3].
Data mining brings a lot of advantages when using in specific
areas. Besides advantages, data mining also has its own
Fig 1: Data mining process [4]
 Data transformation: Convert the data into appropriate form
that will be easy to mine.
 Data mining: a process such as association, regression,
classification to extract data patterns.
 Pattern Evaluation: Evaluate the output of data mining
process and identify the interesting measures.
 Knowledge Representation: Various techniques are used to
present the mined data to the user [5]
www.ijcsce.org
53
ISSN 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 5 issue 1(February 2016 issue)
Fig 2: Steps in Data Mining process [6]
III. ARCHITECTURE OF DATA MINING
Data mining is described as a process of discover or extracting
interesting knowledge from large amounts of data stored in
multiple data sources such as file systems, databases, data
warehouse etc. This knowledge contributes a lot of benefits to
business strategies, scientific, medical research, governments
and individual. The architecture contains modules for secure
safe-thread communication, database connectivity, organized
data management and efficient data analysis for generating
global mining model [7].
customers and predicting the kinds of customer best respond to
new loan offered by the backs. .
2) Marketing: Data mining facilitates marketing sector by
classifying customer demographic that can be used to predict
which customer will respond to a mailing or buy a particular
product and it is very much helpful in growth of business.
3) Health-Care: Data mining supports a lot in health care sector.
It supports health care sector by correlating demographics of
patients with critical illnesses, developing better insights on
symptoms and their causes and learning how to provide proper
treatments
4) Insurance: Data mining assist insurance sector in predicting
fraudulent claims and medical coverage cost, classifying the
important factors that affect medical coverage and predicting the
customers’ pattern which customer will buy new policies [9].
2. Disadvantages of Data Mining
The disadvantages of data mining are explained as follows [10]:
1) Privacy Issues
One of the disadvantages is a personal privacy issue. In recent
years, with the boom of internet, the concerns about privacy
have increased tremendously. Because of this privacy concern,
individuals like internet users, employees, customers are afraid
that unknown person may have access to their personal
information and then use that information in an unethical way
and this may cause harm to them.
2) Security Issues
Another biggest disadvantage is security issue which is always a
major concern in information technology. Companies have a lot
of personal information about the employees and customers
including social security number, birthdates, payroll etc., and it
is also available in online. But, they do not have sufficient
security systems in place to protect this information. They have
been a lot of cases where hackers access and stole personal data
of customers [10]
V. CHALLENGES OF DATA MINING
Fig 3: Architecture of data mining [7]
IV.
DATA MINING ADVANTAGES AND
DISADVANTAGES
1. Advantages:
Advantages of using data mining in various applications such as
Banking, Manufacturing and production, marketing, health care
etc., are as follows[8]:
1) Banking: Data mining supports banking sector in the process
of searching a large database to discover previously unknown
patterns; automate the process of finding predictive information.
Data mining helps to forecast levels of bad loans and fraudulent
credit cards use, predicting credit card spending by new
There are many challenges faced by the data mining and these
challenges of data mining are pointed as follows[11]:
 Scalability
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Dimensionality
 Privacy preservation [12].
VI. DATA MINING TECHNIUES
Data mining techniques and methods used in the main related
disciplines and technologies from the following areas [13]:
(1)Statistical Methods
www.ijcsce.org
54
ISSN 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 5 issue 1(February 2016 issue)
In data mining often involves a certain degree of statistical
process, as data sample and modeling to determine assumptions
and error control. Including descriptive statistics, probability
theory, regression analysis, time series, including many of the
statistical methods, data mining plays an important role.
(2) Decision Tree
Decision tree method is mainly used for data classification.
Generally divided into two stages; The tree structure and tree
pruning. Firstly, the training data to generate a test function,
according to different Classification based on decision tree
classification method in comparison with the other, with faster,
more easily into simple and easy to understand classification
rules, easily converted into database queries advantages,
especially in problem areas of high dimension can be very good
classification results.
(3)Neural Network
Artificial neural network structure mimic biological god the
network is trained to learn through the nonlinear prediction
model, in data mining can be used to carry out sub-class,
clustering, feature extraction and other operations.
(4)Genetic Algorithm
Genetic algorithm is an optimization technique, which uses
students’ evolution of the concept of
property issues a series of search and finally optimized.
Implementation of genetic algorithm, the first code for solving
problems (called chromosomes), generates the initial population
and then calculate the individual fitness, and then chromosome
replication, exchange, mutation operation, generate new
individuals. Repeat this exercise for, until the individual seeking
the best or better. In data mining, data mining tasks tend to
express as a search problems, use the powerful search capability
of genetic algorithm to find the optimal solution.
(6). Fuzzy Set
Fuzzy sets is that the uncertainty of data and processing of
important ways. Degree of membership of fuzzy set theory to
describe the difference with the medium transition is a language
with a precise mathematical fuzziness described method [14].
Fuzzy sets can not only deal with incomplete data, noise or
imprecise data, but also in development of data uncertainty
models can provide a more agile than traditional methods,
smoother performance [15].
only. In addition to that some may work in only one database
type. But, Most of the software will be able to handle any data
using online analytical processing or a similar technology [16].
B. Dashboards
Dashboards reflect data changed and update on screen.
Dashboards is normally installed in computers to monitor
information in a database and it reflects data changes and
updates the data in the form of a chart or table on the screen. It
enables the user to see how the business is performing.
Historical data can be referenced and checks against the current
status in order to see the changes in the business. By this way,
dashboards is very easy to use and helps the manager a lot with
great appeal to have an overview of the company’s performance.
C. Text-Mining Tools
The third type of data mining tools is called as a text-mining tool
because of its ability to mine data from different kind of text
starting from Microsoft Word, Acrobat PDF documents to
simple text files. This provides facility of scanning the content
and converts the selected into a format that is compatible with
the tools database without opening different applications[17]
Current open tools:
These are following open sources tools[18]
B. Weka
Weka is a java based software capability of working under
various operating systems and contains tools for data preprocessing, classification, regression, clustering, association
rules and visualization. The algorithms can either be applied
directly to a dataset or called from a user’s java code[19].
C. Orange
Orange is an open source data mining and visualisation software
with active community and which helps novice and experts for
their analysis. It has the ability to work under various platforms
like windows, Mac Os C and GNU/Linux operating systems and
it’s packed with data analytics features. It enables design of data
analysis process through user friendly visual programming or
python scripting. Hence, this can be used as a scripting language
for respective tasks of data mining. It represents most major
algorithms for data mining and contains different visualisation,
from scatter plots, bar charts, trees to dendrograms, networks
and heatmaps It has specialised add-ons like Bioorange for bio
informatics [20].
VII. DATA MINING TOOLS
VIII. APPLICATIONS IN DATA MINING
A. Categories of Data Mining Tools
Most of the data mining tools can be classified into three
categories: Traditional data mining tools, dash boards and textmining tools[16]:
A. Traditional Data Mining Tools
Traditional mining programs help the companies to establish
data patterns and trends by using various complex algorithms
and techniques. Some of these tools are installed on the desktop
computers to monitor the data and emphasize trends and others
capture information residing outside a data base. Majority of
these programs are supported by windows and UNIX versions.
However, some software specializes in one operating system
There are large scopes for application of data mining in different
types of area as follows:
1). In Medical Science:
In medical science there is large scope for application of data
mining. Diagnosis of diesis, health care, patient profiling and
history generation etc. are the few examples. Mammography is
the method used in breast cancer detection. Radiologists face lot
of difficulties in detection of tumors
that’s why CAM(Computer Aided Methods) could helps to the
medical staff [21].
2). In the Web Education:
www.ijcsce.org
55
ISSN 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 5 issue 1(February 2016 issue)
In the 21st century the beginners are using the data mining
techniques which is one of the best learning method in this era.
This makes it possible to increase the awareness of learners.
Web Education which will rapidly growth in the application of
data mining methods to educational chats which is both feasible
and can be improvement in learning environments in the 21st
century [22].
3).A malicious Executable is Threat
A malicious executable is threat to system’s security, it damage
a system or obtaining
sensitive information without the user’s permission. The data
mining methods used to accurately
detect malicious executables before they run[23].
4). Sports data Mining :
The data mining and its technique is used for an application of
Sports center. Data mining is not
only use in the business purposes but also it used in the sports .In
the world, a huge number of
games are available where each and every day the national and
international games are to be
scheduled, where a huge number of data’s are to be maintained
[24].
IX. TRENDS IN DATA MINING
Table 1: Data Mining Trends Comparative Statements [25]
Data
Algorithms/
Data Formats
Computing
Mining
Techniques
Resources
Trends
Employed
Past
Statistical,
Numerical data Evolution of
Machine
and structured 4G PL and
Learning
data stored in various
Techniques
traditional
related
databases
techniques
Current
Statistical,
Heterogeneous
High
speed
Machine
data
formats networks,
Learning,
includes
High
end
Artificial
structured, semi storage
Intelligence,
structured and devices and
Pattern
unstructured
Parallel,
Reorganization
data
Distributed
Techniques
computing
etc…
Future
Soft Computing Complex data Multi-agent
techniques like objects includes technologies
Fuzzy
logic, high
and
Cloud
Neural
dimensional,
Computing
Networks and high speed data
Genetic
streams,
Programming
sequence, noise
in the time
series,
graph,
Multi instance
objects.
IX. FUTURE WORK
Today‘s competition is one of the most important challenges
facing by all organizations and industries in data mining issues.
As explained to address these issues, following problem should
be widely studied [26]:
a) Privacy and accuracy is a pair of contradiction; improving one
usually incurs a cost in the other. How to apply various
optimizations to achieve a trade-off should be deeply researched.
b) In distributed privacy preserving data mining areas, efficiency
is an essential issue. We should try to develop more efficient
algorithms and achieve a balance between disclosure cost,
computation cost and communication cost.
c) Side-effects are unavoidable in data sanitization process. How
to reduce their negative impact on privacy preserving needs to
be considered carefully. We also need to define some metrics for
measuring the side-effects resulted from data processing [27].
XI. CONCLUSION
Data mining has become an important tool which can extract
useful information from the huge amount of data we have
nowadays. In this paper we reviewed the various data mining
trends and applications from its inception to the future. This
review puts focus on the hot and promising areas of data mining.
It also may help to extract information from the Internet which
becomes part of our life. The ability of automation the data
mining techniques and the value added of using it, make it
attractive to be used in different areas especially science and
business areas with huge amount of data. Both in scientific and
industrial world, the applications have become too widespread.
Privacy protection deserves certainly a solid amount of attention,
but it should not lead to an exaggerated apprehension of data
mining. After all, the possibilities and opportunities of data
mining are too valuable, for example in the development cycle
of new medicines. These techniques are still subject of further
research, but we expect that they will make rapidly the transition
into a business environment.
REFERENCES
[1] Michael Goebel Et Al.”A Survey Of Data Mining And
Knowledge Discovery Software Tools” Department Of
Computer Science University Of Auckland, Sigkdd
Explorations, Acm Sigkdd, June 1999.
[2] Aparna S. Varde ,”Challenging Research Issues in Data
Mining, Databases and Information Retrieval”, Department
of Computer Science Montclair State University Montclair,
NJ, USA.
[3] Dr.A Bharati et al.”A Survey on Crime Data Analysis of
Data Mining Using Clustering Techniques”, International
Journal of Advance Research in Computer Science and
Management Studies, Volume 2, Issue 8, August 2014,
ISSN: 2327782.
[4] Monika D. Khatri1.et al” History and Current and Future
trends of Data mining Techniques” International Journal of
www.ijcsce.org
56
ISSN 2319-7080
International Journal of Computer Science and Communication Engineering
Volume 5 issue 1(February 2016 issue)
Advance Research in Computer Science and Management
Studies, Volume 2, Issue 3, March 2014, ISSN: 2321-7782.
[5] Data mining: concepts and techniques second edition,Jiawei
Hn,University of lions at Urbana Champaign,Micheline
Kamber.
[6] Anand V. Saurkar et al” A Review Paper on Various Data
Mining Techniques ”International Journal of Advanced
Research in Computer Science and Software Engineering
Volume 4, Issue 4, April 2014 ISSN: 2277 128X .
[7] Mafruz Zaman Ashrafi, David Taniar, Kate A. Smith, ”Data
Mining Architecture for Clustered Environments” ,
Proceeding PARA '02 Proceedings of the 6th International
Conference on Applied Parallel Computing Advanced
Scientific Computing, Pages 89-98, Springer- Verlag
London, UK ©2002.
[8] Dileep Kumar Singh et al. “Data Security and Privacy in
Data Mining: Research Issues & Preparation” International
Journal of Computer Trends and Technologyvolume4Issue2- 2013 ISSN: 2231-2803.
[9] Yujie Zheng ,”Clustering Methods in Data Mining with its
Applications in High Education”2012 International
Conference on Education Technology and Computer
(ICETC2012) IPCSIT vol.43 (2012) © (2012) IACSIT
Press, Singapore.
[10] Riehard A. et al. “Wichern. Applied Multivariate Statistical
Analysis (5 th Ed) 2003.
[11] Guttman L. The quantification of a class of attributes: A
theory and Method of scale construction[C].The Committee
on Social Adjustment(ed.),The Prediction of Personal
Adjustment. New York : Social Science Research Council
, 1941.
[12] Karimella Vikram and Niraj Upadhayaya, “Data Mining
Tools and Techniques: a review,” Computer Engineering
and Intelligent Systems, Vol 2, No.8, 2011, pp.31-39.
[13] (2006) “Advantages & Disadvantages of Data Mining?”
[online].
[14] Jiawei Han and Jing Gao, “Research Challenges for Data
Mining in Science and Engineering”, Chapter 8, pp.1-8,
[15] Kusiak, A., Kernstine, K.H., Kern, J.A., McLaughlin, K.A.,
and Tseng, T.L., “Data Mining: Medical And Engineering
Case Studies”. Proceedings of the Industrial Engineering
Research 2000 Conference, Cleveland, Ohio, pp. 1-7,May
21-23, 2000.
[16] Romero, C., Ventura, S. and De-Bra, P. “Knowledge
Discovery with Genetic Programming for Providing
Feedback to Courseware Authors, Kluwer Academic
Publishers, Printed in the Netherlands, 30/08/2004”.
[17] Neelamadhab Padhyet al.” The Survey of Data Mining
Applications And Feature Scope ”International Journal of
Computer Science, Engineering and Information
Technology (IJCSEIT), Vol.2, No.3, June 2012 DOI :
10.5121/ijcseit.2012.2303 43
[18] Cai, W. and Li L., “Anomaly Detection using TCP Header
Information, STAT753 Class Project Paper, May 2004.”.
Nandi, T., Rao, C. B. and Ramchandran, S., “Comparative
genomics using data mining tools, Journal of Bio-Science,
Indian Academy of Sciences, Vol. 27,No. 1, Suppl. 1, page
No. 15-25, February 2002”.
[19] Robert P. Schumaker ,Osama K. Solieman ,Hsinchun Chen
,Springer.
[20] Content Technology and its Applications Volume 4,
Number 9, December 2010.
[21] Anmol Kumar et al.” data Mining: Various Issues and
Challenges for Future A Short discussion on Data Mining
issues for future work” International Journal of Emerging
Technology and Advanced Engineering, (ISSN 2250-2459
(Online), Volume 4, Special Issue 1, February 2014)
International Conference on Advanced Developments in
Engineering and Technology (ICADET-14), INDIA.
[22] Sangeeta Goele, Nisha Chanana, “Data Mining Trend In
Past, Current And Future,” International Journal of
Computing & Business Research, in Proc. I-Society 2012,
2012.
www.ijcsce.org
57