Download Algorithmic Approach to Data Mining and Classification Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/88874, July 2016
ISSN (Print) : 0974-6846
ISSN (Online) : 0974-5645
Algorithmic Approach to Data Mining and
Classification Techniques
Amit Verma*, Iqbaldeep Kaur and Amandeep Kaur
Department of Computer Science and Engineering, Chandigarh Engineering College,
Mohali - 140307, Punjab, India;
[email protected], [email protected], [email protected]
Abstract
Objective/Background: This paper highlights the extension of access data to data mining from passing year to recent. Main
aim of this paper is comparative study of tools/techniques/algorithms which are used for analysis of huge amount of data.
Methods/Statistical Analysis: Different methods of data mining has been studied and discussed which include decision
tree, neural network, regression, clustering techniques are implemented on different tools for fraud detection. Different
algorithms Adaboost, page rank, K-means used for data mining are also discussed. For generate relevant information from
data streams, frequent pattern generation tree algorithm is also implemented and discussed. Findings: Out of so many
available algorithms decision tree has been found out to be the most suitable for mining data provided the data is restricted
to some thousand of entries. The most prominent feature as its advantage lies in its clear illustration in the form of graphical
tree with inherent tree structure capability. However the concern about ambiguity should be carefully dealt with maintains
consistency. Applications: For the extraction of the relevant data, data mining is helpful in various ways. The various areas
where data mining is being used have also been discussed in the paper. Future Scope: The scope of the paper extends from
an exhaustive survey and analysis of all available empirical and conceptual techniques and tools in the area of data mining.
Keywords: Association Rule Mining, Classification, Clustering, Data, Data Mining, Decision tree, Neural Network
1. Introduction
Data is a collection of facts, such as numbers, words,
measurements, observations or even just descriptions
of things. It means that data is mainly of two types, i.e.
Quantitative form (it is numerical data) and Qualitative
form (it is descriptive information). Data can be shown in
written form, in the form of graphs, in the form of tables
or in pictorial form, in the form of bits in electronic memory or it can be simple facts in the mind. Data is plural
term of datum. Datum is a single piece of information.
Generally, the term data is used as both singular and
plural form. In earlier times when computers were not
invented, data was stored in form of papers. Still it is avail-
*Author for correspondence
able in large scale in paper form. We call it books. Students
also write information on pages or papers of copy, which
is also data. In today’s world all the organizations are connecting to computers. They want to store their data for
keeping records, for doing manipulations on stored data
to make their services better. Shops store data of buying
and selling to keep record of the money input and output and the amount of benefit they are getting. Banks
store data to keep track of money transfers which can
be accessed by only particular persons. Keeping data in
hard disks or on other means of computer storage makes
it faster to upload, download. It is more secure and trustworthy. It can be transmitted to longer distances securely
and only it can be more convenient. Any person can know
Algorithmic Approach to Data Mining and Classification Techniques
his bank account details just by using his computer or
mobile from anywhere in the world. Almost all the organizations are continually storing data and it made the data
in an extremely vast form. Internet is one of the medium
which is used to access that data from anywhere in the
world in a secure, cheaper and convenient form.
The Mother Nature has enabled humans to collect,
store, sense feel, see, hear and exchange the information
in environment. Earlier they store that information in
their minds in the form of facts. Later they started sharing the data by gestures and then in the evolution history,
they used speech to share data. After that data was shared
in the form of both gesture and speech form. At that time
there was no way to store the spoken data in tangible
form. So data was tried to be stored by making drawings
on rocks, later leaves were used to store data in textual
form. But still it was not very safe for a longer time. Slowly
it was felt importance to store data so that it can be used
for record keeping. So that future generations can also
use it. Then data was stored in written form on animal
skin, because it can last longer. But it was not a permanent solution. So there was discovery of paper. In earlier
times, to write there was use of flower colors or other colors from nature. Slowly pencil and pen were discovered.
Data was stored by scientists, organizations, etc. to keep
records. But they were destructible and it was time consuming to manipulate them. So there was discovery of
electronic devices to store and share data. Earlier data was
stored in vocal form and there was discovery of musical
instruments which can store data.
Data was shared in the form of radio signals. Slowly
telephones were discovered to share data. After that
T.V. was discovered to share the stored data in pictorial
form. Magnetic tapes and magnetic disks were used to
store vocal and visual data. Computer made is very fast
to store data and to work on it. It was very safe to store
data. Slowly with the advancement in technology, data
was stored in large volumes. Large amount of data can
be stored or edited in textual, graphical, pictorial, audio
or video form. Now we use pen-drives, memory cards
and cloud-computing techniques to store and retrieve
data. Many years later in around 18th – 19th century,
punch cards were used. In 1725, there was invention of
‘punch cards’ and for information storage they were used
in 1832. In 1890, the scientist named Herman Hollerith
was the first to invent a punch card that could be read by
a machine. In 20th century, there was invention of mag-
2
Vol 9 (28) | July 2016 | www.indjst.org
netic tapes. They were first used in 1951. Magnetic tapes
replaced punch cards. It had the capability to store more
data of more than 10,000 punch cards.
In future, we can expect ‘holographic layers’. It would
allow data to be encoded on tiny holograms’ layers. And
it would have capacity to store data for more than thirty
years. Another storage technique could be ‘quantum storage’, it will be extremely small in size and it couldn’t be
read by even the smallest microscope.
1.1 Taxonomies of Data
Data may be a sequential data, time series, temporal,
spatio-temporal, audio signal, video signal. These1 are
explained in Table 1.
Table 1. Taxonomies of data
DTY
DEF
ITN
Sda
GEAS
Mar, Dfl, Mtsd
Tsd
MTI
Oti, Sc
Tda
OCTP
AHB, WONLA
Std
MSTI
Fsh, Tmg
DTY = Types of data, DEF = Definition, ITN = Instances
Sda = Sequential data, GEAS = Groups of elements are
arranged or structure in a sequence.
1.2 Concept of Data Mining
Data mining introduce the concept2 of mining or extracting relevant knowledge from the data that is present in
huge amount. So it is also termed as Knowledge Discovery
from Data (KDD). The main aim of data mining is to
mine small set of valuable chunk from large amount of
raw data.
1.3 Evolutionary Stages of Data Mining
TAfter long time hard work done by the researchers number of data mining techniques come out. In earlier time,
data was collected into the computers and this extends
to data access. Today, data can be navigating data in real
time. Evolution3 of data mining is explained in Table 2.
The core components of data mining technology have
been developing for decades in research areas such as statistics, artificial intelligence and machine learning.
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
Table 2. Evolution of data mining
STG
BCN
ETG
PPD
CTR
Cd(1960s)
CTR
Com , Tap, Dsk
IBM, CDC
SDD
Ad(1980s)
SAT
Rdb,Sql,ODBC
Ora, Syb, Ifo, Mcs,IBM,
Ddr
Nd(1990s)
CS and CP
Olap,Mdd, Wd
Plt,IRI, Arb, Rdb, Evt
Ddm
Md(2000)
ESRTF and IF
Alg,Cmul,Dma
Lkh,Nus,IBM, SGI
Prp, Pid
STG = BCN = Business Concern, ETG = Enabling technology, PPD = Product providers, CTR = Characteristics, Cd
= Data Collection.
1.4 Types of Data Mining System
Data mining systems can be labeled into4 numbers of categories. These are explained as below:
• Classification of data mining systems according to
the type of data source mined: In today’s scenario
huge amount of data is available in the organization
that is also of similar fashion so it is very difficult
to extract relevant information. So there is need to
group that data according to its type.
• Classification of data mining systems according to
the data model: Data should be classified according
to the data models and these models are like Object
Model, Object Oriented data Model, Relational data
model, Hierarchical data Model/W data model.
• Classification of data mining systems according to the kind of knowledge discovered: Kind of
knowledge discovered means classification of data
will be according to the data mining functionality.
Functionality can be characterization, discrimination, association, classification, clustering, etc.
• Classification of data mining systems according
to mining techniques used: Data will be grouped
according to the techniques such as genetic algorithms, statistics, visualization, database oriented or
data warehouse-oriented, machine learning, neural
networks.
Vol 9 (28) | July 2016 | www.indjst.org
1.5 Data Mining and Knowledge Discovery
process
As data mining is to mine relevant knowledge from raw
material. Knowledge discovery2 as a process is an iterative
sequence of the following steps:
• Data cleaning: It means to remove noise from data
and inconsistent data.
• Data integration: It means to combines multiple
sources from where data can be collected.
• Data selection: It refers to retrieve relevant data
from the database.
• Data transformation: It transforms the data into the
suitable form that considered for data mining. It is
done by using operations like aggregation or summary.
• Data mining: In which number of intelligent methods are applied to extract frequent patterns.
• Pattern evaluation: In this, the patterns that generated are analyzed that really represent the
knowledge based on some measures.
• Knowledge presentation: It uses to present the
knowledge to user using some techniques like visualization and knowledge representation.
3.1 Data Flow Diagram (Data Mining)
Level-0
In level-0 DFD, primary functionality of data mining is explained that firstly, user put their Query that is
named as user query after that data mining processed
Indian Journal of Science and Technology
3
Algorithmic Approach to Data Mining and Classification Techniques
Relevant
Information
Data Mining
User
Query
Figure 1. Data flow diagram of data mining level-0.
this query and give relevant information as per the user
requirement.
This DFD explain the basic concept of data mining
that how information flow and after long Process information is retrieved by a user.
Level-1
In level-1 DFD, further explanation of data mining
process is given. As user put there query for retrieval of
information then data mining tools like Rapid Miner,
Weka, Knime, Orange which helps to retrieve the data
from the database.
Information retrieval can be done with the help of
various data mining techniques such as Classification,
Clustering, Decision Tree, Neural Network these tech-
nique are helpful to retrieve relevant data as per user
requirements.
Level-2
In level-2 DFD, Algorithm used in data mining is
named which are helpful to retrieve data from the database these algorithms are used under the particular
techniques.
These are like Artificial neural network (Algorithm:
Kohonen clustering Algorithm, learning algorithms),
Classification (Algorithm: Genetic Algorithms, Bayesian
networks Neural networks), Clustering (Algorithm:
Relocation Algorithms, K-medoids Methods, K-means
Methods, Association rule Algorithm: Multilevel association rule, Multi-dimensional association rule.
Data mining Tools and techniques
User
Query
Rapid Miner
Classification
Weka
Clustering
Database
Knime
Orange
Relevant
Information
Decision
Tree
Neural
Network
Figure 2. Data flow diagram of data mining level-1.
4
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
Database
Data mining techniques with Algorithm
Artificial neural
Classification
network. Algorithm:
Kohonen clustering
Algorithm: Genetic
Algorithms,
Bayesian networks
Neural networks
Algorithm, learning
algorithms
User Query
Relevant
Information
Clustering
Association rule
Algorithm:Relocation
Algorithms,ƒKmedoids Methods, Kmeans Methods
Algorithm:Multilevel
association rule,
Multi-dimensional
association rule,
Quantitative
association rule
Figure 3. Data flow diagram of data mining level-2.
2. Data Mining Tools
There are number of tools for data mining. All of them
Excel is one commercial tool for data mining. Modern
data mining tools like Rapid miner, Weka, R these all are
software based and also have graphical integrated environment. These features are effectively used for mining
the relevant data from large amount of data. All the tools
Vol 9 (28) | July 2016 | www.indjst.org
of data mining are compatible with each environment.
These all can install in Windows, Mac Os and Linux very
easily.
Here are parts of the table with the active tools that are
used for data mining and these parts of table are arranged
according to the release date of tool. All the tools are elaborated in Table 3 from year 1998 to 2004, Table 4 from
2006 to 2015 and in Table 5 from 2013 to 2015.
Indian Journal of Science and Technology
5
Algorithmic Approach to Data Mining and Classification Techniques
Table 3. From 1988 to 2004
SNO
NAM
RDT
LAG
TYP
FTR
PRS
1
Gno
1988
HLPL
CLNP
Ncp
CLI
2
Wek
1993
Jav
Mal
3
Rpr
1993
C,For, R
Sgt
Lnm,Cst,Tsa,,Cla,Clu
FSP
4
Gdm
1999
Pyt
Cot
GUI and Adm
FOS
5
NLTK
2001
Pyt
Tpr,Cla,Toz,Stm, Tag, Par, Str
Snp
LNL
6
ONN
2003
C++
Nen
Dam, Pra
FRDA
7
Kni
2004
Jav, Ecl
Com,Daa Fmw
Dat, Ini, Dta,Ppa,Vis, Rep
Ggu
8
Kel
2004
Jav
Mls
Kex
OS
Dap, Dap, Cal, Reg ,Clu , Asr and Vis JDTSQL, FOC
SNO = Serial Number, NAM = Name, RDT = Release Date, LAG = Language, T YP = Type, FTR = Features, PRS = Pros, Gno = GNU
Octave.
Table 4. From 2006 to 2012
SNO
NAM
RDT
LAG
TYP
FTR
PRS
1
Ram
2006
Jav
Mal,Da,
Tem,Pra, Bua
Atf
Osl
2
Scl
2007
Pyt
Mls
Cla,Reg, Cal
OSML, Sil,
Eft
3
Rgu
2007
Rpr
ETD, Rla
DST, Val, Tst
GUI
4
Ora
2009
Pyt
Mal ForBioandTem
Daa
OSNE
5
CMSR
2010
Jav
Prm, Dav, Rme, Sda
Som
Ost
6
Mlt
2011
Jav
Mlr, Iex,
Algo, Ecp
OSS
7
Mlf
2012
Jav
Mlp
Acn, HTMLrt
Cli
8
Mlpy
2012
Pyt
SUpr
Mdl, Mtb, Usb
OS
SNO = Serial Number, NAM = Name, RDT = Release Date, LAG = Language, T YP = Type, FTR = Features, PRS = Pros, Ram=
Rapid Miner, Jav= Java.
6
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
Table 5. From 2013 to 2015
SNO
NAM
RDT
LAG
TYP
FTR
PRS
1
Grl
2013
C++
Dcf,
Vda, Psr
Hpf
2
Dlib
2014
C++
Lib, Mal
Trd, NAlg,GUI
OSS
3
UIMA
2014
Jav
Lid, Lss, Sbt
Auc
WCns
4
ELKI
2014
Jav
Aca
Odt,PAlg
Hpf, Sbl
5
Lis
2014
Jav
Mal
Cla,Pes
Git
6
Lil
2015
C++
Mal
Aps, Pes
Sit
7
VWB
2015
Pyt
Mal
LAlg,
OAlg
OS
8
Shg
2015
C++
IMmd, Mal
Reg, Cla
OS, Sbl
9
Apm
2015
Jav
Mal
Cft,Clu, Cla
Sbl
Grl = Graph Lab, Dcf = Distributed Computation Framework, Vda = Visualization of Data, Psr = Predictive Services, Hpf
= High performance.
So, these are some tools which were used earlier and
today for data mining and which are good and popular
for extraction of useful information from huge amount
of data.
3. Data Mining Techniques
Data mining adopts its technique from many research
areas, including statics machine learning, database systems, rough sets, visualization and neural networks.
These are explaining in Table 6.
Table 6. Data mining techniques
NAM
ALG
SAP
PRS
CNS
Det
Ida
ATEC
FTOT
Obe
Clu
Rea,Prc,Kmm, Kem
Mar,Par,Daa, Img
AC and FDDG
LClu and Lac
Cla
Gea,Rbi,Nen, Gea, Ban
Bla, Mar, Dft
Frd, Cra
ETC
Asr
Mar,Qar,Qar
Cad,Crm, Csb
Rgv
Rlh
Ann
Kca , Lea
Smp ,Tsp
Is, Nnp
Nnr,CHtd
NAM = Name, ALG = Algorithm, SAP = Specific application, PRS = Pros, CNS = Cons, Det = Decision trees, Ida = ID3 algorithm.
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
7
Algorithmic Approach to Data Mining and Classification Techniques
3. Review of Literature
In2 proposed how data mining and knowledge discovery are related to each other and to other fields such as
machine learning and statistics. Method to discover
knowledge discovery from a database through data mining is given. Data mining steps of KDD steps are given. An
experiment was performed on a ‘Loan Dataset’ by using
linear classification boundary. Also clustering technique
applied on this dataset and three clusters were shown in
a result. Specialized methods with particular algorithms
that can be implemented on a dataset are also explained.
In5 discussed fourteen data mining tools for different platforms. Tools solve binary classification problem,
multi-class classification problem and noise less estimation problem. Multiple traits were collected in five
categories that are usability, interoperability, flexibility,
capability and accuracy. After selection of techniques,
tools can be selected by an employ by a developer to
develop a particular product. Also performance of tools
according to related technique was categorized into a
Table 7.
Table 7. Data mining tool evaluation summary
Tch
Tre
Rul
Nrl
Pln
Tol
Cpb
Usb
Ipb
Fbl
Acc
Ovl
Prc
Crt
1
+
-
1+
1
1+
995
Scn
+-
1
1
-
0
+
695
See
+
+-
+
+-
1
+
440
Tra
+
+
1+
+
1+
1+
Med = 845
Wzy
+
1+
+
+-
-
+
4000
Dmd
1+
11
1
+-
+
1+
25000
Dms
-
0
+-
-
1
-
75
Rua
+
+
1+
-
+
+
Med = 4000
Ns2
-
+
-
-
11
+-
395
Plp
+-
-
-
+-
+
+-
495
Prw
1+
1
11
+
11
1
10,000
Nua
+-
+
+
+-
1
+-
Med = 495
Mqe
1
+
+
1+
1
1+
5,950
Ns2
+-
+
+
+
1
+
495
Gno
+-
+
0
+-
11
+-
4,900
Pna
+-
1-
-
+
1
+-
Med = 2.698
Ova
+
+
+
+-
1+
+
Med = 845
1 = Good, 0 = Poor, 11 = Excellent, + = Average, - = Needs Improvement, None- Does not exist, NE = Not Evaluated,
Med = Median, Ova = Overall Average.
8
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
In6 proposed a tool for fraud detection. Five data mining tools were compared for fraud detection application.
Tool selection was done based on Computer system environment, intended end user and ease of use. Data mining
tools selected and their contained algorithms are shown
in Table 8.
Table 8. Algorithms implemented
In7 proposed real time signal planning using clustering technique. For to identify time of day automatically,
cluster analysis approach was used. Set of sensors were
used in traffic signal system by using high-resolution system state. CART tool was used to automatically generate
TOD intervals and also used for planned development
and maintenance. Numeric TOD Representations are
shown in Table 10.
Algo
IBM
ISL
SAS
TMC
Unica
Dct
1
1
1
1
-
GSY
TGS
Nun
1
1
1
1
1
1
22:30-2:30
Reg
0
1
1
-
1
2
2:30-5:00
Rab
12
-
-
-
1
3
5:00-7:30
Nnb
-
-
1
1
1
4
7:30-10:00
MKM
-
1
1
-
1
5
10:00-12:30
Clu
1
1
-
-
1
6
12:30-15:00
Asr
1
1
-
-
-
7
15:00-17:30
8
17:30-20:00
9
20:00-22:30
1 = Yes, - = Not Available, 0 = Accessed in data analysis only, 2 = estimation only (not for classification).
Algo = Algorithm,Dct = Decision Trees, Nun = Neural
Network, Reg = Regression.
Ease of use was evaluated in five categories and its
comparison on the basis of four categories is shown in
Table 9 that is given.
Table 9. Ease of use comparison
Ctg
IBM
ISL
SAS
TMC
Unica
Dlm
3.1
3.7
3.7
3.1
3.9
Mdb
3.1
4.6
3.9
3.2
4.8
Mdu
3.2
4.2
2.6
3.8
3.8
Tcs
3.0
4.0
2.8
3.2
4.7
Ous
3.1
4.1
3.1
3.4
4.2
Ctg = Category, Dlm = Data load and manipulation,
Mdb = Model building, Mdu = Model understanding, Tcs =
Technical support, Ous = Overall usability.
Vol 9 (28) | July 2016 | www.indjst.org
Table 10. Numeric TOD representations
GSY = Graph symbol, TGS = Time of graph symbol.
In8 proposed Algorithm Development and Mining
(ADAM) toolkit. This toolkit can be used with scientific
data. It is used for data mining methods like classification,
clustering. This toolkit can be used for image processing
data cleaning and feature reduction. Architecture and
design of ADAM were included and its application is discussed. A case study for Cumulus cloud detection using
satellite images was also taken and its results were calculated.
In9 discussed through study on teenage drivers and
senior drivers. For this a metadata was taken to analyze
it thoroughly for that data mining techniques were used
for these kind of problems. On the basis of drivers’ age,
gender, perception of road signs, alcohol used, medical
conditions, fragility roadway accidents were calculated.
Based on above descriptive analysis like frequency, mean
and standard deviation were calculated. Variables of
descriptive statistics are shown in Table 11.
Indian Journal of Science and Technology
9
Algorithmic Approach to Data Mining and Classification Techniques
Table 11. Descriptive statistics of the variables
VAR
NSP
MEAN
SDV
Agr
127
3.13
0.678
Dtr
127
1.03
0.250
Bpa
127
0.42
0.495
Drv
127
1.00
0.000
Aln
127
0.83
0.373
Gl
127
0.83
0.380
Dnt
127
0.88
0.324
Mle
125
2.68
0.980
Dwe
125
2.66
0.569
Crs
127
0.27
0.495
VAR = Variables, NSP = Number in sample, MEAN = Mean, SDV = Standard deviation, Agr = Age groups, Dtr = District, Bpa
= Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a week, Crs = Crashes.
This is analysis between the crashes that are with
respondents and which respondents are not involved in
crashes. This is shown in Table 12.
Table 12. Descriptive analysis between the respondents with crashes and those not involved in crashes
N
RNG
MNM
MNM
MEAN
STS
STS
STS
STS
SER
STS
STS
Agr
96
2
2
4
3.07
0.068
0.669
0.447
Dtr
96
2
1
3
1.04
0.029
0.287
0.082
Bpa
94
1
0
1
0.44
0.051
0.499
0.249
Drv
96
0
1
1
1.00
0.000
0.000
0.000
Aln
96
1
0
1
0.81
0.040
0.392
0.154
Gl
96
1
0
1
0.83
0.038
0.375
0.140
Dnt
96
1
0
1
0.87
0.034
0.332
0.111
Mle
94
3
1
4
2.73
0.102
0.986
0.972
Dwe
94
2
1
3
2.63
0.057
0.548
0.301
CRH
STS
NON
10
Vol 9 (28) | July 2016 | www.indjst.org
STD
VAR
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
ACC
Agr
31
2
2
4
3.29
0.124
0.693
0.480
Dtr
31
0
1
1
1.00
0.000
0.000
0.000
Bpa
31
1
0
1
0.39
0.089
0.495
0.245
Drv
30
0
1
1
1.00
0.3000
0.000
0.000
Aln
30
1
0
1
0.90
0.056
0.305
0.093
Gl
31
1
0
1
0.81
0.072
0.402
0.161
Dnt
31
1
0
1
0.90
0.054
0.301
0.090
Mle
31
3
1
4
2.52
0.173
0.962
0.925
Dwe
31
2
1
3
2.68
0.108
0.599
0.359
CRH = Crashes, STS = Statistic, RNG = Range, MNM = Minimun, MEAN = Mean, SER = Standard error, STD = Standard, VAR
= Variance, Non = None, , Agr = Age groups, Dtr = District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt =
Drive at night, Mle = Miles, Dwe = Days a week, Acc = Accident.
Comparison of different crashes with each value of
record is shown in Table 13.
Table 13. Comparing predicted crashes with actual crash
values for each record
RPR
NRD
ACC
Crt
49
76.56
Wrg
15
23.44
Ttl
64
100.00
RPR = Result of prediction, NRD = Number of records,
ACC = Accuracy, Crt = Correct, Wrg = Wrong, Ttl = Total.
In this matrix predicted crash values are shown in
Table 14.
Vol 9 (28) | July 2016 | www.indjst.org
Table 14. Coincidence matrix for predicted crash values
0(NCR)
1(CRH)
NAR
0(NCR)
44
4
48
1(CRH)
11
5
16
NAR
55
9
64
NCR = No crash, CRH = Crash, NAR = Number of actual
records.
In10 introduced ten algorithms of data mining.
Algorithms were described and also impact of each algorithm was discussed. Introduced algorithms include
classification, clustering, association analysis, statistical
learning.
Indian Journal of Science and Technology
11
Algorithmic Approach to Data Mining and Classification Techniques
AdaBoost Algorithm
Int DSET,L,TL = Learning Algorithm
For (DSET = 1, DSET≤ DT,DSET++)T = Epoch (Number of learning round)
{
Di = 1/m\\ Initialization weight distribution
}
For (Di(w) = 1, Di(w) = Di, Di(w)++)
{
Train dataset from Di(w);
Calculate ER;
}
If Di(w) = Di /Zt × {
Zt = Normal Distribution
{
Calculate h(x);
}
Else
{
Exit;
}
Page-Rank Algorithm
Int i,hln,hloWhere
I = Page
If i>1hln= Hyperlink
{
For (Pi = 1,Pi≤Pl,Pi++)
{
Calculate hln;Where hln€Pi
}
}
If i<1
{
For (Pi = 1,Pi≤Pl,Pi++)
{
Calculate hlo;
Where hlo€Pi
}
}
K-Means algorithm
Int Vs, Ci, Cc (i=1 to n)
12
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
For (Vi = 1, Vi ≤ Vic, Vi++)Cc= Cluster Centroid
{
Ci = Centroid vector
Select CiDj = Document vector
{ Vi =Vector
For (Ci = 1, Ci≤ Cic, C++)
{
If Ci
Cc
{
Select Cc
}
Else
{
Exit (1)
}
}
}
}
Calculate dj
Dj
Ci
(i =1……….to r)
In11 discussed how Geographical data mining analyst is used for remote sensing image analysis. For spatial
data mining, GeoDMA used decision tree strategies
which connect images with geographical data types using
data warehouse. To improve classification accuracy new
approach was proposed that was based on polar coordinates transformation. Various tables were described as an
output of segmentation based spatial features and landscape based features.
Vol 9 (28) | July 2016 | www.indjst.org
In12 defined three key issues for a managerial personnel for investigating data mining tools, these issues
are task fit, technology use and habit. To investigate a
tool Task Technology Fit (TTF) model, ExpectationConfirmation Modes (ECM) were used for continuation
use of Data Mining Tools (DMT). As user satisfaction
and received usefulness were the main key predictors.
Multiple hypotheses were taken for decision making.
Convergent validity is shown in Table 15.
Indian Journal of Science and Technology
13
Algorithmic Approach to Data Mining and Classification Techniques
Table 15. Convergent validity
CST
Cnf
Cit
Hbt
Ust
Ttf
ITM
FLD
TVL
CON1
0.918
38.10
CON2
0.916
40.93
CON3
0.884
30.79
CI1
0.803
15.40
CI2
0.870
33.59
CI3
0.769
16.93
HAB1
0.948
79.06
HAB2
0.940
74.66
HAB3
0.806
20.19
SAT1
0.891
28.71
SAT2
0.927
51.10
SAT3
0.934
66.79
TTF2
0.777
12.88
TTF3
0.847
19.07
TTF
0.832
17.49
MEAN
STD
C.R
AVE
CRB
5.48
1.02
0.93
0.82
0.89
5.56
1.09
0.86
0.68
0.76
4.60
1.55
0.93
0.81
0.88
5.07
1.20
0.96
0.84
0.94
5.27
1.21
0.91
0.67
0.87
CST = Construct, ITM = Items, FLD = Factor loading, TVL = T-Value, MEAN = Mean, STD = Standard deviation, CRB = Cronbach’s,
Cnf = Confirmation, Cit = Continuance intention, Hbt = Habit, Ust = User satisfaction, Ttf = Task technology fit.
14
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
The results of different models with comparison are
shown in Table 16.
Table 16. Results of the comparative models
MDL
R2
Pst
0.528
TTF
0.413
ECM
0.404
Hbt
0.360
TTF+ECM
0.445
TTF+ Hbt
0.474
ECM+ Hbt
0.458
In13 discussed students learning behavior. In today’s
online learning environment, it is very difficult to uncover
and analyze hidden information manually from large
amount of data. So for that latest data mining tools and
approaches are used in educational research. This paper
purposes to use Google Analytics tool with data mining technique into blog environment to fetch log data for
analysis.
In1 proposed an algorithm for frequent pattern generation tree of relevant information from a data stream.
Stream data was defined as Facebook, Twitter, Internet,
Relay chats, ATM transaction, weather forecasting, stock
market prediction and also can be related to medicine.
Different parameters were taken for handling data streams
such as Data access, data speed, available memory, data
modeling, sampling, etc. Also a case study was taken to
generate frequent pattern from a huge amount of data.
MDL= Models, Pst = Present, Hbt = Habit
Decision Tree: Frequent Pattern Generation Tree.
Int RN ,LT ,SW, n,m;
Where
RN = Root Node
For(n = 1,n≤m,n++)LT = List of Transactions
{
SW = Sliding Window
Create RN
Where RN € m
}
N = Number of Nodes
m€ LTM = Number of Items
For (LT = 1, LT≤ LTD,LT++)
{
Scan DT ;
}
If DT ==DDT = Data Set
{
Compare mset ;
MSet = Item Set
}
Else
Exit;
If SValue < UTN
SValue = Support Value
{
UTN = User Threshold
Remove nS = ni-1
}
Else
Retain ni ;
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
15
Algorithmic Approach to Data Mining and Classification Techniques
For (nS = 1, nS ≤ ni+1,nS++)
{
Consider nP
Remove npi-1 = ni
Else
{
Retain npi-1;
}
In14 Discus that in higher education institution calculate the success rate of the student to decided whether they
should continue the particular course or not. This can be
np = Parent Node
(P€ E)
E = Current Node
done by using data mining tools. Also it focuses on the
small student data set that was difficult to do data by data
mining according to other authors. In this, Comparison
of MS Excel sheets and data mining tool Weka was done.
Table 17. Key influencer report for final grade
CLM
VLU
FVR
RIM
Syr
2012–13
Ept
100
Fpt
92
10
100
Apt
39–42
7
100
Fpt
77
8
100
Fpt
85
9
100
Ept
9
2
100
Fpt
53
2
100
Ept
13
2
100
Fpt
51
2
100
Ept
8
4
100
Fpt
41
4
100
Ept
21
4
100
Fpt
66
6
100
CLM = Column, VLU = Value, FVR = Favors, RIM = Relative impact, Syr = Study year, Fpt = Final points, Apt = Activities points,
Ept = Exam points, Ept = Empty.
In15 compared the implemented algorithms that can
cover the data mining techniques like classification, clustering, visualization and feature selection. It also describes
the advantages and disadvantages of particular tool. It
16
Vol 9 (28) | July 2016 | www.indjst.org
also tries to investigate which tool is best.
In this different data mining procedures and algorithm which are supported by the tools are given in Table
18.
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
Table 18. Data mining algorithms and procedures supported by the tools
CTG
DIM
FSl
FTF
CRL
BNT
ESM
LRN
NAM
RPR
R
WEK
ORG
KNM
Tfl
1
1
1
1
1
Sin
1
00
1
1
1
Ssh
1
00
0
0
00
Flt
1
00
1
1
1
Wrp
1
1
1
0
1
PCA
1
1
1
1
1
ICA
1
1
0
0
0
MDS
0
1
1
00
1
1 Rul
1
00
1
0
00
Part
00
00
1
0
1
Rpr
1
00
1
0
00
Nbv
1
1
1
1
1
Fbn
00
0
1
0
00
AOde
00
00
1
0
00
Bgg
1
00
1
1
00
Abt
1
00
1
1
00
Rft
1
00
1
1
1
Rrt
0
1
0
0
0
CTG = Category, NAM = Name, RPR = Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime, DIM = Data
import, Tfl = Textual files, Sin = Specific input format, Ssh = Spread sheet, FSl = Feature selection, Flt = Filters, Wrp =
Wrappers.
Vol 9 (28) | July 2016 | www.indjst.org
Indian Journal of Science and Technology
17
Algorithmic Approach to Data Mining and Classification Techniques
Advanced and specialized data mining tasks are
shown in Table 19.
Table 19. Support for specialized and advanced data mining tasks
NAM
RPR
R
WEK
ORG
KNM
SLN
Bdt
T
B
T
-
B
T
Gmg
-
B
B
-
B
-
Sda
-
B
-
-
B
T
Tsa
T
Y,B
T
-
Y
T
Dst
Y
B
T
B
B
Y
Tmg
B
B
T
B
B
Y
Dlg
-
T
-
-
-
T
NAM = Name, RPR = Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime, SLN = Scikit
learn, Bdt = Big data, Gmg = Graph mining, Sda = Spatial data analysis, Tsa = Time-series analysis, Dst = Data streams, Tmg = Text mining, Dlg = Deep learning.
In16 compared various tools on the basis of file formats, operating system supported and general features.
Tools such as Orange, Weka, dlib, Rapid Miner were dis-
cussed. Also different features of clustering algorithm are
explained in Table 20.
Table 20. Different features of various clustering algorithms
CTG
Hch
Ptn
Grd
18
Vol 9 (28) | July 2016 | www.indjst.org
ALGO
TDT
HHD
HND
Brh
0
0
0
Cre
0
1
1
Rck
C
0
0
FCM
0
0
0
Kmn
0
0
0
PAM
0
0
0
Ogd
SL
1
1
Clq
0
1
0
Stg
S
0
1
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
Irl
Dst
EM
SL
1
0
Cwb
0
0
0
Cst
0
0
0
Dcn
0
0
0
Dcl
0
0
1
Ots
0
0
1
Sl-Special, 1-Yes, 0- No, L-Large, S-Small
CTG = Categories, ALGO = Algorithm, TDT = Type of data, HHD = Handling high dimensionnality, HND = Handling noisy data, Hch = Hierarchical, Brh = Birch, Cre = Cure.
5. Conclusion
TIn this paper a vast survey of data mining techniques and related tools have been presented. The History
of data mining provides the scenario how the evolution
steps get integrated with new state of the art techniques.
All strata of the work for data mining and algorithms like
ADABOOST, DECESION TREE and PAGE REANKING
leads to the cue that automation of the work can be
achieved by supervised learning approach in amalgamation with neural network.
6. References
1. Phridvi Raj MSB, Guru Rao CV. Data mining–past, present and future – A typical survey on data streams. Procedia
Technology, 2014; 12:255–63.
2. Usama F, Piatetsky-Shapiro G, Smyth P. From data mining
to knowledge discovery in databases. American Association
for Artificial Intelligence.1996; 17(3):37–54.
3. Chris R, Wang JC, Yen DC. Data mining techniques for
customer relationship management. Technology in Society.
2002; 24(4):483–502.
4. Padhy N, Mishra P, Panigrahi P. The survey of data mining applications and feature scope. International Journal
of Computer Science, Engineering and Information
Technology. 2012; 2(3):43–58.
5. King M, Elder J, Gomolka B, Schmidt E, Summers M,
Toop K. Evaluation of fourteen desktop data mining tools.
IEEE International Conference on Systems, Man and
Cybernetics. San Diego, CA. 1998; 3:2927–32.
6. Abbott DW, Matkovsky IP, Elder JF. An evaluation of
high-end data mining tools for fraud detection. IEEE
Vol 9 (28) | July 2016 | www.indjst.org
International Conference on Systems, Man and Cybernetics.
1998; 3:2836–41.
7. Hauser AT, Scherer TW. Data mining tools for realtime traffic signal decision support and maintenance.
IEEE International Conference on Systems, Man and
Cybernetics. 2001; 3:1471–7.
8. Rushing J, Ramachandran R, Nair U, Graves S, Welch R, Lin
H. ADaM: A Data Mining toolkit for scientists and engineers. Computers and Geosciences. 2005; 31(5):607–18.
9. Bayam E, Liebowitz J, Agresti W. Older drivers and accidents: A meta analysis and data mining application on
traffic accident data. Expert Systems with Applications,
2005; 29(3):598–629.
10. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H,
McLachlan GJ, et al. Top 10 algorithms in data mining.
Knowledge and Information Systems. 2008; 14(1):1–37.
11. Korting TS, Fonseca LM, Camara G. GeoDMA - Geographic
Data Mining Analyst. Computers and Geosciences. 2013;
57:133–45.
12.Huang TCK, Wu IL, Chou CC. Investigating use continuance of data mining tools. International Journal of
Information Management. 2013; 33(5):791–801.
13. Mohamad SK, Tasir Z. Educational data mining: A review.
Procedia-Social and Behavioral Sciences. 2013; 97:320–4.
14.Natek S, Zwilling M. Student data mining solution–
knowledge management system related to higher
education institutions. Expert Systems with Applications.
2014; 41(14):6400–7.
15. Jovic A, Brkic K, Bogunovic N. An overview of free software tools for general data mining. 37th International
Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO);
2014. p. 1112–7.
Indian Journal of Science and Technology
19
Algorithmic Approach to Data Mining and Classification Techniques
16. Gera M, Goel S. Data mining-techniques, methods and algorithms: A review on tools and their validity. International
Journal of Computer Applications. 2015; 113(18):22–9.
17. Sajana T, Sheela Rani CM, Narayana KV. A survey on clustering techniques for big data mining. Indian Journal of
Science and Technology. 2016 Jan; 9(3).
18.Hariharan R, Mahesh C, Prasenna P, Vinoth Kumar R.
Enhancing privacy preservation in data mining using cluster based greedy method in hierarchical approach. Indian
Journal of Science and Technology. 2016 Jan; 9(3).
19. Murugananthan V, Shiva Kumar BL. An adaptive educational data mining technique for mining educational data
models in E-learning systems. Indian Journal of Science
and Technology. 2016 Jan; 9(3).
20. Sivakumar S, Venkataraman S, Selvaraj R. Predictive modeling of student dropout indicators in educational data
mining using improved decision tree. Indian Journal of
Science and Technology. 2016 Jan; 9(4).
21. Undavia JN, Dolia P, Patel A. Customized prediction model
to predict post-graduation course for graduating students
using decision tree classifier. Indian Journal of Science and
Technology. 2016 Mar; 9(12).
22. Alzahrani AS, Qureshi MS. Privacy preserving optimized
rules mining from decision tables and decision trees. Indian
Journal of Science and Technology. 2012 Jun; 5(6).
23.Verma A, Kaur I, Singh I. Comparative analysis of data
mining tools and techniques for information retrieval.
Indian Journal of Science and Technology. 2016; 9(11).
Abbreviations
DTY = Types of data, DEF = Definition, ITN = Instances Sda =
Sequential data, GEAS = Groups of elements are arranged
or structure in a sequence, Mar = Memory array, Dfl = Disk
file, Mtsd = Magnetic tape data storage, Tsd = Time Series
Data, MTI = Consisting of successive measurements made
over a time interval, Oti = Ocean tides, Sc = Counts of sunspots, Tda = Temporal Data, OCTP = Indicates the progress
of object characteristic over time period, AHB = Ageing
of Human beings, WONLA = Wearing out of any non living article, Std = Spatio-temporal data, MSTI = Manages
both space and time information like Tracking of moving
objects, Fsh = Flocking sheep, Tmg = Traffic management.
STG =, BCN = Business concern, ETG = Enabling technology,
PPD = Product providers, CTR = Characteristics, Cd = Data
collection, CTR = Calculation of total revenue or average revenue over a period of time, Com = Computers, Tap =
Tapes, Dsk = Disks, SDD = Static data delivery, Ad = Access
20
Vol 9 (28) | July 2016 | www.indjst.org
24. Purusothaman G, Krishnakumari P. A survey of data mining techniques on risk prediction: Heart disease. Indian
Journal of Science and Technology. 2015; 8(12).
25. Lohita K, Sree AA, Poojitha D, Devi TR, Umamakeswari A.
Performance analysis of various data mining techniques in
the prediction of heart disease. Indian Journal of Science
and Technology. 2015; 8(35).
26. Murugananthan V, Kumar BLS. An adaptive educational
data mining technique for mining educational data models in E-learning systems. Indian Journal of Science and
Technology. 2016; 9(3).
27.Rajalakshmi V, Mala GSA. Anonymization by data relocation using sub-clustering for privacy preserving data
mining. Indian Journal of Science and Technology. 2014;
7(7).
28. Chakradeo SN, Abraham RM, Rani BA, Manjula R. Data
mining: Building social network. Indian Journal of Science
and Technology. 2015; 8(2).
29. Kholghi M, Hassanzadeh H, Keyvanpour MR. Classification
and evaluation of data mining techniques for data stream
requirements. International Symposium on Computer
Communication Control and Automation (3CA); Tainan.
2010. p. 474–8.
30. Purusothaman G, Krishnakumari P. A survey of data mining techniques on risk prediction: Heart disease. Indian
Journal of Science and Technology. 2015; 8(12).
data, SAT = Sales in particular area during any specified
time period, Rdb = Relational databases (RDBMS), Sql =
Structured Query Language (SQL), Ora = Oracle, Syb =
Sybase, Ifo = Informix, Mcs = Microsoft, Ddr = Dynamic
data delivery at record level, Nd = Data Navigation, CS
and CP = Calculate regional sales for a specified period
and comparisons with its peers, Olap = On-line analytic
processing (OLAP), Mdd = Multidimensional databases,
Wd = Data warehouses, Plt = Pilot, Rdb = Redbrick, Arb =
Arbor, Evt = Evolutionary Technologies, Ddm = Dynamic
data delivery at multiple levels, Md = Data Mining, ESRTF
and IF = Estimation of next sale on the basis of real-time
feedback and information exchange, Alg = Advanced
algorithms, Cmul = Multiprocessor computers, Dma =
Massive databases, Lkh = Lockheed, Nus = Numerous startups (nascent industry), Prp = Prospective, Pid = Proactive
information delivery. SNO = Serial Number, NAM = Name,
RDT = Release Date, LAG = Language, TYP = Type , FTR =
Features, PRS = Pros, Gno = GNU Octave, HLPL = High
Indian Journal of Science and Technology
Amit Verma, Iqbaldeep Kaur and Amandeep Kaur
level programming language, CLNP = CLI for linear and
non-linear numerical problems, Ncp = Made for numerical computation, CLI = Command line interface, Wek =
Weka, Dap = Data preprocessing, Cla = classification, Reg =
Regression, Clu = clustering, As r= association rules , Vis =
visualization, JDTSQL = JDBC can be connected through
SQL, FOC = Free of cost, Gdm = Gnome datamine tools,
Pyt = Python, Cot = Collection of tools, Adm = data mining applications, FOS = Free open source software, NLTK
= NLTK (Natural language toolkit), Tpr = Text processing libraries for Cla = Classification, Toz = Tokenization,
Stm = Stemming, Tag = Tagging, Par = Parsing, Str semantic
reasoning, Snp = Symbolic and Statistical natural processing, LNL = An amazing library to play with natural
language, ONN = OpenNN, Nen = Neural network, Dam =
Data mining, Pra = Predictive analytics, FRDA = Provides
framework for research and development of algorithms,
Kni = Knime (Konstanz Information Miner, Jav = Java based
on Ecl = Eclipse, Com = Comprehensive, Daa = Data analytics, Fmw = Framework Dat = Data transformation, Ini =
Initial Investigation, Dta = data access, Ppa = powerful predictive analytics, Rep = Reporting, Ggu = Provide graphical
user interface, Kel = Keel (Knowledge Extraction based on
Evolutionary Learning), Mls = Machine learning software
tools, Kex = knowledge extraction, OS = open source. Ram
= Rapid Miner, Jav = Java, Mal = Machine learning, Dam =
data mining, Tem = text mining, Pra = predictive analytics,
Bua = business analytics, Atf = Offers advanced analytics
through template-based frameworks, Osl = Offered as a
service rather than as local software, Scl = Scikit-learn, Pyt
= Python, Mls = Machine learning, Cla = Classification,
Reg = Regression, Cal = Clustering algorithms, OSML =
Open Source Machine Learning Library, Sil = Simple,
Eft = Efficient tool, Rgu = Rattle GUI, Rpr = R programming, ETD = Edit and Test Data, Rla = R language, DST =
Dataset can be partitioned for training, Val = Validation,
Tst = Testing, GUI = Graphical User Interface, Ora =
Orange, Pyt = Python, Bio = bioinformatics, Daa = Data analytics, OSNE = Open Source Tool for Novice and Experts,
CMSR = CMSR (Cramer Modeling Segmentation and
Rules) data miner, Prm = Predictive modeling, Dav = Data
visualization, Rme = Rule based model evaluation, Sda =
Statistical data analysis, Som = Self Organizing Maps, Ost =
Open source tool, Mlt = Mallet, Mlr = Machine learning,
Iex = Information Extraction, Algo = Wide variety of algorithms. Grl = Graph Lab, Dcf = Distributed computation
framework, Vda = Visualization of data, Psr = Predictive
Vol 9 (28) | July 2016 | www.indjst.org
services, Hpf = High performance, Dlib = Dlib, Lib =
Library, Mal = Machine learning Trd = Threading, NAlg =
Numerical Algorithms, GUI = Graphical User Interfaces,
OSS = Open Source Software, UIMA (Unstructured
Information Management Architecture), Jav = JAVA, Lid
= Language Identification, Lss = Language specific segmentation, Sbt = Sentence boundary detection, Auc =
Analyse unstructured content such as audio, video and
text, WCns = Wrap components as network services, Aca
= Advanced cluster analysis, Odt = Outlier detection,
PAlg = Parameterizable Algorithms, Hpf = High performance, Sbl = Scalability, Lis = Libsvn, Cla = Classification,
Pes = Probability estimates, Git = Graphic interface, Lil =
Liblinear, Aps = Automatic parameter selection, Pes =
Probability estimates, Sit = Simple interface, VWB = Vowpal
wabbit, Pyt = Python, LAlg = Multiple learning algorithms,
OAlg = Multiple optimization algorithms, OS = Open
Source, Shg = Shogun, IMmd = Implementation of Hidden
Markov Models, Reg = Regression, Cla = Classification,
Apm = Apache Mahout, Cft = Collaborative filtering, Clu
= Clustering Cla = Classification. Ecp = Evaluating classifier performance, OSS = Open Source Software, Mlf
= ML-Flex, Mlp = Machine learning packages, Acn =
Computing nodes are analysed in parallel, HTMLrt =
HTML reports, Cli = Command-Line Interface, Mlpy =
Mlpy, Pyt = Python, SUpr = Supervised and Unsupervised
problems, Mdl = Modularity, Mtb = Maintainability, Usb
= Usability, OS = Open Source. = Good, 0 = Poor, 11 =
Excellent, + = Average, - = Needs Improvement, NoneDoes not exist, NE = Not Evaluated, Med = Median, Ova
= Overall Average, Tch = Technology, Tol = Tools, Cpb =
Capability, Usb = Usability, Ipb = Interoperability, Fbl =
Flexibility, Acc = Accuracy, Ovl = Overall, Prc = Price in
form of dollar, Tre = Tree, Crt = Cart, Scn = Scenario, See =
See5, Tra = Tree Average, Rul = Rule, Rua = Rule Average Wzy
WizWhy, Dmd = Datamind, Dms = DMSK, Nrl = Neural, Ns2
= NeuroShell 2, Plp = PcOLPARS, Prw = PRW, Nua = Neural
average, Pln = Poly Net, Mqe = MQ Expert, Gno = Gnosis,
Pna = Poly Net Average
Ctg = Category, Dlm = Data Load and Manipulation,
Mdb = Model Building, Mdu = Model Understanding, Tcs =
Technical Support, Ous = Overall Usability, GSY = Graph
Symbol, TGS = Time of Graph Symbol.
VAR = Variables, NSP = Number in Sample, MEAN =
Mean, SDV = Standard deviation, Agr = Age groups, Dtr =
District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls =
Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a
Indian Journal of Science and Technology
21
Algorithmic Approach to Data Mining and Classification Techniques
week, Crs = Crashes, CRH = Crashes, STS = Statistic, RNG =
Range, MNM = Minimun, MEAN = Mean, SER = Standard
error, STD = Standard, VAR = Variance, Non = None, Agr =
Age groups, Dtr = District, Bpa = Back pain, Drv = Drive,
Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle =
Miles, Dwe = Days a week, Acc = Accident, RPR = Result of
prediction, NRD = Number of records, ACC = Accuracy, Crt
= Correct, Wrg = Wrong, Ttl = Total, NCR = No crash, CRH =
Crash, NAR = Number of actual records, CST = Construct,
ITM = Items, FLD = Factor loading, TVL = T-value, MEAN =
Mean, STD = Standard deviation, CRB = Cronbach’s, Cnf =
Confirmation, Cit = Continuance intention, Hbt = Habit,
Ust = User satisfaction, Ttf = Task technology fit., MDL =
Models, Pst = Present, Hbt = Habit, CLM = Column, VLU
= Value, FVR = Favors, RIM = Relative impact, Syr = Study
year, Fpt = Final points, Apt = Activities points, Ept = Exam
points, Ept = Empty, CTG = Category, NAM = Name, RPR =
Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime,
DIM = Data import, Tfl = Textual files, Sin = Specific input
22
Vol 9 (28) | July 2016 | www.indjst.org
format, Ssh = Spread sheet, FSl = Feature selection, Flt =
Filters, Wrp = Wrappers, FTF = Feature tranformation, CRL
= Classification rules, Rul = Rule, Part = Part, Rpr = Ripper,
BNT = Bayesian networks, Nbv = Naïve Bayes, Fbn = Full
Bayesian Netwok, AOde = AODE, ESM = Ensemble, Bgg =
Bagging, LRN = Learning, Abt = AdaBoost, Rft = Random
Forest, Rrt = Rotation Forest, NAM = Name, RPR = Rapid
Miner, WEK = Weka, ORG = Orange, KNM = Knime, SLN
= Scikit learn, Bdt = Big data, Gmg = Graph mining, Sda
= Spatial data analysis, Tsa = Time-series analysis, Dst =
Data streams, Tmg = Text mining, Dlg = Deep learning,
Sl-Special, 1-Yes, 0-No, L-Large,S-Small, CTG = Categories,
ALGO = Algorithm, TDT = Type of data, HHD = Handling
high dimensionnality, HND = Handling noisy data, Hch =
Hierarchical, Brh = Birch, Cre = Cure, Rck = Rock, Ptn =
Partitioning, Kmn = K-Mean, Grd = Grid, Ogd = Optigrid,
Clq = Clique, Stg = Sting, Irl = Iterative relocation, Cwb =
Cobweb, Cst = Classit, Dst = Density, Dcn = DBSCAN, Dcl
= DCCLASD. Ots = Optics.
Indian Journal of Science and Technology