Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/88874, July 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Algorithmic Approach to Data Mining and Classification Techniques Amit Verma*, Iqbaldeep Kaur and Amandeep Kaur Department of Computer Science and Engineering, Chandigarh Engineering College, Mohali - 140307, Punjab, India; [email protected], [email protected], [email protected] Abstract Objective/Background: This paper highlights the extension of access data to data mining from passing year to recent. Main aim of this paper is comparative study of tools/techniques/algorithms which are used for analysis of huge amount of data. Methods/Statistical Analysis: Different methods of data mining has been studied and discussed which include decision tree, neural network, regression, clustering techniques are implemented on different tools for fraud detection. Different algorithms Adaboost, page rank, K-means used for data mining are also discussed. For generate relevant information from data streams, frequent pattern generation tree algorithm is also implemented and discussed. Findings: Out of so many available algorithms decision tree has been found out to be the most suitable for mining data provided the data is restricted to some thousand of entries. The most prominent feature as its advantage lies in its clear illustration in the form of graphical tree with inherent tree structure capability. However the concern about ambiguity should be carefully dealt with maintains consistency. Applications: For the extraction of the relevant data, data mining is helpful in various ways. The various areas where data mining is being used have also been discussed in the paper. Future Scope: The scope of the paper extends from an exhaustive survey and analysis of all available empirical and conceptual techniques and tools in the area of data mining. Keywords: Association Rule Mining, Classification, Clustering, Data, Data Mining, Decision tree, Neural Network 1. Introduction Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things. It means that data is mainly of two types, i.e. Quantitative form (it is numerical data) and Qualitative form (it is descriptive information). Data can be shown in written form, in the form of graphs, in the form of tables or in pictorial form, in the form of bits in electronic memory or it can be simple facts in the mind. Data is plural term of datum. Datum is a single piece of information. Generally, the term data is used as both singular and plural form. In earlier times when computers were not invented, data was stored in form of papers. Still it is avail- *Author for correspondence able in large scale in paper form. We call it books. Students also write information on pages or papers of copy, which is also data. In today’s world all the organizations are connecting to computers. They want to store their data for keeping records, for doing manipulations on stored data to make their services better. Shops store data of buying and selling to keep record of the money input and output and the amount of benefit they are getting. Banks store data to keep track of money transfers which can be accessed by only particular persons. Keeping data in hard disks or on other means of computer storage makes it faster to upload, download. It is more secure and trustworthy. It can be transmitted to longer distances securely and only it can be more convenient. Any person can know Algorithmic Approach to Data Mining and Classification Techniques his bank account details just by using his computer or mobile from anywhere in the world. Almost all the organizations are continually storing data and it made the data in an extremely vast form. Internet is one of the medium which is used to access that data from anywhere in the world in a secure, cheaper and convenient form. The Mother Nature has enabled humans to collect, store, sense feel, see, hear and exchange the information in environment. Earlier they store that information in their minds in the form of facts. Later they started sharing the data by gestures and then in the evolution history, they used speech to share data. After that data was shared in the form of both gesture and speech form. At that time there was no way to store the spoken data in tangible form. So data was tried to be stored by making drawings on rocks, later leaves were used to store data in textual form. But still it was not very safe for a longer time. Slowly it was felt importance to store data so that it can be used for record keeping. So that future generations can also use it. Then data was stored in written form on animal skin, because it can last longer. But it was not a permanent solution. So there was discovery of paper. In earlier times, to write there was use of flower colors or other colors from nature. Slowly pencil and pen were discovered. Data was stored by scientists, organizations, etc. to keep records. But they were destructible and it was time consuming to manipulate them. So there was discovery of electronic devices to store and share data. Earlier data was stored in vocal form and there was discovery of musical instruments which can store data. Data was shared in the form of radio signals. Slowly telephones were discovered to share data. After that T.V. was discovered to share the stored data in pictorial form. Magnetic tapes and magnetic disks were used to store vocal and visual data. Computer made is very fast to store data and to work on it. It was very safe to store data. Slowly with the advancement in technology, data was stored in large volumes. Large amount of data can be stored or edited in textual, graphical, pictorial, audio or video form. Now we use pen-drives, memory cards and cloud-computing techniques to store and retrieve data. Many years later in around 18th – 19th century, punch cards were used. In 1725, there was invention of ‘punch cards’ and for information storage they were used in 1832. In 1890, the scientist named Herman Hollerith was the first to invent a punch card that could be read by a machine. In 20th century, there was invention of mag- 2 Vol 9 (28) | July 2016 | www.indjst.org netic tapes. They were first used in 1951. Magnetic tapes replaced punch cards. It had the capability to store more data of more than 10,000 punch cards. In future, we can expect ‘holographic layers’. It would allow data to be encoded on tiny holograms’ layers. And it would have capacity to store data for more than thirty years. Another storage technique could be ‘quantum storage’, it will be extremely small in size and it couldn’t be read by even the smallest microscope. 1.1 Taxonomies of Data Data may be a sequential data, time series, temporal, spatio-temporal, audio signal, video signal. These1 are explained in Table 1. Table 1. Taxonomies of data DTY DEF ITN Sda GEAS Mar, Dfl, Mtsd Tsd MTI Oti, Sc Tda OCTP AHB, WONLA Std MSTI Fsh, Tmg DTY = Types of data, DEF = Definition, ITN = Instances Sda = Sequential data, GEAS = Groups of elements are arranged or structure in a sequence. 1.2 Concept of Data Mining Data mining introduce the concept2 of mining or extracting relevant knowledge from the data that is present in huge amount. So it is also termed as Knowledge Discovery from Data (KDD). The main aim of data mining is to mine small set of valuable chunk from large amount of raw data. 1.3 Evolutionary Stages of Data Mining TAfter long time hard work done by the researchers number of data mining techniques come out. In earlier time, data was collected into the computers and this extends to data access. Today, data can be navigating data in real time. Evolution3 of data mining is explained in Table 2. The core components of data mining technology have been developing for decades in research areas such as statistics, artificial intelligence and machine learning. Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur Table 2. Evolution of data mining STG BCN ETG PPD CTR Cd(1960s) CTR Com , Tap, Dsk IBM, CDC SDD Ad(1980s) SAT Rdb,Sql,ODBC Ora, Syb, Ifo, Mcs,IBM, Ddr Nd(1990s) CS and CP Olap,Mdd, Wd Plt,IRI, Arb, Rdb, Evt Ddm Md(2000) ESRTF and IF Alg,Cmul,Dma Lkh,Nus,IBM, SGI Prp, Pid STG = BCN = Business Concern, ETG = Enabling technology, PPD = Product providers, CTR = Characteristics, Cd = Data Collection. 1.4 Types of Data Mining System Data mining systems can be labeled into4 numbers of categories. These are explained as below: • Classification of data mining systems according to the type of data source mined: In today’s scenario huge amount of data is available in the organization that is also of similar fashion so it is very difficult to extract relevant information. So there is need to group that data according to its type. • Classification of data mining systems according to the data model: Data should be classified according to the data models and these models are like Object Model, Object Oriented data Model, Relational data model, Hierarchical data Model/W data model. • Classification of data mining systems according to the kind of knowledge discovered: Kind of knowledge discovered means classification of data will be according to the data mining functionality. Functionality can be characterization, discrimination, association, classification, clustering, etc. • Classification of data mining systems according to mining techniques used: Data will be grouped according to the techniques such as genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, machine learning, neural networks. Vol 9 (28) | July 2016 | www.indjst.org 1.5 Data Mining and Knowledge Discovery process As data mining is to mine relevant knowledge from raw material. Knowledge discovery2 as a process is an iterative sequence of the following steps: • Data cleaning: It means to remove noise from data and inconsistent data. • Data integration: It means to combines multiple sources from where data can be collected. • Data selection: It refers to retrieve relevant data from the database. • Data transformation: It transforms the data into the suitable form that considered for data mining. It is done by using operations like aggregation or summary. • Data mining: In which number of intelligent methods are applied to extract frequent patterns. • Pattern evaluation: In this, the patterns that generated are analyzed that really represent the knowledge based on some measures. • Knowledge presentation: It uses to present the knowledge to user using some techniques like visualization and knowledge representation. 3.1 Data Flow Diagram (Data Mining) Level-0 In level-0 DFD, primary functionality of data mining is explained that firstly, user put their Query that is named as user query after that data mining processed Indian Journal of Science and Technology 3 Algorithmic Approach to Data Mining and Classification Techniques Relevant Information Data Mining User Query Figure 1. Data flow diagram of data mining level-0. this query and give relevant information as per the user requirement. This DFD explain the basic concept of data mining that how information flow and after long Process information is retrieved by a user. Level-1 In level-1 DFD, further explanation of data mining process is given. As user put there query for retrieval of information then data mining tools like Rapid Miner, Weka, Knime, Orange which helps to retrieve the data from the database. Information retrieval can be done with the help of various data mining techniques such as Classification, Clustering, Decision Tree, Neural Network these tech- nique are helpful to retrieve relevant data as per user requirements. Level-2 In level-2 DFD, Algorithm used in data mining is named which are helpful to retrieve data from the database these algorithms are used under the particular techniques. These are like Artificial neural network (Algorithm: Kohonen clustering Algorithm, learning algorithms), Classification (Algorithm: Genetic Algorithms, Bayesian networks Neural networks), Clustering (Algorithm: Relocation Algorithms, K-medoids Methods, K-means Methods, Association rule Algorithm: Multilevel association rule, Multi-dimensional association rule. Data mining Tools and techniques User Query Rapid Miner Classification Weka Clustering Database Knime Orange Relevant Information Decision Tree Neural Network Figure 2. Data flow diagram of data mining level-1. 4 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur Database Data mining techniques with Algorithm Artificial neural Classification network. Algorithm: Kohonen clustering Algorithm: Genetic Algorithms, Bayesian networks Neural networks Algorithm, learning algorithms User Query Relevant Information Clustering Association rule Algorithm:Relocation Algorithms,ƒKmedoids Methods, Kmeans Methods Algorithm:Multilevel association rule, Multi-dimensional association rule, Quantitative association rule Figure 3. Data flow diagram of data mining level-2. 2. Data Mining Tools There are number of tools for data mining. All of them Excel is one commercial tool for data mining. Modern data mining tools like Rapid miner, Weka, R these all are software based and also have graphical integrated environment. These features are effectively used for mining the relevant data from large amount of data. All the tools Vol 9 (28) | July 2016 | www.indjst.org of data mining are compatible with each environment. These all can install in Windows, Mac Os and Linux very easily. Here are parts of the table with the active tools that are used for data mining and these parts of table are arranged according to the release date of tool. All the tools are elaborated in Table 3 from year 1998 to 2004, Table 4 from 2006 to 2015 and in Table 5 from 2013 to 2015. Indian Journal of Science and Technology 5 Algorithmic Approach to Data Mining and Classification Techniques Table 3. From 1988 to 2004 SNO NAM RDT LAG TYP FTR PRS 1 Gno 1988 HLPL CLNP Ncp CLI 2 Wek 1993 Jav Mal 3 Rpr 1993 C,For, R Sgt Lnm,Cst,Tsa,,Cla,Clu FSP 4 Gdm 1999 Pyt Cot GUI and Adm FOS 5 NLTK 2001 Pyt Tpr,Cla,Toz,Stm, Tag, Par, Str Snp LNL 6 ONN 2003 C++ Nen Dam, Pra FRDA 7 Kni 2004 Jav, Ecl Com,Daa Fmw Dat, Ini, Dta,Ppa,Vis, Rep Ggu 8 Kel 2004 Jav Mls Kex OS Dap, Dap, Cal, Reg ,Clu , Asr and Vis JDTSQL, FOC SNO = Serial Number, NAM = Name, RDT = Release Date, LAG = Language, T YP = Type, FTR = Features, PRS = Pros, Gno = GNU Octave. Table 4. From 2006 to 2012 SNO NAM RDT LAG TYP FTR PRS 1 Ram 2006 Jav Mal,Da, Tem,Pra, Bua Atf Osl 2 Scl 2007 Pyt Mls Cla,Reg, Cal OSML, Sil, Eft 3 Rgu 2007 Rpr ETD, Rla DST, Val, Tst GUI 4 Ora 2009 Pyt Mal ForBioandTem Daa OSNE 5 CMSR 2010 Jav Prm, Dav, Rme, Sda Som Ost 6 Mlt 2011 Jav Mlr, Iex, Algo, Ecp OSS 7 Mlf 2012 Jav Mlp Acn, HTMLrt Cli 8 Mlpy 2012 Pyt SUpr Mdl, Mtb, Usb OS SNO = Serial Number, NAM = Name, RDT = Release Date, LAG = Language, T YP = Type, FTR = Features, PRS = Pros, Ram= Rapid Miner, Jav= Java. 6 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur Table 5. From 2013 to 2015 SNO NAM RDT LAG TYP FTR PRS 1 Grl 2013 C++ Dcf, Vda, Psr Hpf 2 Dlib 2014 C++ Lib, Mal Trd, NAlg,GUI OSS 3 UIMA 2014 Jav Lid, Lss, Sbt Auc WCns 4 ELKI 2014 Jav Aca Odt,PAlg Hpf, Sbl 5 Lis 2014 Jav Mal Cla,Pes Git 6 Lil 2015 C++ Mal Aps, Pes Sit 7 VWB 2015 Pyt Mal LAlg, OAlg OS 8 Shg 2015 C++ IMmd, Mal Reg, Cla OS, Sbl 9 Apm 2015 Jav Mal Cft,Clu, Cla Sbl Grl = Graph Lab, Dcf = Distributed Computation Framework, Vda = Visualization of Data, Psr = Predictive Services, Hpf = High performance. So, these are some tools which were used earlier and today for data mining and which are good and popular for extraction of useful information from huge amount of data. 3. Data Mining Techniques Data mining adopts its technique from many research areas, including statics machine learning, database systems, rough sets, visualization and neural networks. These are explaining in Table 6. Table 6. Data mining techniques NAM ALG SAP PRS CNS Det Ida ATEC FTOT Obe Clu Rea,Prc,Kmm, Kem Mar,Par,Daa, Img AC and FDDG LClu and Lac Cla Gea,Rbi,Nen, Gea, Ban Bla, Mar, Dft Frd, Cra ETC Asr Mar,Qar,Qar Cad,Crm, Csb Rgv Rlh Ann Kca , Lea Smp ,Tsp Is, Nnp Nnr,CHtd NAM = Name, ALG = Algorithm, SAP = Specific application, PRS = Pros, CNS = Cons, Det = Decision trees, Ida = ID3 algorithm. Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 7 Algorithmic Approach to Data Mining and Classification Techniques 3. Review of Literature In2 proposed how data mining and knowledge discovery are related to each other and to other fields such as machine learning and statistics. Method to discover knowledge discovery from a database through data mining is given. Data mining steps of KDD steps are given. An experiment was performed on a ‘Loan Dataset’ by using linear classification boundary. Also clustering technique applied on this dataset and three clusters were shown in a result. Specialized methods with particular algorithms that can be implemented on a dataset are also explained. In5 discussed fourteen data mining tools for different platforms. Tools solve binary classification problem, multi-class classification problem and noise less estimation problem. Multiple traits were collected in five categories that are usability, interoperability, flexibility, capability and accuracy. After selection of techniques, tools can be selected by an employ by a developer to develop a particular product. Also performance of tools according to related technique was categorized into a Table 7. Table 7. Data mining tool evaluation summary Tch Tre Rul Nrl Pln Tol Cpb Usb Ipb Fbl Acc Ovl Prc Crt 1 + - 1+ 1 1+ 995 Scn +- 1 1 - 0 + 695 See + +- + +- 1 + 440 Tra + + 1+ + 1+ 1+ Med = 845 Wzy + 1+ + +- - + 4000 Dmd 1+ 11 1 +- + 1+ 25000 Dms - 0 +- - 1 - 75 Rua + + 1+ - + + Med = 4000 Ns2 - + - - 11 +- 395 Plp +- - - +- + +- 495 Prw 1+ 1 11 + 11 1 10,000 Nua +- + + +- 1 +- Med = 495 Mqe 1 + + 1+ 1 1+ 5,950 Ns2 +- + + + 1 + 495 Gno +- + 0 +- 11 +- 4,900 Pna +- 1- - + 1 +- Med = 2.698 Ova + + + +- 1+ + Med = 845 1 = Good, 0 = Poor, 11 = Excellent, + = Average, - = Needs Improvement, None- Does not exist, NE = Not Evaluated, Med = Median, Ova = Overall Average. 8 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur In6 proposed a tool for fraud detection. Five data mining tools were compared for fraud detection application. Tool selection was done based on Computer system environment, intended end user and ease of use. Data mining tools selected and their contained algorithms are shown in Table 8. Table 8. Algorithms implemented In7 proposed real time signal planning using clustering technique. For to identify time of day automatically, cluster analysis approach was used. Set of sensors were used in traffic signal system by using high-resolution system state. CART tool was used to automatically generate TOD intervals and also used for planned development and maintenance. Numeric TOD Representations are shown in Table 10. Algo IBM ISL SAS TMC Unica Dct 1 1 1 1 - GSY TGS Nun 1 1 1 1 1 1 22:30-2:30 Reg 0 1 1 - 1 2 2:30-5:00 Rab 12 - - - 1 3 5:00-7:30 Nnb - - 1 1 1 4 7:30-10:00 MKM - 1 1 - 1 5 10:00-12:30 Clu 1 1 - - 1 6 12:30-15:00 Asr 1 1 - - - 7 15:00-17:30 8 17:30-20:00 9 20:00-22:30 1 = Yes, - = Not Available, 0 = Accessed in data analysis only, 2 = estimation only (not for classification). Algo = Algorithm,Dct = Decision Trees, Nun = Neural Network, Reg = Regression. Ease of use was evaluated in five categories and its comparison on the basis of four categories is shown in Table 9 that is given. Table 9. Ease of use comparison Ctg IBM ISL SAS TMC Unica Dlm 3.1 3.7 3.7 3.1 3.9 Mdb 3.1 4.6 3.9 3.2 4.8 Mdu 3.2 4.2 2.6 3.8 3.8 Tcs 3.0 4.0 2.8 3.2 4.7 Ous 3.1 4.1 3.1 3.4 4.2 Ctg = Category, Dlm = Data load and manipulation, Mdb = Model building, Mdu = Model understanding, Tcs = Technical support, Ous = Overall usability. Vol 9 (28) | July 2016 | www.indjst.org Table 10. Numeric TOD representations GSY = Graph symbol, TGS = Time of graph symbol. In8 proposed Algorithm Development and Mining (ADAM) toolkit. This toolkit can be used with scientific data. It is used for data mining methods like classification, clustering. This toolkit can be used for image processing data cleaning and feature reduction. Architecture and design of ADAM were included and its application is discussed. A case study for Cumulus cloud detection using satellite images was also taken and its results were calculated. In9 discussed through study on teenage drivers and senior drivers. For this a metadata was taken to analyze it thoroughly for that data mining techniques were used for these kind of problems. On the basis of drivers’ age, gender, perception of road signs, alcohol used, medical conditions, fragility roadway accidents were calculated. Based on above descriptive analysis like frequency, mean and standard deviation were calculated. Variables of descriptive statistics are shown in Table 11. Indian Journal of Science and Technology 9 Algorithmic Approach to Data Mining and Classification Techniques Table 11. Descriptive statistics of the variables VAR NSP MEAN SDV Agr 127 3.13 0.678 Dtr 127 1.03 0.250 Bpa 127 0.42 0.495 Drv 127 1.00 0.000 Aln 127 0.83 0.373 Gl 127 0.83 0.380 Dnt 127 0.88 0.324 Mle 125 2.68 0.980 Dwe 125 2.66 0.569 Crs 127 0.27 0.495 VAR = Variables, NSP = Number in sample, MEAN = Mean, SDV = Standard deviation, Agr = Age groups, Dtr = District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a week, Crs = Crashes. This is analysis between the crashes that are with respondents and which respondents are not involved in crashes. This is shown in Table 12. Table 12. Descriptive analysis between the respondents with crashes and those not involved in crashes N RNG MNM MNM MEAN STS STS STS STS SER STS STS Agr 96 2 2 4 3.07 0.068 0.669 0.447 Dtr 96 2 1 3 1.04 0.029 0.287 0.082 Bpa 94 1 0 1 0.44 0.051 0.499 0.249 Drv 96 0 1 1 1.00 0.000 0.000 0.000 Aln 96 1 0 1 0.81 0.040 0.392 0.154 Gl 96 1 0 1 0.83 0.038 0.375 0.140 Dnt 96 1 0 1 0.87 0.034 0.332 0.111 Mle 94 3 1 4 2.73 0.102 0.986 0.972 Dwe 94 2 1 3 2.63 0.057 0.548 0.301 CRH STS NON 10 Vol 9 (28) | July 2016 | www.indjst.org STD VAR Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur ACC Agr 31 2 2 4 3.29 0.124 0.693 0.480 Dtr 31 0 1 1 1.00 0.000 0.000 0.000 Bpa 31 1 0 1 0.39 0.089 0.495 0.245 Drv 30 0 1 1 1.00 0.3000 0.000 0.000 Aln 30 1 0 1 0.90 0.056 0.305 0.093 Gl 31 1 0 1 0.81 0.072 0.402 0.161 Dnt 31 1 0 1 0.90 0.054 0.301 0.090 Mle 31 3 1 4 2.52 0.173 0.962 0.925 Dwe 31 2 1 3 2.68 0.108 0.599 0.359 CRH = Crashes, STS = Statistic, RNG = Range, MNM = Minimun, MEAN = Mean, SER = Standard error, STD = Standard, VAR = Variance, Non = None, , Agr = Age groups, Dtr = District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a week, Acc = Accident. Comparison of different crashes with each value of record is shown in Table 13. Table 13. Comparing predicted crashes with actual crash values for each record RPR NRD ACC Crt 49 76.56 Wrg 15 23.44 Ttl 64 100.00 RPR = Result of prediction, NRD = Number of records, ACC = Accuracy, Crt = Correct, Wrg = Wrong, Ttl = Total. In this matrix predicted crash values are shown in Table 14. Vol 9 (28) | July 2016 | www.indjst.org Table 14. Coincidence matrix for predicted crash values 0(NCR) 1(CRH) NAR 0(NCR) 44 4 48 1(CRH) 11 5 16 NAR 55 9 64 NCR = No crash, CRH = Crash, NAR = Number of actual records. In10 introduced ten algorithms of data mining. Algorithms were described and also impact of each algorithm was discussed. Introduced algorithms include classification, clustering, association analysis, statistical learning. Indian Journal of Science and Technology 11 Algorithmic Approach to Data Mining and Classification Techniques AdaBoost Algorithm Int DSET,L,TL = Learning Algorithm For (DSET = 1, DSET≤ DT,DSET++)T = Epoch (Number of learning round) { Di = 1/m\\ Initialization weight distribution } For (Di(w) = 1, Di(w) = Di, Di(w)++) { Train dataset from Di(w); Calculate ER; } If Di(w) = Di /Zt × { Zt = Normal Distribution { Calculate h(x); } Else { Exit; } Page-Rank Algorithm Int i,hln,hloWhere I = Page If i>1hln= Hyperlink { For (Pi = 1,Pi≤Pl,Pi++) { Calculate hln;Where hln€Pi } } If i<1 { For (Pi = 1,Pi≤Pl,Pi++) { Calculate hlo; Where hlo€Pi } } K-Means algorithm Int Vs, Ci, Cc (i=1 to n) 12 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur For (Vi = 1, Vi ≤ Vic, Vi++)Cc= Cluster Centroid { Ci = Centroid vector Select CiDj = Document vector { Vi =Vector For (Ci = 1, Ci≤ Cic, C++) { If Ci Cc { Select Cc } Else { Exit (1) } } } } Calculate dj Dj Ci (i =1……….to r) In11 discussed how Geographical data mining analyst is used for remote sensing image analysis. For spatial data mining, GeoDMA used decision tree strategies which connect images with geographical data types using data warehouse. To improve classification accuracy new approach was proposed that was based on polar coordinates transformation. Various tables were described as an output of segmentation based spatial features and landscape based features. Vol 9 (28) | July 2016 | www.indjst.org In12 defined three key issues for a managerial personnel for investigating data mining tools, these issues are task fit, technology use and habit. To investigate a tool Task Technology Fit (TTF) model, ExpectationConfirmation Modes (ECM) were used for continuation use of Data Mining Tools (DMT). As user satisfaction and received usefulness were the main key predictors. Multiple hypotheses were taken for decision making. Convergent validity is shown in Table 15. Indian Journal of Science and Technology 13 Algorithmic Approach to Data Mining and Classification Techniques Table 15. Convergent validity CST Cnf Cit Hbt Ust Ttf ITM FLD TVL CON1 0.918 38.10 CON2 0.916 40.93 CON3 0.884 30.79 CI1 0.803 15.40 CI2 0.870 33.59 CI3 0.769 16.93 HAB1 0.948 79.06 HAB2 0.940 74.66 HAB3 0.806 20.19 SAT1 0.891 28.71 SAT2 0.927 51.10 SAT3 0.934 66.79 TTF2 0.777 12.88 TTF3 0.847 19.07 TTF 0.832 17.49 MEAN STD C.R AVE CRB 5.48 1.02 0.93 0.82 0.89 5.56 1.09 0.86 0.68 0.76 4.60 1.55 0.93 0.81 0.88 5.07 1.20 0.96 0.84 0.94 5.27 1.21 0.91 0.67 0.87 CST = Construct, ITM = Items, FLD = Factor loading, TVL = T-Value, MEAN = Mean, STD = Standard deviation, CRB = Cronbach’s, Cnf = Confirmation, Cit = Continuance intention, Hbt = Habit, Ust = User satisfaction, Ttf = Task technology fit. 14 Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur The results of different models with comparison are shown in Table 16. Table 16. Results of the comparative models MDL R2 Pst 0.528 TTF 0.413 ECM 0.404 Hbt 0.360 TTF+ECM 0.445 TTF+ Hbt 0.474 ECM+ Hbt 0.458 In13 discussed students learning behavior. In today’s online learning environment, it is very difficult to uncover and analyze hidden information manually from large amount of data. So for that latest data mining tools and approaches are used in educational research. This paper purposes to use Google Analytics tool with data mining technique into blog environment to fetch log data for analysis. In1 proposed an algorithm for frequent pattern generation tree of relevant information from a data stream. Stream data was defined as Facebook, Twitter, Internet, Relay chats, ATM transaction, weather forecasting, stock market prediction and also can be related to medicine. Different parameters were taken for handling data streams such as Data access, data speed, available memory, data modeling, sampling, etc. Also a case study was taken to generate frequent pattern from a huge amount of data. MDL= Models, Pst = Present, Hbt = Habit Decision Tree: Frequent Pattern Generation Tree. Int RN ,LT ,SW, n,m; Where RN = Root Node For(n = 1,n≤m,n++)LT = List of Transactions { SW = Sliding Window Create RN Where RN € m } N = Number of Nodes m€ LTM = Number of Items For (LT = 1, LT≤ LTD,LT++) { Scan DT ; } If DT ==DDT = Data Set { Compare mset ; MSet = Item Set } Else Exit; If SValue < UTN SValue = Support Value { UTN = User Threshold Remove nS = ni-1 } Else Retain ni ; Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 15 Algorithmic Approach to Data Mining and Classification Techniques For (nS = 1, nS ≤ ni+1,nS++) { Consider nP Remove npi-1 = ni Else { Retain npi-1; } In14 Discus that in higher education institution calculate the success rate of the student to decided whether they should continue the particular course or not. This can be np = Parent Node (P€ E) E = Current Node done by using data mining tools. Also it focuses on the small student data set that was difficult to do data by data mining according to other authors. In this, Comparison of MS Excel sheets and data mining tool Weka was done. Table 17. Key influencer report for final grade CLM VLU FVR RIM Syr 2012–13 Ept 100 Fpt 92 10 100 Apt 39–42 7 100 Fpt 77 8 100 Fpt 85 9 100 Ept 9 2 100 Fpt 53 2 100 Ept 13 2 100 Fpt 51 2 100 Ept 8 4 100 Fpt 41 4 100 Ept 21 4 100 Fpt 66 6 100 CLM = Column, VLU = Value, FVR = Favors, RIM = Relative impact, Syr = Study year, Fpt = Final points, Apt = Activities points, Ept = Exam points, Ept = Empty. In15 compared the implemented algorithms that can cover the data mining techniques like classification, clustering, visualization and feature selection. It also describes the advantages and disadvantages of particular tool. It 16 Vol 9 (28) | July 2016 | www.indjst.org also tries to investigate which tool is best. In this different data mining procedures and algorithm which are supported by the tools are given in Table 18. Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur Table 18. Data mining algorithms and procedures supported by the tools CTG DIM FSl FTF CRL BNT ESM LRN NAM RPR R WEK ORG KNM Tfl 1 1 1 1 1 Sin 1 00 1 1 1 Ssh 1 00 0 0 00 Flt 1 00 1 1 1 Wrp 1 1 1 0 1 PCA 1 1 1 1 1 ICA 1 1 0 0 0 MDS 0 1 1 00 1 1 Rul 1 00 1 0 00 Part 00 00 1 0 1 Rpr 1 00 1 0 00 Nbv 1 1 1 1 1 Fbn 00 0 1 0 00 AOde 00 00 1 0 00 Bgg 1 00 1 1 00 Abt 1 00 1 1 00 Rft 1 00 1 1 1 Rrt 0 1 0 0 0 CTG = Category, NAM = Name, RPR = Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime, DIM = Data import, Tfl = Textual files, Sin = Specific input format, Ssh = Spread sheet, FSl = Feature selection, Flt = Filters, Wrp = Wrappers. Vol 9 (28) | July 2016 | www.indjst.org Indian Journal of Science and Technology 17 Algorithmic Approach to Data Mining and Classification Techniques Advanced and specialized data mining tasks are shown in Table 19. Table 19. Support for specialized and advanced data mining tasks NAM RPR R WEK ORG KNM SLN Bdt T B T - B T Gmg - B B - B - Sda - B - - B T Tsa T Y,B T - Y T Dst Y B T B B Y Tmg B B T B B Y Dlg - T - - - T NAM = Name, RPR = Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime, SLN = Scikit learn, Bdt = Big data, Gmg = Graph mining, Sda = Spatial data analysis, Tsa = Time-series analysis, Dst = Data streams, Tmg = Text mining, Dlg = Deep learning. In16 compared various tools on the basis of file formats, operating system supported and general features. Tools such as Orange, Weka, dlib, Rapid Miner were dis- cussed. Also different features of clustering algorithm are explained in Table 20. Table 20. Different features of various clustering algorithms CTG Hch Ptn Grd 18 Vol 9 (28) | July 2016 | www.indjst.org ALGO TDT HHD HND Brh 0 0 0 Cre 0 1 1 Rck C 0 0 FCM 0 0 0 Kmn 0 0 0 PAM 0 0 0 Ogd SL 1 1 Clq 0 1 0 Stg S 0 1 Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur Irl Dst EM SL 1 0 Cwb 0 0 0 Cst 0 0 0 Dcn 0 0 0 Dcl 0 0 1 Ots 0 0 1 Sl-Special, 1-Yes, 0- No, L-Large, S-Small CTG = Categories, ALGO = Algorithm, TDT = Type of data, HHD = Handling high dimensionnality, HND = Handling noisy data, Hch = Hierarchical, Brh = Birch, Cre = Cure. 5. Conclusion TIn this paper a vast survey of data mining techniques and related tools have been presented. The History of data mining provides the scenario how the evolution steps get integrated with new state of the art techniques. All strata of the work for data mining and algorithms like ADABOOST, DECESION TREE and PAGE REANKING leads to the cue that automation of the work can be achieved by supervised learning approach in amalgamation with neural network. 6. References 1. Phridvi Raj MSB, Guru Rao CV. Data mining–past, present and future – A typical survey on data streams. Procedia Technology, 2014; 12:255–63. 2. Usama F, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. American Association for Artificial Intelligence.1996; 17(3):37–54. 3. Chris R, Wang JC, Yen DC. Data mining techniques for customer relationship management. Technology in Society. 2002; 24(4):483–502. 4. Padhy N, Mishra P, Panigrahi P. The survey of data mining applications and feature scope. International Journal of Computer Science, Engineering and Information Technology. 2012; 2(3):43–58. 5. King M, Elder J, Gomolka B, Schmidt E, Summers M, Toop K. Evaluation of fourteen desktop data mining tools. IEEE International Conference on Systems, Man and Cybernetics. San Diego, CA. 1998; 3:2927–32. 6. Abbott DW, Matkovsky IP, Elder JF. An evaluation of high-end data mining tools for fraud detection. IEEE Vol 9 (28) | July 2016 | www.indjst.org International Conference on Systems, Man and Cybernetics. 1998; 3:2836–41. 7. Hauser AT, Scherer TW. Data mining tools for realtime traffic signal decision support and maintenance. IEEE International Conference on Systems, Man and Cybernetics. 2001; 3:1471–7. 8. Rushing J, Ramachandran R, Nair U, Graves S, Welch R, Lin H. ADaM: A Data Mining toolkit for scientists and engineers. Computers and Geosciences. 2005; 31(5):607–18. 9. Bayam E, Liebowitz J, Agresti W. Older drivers and accidents: A meta analysis and data mining application on traffic accident data. Expert Systems with Applications, 2005; 29(3):598–629. 10. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, et al. Top 10 algorithms in data mining. Knowledge and Information Systems. 2008; 14(1):1–37. 11. Korting TS, Fonseca LM, Camara G. GeoDMA - Geographic Data Mining Analyst. Computers and Geosciences. 2013; 57:133–45. 12.Huang TCK, Wu IL, Chou CC. Investigating use continuance of data mining tools. International Journal of Information Management. 2013; 33(5):791–801. 13. Mohamad SK, Tasir Z. Educational data mining: A review. Procedia-Social and Behavioral Sciences. 2013; 97:320–4. 14.Natek S, Zwilling M. Student data mining solution– knowledge management system related to higher education institutions. Expert Systems with Applications. 2014; 41(14):6400–7. 15. Jovic A, Brkic K, Bogunovic N. An overview of free software tools for general data mining. 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO); 2014. p. 1112–7. Indian Journal of Science and Technology 19 Algorithmic Approach to Data Mining and Classification Techniques 16. Gera M, Goel S. Data mining-techniques, methods and algorithms: A review on tools and their validity. International Journal of Computer Applications. 2015; 113(18):22–9. 17. Sajana T, Sheela Rani CM, Narayana KV. A survey on clustering techniques for big data mining. Indian Journal of Science and Technology. 2016 Jan; 9(3). 18.Hariharan R, Mahesh C, Prasenna P, Vinoth Kumar R. Enhancing privacy preservation in data mining using cluster based greedy method in hierarchical approach. Indian Journal of Science and Technology. 2016 Jan; 9(3). 19. Murugananthan V, Shiva Kumar BL. An adaptive educational data mining technique for mining educational data models in E-learning systems. Indian Journal of Science and Technology. 2016 Jan; 9(3). 20. Sivakumar S, Venkataraman S, Selvaraj R. Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology. 2016 Jan; 9(4). 21. Undavia JN, Dolia P, Patel A. Customized prediction model to predict post-graduation course for graduating students using decision tree classifier. Indian Journal of Science and Technology. 2016 Mar; 9(12). 22. Alzahrani AS, Qureshi MS. Privacy preserving optimized rules mining from decision tables and decision trees. Indian Journal of Science and Technology. 2012 Jun; 5(6). 23.Verma A, Kaur I, Singh I. Comparative analysis of data mining tools and techniques for information retrieval. Indian Journal of Science and Technology. 2016; 9(11). Abbreviations DTY = Types of data, DEF = Definition, ITN = Instances Sda = Sequential data, GEAS = Groups of elements are arranged or structure in a sequence, Mar = Memory array, Dfl = Disk file, Mtsd = Magnetic tape data storage, Tsd = Time Series Data, MTI = Consisting of successive measurements made over a time interval, Oti = Ocean tides, Sc = Counts of sunspots, Tda = Temporal Data, OCTP = Indicates the progress of object characteristic over time period, AHB = Ageing of Human beings, WONLA = Wearing out of any non living article, Std = Spatio-temporal data, MSTI = Manages both space and time information like Tracking of moving objects, Fsh = Flocking sheep, Tmg = Traffic management. STG =, BCN = Business concern, ETG = Enabling technology, PPD = Product providers, CTR = Characteristics, Cd = Data collection, CTR = Calculation of total revenue or average revenue over a period of time, Com = Computers, Tap = Tapes, Dsk = Disks, SDD = Static data delivery, Ad = Access 20 Vol 9 (28) | July 2016 | www.indjst.org 24. Purusothaman G, Krishnakumari P. A survey of data mining techniques on risk prediction: Heart disease. Indian Journal of Science and Technology. 2015; 8(12). 25. Lohita K, Sree AA, Poojitha D, Devi TR, Umamakeswari A. Performance analysis of various data mining techniques in the prediction of heart disease. Indian Journal of Science and Technology. 2015; 8(35). 26. Murugananthan V, Kumar BLS. An adaptive educational data mining technique for mining educational data models in E-learning systems. Indian Journal of Science and Technology. 2016; 9(3). 27.Rajalakshmi V, Mala GSA. Anonymization by data relocation using sub-clustering for privacy preserving data mining. Indian Journal of Science and Technology. 2014; 7(7). 28. Chakradeo SN, Abraham RM, Rani BA, Manjula R. Data mining: Building social network. Indian Journal of Science and Technology. 2015; 8(2). 29. Kholghi M, Hassanzadeh H, Keyvanpour MR. Classification and evaluation of data mining techniques for data stream requirements. International Symposium on Computer Communication Control and Automation (3CA); Tainan. 2010. p. 474–8. 30. Purusothaman G, Krishnakumari P. A survey of data mining techniques on risk prediction: Heart disease. Indian Journal of Science and Technology. 2015; 8(12). data, SAT = Sales in particular area during any specified time period, Rdb = Relational databases (RDBMS), Sql = Structured Query Language (SQL), Ora = Oracle, Syb = Sybase, Ifo = Informix, Mcs = Microsoft, Ddr = Dynamic data delivery at record level, Nd = Data Navigation, CS and CP = Calculate regional sales for a specified period and comparisons with its peers, Olap = On-line analytic processing (OLAP), Mdd = Multidimensional databases, Wd = Data warehouses, Plt = Pilot, Rdb = Redbrick, Arb = Arbor, Evt = Evolutionary Technologies, Ddm = Dynamic data delivery at multiple levels, Md = Data Mining, ESRTF and IF = Estimation of next sale on the basis of real-time feedback and information exchange, Alg = Advanced algorithms, Cmul = Multiprocessor computers, Dma = Massive databases, Lkh = Lockheed, Nus = Numerous startups (nascent industry), Prp = Prospective, Pid = Proactive information delivery. SNO = Serial Number, NAM = Name, RDT = Release Date, LAG = Language, TYP = Type , FTR = Features, PRS = Pros, Gno = GNU Octave, HLPL = High Indian Journal of Science and Technology Amit Verma, Iqbaldeep Kaur and Amandeep Kaur level programming language, CLNP = CLI for linear and non-linear numerical problems, Ncp = Made for numerical computation, CLI = Command line interface, Wek = Weka, Dap = Data preprocessing, Cla = classification, Reg = Regression, Clu = clustering, As r= association rules , Vis = visualization, JDTSQL = JDBC can be connected through SQL, FOC = Free of cost, Gdm = Gnome datamine tools, Pyt = Python, Cot = Collection of tools, Adm = data mining applications, FOS = Free open source software, NLTK = NLTK (Natural language toolkit), Tpr = Text processing libraries for Cla = Classification, Toz = Tokenization, Stm = Stemming, Tag = Tagging, Par = Parsing, Str semantic reasoning, Snp = Symbolic and Statistical natural processing, LNL = An amazing library to play with natural language, ONN = OpenNN, Nen = Neural network, Dam = Data mining, Pra = Predictive analytics, FRDA = Provides framework for research and development of algorithms, Kni = Knime (Konstanz Information Miner, Jav = Java based on Ecl = Eclipse, Com = Comprehensive, Daa = Data analytics, Fmw = Framework Dat = Data transformation, Ini = Initial Investigation, Dta = data access, Ppa = powerful predictive analytics, Rep = Reporting, Ggu = Provide graphical user interface, Kel = Keel (Knowledge Extraction based on Evolutionary Learning), Mls = Machine learning software tools, Kex = knowledge extraction, OS = open source. Ram = Rapid Miner, Jav = Java, Mal = Machine learning, Dam = data mining, Tem = text mining, Pra = predictive analytics, Bua = business analytics, Atf = Offers advanced analytics through template-based frameworks, Osl = Offered as a service rather than as local software, Scl = Scikit-learn, Pyt = Python, Mls = Machine learning, Cla = Classification, Reg = Regression, Cal = Clustering algorithms, OSML = Open Source Machine Learning Library, Sil = Simple, Eft = Efficient tool, Rgu = Rattle GUI, Rpr = R programming, ETD = Edit and Test Data, Rla = R language, DST = Dataset can be partitioned for training, Val = Validation, Tst = Testing, GUI = Graphical User Interface, Ora = Orange, Pyt = Python, Bio = bioinformatics, Daa = Data analytics, OSNE = Open Source Tool for Novice and Experts, CMSR = CMSR (Cramer Modeling Segmentation and Rules) data miner, Prm = Predictive modeling, Dav = Data visualization, Rme = Rule based model evaluation, Sda = Statistical data analysis, Som = Self Organizing Maps, Ost = Open source tool, Mlt = Mallet, Mlr = Machine learning, Iex = Information Extraction, Algo = Wide variety of algorithms. Grl = Graph Lab, Dcf = Distributed computation framework, Vda = Visualization of data, Psr = Predictive Vol 9 (28) | July 2016 | www.indjst.org services, Hpf = High performance, Dlib = Dlib, Lib = Library, Mal = Machine learning Trd = Threading, NAlg = Numerical Algorithms, GUI = Graphical User Interfaces, OSS = Open Source Software, UIMA (Unstructured Information Management Architecture), Jav = JAVA, Lid = Language Identification, Lss = Language specific segmentation, Sbt = Sentence boundary detection, Auc = Analyse unstructured content such as audio, video and text, WCns = Wrap components as network services, Aca = Advanced cluster analysis, Odt = Outlier detection, PAlg = Parameterizable Algorithms, Hpf = High performance, Sbl = Scalability, Lis = Libsvn, Cla = Classification, Pes = Probability estimates, Git = Graphic interface, Lil = Liblinear, Aps = Automatic parameter selection, Pes = Probability estimates, Sit = Simple interface, VWB = Vowpal wabbit, Pyt = Python, LAlg = Multiple learning algorithms, OAlg = Multiple optimization algorithms, OS = Open Source, Shg = Shogun, IMmd = Implementation of Hidden Markov Models, Reg = Regression, Cla = Classification, Apm = Apache Mahout, Cft = Collaborative filtering, Clu = Clustering Cla = Classification. Ecp = Evaluating classifier performance, OSS = Open Source Software, Mlf = ML-Flex, Mlp = Machine learning packages, Acn = Computing nodes are analysed in parallel, HTMLrt = HTML reports, Cli = Command-Line Interface, Mlpy = Mlpy, Pyt = Python, SUpr = Supervised and Unsupervised problems, Mdl = Modularity, Mtb = Maintainability, Usb = Usability, OS = Open Source. = Good, 0 = Poor, 11 = Excellent, + = Average, - = Needs Improvement, NoneDoes not exist, NE = Not Evaluated, Med = Median, Ova = Overall Average, Tch = Technology, Tol = Tools, Cpb = Capability, Usb = Usability, Ipb = Interoperability, Fbl = Flexibility, Acc = Accuracy, Ovl = Overall, Prc = Price in form of dollar, Tre = Tree, Crt = Cart, Scn = Scenario, See = See5, Tra = Tree Average, Rul = Rule, Rua = Rule Average Wzy WizWhy, Dmd = Datamind, Dms = DMSK, Nrl = Neural, Ns2 = NeuroShell 2, Plp = PcOLPARS, Prw = PRW, Nua = Neural average, Pln = Poly Net, Mqe = MQ Expert, Gno = Gnosis, Pna = Poly Net Average Ctg = Category, Dlm = Data Load and Manipulation, Mdb = Model Building, Mdu = Model Understanding, Tcs = Technical Support, Ous = Overall Usability, GSY = Graph Symbol, TGS = Time of Graph Symbol. VAR = Variables, NSP = Number in Sample, MEAN = Mean, SDV = Standard deviation, Agr = Age groups, Dtr = District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a Indian Journal of Science and Technology 21 Algorithmic Approach to Data Mining and Classification Techniques week, Crs = Crashes, CRH = Crashes, STS = Statistic, RNG = Range, MNM = Minimun, MEAN = Mean, SER = Standard error, STD = Standard, VAR = Variance, Non = None, Agr = Age groups, Dtr = District, Bpa = Back pain, Drv = Drive, Aln = Alone, Gls = Glasses, Dnt = Drive at night, Mle = Miles, Dwe = Days a week, Acc = Accident, RPR = Result of prediction, NRD = Number of records, ACC = Accuracy, Crt = Correct, Wrg = Wrong, Ttl = Total, NCR = No crash, CRH = Crash, NAR = Number of actual records, CST = Construct, ITM = Items, FLD = Factor loading, TVL = T-value, MEAN = Mean, STD = Standard deviation, CRB = Cronbach’s, Cnf = Confirmation, Cit = Continuance intention, Hbt = Habit, Ust = User satisfaction, Ttf = Task technology fit., MDL = Models, Pst = Present, Hbt = Habit, CLM = Column, VLU = Value, FVR = Favors, RIM = Relative impact, Syr = Study year, Fpt = Final points, Apt = Activities points, Ept = Exam points, Ept = Empty, CTG = Category, NAM = Name, RPR = Rapid miner, WEK = Weka, ORG = Orange, KNM = Knime, DIM = Data import, Tfl = Textual files, Sin = Specific input 22 Vol 9 (28) | July 2016 | www.indjst.org format, Ssh = Spread sheet, FSl = Feature selection, Flt = Filters, Wrp = Wrappers, FTF = Feature tranformation, CRL = Classification rules, Rul = Rule, Part = Part, Rpr = Ripper, BNT = Bayesian networks, Nbv = Naïve Bayes, Fbn = Full Bayesian Netwok, AOde = AODE, ESM = Ensemble, Bgg = Bagging, LRN = Learning, Abt = AdaBoost, Rft = Random Forest, Rrt = Rotation Forest, NAM = Name, RPR = Rapid Miner, WEK = Weka, ORG = Orange, KNM = Knime, SLN = Scikit learn, Bdt = Big data, Gmg = Graph mining, Sda = Spatial data analysis, Tsa = Time-series analysis, Dst = Data streams, Tmg = Text mining, Dlg = Deep learning, Sl-Special, 1-Yes, 0-No, L-Large,S-Small, CTG = Categories, ALGO = Algorithm, TDT = Type of data, HHD = Handling high dimensionnality, HND = Handling noisy data, Hch = Hierarchical, Brh = Birch, Cre = Cure, Rck = Rock, Ptn = Partitioning, Kmn = K-Mean, Grd = Grid, Ogd = Optigrid, Clq = Clique, Stg = Sting, Irl = Iterative relocation, Cwb = Cobweb, Cst = Classit, Dst = Density, Dcn = DBSCAN, Dcl = DCCLASD. Ots = Optics. Indian Journal of Science and Technology