Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Science and Engineering Research (IJ0SER), Vol 3 Issue 11 November-2015 3221 5687, (P) 3221 568X Improved And Ensemble Methods For Time Series Classification With Cote 1 Yamunadevi S1, Dr.S.Uma2 ,Bhuvaneshwari P3 Student, 2Head of the deportment, 3 Assistant Professor, Department of Computer Science and Engineering Hindusthan Institute of Technology, Coimbatore, Tamilnadu India Abstract -The main objective of this research is to analyze the state of the art techniques for time series classification dataset and discover an efficient method that overcomes the major issues. And also this scenario is used to improve the classification accuracy and reducing the time complexity using ensemble algorithm. Methods:The analysis has been done by different methods to provide the efficient time series classification results. The diverseefficientmethods are considered to find the appropriateand effectivemethod. Findings:The various research works has been analyzed and evaluated. From the analysis, the voting ensemble methodis found to be better for time series classification also superior performance is achieved in terms of precision, recall, f-measure and accuracy. Application/Improvements:The findings of this work PROVE that the voting method provides better result than other approaches. Keyword:Data mining, time series classification, ensemble and voting method 1. INTRODUCTION Data mining is used to extract the important data from the large repositories. Data mining can be executed on data signified in quantitative, textual or multimedia forms. Data mining application could use several parameters to inspect the data. They contain the concepts such as association, sequence analysis, classification, clustering and forecasting. Association is defined as patterns where one event is connected to other event, like purchasing pen and bag. Path analysis or sequence is described as patterns where one event guides to other event, like the birth of a child and buying diapers. Classification is defined as the identification of new patterns such as coincidence among duct type purchases and plastic sheeting purchases. Clustering is described as determining and visually documenting groups of previously unknown facts, like geographic location and brand preferences. Forecasting is nothing but determining patterns from which one can make reasonable predictions regarding future activities. Data mining tools are used to forecast the future trends and behaviors. In general, data mining is used in extensive series of profiling practices like marketing, surveillance, fraud detection and scientific discovery. A prominent concept is that constructing an extracting model is part of superior process which contains everything from defining the basic crisis that the representation will resolve, to arranging the model into a functioning environment.Data mining methods are the result of long research process and product development. This development start while business data was first saved on computers, continued with improvements in data access, and more recently, generated expertise’s that permit users for navigating via their data in real time. Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015 Classification is defined as the process of analysis and categorized into different classes. It is used to predict class labels based on the specified model. The time series classification is defined as constructing the classification model depends on labeled time series and utilizes this model to classify the label of unlabeled time series. The feature extraction methods applied and build features from the time series data initially, then use the classification methods such as support vector machine, k nearest neighbor, regression and decision tree to the feature set.A novel time series primitive called as time series shapeletswhich is used to predict most informative data [1]. Time series similarity is an important and great attention in data mining applications. Several time series representation and distance measures are produced. The preceding research was focused on the shape based similarity matching for low dimensional dataset [2]. For high dimensional dataset, the ensemble approaches and shapelets approaches are introduced. The several data mining concepts and techniques are utilized to evaluate the time series classification data. 2. ANALYSIS OF RESEARCH METHODOLOGY In [3], AbdhullahMueen et.al (2011) discussed about logicalshapelets for time series classification. This scenario is notices the issue through introducing a new algorithm that discovers shapelets in a smaller amount of time than present methods via an order of magnitude. The proposed algorithm is depends on intelligent caching and reclaim of calculations, and the allowable pruning of the seek space. Since the current algorithm is so speedy, it makes an opening to consume extra communicative shapelet queries. In specific, the scenario has shown for the first time an amplified shapelet illustration that differentiates the records depends on conjunctions or disjunctions of shapelets. In International Journal of Science and Engineering Research (IJ0SER), Vol 3 Issue 11 November-2015 3221 5687, (P) 3221 568X this research, use a novel representation logicalshapelets. Itdisplay the effectiveness of this approach on the typicalstandard datasets utilized for these problems, and demonstratenumerous case studies where logical shapelets extensively achieve the innovative shapelet representation and other time series categorizationmethods. We illustrate the utility of thoughts in domains as various as motionidentification, robotics, and biometrics. Consequently this technique has been identified as the improvedmethod as the approach presents deep study of superior accuracy classification for the given datasets. The effectiveness of this scenario is better than the conventional approaches. However it does not improve the worst case complexity. In [4], Lexiang Ye et.al (2011) suggested a novel method for time series shapelets. Classification of time series has been attracting great interest over the past decade. The simple nearest neighbor approach is hard to handle the several time series problems, particularly for high dimensional datasets. Initially, the nearest neighbor approachneeds storing and seeking the complete dataset, resulting in a long time and space difficultywhich restricts its applicability, particularly on resource restricted sensors. Subsequently, beyond simplecategorization accuracy, we frequentlydesire to achieve some impending into the information and to construct the categorization result more understandable, which global uniqueness of the nearest neighbor cannot make available. In this research, we introduced novel time series primitive, time series shapelets, which focused these restrictions. Casually, in some sense shapelets are time series subsequences which are maximally commissioner of a group. This scenario can utilize the distance to the shapelet, than the distance to the nearest neighbor to categorize objects. It has shown along with widespread empirical estimations in various domains, categorization algorithms depends on the time series shapelet primitives can be interpretable, more precise, and considerablyquicker. In [5], Jason Lines et.al (2012) presented a shapelet transform for time series classification. The problem of time series classification (TSC), the research consider any real-valued structuredinformation a time series, provides a particular machine learning task as the organizing of variables is frequentlyessential in discovering the optimalselective features. One of the most emerging current techniques is to discover shape lets within a data set. A shape let is a time series subsequence which is recognized as being characterized of class association. The research scenario in this area embedded the process of determining shape lets within a decision tree. We recommend disconnecting the procedure of decision shape lets from the categorization algorithm through proposing a shape let transformation. Itportrayhowmined the k best shape lets from a data set in a single pass, and utilize these shape lets to change information by means of computing the distances from a series to every shape let. This research demonstrateswhich transformation into this novel data space couldprogresscategorizationaccurateness, whilemaintaining the descriptive power presented through shape lets. Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015 In [6], Mustafa GokceBaydogan (2013) discussed a bag of features structure to categorize time series.Time series categorization is an essentialprocess along with several challenging applications. A nearest neighbor (NN) classifier along with dynamic time warping (DTW) distance is a powerful solution in this research area. On the other hand, a feature-based technique is enhanced as both classifiers and to presentimminent into the series, however these methods has issue indealing translations and dilations in local patterns. To overcome the above mentioned issues,the scenario proposed a structure to categorize time series based on a bag-of-features representation (TSBF). Multiple subsequences chosen from random settings and of random lengths are separated into smaller intervals to hold the local information. Therefore, features calculated from these subsequences determine properties at diverse locations and dilations when inspected from the novel series. It presents a feature-based method which can deal warping though in a different way from DTW. Furthermore, a supervised learner whichdeals mixed data types, diverse units and incorporates location data into a compact codebook via class probability computes. Moreover, applicable global features couldsimply supplement the codebook. TSBF is evaluated to NN classifiers and other alternatives such as bag-of words strategies, sparse spatial sample kernels alsoshapelets. In [7], Houtao Deng (2013)presented a time series forest for classification and feature extraction. A tree-ensemble technique, named as time series forest (TSF), is introduced for time series categorization. TSF uses a mixture of entropy gain and a distance measure, called as the entry (entropy and distance) gain, for estimating the splits. An important feature is extracted which captures the temporal characteristics for high feature space. In this research work, the method is used to illustrate that the entrance gain increases the accuracy of TSF. TSF arbitrarily samples features ineverytree node and has execution complexity linear in the length of time series, and couldconstructedby using parallel computing methods. The temporal significance curve is enhanced to obtain the temporal distinctiveness useful for categorization. The scenario finallydisplay that TSF using simple features such as mean, standard deviation and slope is computationally resourceful and performs powerful competitors such as one-nearest-neighbor classifiers along with dynamic time warping. In [8], Jon Hills et.al (2014) discussed classification of time series by shapelet transformation. Time-series classification (TSC) problems provide a particular challenge for classification algorithms that how to calculatelikenessamong series. A shapeletis a time-series subsequence which permits for TSC depends on local, phase-independent likeness in shape. Shapeletbased categorization utilizes the similarity among a shapelet and a series as aunbalanced feature. One advantage of the shape let method is that shapelets are understandable, and couldpresentimminent into the problem domain. The original shapelet-based classifier inserts the shapelet-discovery approach in a decision tree, and utilizes information gain to evaluate the quality of candidates, discovering a novelshapeletbyall node of International Journal of Science and Engineering Research (IJ0SER), Vol 3 Issue 11 November-2015 3221 5687, (P) 3221 568X the tree via an enumerative search. Subsequent scenario is concentrated mainly on methods to speed up the search. This scenario has analysis how most excellent to use the shapelet primitive to build classifiers. In this research, method introduced named as single-scan shapelet algorithm which determines the best k shapelets,which are utilized to construct a transformed dataset, where each of the k features symbolized the distance among a time series and a shapelet. The main benefits over the embedded method are that transformed data is used in conjunction along with any classifier, and that there is no recursive search for shapelets. Itreveals that the transformed data, in conjunction along with more complex classifiers, producessuperior accuracy rather than the embedded shapelet tree. This scenario also executes three similarity measures that generate equivalent results to information gain in small amount of time. In [9], Anthony Bagnall et.al (2015) suggested time series classification with collective of transformations based ensemble methods. It is mainly used for increasing the classification accuracy than preceding research. Another algorithm is named as Time series classification (TSC) which is used for transformation process which is based on comparative features. COTE contains classifiers constructed in the time, frequency, change, and shapelet transformation domains combined in alternative ensemble structures.The flat collective of transform-based ensembles (flat-COTE) weights the vote of each classifier by its cross-validation accuracy on the training data.We compare the accuracy ranks of ensembles constructed on each transform domain, flat-COTE, and, for bench marking, 1-NN with Euclidean distance and dynamic time warping with warping window set through cross validation. The mean rank of flatCOTE is significantly higher than all of the other classifiers. The information provided by the shapelet transform and, to a lesser extent, the power spectrum and change domains, provides discriminatory features that are hard, if not impossible, to detect in the time domain. COTE is used to provide better performance for all specified dataset than other datasets. Voting method is used to produce higher performance in terms of accuracy, precision, recall and f-measure rather the preceding research. Thus this method has been identified as the better technique as the approach provides deep analysis of higher classification performance for the specified datasets. The efficiency of this approach is better than the other techniques and also it provides pathway for further improvement. provisioning the future enhanced than can be performed to overcome its shortcomings. Table 1. Comparison of Research Methodologies S.N O TITLE PAPER OF AUTHO R NAME MERITS DEMERIT S 1 Logicalshapelets: an expressive primitive for time series classification Mueen, Abdullah , Eamonn Keogh, and Neal Young Efficiency is improved It has worst case complexity. 2 Time series shapelets: a novel technique that allows accurate, interpretable and fast classification A shapelet transform for time series classification Ye, Lexiang, and Eamonn Keogh Space complexity is reduced Training time takes long time Lines, Jason, et al Increased the classificatio n accuracy It degrades the multi class problems Baydoga n, Mustafa Gokce, George Runger, and Eugene Tuv It deals with high dimensiona l data It does not satisfy particular datasets A time series forest for classification and feature extraction Deng, Houtao, et al High quality results Classification of time series by shapelet transformation Hills, Jon, et al. 3 4 5 A bag-offeatures framework to classify time series Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015 6 Essential features are selected of Space efficiency is reduced High interpretabl e results 3. COMPARISON METHODOLOGIES This section provides an overview about the pros and cons that are occurred in the research methodologies whose functional scenarios are discussed in depth in the previous section. From the following table, it can be predicted a better approach that provides considerable improvement in the proposed scenarios. The table helps in detecting the efficient technique as well as in It handles noisy dataset effectively It handles large volume of dataset It is slow process International Journal of Science and Engineering Research (IJ0SER), Vol 3 Issue 11 November-2015 7 Time-Series Classification with COTE: The Collective of Transformatio n-Based Ensembles. Bagnall, Anthony, et al Accuracy is increased Space storage is more robust It requires further improveme nt Overall performanc e is improved Time complexity is reduced 4. CONCLUSION In this section, the conclusion decides that various recent progress in time series classification problems. The time series classification is an important factor and various classification methods are applied to research. The classification accuracy and performance is improved significantly by using various techniques. In this analysis, survey of different methodologies is conducted and various performance parameters are used to discover the better approach for better classification results. The voting ensemble method provides higher classification performance rather than other techniques. 5. ACKNOWLEDGEMENT We the authors assure you that, this is our own work and also assure you there is no conflict of interest. 6. REFERENCES 1. Ye, Lexiang, and Eamonn Keogh, Time series shapelets: a new primitive for data mining, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2009. 2. Lin, Jessica, RohanKhade, and Yuan Li, Rotation-invariant similarity in time series using bag-of-patterns representation, Journal of Intelligent Information Systems 39.2 (2012): 287-315. 3. Mueen, Abdullah, Eamonn Keogh, and Neal Young, Logical-shapelets: an expressive primitive for time series classification, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2011. 4. Ye, Lexiang, and Eamonn Keogh, Time series shapelets: a novel technique that allows accurate, interpretable and fast classification, Data mining and knowledge discovery 22.1-2 (2011): 149-182. 5. Lines, Jason, et al, A shapelet transform for time series classification, Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2012. Yamunadevi ,Bhuvaneshwari,Uma…. (IJ0SER) November - 2015 3221 5687, (P) 3221 568X 6. Baydogan, Mustafa Gokce, George Runger, and Eugene Tuv, A bag-offeatures framework to classify time series, Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.11 (2013): 2796-2802. 7. Deng, Houtao, et al, A time series forest for classification and feature extraction, Information Sciences 239 (2013): 142-153. 8. Hills, Jon, et al, Classification of time series by shapelet transformation, Data Mining and Knowledge Discovery 28.4 (2014): 851-881. 9. Bagnall, Anthony, et al, Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles, IEEE transactions on knowledge and data engineeri l. 27, no. 9, 2015.