Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016 A Novel Staged Modeling Mechanism for Process Object Kun Zhang, Kai Wang, Qingbei Guo, Lianjiang Zhu, and Shouning Qu Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China Email: {ise_zhangk, ise_wangk, ise_guoqb, ise_zhulj, qsn}@ujn.edu.cn Abstract—A cloud model based staged modeling mechanism of process object is proposed in this paper. This schema aims to improve the efficiency and effectiveness of process object modeling for large-scale data. In this mechanism, cloud model is used to characterize the feature of time-series data for process object in industrial area. Then a data partition approach based on k-means for data preprocessing is proposed. The partitioned data could be used as the input of modelling technology. The different stages could be modeled in parallel to improve the efficiency of modeling and can be describe the model more precisely. Experiments and analysis showed the efficiency and effectiveness of our staged modeling mechanism. mining purpose. Then the parallel algorithms could be used to seeking the interesting rules or potential trends hiding in the large data sets efficiently. In order to improve the efficiency of mathematical modelling for industrial process object, time-series data should be preprocessed before modelling phrase. In this paper, time-series data are fragmented into the atom unit as a fixed time-length, such as 12 hours or 24 hours. Then the data from an atom unit should be modelled to get the feature of these data. Then the adjacent unit could be clustered into the same bigger unit according to the feature similarity. After the clustering between the adjacent atom units, a hierarchical clustering structure could be constructed. Then we could use these clustered data for modelling respectively. Using this strategy, a time-phased mathematical model could be used to get a more accurate effect. The contribution of this paper is as follows: Firstly, we propose a novel cloud-model based clustering approach for time-series data in industrial area. Cloud model is used to describe the feature of the data from a certain time and then used as the merging criteria. After clustering the adjacent time-series data with similarity feature based on cloud model, it could accelerate the size of input data volume of mathematical modelling to improve the efficiency. Secondly, we propose a staged mathematical modelling strategy for industrial process object. Compared to a holistic and integral mathematical model, the staged model could get a more accurate effect. And the staged model could be the basis for adaptive modelling or progressive modelling of industrial process object, which is our future work. Finally, we apply this approach to the mathematical modelling of industrial process object in power industry. The experiments in the power industry area show the effect of the approach. This paper is organized as follows. Section II gives a survey of related works. Section III gives the main idea of our approach. The detailed solution is presented in Section IV. Experiments and analysis are made in Section V. Section VI gives a conclusion. Index Terms—process object, large-scale data, cloud model, mathematical modeling, staged modeling I. INTRODUCTION With the development and maturity of cloud computing and big data, more and more industries adopt and apply the concept of big data and cloud computing to collect, transfer, store and manage data for further study and research. Useful or potential knowledge and rules are the target for improving the efficiency and effectiveness of large-scale data usage. Especially in industrial area, due to the uncertainty, complexity, nonlinearity and strong correlation of industrial process, the modelling of the industrial process is difficult when data volume is large and industrial process is complex. There are some challenges for industrial process modelling in such situation. Firstly, the data volume is large. Finding the appropriate features and sample data for modelling is a challenge. Secondly, the data is time dependent. In some situations, data belonging to the different time zones have different impact on the mathematical modelling. So how to evaluate the impact of data belonging to the different time zones and how to apply its impact to the modelling is another challenges. All of above are concerns about the efficiency of modelling. To improve the efficiency and effectiveness of mathematical modelling of industrial process object, data preprocessing must be done. In industrial area, these time-series data from the process object could be clustered according to the feature customized by the data II. The research on time-series data is paid more and more attention by many researchers. For time-series data in industrial area, there have been a lot of research issues, including the expression, measurement, classification, Manuscript received July 12, 2015; revised January 14, 2016. ©2016 Int. J. Electron. Electr. Eng. doi: 10.18178/ijeee.4.5.454-458 RELATED WORKS 454 International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016 III. clustering, forecasting and modeling of time-series data [1]. Chen proposed the flexible neural tree [2] for modelling of time-series data, and then proposed a parallel evolving algorithm [3] for flexible neural tree to improve the efficiency of the modelling, especially for the large data sets. The basic of flexible neural tree is to construct a flexible tree to model the relationship between the root node variable and the leaf node variables. The construction of the flexible neural tree and the parameters of each node could be trained through large-scale data to get a more appropriate model. However, this approach considers the training data as a whole and didn’t pay attention to the characteristic of training data itself. For the feature extraction to get the characteristic of data, there have been a lot of methods, including clustering. For time-series data, traditional clustering approached could be transformed to fit for the time-series data clustering, or time-series data should be transformed into the static data before clustering [4]. The latter approach could be called feature-based or model-based approach. The traditional clustering approach could be used for feature extraction of time-series data, including k-means [5], [6], DBSCAN [7], OPTICS [8] and other clustering algorithms. Before clustering, time-series data could be transformed into the appropriate form. There has been a lot of transformation approaches. Li proposed the cloud model [9], which is an effective tool in transforming between qualitative concepts and their quantitative expressions. The digital characteristics of cloud, expert value (Ex), entropy (En) and hyper-entropy (He), well integrate the fuzziness and randomness of linguistic concepts in a unified way. With sample data, backward cloud generator could be used to get the backward cloud model [10], which is used to express the feature of the time-series data. For time-series modeling, flexible neural tree could be used for modeling and prediction [11]. And a schema for mining state association rules of process object is proposed [12]. This schema proposed an algorithm flow to discover the state association rules hiding in the largedata sets. Based on the flexible neural tree, the relationship between states could be modeled. However, the states belonging to the different contexts should be processed specifically. There are many problems involved in modeling the time-series data. As the complexity, nonlinearity and high relevancy of time-series data in industrial area, the modeling of time-series data is becoming more and more difficult, especially with the explosion of data volume. So in these new challenges, new modeling mechanism of time-series data, i.e. process object, for large-scale data should be studied. The internal characteristic of timeseries data should be learned and used to support the modeling process. In this paper, we further distinguish the different phase of time-series data, and proposed a staged modeling mechanism. The characteristic of different phase is modeled by cloud model in this paper. ©2016 Int. J. Electron. Electr. Eng. PROCESS OBJECT MODELING ARCHITECTURE A. Process Object Modeling In our previous work [12], we proposed a scheme which can discover the state association rules of the process object. The scheme aimed to dig the hidden close relationships of different links in process object. This schema consisted of five main steps including data sampling, timing analysis, clustering, association rule mining, association chain mining and state association rule generation. The steps are as shown in Fig. 1. Raw Data A Schema for Mining State Association Rules of Process Object Data Sampling Time Series Computing Association Rules Mining Clustering Association Chains Mining State Association Rules Computing State Association Rules Figure 1. A schema for mining state association rules of process object. B. Staged Process Object Modeling The reasons for staged modelling of process object are as follows. Firstly, the rules hiding in the large-scale process objects are complex due to the complexity of the process object itself. Then it could be more difficult to discover from the large-scale data set as a whole. Secondly, due to the staged feature of process object itself, the rules could be more precise if described according to the features. Then the staged modelling mechanism is proposed, as shown in Fig. 2. The previous modelling mechanism is as shown in Fig. 3. Time-Series Data Data Partitioning Data 1 Data2 …… Data n Modeling Model Model1 Model2 …… Modeln Figure 2. Staged modeling mechanism paradigm. 455 International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016 B. Cloud Model Based Data Feature Extraction In industrial area, time-series data could be characterized by cloud model. Definition 1. Atom Period. Atom period is a duration which could not be segmented. Data from an atom period could be used as a whole. An atom period should be customized by expert and application area, such as one minute, one hour, or one day. Using the backward cloud generator [10], the three feature parameters of cloud model, including Ex, En, and He, could be gotten. The algorithm is as shown in Table I. Time-Series Data Modelling Model Figure 3. Non-Staged modeling. The basis of stage division could be diverse and there are lots of features that could be used as the stage division basis. We proposed a cloud model based method to divide the stages. The details are presented in Section IV. IV. TABLE I. BACKWARD CLOUD GENERATOR ALGORITHM Algorithm 1: Backward cloud generator algorithm Input: D: Data set with n data points, i.e. D={di| 1≤ i ≤n, and di=(x1, x2…, xL)} Y: Membership degree set of D, Y={yi| yi is the membership degree of di in D} Output: Ex: expect value En: entropy He: hyper-entropy Algorithm STAGE-BASED MODELING FOR PROCESS OBJECT A. Staged Modeling Architecture The staged modeling architecture is shown is in Fig. 4. The stage-based modelling mechanism includes steps as follows. Firstly, cloud model is used to characterize the feature of data for time-series data in industrial area. Then we propose a data partition approach for data preprocessing. The partitioned data which could be successive in time could be used as the input of modelling technology, such as flexible neutral tree. (1) Using D and Y, fit the cloud expect equation y e (3) get En using equation En ' ' | x E x| 2 ln y m Eni' i 1 m m Data Partitioning Cloud Model based Feature Extraction Atom Context1 …… …… (5) get H e using equation H e ( Eni' E n )2 i 1 m 1 C. Clustering Based on atom period, data from the same atom period could be analyzed as a whole. The data from successive atom period could be merged as a bigger one, if they have the same or similarity features. The data with same or similar features could be considered as a whole in modeling. The data from an atom period could form an entity called atom context. Then we could use this to transform the raw data sets into data sets of atom context based on cloud model. These atom contexts could be clustered according to their similarity. Data n Cloud Model Cloud Model based Feature based Feature …… Extraction Extraction Atom Context2 , and then get E x (2) Drop the data points when y>0.999, and remain m data points (4) get E n using equation E n Data 2 2 Time-Series Data Data 1 ( x Ex ) 2 2 En Atom Contextn K-means Clustering Context Clusters …… Context1 TABLE II. CONTEXT CLUSTERING ALGORITHM BASED ON K-MEANS Contextm Algorithm 2: Context clustering algorithm based on K-means Input: Output: Algorithm (1) Select k atom context from D randomly as the initial cluster centroid (2) repeat (3) Assign every atom context into the most similar cluster according to the mean value of clusters (4) Update mean value of clusters (5) until no more changes happen Flexible Neural Tree based Modelling Staged Model Modelstage-1 …… Modelstage-m Figure 4. Staged modeling architecture. ©2016 Int. J. Electron. Electr. Eng. k: number of clusters D: data set with n atom context C: cluster set with k clusters 456 International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016 We use k-means method to cluster the atom context to form a larger context. The data from a same context could be modeled as a whole. The clustering method based on k-means is as shown in Table II. In this approach, the value k could be customized by the domain experts according to the actual process. In order to improve the efficiency of flexible neural tree based modeling, the parallel evolving algorithm 3 could be applied. V. EXPERIMENTS AND ANALYSIS We have performed some experiments to show that the staged modeling mechanism is effective. Power generation system of the electric power is a typical process industry system. The whole process flows of power system generate large-scale multi-dimensional data. We have performed experiments on the historical data of a subsystem of a power plant to mining state association rules [12]. Based on the rules, we could model the process object using staged modeling mechanism proposed in this paper. In this experiment, we select 13000 records for demonstration. The k value in k-means clustering is set for 2 for simplicity. And we use flexible neural tree model to construct a model with X3 variable as its output. The result is shown in Fig. 5. It showed that the staged modeling mechanism is more accurate than non-staged modeling, because the staged modeling mechanism takes more information. However, more experiments and analysis are needed to prove the efficiency and effectiveness of the staged modeling mechanism. D. Flexible Neural Tree Based Modelling Based on flexible neural tree 1, we could model every context got from context clustering algorithm. The algorithm is shown is shown in Table III. TABLE III. FLEXIBLE NEURAL TREE BASED MODELING Algorithm 3: Flexible neural tree based modeling Input: C: cluster set with k clusters for context Conf: flexible neural tree initial parameters Output: M: staged model, i.e. M={Mi | 1≤ i ≤k, and Mi is the model of clustered context i in C} Algorithm (1) Flexible neural tree parameters initialization with Conf, and initialize M (2) repeat (3) Select an modeled context Ci from C (4) Modeled Ci with flexible neural tree, get the corresponding model Mi (5) put Mi into M (6) until every clustered context in C is modeled Figure 5. Comparison between non-staged modeling and staged modeling. VI. CONCLUSION ACKNOWLEDGEMENTS This paper proposed a staged modeling mechanism of process object in industrial area based on cloud model. Firstly, time-series data from industrial area should be divided into data units according to the atom period customized by experts. Then the data units could be clustered using k-means to get the different clusters. The data units from the same cluster could be modeled as a whole to get a specific mathematical model based on flexible neural tree. From the point view of time, the time-series data could be modeled as a piecewise function. And this is so-called staged modelling mechanism. The staged modeling mechanism could be more precise than a unified modeling mechanism. However, there are other works in future. Firstly, the clustering method could be more precisely. And hierarchy clustering methods or density based clustering methods could be used to improve the effectiveness of the data partition. Secondly, based on the staged modeling technology, the context could be used to describe the stage information. ©2016 Int. J. Electron. Electr. Eng. 457 This work is supported by Foundation for Doctors of University of Jinan under Grant No. XBS1316. REFERENCES [1] [2] [3] [4] [5] [6] [7] P. Esling and C. Agon, “Time-Series data mining,” ACM Computer. Surv., vol. 45, no. 1, pp. 1-34, 2012. Y. Chen, B. Yang, and J. Dong, “Evolving flexible neural networks using ant programming and PSO algorithm,” LNCS, vol. 3173, pp. 211-216, 2004. L. Peng, B. Yang, L. Zhang, and Y. Chen, “A parallel evolving algorithm for flexible neural tree,” Parallel Computing, vol. 37, pp. 653-666, 2011. T. W. Liao, “Clustering of time series data - A survey,” Pattern Recognition, vol. 38, no. 11, pp. 1857-1874, 2005. S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-136, 1982. T. Yang and J. Wang, “Clustering unsynchronized time series subsequences with phase shift weighted spherical k-means algorithm,” JCP, vol. 9, no. 5, pp. 1103-1108, 2014. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. KDD, 1996, pp. 226-231. International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016 [8] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: Ordering points to identify the clustering structure,” in Proc. SIGMOD, 1999, pp. 49-60. [9] D. Li, K. Di, D. Li, and X. Shi, “Mining association rules with linguistic cloud models,” Journal of Software, vol. 11, no. 2, pp. 143-158, 2000. [10] H. Lu, Y. Wang, D. Li, and C. Liu, “The application of backword cloud in qualitative evaluation,” Chinese Journal of Computers, vol. 26, no. 8, pp. 1009-1014, 2003. [11] Y. Chen, B. Yang, J. Dong, and A. Abraham, “Time-Series forecasting using flexible neural tree model,” Inf. Sci., vol. 174, no. 3-4, pp. 219-235, 2005. [12] Q. Song, Q. Guo, K. Wang, T. Du, S. Qu, and Y. Zhang, “A scheme for mining state association rules of process object based on big data,” Journal of Computer and Communications, vol. 2, pp. 17-24, 2014. Kai Wang is a lecturer in School of Information Science and Engineering, University of Jinan, Jinan, China. His research focuses are in data mining, and big data. Qingbei Guo is a lecturer in School of Information Science and Engineering, University of Jinan, Jinan, China. His research focuses are in data mining, big data, and sensor network. Lianjiang Zhu is a lecturer in School of Information Science and Engineering, University of Jinan, Jinan, China. His research focuses are in data mining, and big data. Shouning Qu is a professor in School of Information Science and Engineering, University of Jinan, Jinan, China. His research focuses are in data mining and big data. Kun Zhang is a lecturer in School of Information Science and Engineering, University of Jinan, Jinan, China. He received his doctor ©2016 Int. J. Electron. Electr. Eng. degree in computer software and theory from Shandong University from 2012. His research focuses are in data mining, data privacy, cloud computing, and web service. 458