Download A Novel Staged Modeling Mechanism for Process Object

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016
A Novel Staged Modeling Mechanism for
Process Object
Kun Zhang, Kai Wang, Qingbei Guo, Lianjiang Zhu, and Shouning Qu
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
Email: {ise_zhangk, ise_wangk, ise_guoqb, ise_zhulj, qsn}@ujn.edu.cn
Abstract—A cloud model based staged modeling mechanism
of process object is proposed in this paper. This schema
aims to improve the efficiency and effectiveness of process
object modeling for large-scale data. In this mechanism,
cloud model is used to characterize the feature of time-series
data for process object in industrial area. Then a data
partition approach based on k-means for data preprocessing
is proposed. The partitioned data could be used as the input
of modelling technology. The different stages could be
modeled in parallel to improve the efficiency of modeling
and can be describe the model more precisely. Experiments
and analysis showed the efficiency and effectiveness of our
staged modeling mechanism. 
mining purpose. Then the parallel algorithms could be
used to seeking the interesting rules or potential trends
hiding in the large data sets efficiently.
In order to improve the efficiency of mathematical
modelling for industrial process object, time-series data
should be preprocessed before modelling phrase. In this
paper, time-series data are fragmented into the atom unit
as a fixed time-length, such as 12 hours or 24 hours. Then
the data from an atom unit should be modelled to get the
feature of these data. Then the adjacent unit could be
clustered into the same bigger unit according to the
feature similarity. After the clustering between the
adjacent atom units, a hierarchical clustering structure
could be constructed. Then we could use these clustered
data for modelling respectively. Using this strategy, a
time-phased mathematical model could be used to get a
more accurate effect.
The contribution of this paper is as follows: Firstly, we
propose a novel cloud-model based clustering approach
for time-series data in industrial area. Cloud model is
used to describe the feature of the data from a certain
time and then used as the merging criteria. After
clustering the adjacent time-series data with similarity
feature based on cloud model, it could accelerate the size
of input data volume of mathematical modelling to
improve the efficiency. Secondly, we propose a staged
mathematical modelling strategy for industrial process
object. Compared to a holistic and integral mathematical
model, the staged model could get a more accurate effect.
And the staged model could be the basis for adaptive
modelling or progressive modelling of industrial process
object, which is our future work. Finally, we apply this
approach to the mathematical modelling of industrial
process object in power industry. The experiments in the
power industry area show the effect of the approach.
This paper is organized as follows. Section II gives a
survey of related works. Section III gives the main idea of
our approach. The detailed solution is presented in
Section IV. Experiments and analysis are made in Section
V. Section VI gives a conclusion.
Index Terms—process object, large-scale data, cloud model,
mathematical modeling, staged modeling
I.
INTRODUCTION
With the development and maturity of cloud
computing and big data, more and more industries adopt
and apply the concept of big data and cloud computing to
collect, transfer, store and manage data for further study
and research. Useful or potential knowledge and rules are
the target for improving the efficiency and effectiveness
of large-scale data usage. Especially in industrial area,
due to the uncertainty, complexity, nonlinearity and
strong correlation of industrial process, the modelling of
the industrial process is difficult when data volume is
large and industrial process is complex.
There are some challenges for industrial process
modelling in such situation. Firstly, the data volume is
large. Finding the appropriate features and sample data
for modelling is a challenge. Secondly, the data is time
dependent. In some situations, data belonging to the
different time zones have different impact on the
mathematical modelling. So how to evaluate the impact
of data belonging to the different time zones and how to
apply its impact to the modelling is another challenges.
All of above are concerns about the efficiency of
modelling.
To improve the efficiency and effectiveness of
mathematical modelling of industrial process object, data
preprocessing must be done. In industrial area, these
time-series data from the process object could be
clustered according to the feature customized by the data
II.
The research on time-series data is paid more and more
attention by many researchers. For time-series data in
industrial area, there have been a lot of research issues,
including the expression, measurement, classification,
Manuscript received July 12, 2015; revised January 14, 2016.
©2016 Int. J. Electron. Electr. Eng.
doi: 10.18178/ijeee.4.5.454-458
RELATED WORKS
454
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016
III.
clustering, forecasting and modeling of time-series data
[1].
Chen proposed the flexible neural tree [2] for
modelling of time-series data, and then proposed a
parallel evolving algorithm [3] for flexible neural tree to
improve the efficiency of the modelling, especially for
the large data sets. The basic of flexible neural tree is to
construct a flexible tree to model the relationship between
the root node variable and the leaf node variables. The
construction of the flexible neural tree and the parameters
of each node could be trained through large-scale data to
get a more appropriate model. However, this approach
considers the training data as a whole and didn’t pay
attention to the characteristic of training data itself.
For the feature extraction to get the characteristic of
data, there have been a lot of methods, including
clustering. For time-series data, traditional clustering
approached could be transformed to fit for the time-series
data clustering, or time-series data should be transformed
into the static data before clustering [4]. The latter
approach could be called feature-based or model-based
approach. The traditional clustering approach could be
used for feature extraction of time-series data, including
k-means [5], [6], DBSCAN [7], OPTICS [8] and other
clustering algorithms.
Before clustering, time-series data could be
transformed into the appropriate form. There has been a
lot of transformation approaches. Li proposed the cloud
model [9], which is an effective tool in transforming
between qualitative concepts and their quantitative
expressions. The digital characteristics of cloud, expert
value (Ex), entropy (En) and hyper-entropy (He), well
integrate the fuzziness and randomness of linguistic
concepts in a unified way. With sample data, backward
cloud generator could be used to get the backward cloud
model [10], which is used to express the feature of the
time-series data.
For time-series modeling, flexible neural tree could be
used for modeling and prediction [11]. And a schema for
mining state association rules of process object is
proposed [12]. This schema proposed an algorithm flow
to discover the state association rules hiding in the largedata sets. Based on the flexible neural tree, the
relationship between states could be modeled. However,
the states belonging to the different contexts should be
processed specifically.
There are many problems involved in modeling the
time-series data. As the complexity, nonlinearity and high
relevancy of time-series data in industrial area, the
modeling of time-series data is becoming more and more
difficult, especially with the explosion of data volume. So
in these new challenges, new modeling mechanism of
time-series data, i.e. process object, for large-scale data
should be studied. The internal characteristic of timeseries data should be learned and used to support the
modeling process. In this paper, we further distinguish
the different phase of time-series data, and proposed a
staged modeling mechanism. The characteristic of
different phase is modeled by cloud model in this paper.
©2016 Int. J. Electron. Electr. Eng.
PROCESS OBJECT MODELING ARCHITECTURE
A. Process Object Modeling
In our previous work [12], we proposed a scheme
which can discover the state association rules of the
process object. The scheme aimed to dig the hidden close
relationships of different links in process object. This
schema consisted of five main steps including data
sampling, timing analysis, clustering, association rule
mining, association chain mining and state association
rule generation. The steps are as shown in Fig. 1.
Raw Data
A Schema for Mining State Association Rules
of Process Object
Data Sampling
Time Series
Computing
Association Rules
Mining
Clustering
Association
Chains Mining
State Association
Rules Computing
State Association Rules
Figure 1. A schema for mining state association rules of process object.
B. Staged Process Object Modeling
The reasons for staged modelling of process object are
as follows. Firstly, the rules hiding in the large-scale
process objects are complex due to the complexity of the
process object itself. Then it could be more difficult to
discover from the large-scale data set as a whole.
Secondly, due to the staged feature of process object
itself, the rules could be more precise if described
according to the features. Then the staged modelling
mechanism is proposed, as shown in Fig. 2. The previous
modelling mechanism is as shown in Fig. 3.
Time-Series Data
Data Partitioning
Data 1
Data2
……
Data n
Modeling
Model
Model1
Model2 …… Modeln
Figure 2. Staged modeling mechanism paradigm.
455
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016
B. Cloud Model Based Data Feature Extraction
In industrial area, time-series data could be
characterized by cloud model.
Definition 1. Atom Period. Atom period is a duration
which could not be segmented. Data from an atom period
could be used as a whole. An atom period should be
customized by expert and application area, such as one
minute, one hour, or one day.
Using the backward cloud generator [10], the three
feature parameters of cloud model, including Ex, En, and
He, could be gotten. The algorithm is as shown in Table I.
Time-Series Data
Modelling
Model
Figure 3. Non-Staged modeling.
The basis of stage division could be diverse and there
are lots of features that could be used as the stage division
basis. We proposed a cloud model based method to
divide the stages. The details are presented in Section IV.
IV.
TABLE I.
BACKWARD CLOUD GENERATOR ALGORITHM
Algorithm 1: Backward cloud generator algorithm
Input:
D: Data set with n data points, i.e. D={di| 1≤ i ≤n, and
di=(x1, x2…, xL)}
Y: Membership degree set of D, Y={yi| yi is the
membership degree of di in D}
Output: Ex: expect value
En: entropy
He: hyper-entropy
Algorithm
STAGE-BASED MODELING FOR PROCESS OBJECT
A. Staged Modeling Architecture
The staged modeling architecture is shown is in Fig. 4.
The stage-based modelling mechanism includes steps as
follows. Firstly, cloud model is used to characterize the
feature of data for time-series data in industrial area.
Then we propose a data partition approach for data
preprocessing. The partitioned data which could be
successive in time could be used as the input of
modelling technology, such as flexible neutral tree.
(1) Using D and Y, fit the cloud expect equation y  e

(3) get En using equation En 
'
'
| x  E x|
2 ln y
m


 Eni'
i 1
m
m

Data Partitioning
Cloud Model
based Feature
Extraction
Atom
Context1
……
……

(5) get H e using equation H e 

 ( Eni'  E n )2
i 1
m 1
C. Clustering
Based on atom period, data from the same atom period
could be analyzed as a whole. The data from successive
atom period could be merged as a bigger one, if they have
the same or similarity features. The data with same or
similar features could be considered as a whole in
modeling.
The data from an atom period could form an entity
called atom context. Then we could use this to transform
the raw data sets into data sets of atom context based on
cloud model. These atom contexts could be clustered
according to their similarity.
Data n
Cloud Model
Cloud Model
based Feature
based Feature
……
Extraction
Extraction
Atom
Context2
, and
then get E x
(2) Drop the data points when y>0.999, and remain m data points
(4) get E n using equation E n 
Data 2
2

Time-Series Data
Data 1
 ( x  Ex )
2
2 En
Atom
Contextn
K-means Clustering
Context Clusters
……
Context1
TABLE II. CONTEXT CLUSTERING ALGORITHM BASED ON K-MEANS
Contextm
Algorithm 2: Context clustering algorithm based on K-means
Input:
Output:
Algorithm
(1) Select k atom context from D randomly as the initial cluster
centroid
(2) repeat
(3) Assign every atom context into the most similar cluster
according to the mean value of clusters
(4) Update mean value of clusters
(5) until no more changes happen
Flexible Neural Tree based Modelling
Staged Model
Modelstage-1
……
Modelstage-m
Figure 4. Staged modeling architecture.
©2016 Int. J. Electron. Electr. Eng.
k: number of clusters
D: data set with n atom context
C: cluster set with k clusters
456
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016
We use k-means method to cluster the atom context to
form a larger context. The data from a same context could
be modeled as a whole. The clustering method based on
k-means is as shown in Table II.
In this approach, the value k could be customized by
the domain experts according to the actual process.
In order to improve the efficiency of flexible neural
tree based modeling, the parallel evolving algorithm 3
could be applied.
V.
EXPERIMENTS AND ANALYSIS
We have performed some experiments to show that the
staged modeling mechanism is effective. Power
generation system of the electric power is a typical
process industry system. The whole process flows of
power system generate large-scale multi-dimensional data.
We have performed experiments on the historical data of
a subsystem of a power plant to mining state association
rules [12]. Based on the rules, we could model the
process object using staged modeling mechanism
proposed in this paper.
In this experiment, we select 13000 records for
demonstration. The k value in k-means clustering is set
for 2 for simplicity. And we use flexible neural tree
model to construct a model with X3 variable as its output.
The result is shown in Fig. 5. It showed that the staged
modeling mechanism is more accurate than non-staged
modeling, because the staged modeling mechanism takes
more information. However, more experiments and
analysis are needed to prove the efficiency and
effectiveness of the staged modeling mechanism.
D. Flexible Neural Tree Based Modelling
Based on flexible neural tree 1, we could model every
context got from context clustering algorithm. The
algorithm is shown is shown in Table III.
TABLE III. FLEXIBLE NEURAL TREE BASED MODELING
Algorithm 3: Flexible neural tree based modeling
Input:
C: cluster set with k clusters for context
Conf: flexible neural tree initial parameters
Output: M: staged model, i.e. M={Mi | 1≤ i ≤k, and Mi is the
model of clustered context i in C}
Algorithm
(1) Flexible neural tree parameters initialization with Conf, and
initialize M
(2) repeat
(3)
Select an modeled context Ci from C
(4)
Modeled Ci with flexible neural tree, get the
corresponding model Mi
(5) put Mi into M
(6) until every clustered context in C is modeled
Figure 5. Comparison between non-staged modeling and staged modeling.
VI.
CONCLUSION
ACKNOWLEDGEMENTS
This paper proposed a staged modeling mechanism of
process object in industrial area based on cloud model.
Firstly, time-series data from industrial area should be
divided into data units according to the atom period
customized by experts. Then the data units could be
clustered using k-means to get the different clusters. The
data units from the same cluster could be modeled as a
whole to get a specific mathematical model based on
flexible neural tree. From the point view of time, the
time-series data could be modeled as a piecewise function.
And this is so-called staged modelling mechanism. The
staged modeling mechanism could be more precise than a
unified modeling mechanism.
However, there are other works in future. Firstly, the
clustering method could be more precisely. And
hierarchy clustering methods or density based clustering
methods could be used to improve the effectiveness of the
data partition. Secondly, based on the staged modeling
technology, the context could be used to describe the
stage information.
©2016 Int. J. Electron. Electr. Eng.
457
This work is supported by Foundation for Doctors of
University of Jinan under Grant No. XBS1316.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
P. Esling and C. Agon, “Time-Series data mining,” ACM
Computer. Surv., vol. 45, no. 1, pp. 1-34, 2012.
Y. Chen, B. Yang, and J. Dong, “Evolving flexible neural
networks using ant programming and PSO algorithm,” LNCS, vol.
3173, pp. 211-216, 2004.
L. Peng, B. Yang, L. Zhang, and Y. Chen, “A parallel evolving
algorithm for flexible neural tree,” Parallel Computing, vol. 37, pp.
653-666, 2011.
T. W. Liao, “Clustering of time series data - A survey,” Pattern
Recognition, vol. 38, no. 11, pp. 1857-1874, 2005.
S. P. Lloyd, “Least squares quantization in PCM,” IEEE
Transactions on Information Theory, vol. 28, no. 2, pp. 129-136,
1982.
T. Yang and J. Wang, “Clustering unsynchronized time series
subsequences with phase shift weighted spherical k-means
algorithm,” JCP, vol. 9, no. 5, pp. 1103-1108, 2014.
M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based
algorithm for discovering clusters in large spatial databases with
noise,” in Proc. KDD, 1996, pp. 226-231.
International Journal of Electronics and Electrical Engineering Vol. 4, No. 5, October 2016
[8]
M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander,
“OPTICS: Ordering points to identify the clustering structure,” in
Proc. SIGMOD, 1999, pp. 49-60.
[9] D. Li, K. Di, D. Li, and X. Shi, “Mining association rules with
linguistic cloud models,” Journal of Software, vol. 11, no. 2, pp.
143-158, 2000.
[10] H. Lu, Y. Wang, D. Li, and C. Liu, “The application of backword
cloud in qualitative evaluation,” Chinese Journal of Computers,
vol. 26, no. 8, pp. 1009-1014, 2003.
[11] Y. Chen, B. Yang, J. Dong, and A. Abraham, “Time-Series
forecasting using flexible neural tree model,” Inf. Sci., vol. 174, no.
3-4, pp. 219-235, 2005.
[12] Q. Song, Q. Guo, K. Wang, T. Du, S. Qu, and Y. Zhang, “A
scheme for mining state association rules of process object based
on big data,” Journal of Computer and Communications, vol. 2,
pp. 17-24, 2014.
Kai Wang is a lecturer in School of Information Science and
Engineering, University of Jinan, Jinan, China. His research focuses are
in data mining, and big data.
Qingbei Guo is a lecturer in School of Information Science and
Engineering, University of Jinan, Jinan, China. His research focuses are
in data mining, big data, and sensor network.
Lianjiang Zhu is a lecturer in School of Information Science and
Engineering, University of Jinan, Jinan, China. His research focuses are
in data mining, and big data.
Shouning Qu is a professor in School of Information Science and
Engineering, University of Jinan, Jinan, China. His research focuses are
in data mining and big data.
Kun Zhang is a lecturer in School of Information Science and
Engineering, University of Jinan, Jinan, China. He received his doctor
©2016 Int. J. Electron. Electr. Eng.
degree in computer software and theory from Shandong University from
2012. His research focuses are in data mining, data privacy, cloud
computing, and web service.
458