Download 5. MR FP-Growth algorithm for mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
2016China International Conference on Electricity Distribution (CICED 2016)
Xian Sep.
2016
Operation and Maintenance Analysis
for Power Communication Networks based on Big Data
Zhan SHI1
Siya XU2
1
Power Grid Control Center of Guangdong China South Power Grid
2
State Key Laboratory of Network and Switching Technology, Beijing University of Posts and
Telecommunications
Abstract
To support the reliable operation and reasonable construction demands of power communication
networks, this paper uses Big Data technologies to analyze the equipment operation and maintenance status
of power communications networks based on the equipment ledger, failure and maintenance data. Firstly,
the features of equipment operation and maintenance data of power communication networks are analyzed,
and then the equipment operation and maintenance analysis indicators and influence factors are proposed.
Next, the data preprocessing is completed using correlation analysis and principal component analysis. At
the same time, the data mining approach combined with MapReduce technology and FP-growth algorithm
is raised. Finally, the actual network data is mined using this method, and according to the data mining
result, the equipment operation and maintenance situation of power communication networks is analyzed
from the points of maintenance effect, maintenance cost and time distribution of maintenance, etc.
1.
Introduction
Currently, data analysis of power communication network operation and maintenance is still based
on the traditional experience-based judgment. And it’s lack of efficient mining and analysis means in
the era of big data.
2.
2.1.
Related works and description of the problem
Related works
The application of big data in smart grid is in its infancy. Big data research on electric power
communications network is merely about the framework and application analysis, and lack of specific
embodiments. Thus, we need to use the operation and maintenance information of the existing electric
power communication network equipment firstly. Then, we can apply the appropriate big data
processing scheme, and analyze the correlation between the different indicators. Finally, we try to
provide reasonable suggestions for equipment maintenance.
2.2.
Description of the problem
This article will regard the accounting, fault and maintenance data in different power
communication network systems as the input source. The effect of operation and maintenance, costs,
relationship in time and space are obtained via processing big data. Then we complete data
pre-processing through data cleaning, data integration and data statute according to data characteristics,
CICED2010 Session x
Page1/3
Paper No xxx
2016China International Conference on Electricity Distribution (CICED 2016)
Xian Sep.
2016
and obtain a valid data set. Based on data set, we accomplish the processing of large data sets by
combining the MapReduce and FP-growth technology, and get effective analysis results.
3.
Indicators and influencing factors of equipment operation and maintenance
Based on the maintenance characteristics of power communication network, the indicators and
influencing factors of equipment operation and maintenance are proposed. The indicators and
influencing factors include maintenance efficiency, maintenance costs and maintenance time
distribution. The three aspects are used to analyze the operation and maintenance of power
communication equipment.
3.1.
Maintenance efficiency
3.2.
Maintenance costs
(1) Time overhead; (2) Integrated cost; (3) Time distribution of maintenance
4.
Data pretreatment
To improve the quality of data mining, the data preprocessing is generated. Based on the theory of
data mining, the data pretreatment process is divided into three steps, as data cleaning, data integration
and data statute.
4.1.
Ddata cleaning
4.2.
Data integration
4.3.
Data statute
5.
MR FP-Growth algorithm for mining
By FP-Growth method, the problem of looking for long-frequency mode is converted into
recursively searching for some short modes in smaller condition database, and then connects suffix.
The most infrequent entries are used for suffix, which offers better selectivity.
The persuade coed of FP-Growth algorithm
FP-Growth algorithm
Input: object databases D, min_sup minimum supported threshold
Output: Complete set of frequent patterns
1
scann object databases D, construct FP-Tree’s root node
2
3
4
5
6
CICED2010 Session x
Page2/3
If Tree Contain a single path P then
For combination of path P ( b )
b È a and support_count=min_value
Else For every ai Î Linki
b = ai È a and support _ count = ai .support_ count
Paper No xxx
2016China International Conference on Electricity Distribution (CICED 2016)
Xian Sep.
2016
FP-Tree= T ree b
If T ree b ¹ Æ
7
FP_Growth( T ree b , b )
The MapReduce technology and FP-Growth algorithm are combined in this paper. Parallel
computing characteristic of Hadoop cluster is used to accelerate the convergence rate. Therefore, the
enhanced FP-Growthi algorithm (MR FP-Growth algorithm) is proposed with specific ideas as follows:
(1) implementation methods:
a) Dividing object databases, and distributing them to different nodes in the cluster
b) Computing support count for each node, aggregating these data to the same node, and
then constructing frequent item sets
c) Obtaining local frequent item sets by using FP-Growth algorithm
(2) implementation process:
a) TDB will be evenly distributed to different working nodes
b) Distributedly Computing F_List
c) Parallelly computing local frequent item sets. The affairs set will be distributed to the
same node to build FT i by Map. Reduce constructs SubFP-Tree.
6.
Result analysis
This paper selects fault data, equipment maintenance data and related resources ledger data as the
input source. All data is between 2010 and 2014 from a power company. Analyze the data by terms of
maintenance effect, maintenance costs and time distribution of the maintenance workload.
7.
Conclusion
In this paper, big data processing technique is used to analyze operation and maintenance of
power communication network equipment. Firstly, desired attribute statute is adopted to pretreatment
data, and to cluster the important attributes. Then, indicators and influencing factors of equipment
operation and maintenance are proposed, including maintenance efficiency, maintenance costs and
maintenance time distribution. The data mining method is proposed by combining the MapReduce
technology and the Apriori algorithm. Finally, fault data, equipment maintenance data and related
resources ledger data are analyzed by the proposed method. Analysis results can provide effective
support for operation and maintenance of network.
Key words:
Power telecommunication network, Equipment operation and maintenance, Big Data, Data mining
Author’s brief introduction and contact information
Zhan
SHI,
is
an
engineer
of
Power
Grid
Control Center of Guangdon
China South Power Grid, Guangzhou, China. His current research interests include Communication
and Information Systems. Email: [email protected].
XU Siya, is a Ph.D Student of State Key Laboratory of Networking and Switching Technology, Beijing
University of Posts and Telecommunications, China. Her current research interests include Smart Grid
and Network Management. Email: [email protected].
CICED2010 Session x
Page3/3
Paper No xxx