Download CCBD 2016 The 7th International Conference on Cloud Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
CCBD 2016
The 7th International Conference on Cloud
Computing and Big Data
Book of Program
November 16-18, 2016
Macau, China
Sponsored by
Contents
Conference Committee
1
Conference Schedule
6
Keynotes
7
Oral Presentation Sessions
10
Poster Session
14
Macau Big Data Public Forum
17
Abstract
18
Index
42
Keynote Speakers
42
Paper Session
42
Conference Committee
Honorary Chairs
Macau, China
Wei Zhao
University of Macau
China
Wen Gao
National Science Foundation of China
China
Hong Mei
Chinese Academy of Science
General Chairs
Macau, China
Lionel Ni
University of Macau
America
Geoffrey Charles Fox
Indiana University
Hong Kong, China
Benjamin W. Wah
The Chinese University of Hong Kong
Taiwan
Tei-Wei Kuo
National Taiwan University
Program Chairs
Macau, China
Yuan Yan Tang
University of Macau
Italy
Ernesto Damiani
University of Milan
America
Chengzhong Xu
Wayne State University
China
Chunming Hu
Beihang Unviersity
Local Organization Chairs
Macau, China
Chi Man Pun
University of Macau
Macau, China
Jian Tao Zhou
University of Macau
Long Chen
University of Macau
Yibo Zhang
University of Macau
Publicity Chairs
Macau, China
Publication Chairs
Macau, China
Finance and Registration Chairs
Macau, China
Leong Hou U
University of Macau
Steering Committee
1
Macau, China
C. L. Philip Chen
University of Macau
China
Runhua Lin
Chinese Institute of Electronics
China
Guangnan Ni
Academician, Chinese Academy of
Engineering
China
Rulin Liu
Chinese Institute of Electronics
China
Ke Liu
National Science Foundation of China
United Kingdom
Ameer Al-Nemrat
University of East London
Australia
Bahman Javadi
University of Western Sydney
China
Bing Tang
Hunan University of Science and Technology
China
Chao Yin
Jiujiang University
China
Chen Xu
East China Normal University
America
Dakai Zhu
University of Texas at San Antonio
India
Deo Prakash Vidyarthi
Jawaharlal Nehru University
Singapore
Feida Zhu
Singapore Management University
Cyprus
George Pallis
University of Cyprus
China
Hao Chen
Nankai University in China
China
Haofen Wang
East China University of Science and
Technology
Taiwan
Hung-Chang Hsiao
National Cheng Kung University
China
Jingwei Zhang
East China Normal University
America
Jinoh Kim
Texas A &M University-Commerce
Australia
John Yearwood
Deakin University
America
Jyh-Haw Yeh
Boise State University
China
Kai Chen
Shanghai Jiaotong University
Switzerland
Katarzyna Wac
University of Geneva
China
Kejun Dong
Chinese Academy of Sciences
China
Li Xu
Fujian Normal University
America
Prasad Kulkarni
University of Kansas
America
Seetharami Seelam
IBM Research
America
Serban Maerean
IBM System & Technology Group
Technical Committee
2
America
Tom Hacker
Purdue University
Brazil
Weigang Li
University of Brasilia
China
Xiaojun Hei
Huazhong University of Science and
Technology
China
Xijin Tang
CAS Academy of Mathematics & Systems
Science
Australia
Xinyi Huang
University of Wollongong
China
Yanming Shen
Dalian University of Technology
China
Yi Wang
Tsinghua University
China
Ying Yan
Microsoft Research Asia
America
Yuan Ding
Google
China
Yuwei Peng
Wuhan University
China
Zhipeng Gao
Beijing University of Posts and
Telecommunications
America
Abdelhalim Amer
Argonne National Laboratory
United Kingdom
Ahmad Afsahi
Queen's University
America
Gagan Agrawal
The Ohio State University
America
Haiying Shen
Clemson University
America
Krishna Kant
Temple University
America
Michela Taufer
University of Delaware
America
Rong Ge
Marquette University
America
Sangmin Seo
Argonne National Laboratory
Spain
Toni Cortes
Barcelona Supercomputing Center
America
Xin Yuan
Florida State University
China
Yunquan Zhang
Chinese Academy of Sciences
America
Zhiling Lan
Illinois Institute of Technology
Australia
Andrzej Goscinsk
Deakin University
Taiwan
Ching-Hsien Hsu
Chung Hua University
South Africa
Ekow Otoo
University of Witwatersrand
America
Farokh Bastani
UT Dallas
Malaysia
Hairulnizam Mahdin
UTHM
America
Jerry Gao
San Jose State University
3
China
Jianxin Li
Beihang University
America
Jicheng Fu
University of Central Oklahoma
Saudi Arabia
M. Shamim Hossain
King Saud University
United Kingdom
Omer Rana
Cardiff University
United Kingdom
Paul Townend
University of Leeds
Canada
Ruppa Thulasiram
University of Manitoba
Taiwan
San-Yih Hwang
National Sun Yat-sen University
China
Wei He
Shangdong University
America
Yinong Chen
Arizona State University
Australia
Yun Yang
Swinburne University of Technology
America
Zhonghang Xia
Western Kentucky University
Taiwan
Fu-Hau Hsu
National Central University
America
Geoffrey Charles Fox
Indiana University
United Kingdom
Shakeel Ahmad
De Montfort University
Italy
Danilo Ardagna
Politecnico di Milano
China
Xiaoying Bai
Tsinghua University
New Zealand
Jim Buchan
Auckland University of Technology
China
Jian Cao
Shanghai Jiaotong University
America
Keke Chen
Wright State University
China
Yixiang Chen
East China Normal University
China
Wanchun Dou
Nanjing University
Austria
Schahram Dustdar
Vienna University of Technology
China
Yanbo Han
North China University of Technology
China
Qing He
Chinese Academy of Science
China
Yuan He
Tsinghua University
Taiwan
Robert C.H. Hsu
Chung Hua University
America
Dijiang Huang
Arizona State University
China
Hai Jin
Huazhong University of Science and
Technology
China
Xiaoyuan Jing
Wuhan University
Germany
Jan Jurjens
Technical University of Dortmund
New Zealand
Dong Seong Kim
University of Canterbury
4
Germany
Luigi Lo Lacono
Cologne University of Applied Sciences
China
Jinhu Lu
Chinese Academy of Sciences
United Kingdom
Graham Morgan
University of Newcastle upon Tyne
Denmark
Neeli Prasad
Aalborg University
America
Omer F. Rana
Cardiff University
China
Chao Peng
East China Normal University
Germany
Michael Resch
University of Stuttgart
China
Qinbao Song
Xi’an Jiaotong University
America
C. Chiu Tan
Temple University
China
Weiqin Tong
Shanghai University
America
Wei-Tek Tsai
Arizona State University
America
Andy, Ju An Wang
Southern Polytechnic State University
China
Qing Wang
Chinese Academy of Sciences
America
Zhengping Wu
University of Bridgeport
America
Haiyong Xie
Yale University
China
Shengwu Xiong
Wuhan University of Technology
China
He Zhang
Nanjing University
China
Xiaolong Zhang
Wuhan University of Science and Technology
United Kingdom
Hong Zhu
Oxford Brookes University
Australia
Yang Xiang
Deakin University
China
Li Zhang
Tsinghua University
China
Ziran Zhao
Tsinghua University
China
Jianping Gu
Nuctech Company Limited
5
CCBD 2016, November 16-18, Macau, China
Schedule
16 Nov 2016
8:30
Opening
9:00
Keynote – 1
Carlo Ghezzi [1]
10:00
10:30
Presentation Session – 1
Knowledge Discovery &
Data Engineering in Cloud
Computing and Big Data –
Part I (6 talks in E4-G062)
Keynote – 3
Ziran Zhao [3]
Presentation Session - 6
Business Models and
Applications for Cloud
Computing and Big Data
(6 talks in E4-G062)
Old Macau (S8 – 1st Floor 1001)
Presentation Session – 2
Knowledge Discovery &
Data Engineering in Cloud
Computing and Big Data –
Part II (4 talks in E4-G062)
17:10
Keynote – 2
Xin Yao [2]
Presentation Session - 4
Software Engineering, Tools
& Services for Cloud
Computing and Big Data –
Part II (6 talks in E4-G062)
13:20
15:30
18 Nov 2016
Coffee break
12:10
15:00
17 Nov 2016
Presentation Session – 5
Architecture & Foundation of
Cloud Computing and Big
Data (5 talks in E4-G062)
Short break
Presentation Session - 7
Security, Privacy, Trust
& Quality of Cloud
Computing and Big Data
(5 talks in E4-G062)
Coffee break
Presentation Session – 3
Software Engineering, Tools
& Services for Cloud
Computing and Big Data –
Part I (5 talks in E4-G062)
Historic Centre of Macau
Moving to N1
Macau Big Data Public
Forum
(E4-G078)
17:30
Poster
(32 posters in N1)
19:00
Banquet in Fortune Inn (N1)
Cocktails at Tromba Rija*,
Macau Tower
End
Notes: Conference is in E4-G062; Length of presentation: 14 minutes, Q&A: 1 minute, Setup: 1 minute.
*After the cocktail there will be two shuttle buses: (1) Holiday Inn @ Cotai Central and (2) The
Postgraduate Guest House @ University of Macau
6
Keynotes
Keynote 1: Tolerating Uncertainty via Evolvable-by-Design Software
Room: E4-G062, 9:00 – 10:00, Wednesday, 16 Nov. 2016
Carlo Ghezzi, Politecnico di Milano
Abstract: Uncertainty is ubiquitous when software is designed. Requirements are often uncertain, and
volatile. Assumptions about the behavior of the environment in which the software will be embedded are
also often uncertain. The virtual platform on which the software will be operated may likewise be subject to
uncertain operating conditions. Design-time uncertainty is resolved during operation, and often the way it
is resolved changes over time. This leads to the need for software to evolve continuously, to keep
guaranteeing satisfaction of its quality goals. Evolution can partly be self-managed, by adding self-adaptive
capabilities to the software. This requires an upfront careful analysis to understand where the sources of
uncertainty, how they can be resolved during operation, and how they can be managed through dynamic
reconfigurations. Whenever self-adaptation cannot solve the problems, designers must be in the loop to
provide new solutions that can be dynamically incorporated in the running system. The talk provides a
holistic view of how to handle uncertainty, which is based on the notion of perpetual development and
adaptation. It shows that existing approaches to software development need to be rethought to respond to
these challenges. The traditional separation between development and operation (design time and run time)
blurs and even fades. The talk especially focuses on modeling and verification, which need to be rethought
in the light of perpetual development and evolution. It also focuses on achieving self-adaptation to support
continuous satisfaction of non-functional requirements --- such as reliability, performance, energy
consumption --- in the context of virtualized environments (cloud computing, service-oriented computing).
Biography: Carlo Ghezzi is an ACM Fellow (1999), an IEEE Fellow (2005), a member of the European
Academy of Sciences and of the Italian Academy of Sciences. He received the ACM SIGSOFT Outstanding
Research Award (2015) and the Distinguished Service Award (2006). He is the current President of
Informatics Europe. He is a regular member of the program committee of flagship conferences in the
software engineering field, such as the ICSE and ESEC/FSE, for which he also served as Program and
General Chair. He has been the Editor in Chief of the ACM Trans. on Software Engineering and
Methodology and an associate editor of and IEEE Trans. on Software Engineering. Currently he is an
Associate Editor of the Communications of the ACM and Science of Computer Programming. Ghezzi’s
research has been mostly focusing on different aspects of software engineering. He co-authored over 200
papers and 8 books. He coordinated several national and international research projects. He has been the
recipient of an ERC Advanced Grant.
7
Keynote 2: From Ensemble Learning to Learning in the Model Space
Room: E4-G062, 9:00 – 10:00, Thursday, 17 Nov. 2016
Xin Yao, Southern University of Science and Technology of China
Abstract: Ensemble learning has been shown to be very effective in solving many challenging regression
and classification problems. Multi-objective learning offers not only a novel method to construct and learn
ensembles automatically, but also better ways to balance accuracy and diversity in an ensemble. This talk
introduces the basic ideas behind multi-objective learning. It describes how ensembles can be used in
mining data streams from the point of view of online learning. In particular, the importance of diversity in
online learning is demonstrated. Finally, a novel approach to data stream mining is presented --- learning
in the model space, which can handle very challenging data streams. The effectiveness of such an
approach is illustrated by concrete examples in cognitive fault diagnosis.
Biography: Xin Yao is a Chair Professor of Computer Science at the Southern University of Science and
Technology in Shenzhen, China. He is an IEEE Fellow and the President (2014-15) of IEEE Computational
Intelligence Society (CIS). His work won the 2001 IEEE Donald G. Fink Prize Paper Award, 2010 IEEE
Transactions on Evolutionary Computation Outstanding Paper Award, 2010 BT Gordon Radley Award for
Best Author of Innovation (Finalist), 2011 and 2015 IEEE Transactions on Neural Networks Outstanding
Paper Awards, and many other best paper awards. He won the prestigious Royal Society Wolfson
Research Merit Award in 2012 and the IEEE CIS Evolutionary Computation Pioneer Award in 2013. He
was the Editor-in-Chief (2003-08) of IEEE Transactions on Evolutionary Computation and is an Associate
Editor or Editorial Member of more than ten other journals. His major research interests include evolutionary
computation, ensemble learning, and their applications, especially in software engineering.
8
Keynote 3: Human Millimeter-wave Holographic Imaging and Automatic Target Recognition
Room: E4-G062, 9:00 – 10:00, Friday, 18 Nov. 2016
Ziran Zhao, Duty Director of Institute for Security Detection Technology of Tsinghua University
Abstract: Millimeter-wave (MMW) holographic Imaging is one of the most effective methods for human
inspection because it can acquire three-dimensional images of human body via a single scan. Due to its
high penetration through fabrics and contrast in reflectivity, we can easily distinguish contrabands such as
guns and explosives on human body in MMW images. Besides, millimeter wave is non-ionizing radiation of
no potential health threat. Our imaging system utilizes a linear antenna array to improve scanning speed.
The image reconstruction is achieved via Fast Fourier Transform (FFT) and spatial spherical wave
expansion. Linear antenna array will results in artifacts in the reconstructed images, system errors and
background scattering could also have negative influences on MMW images. We propose a set of
calibration and denoising methods to eliminate these influences and obtain significant image quality in
experimental studies. Our experiments indicate that these methods could improve the image quality.
Automatic Target Recognition (ATR) based on millimeter-wave holographic Images is a key step to meet
the requirements of intelligent devices. Object detection methods for color images are not very efficient in
human body MMW images. Thus, we proposed a synthetic object detection method for MMW images on
the basis of machine learning. According to previous theories, both multi-layer model and sparse coding
could improve the accuracy of recognition. Thus, we select saliency, SIFT and HOG features to describe
MMW images and build a two-layer model to encode these features. The encoded features are fed to a
linear SVM for target/non-target classification. As the amount of training data contributes to the efficiency
of SVM classifier, we build a training set consists of over 30,000 human body MMW images which is
generated from the 3,154 original images via several image augmentation techniques. The experimental
results indicate that the total target detection rate in MMW images is improved from 70% to 85% by training
set augmentation, which demonstrate the efficiency of our method.
Biography: Dr. Zhao Ziran received his B.S. and Ph.D. from the Tsinghua University, in 1998 and 2004,
respectively. In 1994, he joined the Tsinghua University and became associate professor in 2008. He
received a joint appointment as executive deputy director of Institute for Security Detection Technology in
2012. Dr. Zhao’s research interest is generally in the area of imaging and detection technology. In particular,
he works to apply new image reconstruction algorithms to a wide range of applications including millimeterwave imaging, terahertz imaging, radiation imaging, cosmic-ray muon tomography. He has being devoted
in solving scientific and technical problems on security detection technology and providing hi-tech
equipment for anti-terrorism. He won the National Patent Golden award of 2009. And he is the main member
of Tsinghua university radiation imaging innovative research team, which won the National Science and
Technology Progress Award (Innovative Research Team) in 2013.
9
Oral Presentation Session
Each presentation has 16 minutes: 1 minute Setup + 14 minute Oral Presentation + 1 minute Q&A.
Presentation Session 1
Knowledge Discovery & Data Engineering in Cloud Computing and Big Data – Part I
10:30 – 12:10, Wednesday, 16 Nov. 2016
Room: E4-G062
Chair: Jingzhi Guo
Time
Ref.
Author
Title
10:30-10:46
[5]
Yi Tan
Multi-view Clustering via Co-regularized Nonnegative Matrix
Factorization with Correlation Constraint
10:46-11:02
[13]
Anyong Qin
Minimum Description Length Principle Based Atomic Norm
for Synthetic Low-rank Matrix Recovery
11:02-11:18
[16]
Huapeng Yu
Transfer Learning for Face Identification with Deep Face
Model
11:18-11:34
[22]
Luyan Xiao
When Taxi Meets Bus: Night Bus Stop Planning over Largescale Traffic Data
11:34-11:50
[23]
Li Zhang
Large-scale Classification of Cargo Images Using Ensemble
of Exemplar-SVMs
11:50-12:06
[24]
Manhua Jiang
Characterizing On-Bus WiFi Passenger Behaviors by
Approximate Search and Cluster Analysis
Presentation Session 2
Knowledge Discovery & Data Engineering in Cloud Computing and Big Data – Part II
13:20 – 15:00, Wednesday, 16 Nov. 2016
Room: E4-G062
Chair: Ryan U
Time
Ref.
Author
13:20-13:36
[28]
Chang Lu
13:36-13:52
[39]
Qingquan Lai
13:52-14:08
[43]
Yunpeng
Shen
14:08-14:24
[56]
Zhenyu Liao
Title
Data Mining Applied to Oil Well Using K-means and
DBSCAN
Using Weighted SVM for Identifying User from Gait with
Smart Phone
Learning the Distribution of Data for Embedding
Event Detection on Online Videos using Crowdsourced
Time-Sync Comment
10
Presentation Session 3
15:30 – 17:10, Wednesday, 16 Nov. 2016
Software Engineering, Tools & Services for Cloud Computing and Big Data – Part I
Room: E4-G062
Chair: Bob Zhang
Time
Ref.
Author
15:30-15:46
[11]
Zhigang Xu
15:46-16:02
[14]
Xing Liu
16:02-16:18
[22]
Xutian Zhuang
16:18-16:34
[36]
Xichun Yue
16:34-16:50
[41]
Chan-Fu Kuo
Title
An VM Scheduling Strategy Based on Hierarchy and Load
for OpenStack
Hitchhike: an I/O Scheduler Enabling Writeback for Small
Synchronous Writes
Queries over Large-scale Incremental Data of Hybrid
Granularities
An Optimized Approach to Protect Virtual Machine Image
Integrity in Cloud Computing
On Construction of an Energy Monitoring Service Using Big
Data Technology for Smart Campus
Presentation Session 4
10:30 – 12:10, Thursday, 17 Nov. 2016
Software Engineering, Tools & Services for Cloud Computing and Big Data – Part II
Room: E4-G062
Chair: Zhiguo Gong
Time
Ref.
Author
Title
10:30-10:46
[51]
Jou-Fan Chen
Financial Time-series Data Analysis using Deep
Convolutional Neural Networks
10:46-11:02
[53]
Bo Li
Performance Comparison and Analysis of Yarn's
Schedulers with Stress Cases
11:02-11:18
[59]
Shaohuai Shi
Benchmarking State-of-the-Art Deep Learning Software
Tools
11:18-11:34
[62]
Enqing Tang
Performance Comparison between Five NoSQL Databases
11:34-11:50
[69]
Manyi Cai
11:50-12:06
[48]
Yu-Fu Chen
A Protocol for Extending Analytics Capability of SQL
Database
Binary Classification and Data Analysis for Modeling
Calendar Anomalies in Financial Markets
11
Presentation Session 5
13:20 – 14:30, Thursday, 17 Nov. 2016
Architecture & Foundation of Cloud Computing and Big Data
Room: E4-G062
Chair: Jiantao Zhou
Time
Ref.
Author
Title
Low Complexity WSSOR-based Linear Precoding for
Massive MIMO Systems
13:20-13:36
[8]
Li Zhang
13:36-13:52
[25]
Ni Luo
13:52-14:08
[33]
Binyang Li
IMFSSC: An In-Memory Distributed File System Framework
for Super Computing
14:08-14:24
[55]
Tsz Fai Chow
Utilizing Real-Time Travel Information, Mobile Applications
and Wearable Devices for Smart Public Transportation
14:24:14:40
[50]
Chin Chou
Performance Modeling for Spark Using SVM
Affinity Propagation Clustering for Intelligent Portfolio
Diversification and Investment Risk Reduction
Presentation Session 6
10:30 – 12:10, Friday, 18 Nov. 2016
Business Models and Applications for Cloud Computing and Big Data
Room: E4-G062
Chair: Szu-Hao Huang
Time
Ref.
Author
10:30-10:46
[49]
Szu-Hao
Huang
10:46-11:02
[64]
Xiaoxue Hu
11:02-11:18
[66]
Mei-Chen Wu
11:18-11:34
[67]
Yu-Hsiang
Hsu
11:34-11:50
[70]
Simon Fong
11:50-12:06
[71]
Guoshuai
Zhao
Title
Decision Support System for Real-Time Trading based on
On-Line Learning and Parallel Computing Techniques
Efficient Power Allocation under Global Power Cap and
Application-Level Power Budget
Treand Behavior Research by Pattern Analysis in Financial
Big data - A Case Study of Taiwan Index Futures Market
Applying Market Profile Theory to Analyze Financial Big
Data and Discover Financial Market Trading Behavior - A
Case Study of Taiwan Futures Market
Competitive Intelligence Study on Macau Food and
Beverage Industry
Finding Optimal Meteorological Observation Locations by
Multi-Source Urban Big Data Analysis
12
Presentation Session 7
13:20 – 15:00, Friday, 18 Nov. 2016
Security, Privacy, Trust & Quality of Cloud Computing and Big Data
Room: E4-G062
Chair: Pattarasinee Bhattarakosol
Time
Ref.
Author
Title
Design and Implementation of A Role-Based Access
Control for Categorized Resource in Smart Community
Systems
13:20-13:36
[17]
Siping Shi
13:36-13:52
[34]
Weidian Zhan
A Secure and VM-Supervising VDI System Based on
OpenStack
13:52-14:08
[60]
Tipaporn
Juengchareonpoon
A Mobile Cloud System for Enhancing Multimedia File
Transfer with IP Protection
14:08-14:24
[61]
Wenhan Zhu
14:24-14:40
[68]
Lin Yang
Distinguish True or False 4K Resolution using
Frequency Domain Analysis and Free-Energy Modelling
Protecting Link Privacy for Large Correlated Social
Networks
13
Poster Session
Room: N1, 17:30 – 19:00, Wednesday, 16 Nov. 2016
Reference
Paper Title
Author
Affiliations
[4]
Energy Saving of Elevator Group Under uppeak Flow Based on Geese-PSO
Chunzhi Wang
Hubei University of
Technology
[6]
An ACO-based Link Load-Balancing
Algorithm in SDN
Chunzhi Wang
Hubei University of
Technology
[7]
A Cost-effective Approach of Building Multitenant Oriented Lightweight Virtual HPC
Cluster
Rongzhen Li
National University of
Defense Technology
[9]
Multi-view Latent Space Learning based on
Local Discriminant Embedding
Xinge You
[10]
Improving Government-Data Learning via
Distributed Clustering Analysis
[12]
Hadoop-MapReduce Job Scheduling
Algorithms Survey
[15]
A New Template Update Scheme for Visual
Tracking
[18]
A Robust Appearance Model for Object
Tracking
[19]
GA-Based Sweep Coverage Scheme in
WSN
[20]
A Short Text Similarity Algorithm for Finding
Similar Police 110 Incidents
[26]
[27]
[29]
[30]
[31]
Yurong Zhong
Ehab Mohamed
Xiaohuan Lu
Yi Li
Huazhong University of
Science and
Technology
Institute of Electronic
Science and
Technology
Beijing University of
Aeronautics and
Astronautics
Harbin Institute of
Technology Shenzhen
Graduate School
Harbin Institute of
Technology Shenzhen
Graduate School
Peng Huang
Sichuan Agricultural
University
Lei Duan
Beijing University of
Aeronautics and
Astronautics
An Improved K-means text clustering
algorithm By Optimizing initial cluster
centers
Caiquan Xiong
Hubei University of
Technology
The Implementation of Air Pollution
Monitoring Service Using Hybrid Database
Converter
Jia-Yow Weng
Tunghai University
Chang Liu
Chengdu University
Xinlu Zong
Hubei University of
Technology
Li Zheng
Nuctech Company
Limited
Super Resolution Reconstruction of Brain
MR Image based on Convolution Sparse
Network
Evacuation behaviors and link Selection
Strategy based on artificial fish swarm
algorithm
A Synthetic Targets Detection Method for
Human Millimeter-wave Holographic
Imaging System
14
[32]
An Efficient Distributed Clustering Protocol
Based on Game-Theory for Wireless
Sensor Networks
Xuegang Wu
Chongqing University
[35]
Performance Evaluation for Distributed Join
Based on MapReduce
Jingwei Zhang
Guilin University of
Electronic Technology
[37]
Breaking the Top-k Restriction of the kNN
Hidden Databases
Zhiguo Gong
University of Macau
[38]
Online Fake Drug Detection System in
Heterogeneous Platforms using Big Data
Analysis
Yubin Zhao
Shenzhen Institutes of
Advanced Technology,
Chinese Academy of
Sciences
[40]
Blood Pressure Monitoring on the Cloud
System in Elderly Community Centres: A
Data Capturing Platform for Application
Research in Public Health
Kelvin Tsoi
Chinese University of
Hong Kong
[42]
A Smart Cloud Robotic System based on
Cloud Computing Services
Lujia Wang
Shenzhen Institutes of
Advanced Technology,
Chinese Academy of
Sciences
[44]
On Blind Quality Assessment of JPEG
Images
Guangtao Zhai
Shanghai Jiao Tong
University
[45]
Research on The Application of Distributed
Self-adaptive Task Allocation Mechanism in
Distribution Automation System
Haitian Li
North China Electric
Power University
[46]
Noise-Robust SLIC Superpixel for Natural
Images
[47]
Big Data Analysis on Radiographic Image
Quality
Jianping Gu
Nuctech Company
Limited
[52]
A Practical Model for Analyzing Push-based
Virtual Machine Live Migration
Cho-Chin Lin
National Ilan University
[54]
Classification of Parkinson's disease and
Essential Tremor Based on Structural MRI
Li Zhang
Chengdu University
[57]
Synthetic Data Generator for Classification
Rules Learning
Runzong Liu
Chongqing University
[58]
A Flash Light System for Individuals with
Visual Impairment Based on TPVM
Wenbin Fang
Shanghai Jiao Tong
University
[63]
Collective Extraction for Opinion Targets
and Opinion Words from Online Reviews
Xiangxiang
Jiang
Guilin University of
Electronic Technology
[65]
An Adaptive Tone Mapping Algorithm
Based on Gaussian Filter
Chang Liu
Chongqing University
15
Jiantao Zhou
University of Macau
[72]
Research on Algorithm of PSO in Image
Segmentation of Cement-Based
16
Xiaojie Deng
Hubei University of
Technology
Macao Big Data Public Forum 2016
澳門大數據公開論壇 2016
Big Data is an emerging field where innovative technology offers alternatives to resolve the
inherent problems that appear when working with huge amounts of data, providing new ways to
reuse and extract value from information.
The 1st Macao Big Data Public Forum will also be held on November 18th, 2016 at the University of
Macau and is jointly held with CCBD2016. The forum will discuss the current situation and potential
development of Big Data in the future.
Date: November 18, 2016 16:00-18:00
Venue: E4-G078
Chair: Prof. Lionel Ni, Vice Rector of the University of Macau
Speakers:
Prof. Qiang Yang, Hong Kong University of Science and Technology
Dr. Yu Zheng, Microsoft Research
Prof. Lei Chen, Hong Kong University of Science and Technology
Time
Agenda
15:30-16:00
Guests Arrival & Registration
16:00-16:10
Welcome Speech by Prof. Lionel Ni
Keynote 1: Prof. Qiang Yang,
16:10-16:40
Hong Kong University of Science and Technology
Keynote 2: Dr. Yu Zheng,
16:40-17:10
Microsoft Research
Keynote 3: Prof. Lei Chen,
17:10-17:40
Hong Kong University of Science and Technology
17:40-17:55
Joint Q & A Session
17:55-18:00
Closing by Prof. Lionel Ni
Official Hotel:
Holiday Inn Macao Cotai Central
Organizers:
University of Macau
Macao Convention & Exhibition Association
17
Abstract
[1] Tolerating Uncertainty via Evolvable-by-Design Software
Carlo Ghezzi, Politecnico di Milano
Abstract: Uncertainty is ubiquitous when software is designed. Requirements are often uncertain, and
volatile. Assumptions about the behavior of the environment in which the software will be embedded are
also often uncertain. The virtual platform on which the software will be operated may likewise be subject to
uncertain operating conditions. Design-time uncertainty is resolved during operation, and often the way it
is resolved changes over time. This leads to the need for software to evolve continuously, to keep
guaranteeing satisfaction of its quality goals. Evolution can partly be self-managed, by adding self-adaptive
capabilities to the software. This requires an upfront careful analysis to understand where the sources of
uncertainty, how they can be resolved during operation, and how they can be managed through dynamic
reconfigurations. Whenever self-adaptation cannot solve the problems, designers must be in the loop to
provide new solutions that can be dynamically incorporated in the running system. The talk provides a
holistic view of how to handle uncertainty, which is based on the notion of perpetual development and
adaptation. It shows that existing approaches to software development need to be rethought to respond to
these challenges. The traditional separation between development and operation (design time and run time)
blurs and even fades. The talk especially focuses on modeling and verification, which need to be rethought
in the light of perpetual development and evolution. It also focuses on achieving self-adaptation to support
continuous satisfaction of non-functional requirements --- such as reliability, performance, energy
consumption --- in the context of virtualized environments (cloud computing, service-oriented computing).
[2] From Ensemble Learning to Learning in the Model Space
Xin Yao, Southern University of Science and Technology
Abstract: Ensemble learning has been shown to be very effective in solving many challenging regression
and classification problems. Multi-objective learning offers not only a novel method to construct and learn
ensembles automatically, but also better ways to balance accuracy and diversity in an ensemble. This talk
introduces the basic ideas behind multi-objective learning. It describes how ensembles can be used in
mining data streams from the point of view of online learning. In particular, the importance of diversity in
online learning is demonstrated. Finally, a novel approach to data stream mining is presented --- learning
in the model space, which can handle very challenging data streams. The effectiveness of such an
approach is illustrated by concrete examples in cognitive fault diagnosis.
[3] Human Millimeter-wave Holographic Imaging and Automatic Target Recognition
Ziran Zhao, Duty Director of Institute for Security Detection Technology of Tsinghua University
Abstract: Millimeter-wave (MMW) holographic Imaging is one of the most effective methods for human
inspection because it can acquire three-dimensional images of human body via a single scan. Due to its
high penetration through fabrics and contrast in reflectivity, we can easily distinguish contrabands such as
guns and explosives on human body in MMW images. Besides, millimeter wave is non-ionizing radiation of
no potential health threat. Our imaging system utilizes a linear antenna array to improve scanning speed.
The image reconstruction is achieved via Fast Fourier Transform (FFT) and spatial spherical wave
expansion. Linear antenna array will results in artifacts in the reconstructed images, system errors and
background scattering could also have negative influences on MMW images. We propose a set of
calibration and denoising methods to eliminate these influences and obtain significant image quality in
18
experimental studies. Our experiments indicate that these methods could improve the image quality.
Automatic Target Recognition (ATR) based on millimeter-wave holographic Images is a key step to meet
the requirements of intelligent devices. Object detection methods for color images are not very efficient in
human body MMW images. Thus, we proposed a synthetic object detection method for MMW images on
the basis of machine learning. According to previous theories, both multi-layer model and sparse coding
could improve the accuracy of recognition. Thus, we select saliency, SIFT and HOG features to describe
MMW images and build a two-layer model to encode these features. The encoded features are fed to a
linear SVM for target/non-target classification. As the amount of training data contributes to the efficiency
of SVM classifier, we build a training set consists of over 30,000 human body MMW images which is
generated from the 3,154 original images via several image augmentation techniques. The experimental
results indicate that the total target detection rate in MMW images is improved from 70% to 85% by training
set augmentation, which demonstrate the efficiency of our method.
[4] Energy Saving of Elevator Group Under up-peak Flow Based on Geese-PSO
Chunzhi Wang, Hubei University of Technology
Abstract: Vertical elevators are commonly used in high-rise buildings. Elevator Group Control System
(EGCS) is designed to dispatch elevator cars to meet the needs of customer’s call in different floor. The
optimization of EGCS is aiming at improving its transport capacity and service quality, which is a typical
combinational optimization. Particle Swarm Optimization (PSO) is good at solving combinational
optimization, but it is also easy to fall into local solutions. In this paper, according to the flight characteristics
of geese group, we propose an improved PSO algorithm named Geese-PSO. By using a novel coding
method, we can offer a natural way to meet our requirement. Finally in order to realize energy-saving of
elevator group optimization, we derive the energy-saving function and time cost function, build the elevator
group control model and give the optimization scheme. Simulation results demonstrate the effectiveness of
the approach.
[5] Multi-view Clustering via Co-regularized Nonnegative Matrix Factorization with Correlation
Constraint
Yi Tan, Guizhou Normal University
Abstract: With the increasing availability of multi-view nonnegative data in practical applications, multi-view
learning based on nonnegative matrix factorization (NMF) has attracted more and more attentions. However,
previous works are either difficult to generate meaningful clustering results in terms of views with
heterogeneous quality, or sensitive to noise. To address these problems, we propose a co-regularized
nonnegative matrix factorization method with correlation constraint (CO-NMFCC) for multi-view clustering,
which jointly exploits both consistent and complementary information across multiple views. Different from
previous works, we aim at integrating information from multiple views efficiently and making it more robust
to the presence of noisy views. More specifically, correlation constraint is imposed on the low-dimensional
space to learn a common representation shared by multiple views. Meanwhile, we exploit the
complementary information of multiple views through the co-regularization to accommodate the imbalance
of the quality of views. In addition, experiments on two real datasets demonstrate that CO-NMFCC is an
effective and promising algorithm for practical applications.
[6] An ACO-based Link Load-Balancing Algorithm in SDN
Chunzhi Wang, Hubei University of Technology
19
Abstract: Software Defined Networking is a novel network architecture, which separates data and control
plane by OpenFlow. The feature of centralized control can be achieved to acquisition and allocation of
global network resource. So, the link load-balancing of SDN is not as difficult as the traditional network.
This paper proposes a link load-balancing algorithm based on Ant Colony Optimization (LLBACO). The
algorithm uses the search rule of ACO and takes link load, delay and pack-loss as impact factors that ants
select next node. For maintaining link load-balancing and reducing end-to-end transmission delay, the
widest and shortest path in the all paths can be gained by ants. The results of simulations show that
LLBACO can balance the link load of network effectively, improve the Quality of Service (QoS) and
decrease network overhead, compared with existing algorithm.
[7] A Cost-effective Approach of Building Multi-tenant Oriented Lightweight Virtual HPC Cluster
Rongzhen Li, National University of Defense Technology
Abstract: HPC are considered as increasingly important but only a small set of large enterprises or
governments have the capability to use this high performance approach. In order to deliver HPC service
and solve software dependency problems which rigidly restrict the usage of HPC applications. Based on
Fat-Tree network topology and the virtual HPC cluster model, this paper provides a cloud of HPC delivery
model and solves the dependency of HPC application software stack without destroying the initial HPC
environments. Extensive experiments were conducted and the results validate the feasibility and the
efficiency of our approach.
[8] Low Complexity WSSOR-based Linear Precoding for Massive MIMO Systems
Li Zhang, Anhui University
Abstract: For massive MIMO system with hundreds of antennas at the base station and serve a lot of users,
regularized zero forcing (RZF) precoding can achieve the high performance, but suffer from high complexity
due to the required matrix inversion of large size. To solve this question, we propose a precoding based on
weighted symmetric successive over relaxation(WSSOR)method to approximate the matrix inversion.
The proposed method can reduce the computational complexity by about one order of magnitude and it
can approach the RZF precoding. We also propose a simple way to choose the optional relaxation
parameter in massive MIMO systems. And we choose weighting factor is only related to the system
configuration parameters. Simulation results prove that the proposed WSSOR-based precoding can
approach the near-optional performance of RZF precoding with small number of iterations.
[9] Multi-view Latent Space Learning based on Local Discriminant Embedding
Xinge You, Huazhong University of Science and Technology
Abstract: In many computer vision systems, one object can be described by different features or extracted
from different sources. These varying features or sources usually exhibit heterogeneous properties and can
be referred to as multi-view data of the object. Individual view usually contains the information of one
particular aspect and cannot describe the problem completely. But multi-view data can contain complete
and complementary information of the problem. It is therefore derive the need to combine the information
of multi-view data to better describe the problem as well as discover the connections and differences
between multiple views. Complementary principle and consensus principle are two important principles for
effective multi-view learning algorithms. When views capture information which is uniquely but not complete
enough to give a uniform learning performance, these views may degrade the learning performance and it
is therefore not an ideal solution to simply concatenate multiple views into single view. In this paper, we
20
proposed a multi-view latent space learning algorithm which assume that multi-view is extracted from the
same latent space via distinct transformation. Under this assumption, our algorithm can have a good
performance even though views are not complete and the space obtained can contain the valuable
information of each view as well as get the underlying connections between multi-view data. Due to the
local discriminant embedding of the input space, this multi-view latent space is more suitable for
classification or recognition problems. The proposed algorithm is evaluated on two tasks: indoor scene
classification and abnormal objects classification on MIT scene 67, Abnormal Objects database
respectively. Extensive experiments show that the algorithm we proposed achieves comparable
improvements when compared with many other outstanding methods.
[10] Improving Government-Data Learning via Distributed Clustering Analysis
Yurong Zhong, Institute of Electronic Science and Technology
Abstract: Clustering analysis is a study which is of great value, and the large-scale government-data
needed to be handled by cluster analysis is growing increasingly. Efficient analysis techniques of largescale data need to be adopted to handle the large-scale data. Traditional model of serial programming has
serious scalability shortage, which don’t satisfy the need of the large-scale government-data handling for
computing and storage resources. Distributed computing technology represented by the MapReduce has
good scalability, and can greatly improve the execution efficiency of data-intensive algorithm, and give play
to the computing power of compute cluster based on general hardware. Based on the background of "data
platform for public petition", it aims to study how to combine the cluster analysis technology with the current
massive government-data, extracting useful information from the mass characteristics hidden in the data
through the cluster analysis technology, which can provide comprehensive analyses for system managers
and decision makers. This paper focus on the study of combining basic distributed clustering algorithm and
TF-IDF algorithm, developing the cases feature analysis module based on distributed clustering algorithm.
Based on distributed clustering algorithm, according to the information of the cases, do clustering analysis
of cases according to its characteristics, and then get several hidden information through several decisional
result.
[11] An VM Scheduling Strategy Based on Hierarchy and Load for OpenStack
Zhigang Xu, Beijing University of Aeronautics and Astronautics
Abstract: In the cloud computing environment, one of the most important module is the Scheduler. As the
most popular open-source cloud platform, OpenStack provides us with a massive amount of scheduling
strategies. But there is no one considering of the hierarchies of the VMs and hosts. We will guarantee the
security of VM through these hierarchies. Although OpenStack is abundant in scheduling strategies, none
of them is based on the network load of the host. This paper proposes a scheduling strategy based on the
hierarchies and load. We will define the service levels and security levels for the VMs and hosts, then filter
out the hosts which do not have the corresponding levels with the VMs. Each of the remaining hosts will
get a weight value according to their overall load: such as CPU, memory, disk, network. In the end, the host
with the highest weight value will be selected to create a VM. We build a prototype system on OpenStack
to demonstrate our design and test the result of our solution. According to the experiments, VM can be
created on an appropriate host.
[12] Hadoop-MapReduce Job Scheduling Algorithms Survey
Ehab Mohamed, Beijing University of Aeronautics and Astronautics
21
Abstract: The big data computing era is coming to be a fact in all daily life. As data-intensive become a
reality in many of scientific branches, finding an efficient strategy for massive data computing systems has
become a multi-objective improvement. Processing these huge data on the distributed hardware clusters
as Clouds needs a powerful computation model like Hadoop-MapReduce. In this paper, we studied various
schedulers developed in Hadoop in Cloud Environments, features and issues. Most existing studies
considered the improvement in the performance from the single point of view (scheduling, locality of data,
the correctness of the data, etc. ) but very few literature involved multi-objectives improvements (quality
requirements, scheduling entities, and dynamic environment adaptation), especially in heterogeneous
parallel and distributed systems. Hadoop and MapReduce are two important aspects in big data for handling
structured and unstructured data. The Creation of an algorithm for node selection is essential to improve
and optimize the performance of the MapReduce. This paper introduces a survey of the previous work done
in the Hadoop-MapReduce scheduling and gives some suggestion for the improvement of it.
[13] Minimum Description Length Principle Based Atomic Norm for Synthetic Low-rank Matrix
Recovery
Anyong Qin, Chongqing University
Abstract: Recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers has
been attracting increasing interest. However, in many low-rank problems, neither the exact rank of
estimated matrix nor the particular locations as well as the values of outliers is known. The conventional
methods fail to separate the low-rank and sparse component, especially gross outliers. So we exploit the
advantage of minimum description length principle and atomic norm to overcome the above limitations. In
this paper, we first apply atomic norm to find all the candidate atoms of low-rank and sparse term
respectively, and then minimize the description length of model as well as residual, in order to select the
appropriate atoms of low-rank and the sparse matrix. The experimental results based on synthetic data
sets demonstrate the effectiveness and robustness of the proposed method.
[14] Hitchhike: an I/O Scheduler Enabling Writeback for Small Synchronous Writes
Xing Liu, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: Small and synchronous writes are pervasive in environments, which manifest in various levels of
software stack, ranging from device drivers to application software. Given block interface, these writes can
cause serious write amplifications, excess disk seeks or flash wear, and expensive flush operations, which,
together, fairly degrade the overall I/O performance. To address these issues, we present a novel block I/O
scheduler, named Hitchhike, in this paper, which is able to identify the small writes, and embed them into
other data blocks via some compression technologies. With Hitchhike, we can complete a small write and
another write in one atomic block operation, getting rid of the write amplification, and the overhead in excess
disk seeks. We implemented Hitchhike based on the CFQ and Deadline I/O schedulers in Linux 2.6.32, and
evaluated it by running Filebench benchmark. Our results show that compared to traditional approaches,
Hitchhike can significantly improve the performance of synchronous small writes.
[15] A New Template Update Scheme for Visual Tracking
Xiaohuan Lu, Harbin Institute of Technology Shenzhen Graduate School
Abstract: Single object tracking can be focused on two phases under the particle filter framework: one is
sparse representation, which can be regarded as a matching evaluation; the other is template update, which
can be regarded as the appearance changes of the target. Template update is the most direct and basic
22
phase to ensure a high quality tracking. However, most template update schemes cannot capture the latest
appearance of the target, thereby leading to a low quality tracking. In this paper, we propose a new template
update scheme, which can obtain the latest trends of the target. The experimental results on popular
benchmark video sequences show that the proposed template update scheme is feasible and effective.
[16] Transfer Learning for Face Identification with Deep Face Model
Huapeng Yu, Chengdu University
Abstract: Deep face model learned on big dataset surpasses human for face recognition task on difficult
unconstrained face dataset. But in practice, we are often lack of resources to learn such a complex model,
or we only have very limited training samples (sometimes only one for each class) for a specific face
recognition task. In this paper, we address these problems through transferring an already learned deep
face model to specific tasks on hand. We empirically transfer hierarchical representations of deep face
model as a source model and then learn higher layer representations on a specific small training set to
obtain a final task-specific target model. Experiments on face identification tasks with public small data set
and practical real faces verify the effectiveness and efficiency of our approaches for transfer learning. We
also empirically explore an important open problem -- attributes and transferability of different layer features
of deep model. We argue that lower layer features are both local and general, while higher layer ones are
both global and specific which embraces both intra-class invariance and inter-class discrimination. The
results of unsupervised feature visualization and supervised face identification strongly support our view.
[17] Design and Implementation of a Role-Based Access Control for Categorized Resource in Smart
Community Systems
Siping Shi, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: With the progressive development of smart communities, security of smart community systems
becomes an important issue. Role-Based Access control is a way to solve this problem. However, existing
implementations of role-based access control are not fine-grained, and it takes no account of category
information of the resources. As every resource in the model was authorized in the same way, such a model
cannot meet the security requirement of smart community systems. In this paper, we proposed an improved
role-based access control model for categorized resources, in combination with special requirements of
smart community systems. We designed a role-based access control model for categorized resources,
which integrates the community category information in the definition of roles so as to limit the number of
roles. The new model was fully implemented in a community management system with 14500 users from
14 communities. We compared our system to Spring Security, an existing security open source framework
and demonstrated the advantages of our access control model.
[18] A Robust Appearance Model for Object Tracking
Yi Li, Harbin Institute of Technology Shenzhen Graduate School
Abstract: Patch strategy is widely adopted in visual tracking to address partial occlusions. However, most
patch-based tracking methods either assume all patches sharing the same importance or exploit simple
prior for computing the importance of each patch, which may depress the tracking performance when the
target object is non-rigid or the background information is included in the initial bounding box. To this end,
an importance-aware appearance model with respect to the target patches and background patches is built,
which is able to adaptively evaluate the importance of each target/background patch by means of the local
self-similarity. In addition, we propose a novel bi-directional multi-voting scheme, which integrates a multi-
23
voting scheme and the two-side agreement scheme, to produce a reliable target-background confidence
map. Combining the importance-aware appearance model and the bi-directional multi-voting scheme, a
robust patch-based tracking method is proposed. Experimental results demonstrate that the proposed
tracking method outperforms other state-of-the-art methods on a set of challenging tracking tasks.
[19] GA-Based Sweep Coverage Scheme in WSN
Peng Huang, Sichuan Agricultural University
Abstract: The minimum number of required sensors problem in sweep coverage, one of the important
coverage problems in WSN, uses a small number of mobile sensor nodes to satisfy both POI (Point of
Interest) coverage and data delivery is a dynamic coverage problem. To find the minimum number of mobile
sensor nodes with a uniform speed to guarantee sweep coverage was proved to be a NP-hard problem. In
this paper, we investigate the minimum number of required sensors problem in sweep coverage to minimize
the number of mobile sensor nodes with limited data buffer size while satisfying dynamical POI coverage
and data delivery simultaneously. A GA-based Sweep Coverage scheme (GASC) to solve the problem is
proposed. In GASC, a random route generation is first introduced to create the initial routes for POIs, and
then Genetic Algorithm is employed to optimize these routes. Computational results show that the proposed
GASC approach outperforms all previously known and published methods.
[20] A Short Text Similarity Algorithm for Finding Similar Police 110 Incidents
Lei Duan, Beijing University of Aeronautics and Astronautics
Abstract: Finding similar police 110 incidents from incident dataset plays an important role to recognize
related cases from which the investigators could find more clues and make a better decision on police
deployment. We aim at finding 110 incidents with similar case features and semantic compared against a
given incident. A short text similarity algorithm called Police Incident Mover's Distance is presented. Our
algorithm is developed from a novel semantic similarity algorithm Word Mover's Distance (WMD). In order
to emphasize the significance of case features in incident text, the method introduces the traditional term
frequency-inverted document frequency (TF-IDF) as term weights to the WMD. Then the algorithm is
verified on the practical dataset of public security department to find similar incidents, and experiments
show that the algorithm is effective and can improve the accuracy in finding similar police incidents.
[21] When Taxi Meets Bus: Night Bus Stop Planning over Large-scale Traffic Data
Luyan Xiao, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: With more and more citizens traveling for life or work at night, there is a big gap between the
demands and supplies for public transportation service in China. In this paper, we address the problem of
night-bus stop planning by investigating the characteristics of taxi GPS trajectories and transactions, rather
than leveraging subjective and costly surveys about citizen mobility patterns. There are two stages in our
method. In the first stage, we extract the Pick-up and Drop-off Records (PDRs) from the taxi GPS
trajectories and transactions for capturing citizens travel patterns at night. In the second stage, we propose
DC-DBSCAN, an improved DBSCAN clustering algorithm by considering the Distance Constraint, to detect
hot locations as candidate night-bus stops from the PDRs dataset. We take the service range of a bus stop
into consideration, and optimize the candidates by considering the cost and convenience factors. Finally,
our experiments demonstrate that our method is valid and with better performance than that of K-means.
24
[22] Queries over Large-scale Incremental Data of Hybrid Granularities
Xutian Zhuang, South China Normal University
Abstract: The development of Internet and Web systems in recent years has made it difficult and
challenging to deal with large-scale data. What we usually need to process is incremental data, the scale
of which will enlarge with the changing of time. Nowadays, many general queries over data is traditionally
based on the full amount of raw data, which become a great challenge to performance when the data
increases substantially. As these data will not be updated after generated, this paper proposes a query
model, called hybrid-granularity model, that data and queries can be preprocessed to form intermediate
result sets of different granularities. With query transformation, submitted queries can take advantage of
intermediate preprocessing results, to obtain the required final results. In this paper, we also describe query
transformation and the methods to seek the solution of best performance with hybrid-granularity model for
the specific query. Finally, we analyze and verify the advantage on performance of the proposed model by
comparison with the original model in experiment. The proposed solution is used in some practical systems,
which shows that this solution can guarantee the correctness of query results while improving the
responsive efficiency of the query significantly.
[23] Large-scale Classification of Cargo Images Using Ensemble of Exemplar-SVMs
Li Zhang, Tsinghua University
Abstract: This paper develops a large-scale classification algorithm for cargo X-ray images using ensemble
of exemplar-SVMs. Large-scale or fine-grained classification is very helpful for customs to improve the
inspection efficiency and liberate their inspectors. However, big intra-class variation accompanied with
small inter-class variation of cargo images makes it almost impossible to classify them using traditional
class-SVM between classes. But those typical images with salient and representative features of some
classes can be easily distinguished from others. Inspired by the idea of the ensemble of exemplar-SVMs
for object detection, we develop a classification method using the ensemble of exemplar-SVMs of typical
image patches. Firstly typical image patches are defined and the method of extracting them is discussed.
Then for each of the typical image patches, a linear SVM is trained using itself as the positive sample and
all others as the negative samples. In the classification step, fast detection method based on WTA-hash is
used. Images are firstly classified to typical patches and then classified to the category of the corresponding
typical image patches. A semantic tree built according to the HS code is used to trade off specificity for
accuracy.
[24] Characterizing On-Bus WiFi Passenger Behaviors by Approximate Search and Cluster Analysis
Manhua Jiang, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: On-bus WiFi emerges as a promising market in recent years. It is an interesting problem to
investigate the characteristics of bus passengers by the spatial-temporal data collected by smart WiFi
devices. In this paper, we analyze passenger behavior logs, including WiFi connection, WiFi disconnection,
web authorization, data traffic, etc. We aggregate these passengers' activities into online events by finding
out the relations among these activities. We describe all the trips of bus passengers and cluster these trips
with all the passengers' Origin-Destination pairs (ODs) to find the distribution of passengers' interested
points. Our results show that there is only 8.33% of on-bus WiFi passengers to explore the web. The
average time used on on-bus WiFi only last 6 minutes in one connection, but total time which a passenger
spends on on-bus WiFi is about 25 minutes a day. On-bus WiFi passengers' average data traffic of the
network is periodical. It is growing slowly until Sunday while dropping down on Monday, which indicates
25
passengers prefer to use on-bus WiFi more frequently at weekends. 44.7% of the passengers are active in
only one place and 39% of the passengers are active in two places.
[25] Performance Modeling for Spark Using SVM
Ni Luo, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: At present, Spark is widely used in a number of enterprises. Although Spark is much faster than
Hadoop for some applications, its configuration parameters can have a great impact on its performance
due to the number and complexity of the parameters, and various characteristics of applications.
Unfortunately, there is not yet any research conducted to predict the performance of Spark based on its
configuration sets. In this paper, we employ a machine learning method- Support Vector Machine(SVM) to
build performance models for Spark. The input of configuration sets is collected by running Spark
application previously with randomly modified and combined property values. In this way, we also determine
the range of each property and gain a deeper understanding about how these properties work in Spark.
We also use Artificial Neural Network to model the performance of Spark and find that the error rate of ANN
is on average 1.98 times that of SVM for three workloads from HiBench.
[26] An Improved K-means text clustering algorithm By Optimizing initial cluster centers
Caiquan Xiong, Hubei University of Technology
Abstract: K-means clustering algorithm is an influential algorithm in data mining. The traditional K-means
algorithm has sensitivity to the initial cluster centers, leading to the result of clustering depends on the initial
centers excessively. In order to overcome this shortcoming, this paper proposes an improved K-means text
clustering algorithm by optimizing initial cluster centers. The algorithm first calculates the density of each
data object in the data set, and then judge which data object is an isolated point. After removing all of
isolated points, a set of data objects with high density is obtained. Afterwards, chooses k high density data
objects as the initial cluster centers, where the distance between the data objects is the largest. The
experimental results show that the improved K-means algorithm can improve the stability and accuracy of
text clustering.
[27] The Implementation of Air Pollution Monitoring Service Using Hybrid Database Converter
Jia-Yow Weng, Tunghai University
Abstract: As air pollution becomes more and more serious, pollution hurts human health, people start to
pay attention on real time value of air pollution factors monitoring and recording analysis. Because our
system need to get data from air pollution monitoring stations usually. Among of data is growing faster,
RDB (Relational database) is hard to process huge data. In order to maintain smooth monitoring, we must
remove the historical data is not consolidated. But when we analysis air pollution data, historical data is an
important target. So, how to dump data to NoSQL without change RDB system become an important things.
In order to achieve our goal, this paper proposed an air pollution monitoring system combines Hadoop
cluster to dump data from RDB to NoSQL and backup. This will not only reduce the loading of RDB and
also keep the service performance. Dump data to NoSQL need to processing without affecting the real time
monitoring on air pollution monitoring system. In this part we focus on without interruption web service.
Improve efficiency through optimization of dump method, and data backup let service quickly restart by
MapReduce and distributed databases when RDB impaired. And through three different types of
Conversion mode get the best data conversion to be our system. At last, air pollution monitoring system
26
provide message about air pollution factors variation, as an important basis of environment detection and
analysis, let people live in a more comfortable environment.
[28] Data Mining Applied to Oil Well Using K-means and DBSCAN
Chang Lu, Beijing Institute of Technology
Abstract: Oil is essential to our life mainly in transportation, and thus the productivity of oil well is very
important. Classification of oil wells can make it easier to manage wells to ensure good oil productivity.
Machine learning is an emerging technology of analyzing data in which cluster is a good way to do
classification. The paper will apply two kinds of cluster method to the data from Dagang oil well and then
do analysis on not only the classification results but also the choice of method for future analysis.
[29] Super Resolution Reconstruction of Brain MR Image based on Convolution Sparse Network
Chang Liu, Chengdu University
Abstract: In order to recover high resolution image from their corresponding low-resolution counterparts
for MR Image, this paper has proposed a super resolution reconstruction method to recover the lowresolution MR images based on convolution neural network. Based on the proposed network, the
convolution operation and non-linear mapping are employed to adapt MR images naturally and leaning the
end-to-end mapping from low/high-resolution images. On one hand, convolution operation is natural for
image processing; on the other hand, non-linear mapping is helpful to explore the non-linear mapping
relationship between low resolution and high resolution images and enhance the sparsity of feature
representation. The experiments have demonstrated that the proposed convolution sparse network has the
ability to restore the detail information from low resolution MR images and achieve better performance for
super resolution reconstruction.
[30] Evacuation Behaviors and Link Selection Strategy Based On Artificial Fish Swarm Algorithm
Xinlu Zong, Hubei University of Technology
Abstract: Frequent accidents are taken in public places in recent years, causing heavy casualties and
economic losses. It is necessary for us to research effective evacuation. Provide evacuation route guidance
strategy for the evacuated individuals is the key for emergency evacuation. This paper proposes a
evacuation model based on Artificial Fish Awarm Algorithm ( AFSA ). Define the evacuee as an intelligent
artificial fish and utilize the preying behavior, swarming behavior and following behavior of artificial fish
swarm to simulate the mental activity, path selection, behavior preference of individuals. The model embody
the characteristics of evacuation rules and uncertainties in the process and we can get the optimum path
planning. In this paper, we take Zhuankou Stadium in as the experiment environment, to conduct a research
of the simulation of evacuation process under emergency. The results of simulations show that this method
can and balance the congestion and improve the efficiency and fidelity of evacuation, compared with
existing algorithm.
[31] A Synthetic Targets Detection Method for Human Millimeter-wave Holographic Imaging System
Li Zheng, Nuctech Company Limited.
27
Abstract: Automatic Target Recognition (ATR) technology is of great significance in security inspection,
while traditional object detection methods are proved not efficient in human body millimeter-wave images.
In this paper, we propose a synthetic objection detection method for millimeter-wave images. We choose
saliency, SIFT and HOG features to form image descriptors. According to sparse representation, the
features are encoded again and fed to a linear SVM for target/non-target classification. Previous works
proved that the amount of training samples would influence the efficiency of SVM classifiers. Thus, we
utilize several simulating methods for data augmentation, aiming to increase the number of training samples
before training linear SVM classifiers. The experimental results show that our approach is efficient in target
detection of human body millimeter-wave images. Moreover, classifiers trained on larger sets with simulated
samples have better performance in classification on our testing dataset.
[32] An Efficient Distributed Clustering Protocol Based on Game-Theory for Wireless Sensor
Networks
Xuegang Wu, Chongqing University
Abstract: Clustering has been known as an effective way to reduce energy dissipation and prolong network
lifetime in wireless sensor networks (WSNs). Game theory (GT) has been used to find the optimal solutions
to clustering problems. However, the residual energy of nodes is not considered when calculating the
equilibrium probability in the earlier studies. Besides, under the perspective of energy consumption, the
definitions of payoffs in local clustering games are required when calculating the equilibrium probability.
Based on the considerations, a hybrid of the equilibrium probability attained by playing local clustering
games and a cost-dependent exponential function is proposed to get the probability to be CH(Cluster Head),
in which new definitions of payoffs under the perspective of energy consumption are used. In the paper, we
proposed an efficient distributed, game-theory based clustering protocol (EDGC) for Wireless Sensor
Networks.
[33] IMFSSC: An In-Memory Distributed File System Framework for Super Computing
Binyang Li, Beijing University of Aeronautics and Astronautics
Abstract: Supercomputing has been widely implemented in theoretical physics, theoretical chemistry,
climate modeling, biology simulation and medicine research for high-performance and energy-efficient
computing. Many of scientific applications are I/O sensitive and users have to tolerate high latency when
supercomputing center storage processes thousands of I/O requests. In this paper, IMFSSC, an in-memory
distributed file system framework for super computing is proposed, which is supposed to improve latency
performance for I/O sensitive applications and relieves the congestion when I/O requests burst. IMFSSC
consists of modules for multiple master-slave supports, load balance among huge amount computing nodes,
and uses memory space to store data to minimize the latency. Additional features will be added in the near
future and it is supposed to provide better support for large scaled computer systems. Finally, the
performance is tested under the framework for evaluation, which shows high scalability and good I/O
performance.
[34] A Secure and VM-Supervising VDI System Based on OpenStack
Weidian Zhan, Beijing University of Aeronautics and Astronautics
Abstract: Against the background of data explosion and cloud computing, this paper investigates a branch
of the cloud computing technology which is known as VDI (virtual desktop infrastructure). Users can access
data and information via cloud desktops with the endpoint devices. The paper studies OpenStack - a
28
famous open-source cloud platform which has been widely used, and introduces a secure, optimized and
high-available VDI system based on it. The system provides responsive and high-available desktop
connections by multi-thread VM operating, enhances the security of the user-login process by security
labels and device authentication.
[35] Performance Evaluation for Distributed Join Based on MapReduce
Jingwei Zhang, Guilin University of Electronic Technology
Abstract: Inner-Join is a fundamental and frequent operation in large-scale data analysis. MapReduce is
the most widely available framework in large-scale data analysis. A variety of inner-join algorithms are put
forward to run on the MapReduce environment. Usually, those algorithms are designed for specific
scenarios, but inner-join could present very different performance when data volume, reference ratio, data
skew rate, and running environments et al are varied. This paper summarized and implemented those wellknown join algorithms in a uniform MapReduce environment. Considering the number of tables, broadcast
cost, data skew, join rate and related factors, we designed and conducted a large number of experiments
to compare the time performance of those join algorithms. According to the experimental results, we
analyzed and summarized the performance and applicability of those algorithms in different scenarios,
which would be a reference of performance improvement for large-scale data analysis under different
circumstances.
[36] An Optimized Approach to Protect Virtual Machine Image Integrity in Cloud Computing
Xichun Yue, Beijing University of Aeronautics and Astronautics
Abstract: The development of cloud computing is surely unprecedented in IT industry with many
companies adapting to this new technology. The related companies undoubtedly benefit a lot from cloud
computing. Meanwhile, the security of cloud platforms becomes one of the concerns for companies. As an
important underlying component, the virtual machine image is also in need of especial protection. In this
paper, we propose an optimized approach to protect the virtual machine image integrity. In the approach,
we propose an architecture of integrity protection, optimize a hardware environment as the fundamental
deployment environment, design a measurement module to measure and verify images, and design a
strategy module to handle the results. Finally, we integrate it with OpenStack and evaluate its security and
performance. The experiments demonstrate that our approach can protect the image integrity well and the
measurement speed is increased three times faster than the ordinary approach with a little more resource
consumption.
[37] Breaking the Top-k Restriction of the kNN Hidden Databases
Zhiguo Gong, University of Macau
Abstract: With the increasing development of Location-based services (LBS), the spatial data become
accessible on the web. Often, such services provide a public interface which allows users to find k nearest
points to an arbitrary query point. These services may be abstractly modeled as a hidden database behind
a kNN query interface, we refer it as a kNN hidden database. The kNN interface is the only way we can
access such hidden databases and can be quite restrictive. A key restriction enforced by such a kNN
interface is the top-k output constraint - i.e., given an arbitrary query, the system only returns the k nearest
points to the query point (where k is typically a small number such as 10 or 50), hence, such restriction
prevents many third-party services from being developed over the hidden databases. In this paper, we
investigate an interesting problem of ”breaking” the kNN restriction of such web databases to find more
29
than k nearest point. To our best knowledge, this is the first work to study the problem over the kNN hidden
database. We investigate and design a set of algorithms which can efficiently address this problem. Beyond
that, we also perform a set of experiments over synthetic datasets and real-world datasets which illustrate
the effectiveness of our algorithms.
[38] Online Fake Drug Detection System in Heterogeneous Platforms using Big Data Analysis
Yubin Zhao, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: The widespread use of internet provides extensive heterogeneous platforms for drug sales. The
internet has greatly facilitated the development of merchandise sales, meanwhile, many fake drug sellers
that has been strongly restricted in the market by law enforcement agencies build their own platforms for
sales via the internet. In order to against the fake drug website, reduce time and human resource
consumption, it is necessary to screen and identify the drug information on the internet automatically. In
this paper, we develop an automatic drug information screening and content analytical system which online
distract the information and mine the hidden relationship and find the source of the sellers. Our major
contributions lie in those aspects as follows: (1) We apply focused crawler technique to transform the
unstructured data on the drug website into structured data, and stored the data in the local database. (2)
An integrated fake drug identification method is proposed which consists of an image recognition module
and information retrieval module. Based on this method, the fake drug website is not only identified one by
one, but we also extract the hidden connections of multiple platforms. Experimental results demonstrate
that our system can successfully identify a large number of fake drug website.
[39] Using Weighted SVM for Identifying User from Gait with Smart Phone
Qingquan Lai, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: With the development of authentication technology, fingerprint and speech authentication have
applied to most smart devices, which means we are stepping into the era of biometric-based authentication.
As the stable biological feature, the gait is used to establish the authentication model in many researches.
Most of these researches are based on extracting cycles or statistics from the gait data which used as
features in the authentication process with simple machine learning algorithm. The approach presented in
the paper that extracts frequency-series features from gait data from acceleration sensor and uses
Weighted Support Vector Machine to recognize users. Further, this paper uses the same methodology to
perform the experiment, which shows improved performance of 3.57% EER.
[40] Blood Pressure Monitoring on the Cloud System in Elderly Community Centres: A Data
Capturing Platform for Application Research in Public Health
Kelvin Tsoi, Chinese University of Hong Kong
Abstract: Technology on the cloud frameworks in healthcare data management and analytics has opened
new horizons for public health research. Hypertension is a significant modifiable risk factor for
cardiovascular diseases. Nowadays, telemonitoring blood pressure (BP) has been suggested as an
effective tool for BP control. However, elderly people always have difficulties when using electronic health
monitoring devices at home. BP data capturing with cloud technology in elderly community centres under
guidance and with healthcare provider alert function is a pioneer. In this study, the infrastructure of data
collection is constructed on the cloud to capture behavioral data on BP meter use and BP readings. BP
data will be generated by the daily use of BP measurement and uploaded to the cloud. All personal
characteristics, electronic health records, BP data and call log with nurse can be encrypted and store on
30
the cloud. The remote platform on the cloud can provide efficient analytic performance on huge volume of
data with high velocity of data creation in a population-based study. Data mining on the BP measurement
will help to better understand the ways to control hypertension. This platform will also be potentially used
in other epidemiological studies in public health.
[41] On Construction of an Energy Monitoring Service Using Big Data Technology for Smart
Campus
Chan-Fu Kuo, Tunghai University
Abstract: The prosperity of modern human civilization is attributed to the huge amount of resources and
energy. With the increasing population and technological advancements, the demand for energy will
definitely continue to increase. How to save energy has become an important issue. How to reduce
expenses by reducing electricity consumption and unnecessary energy consumption are very important for
institutions as well as universities. In this work, we proposed a system to collect the electricity usage data
in campus buildings through smart meters and environmental sensors, and process the huge amount of
data by big data processing techniques. Therefore, in this thesis we introduced cloud computing and big
data processing architecture as solutions to build a real-time energy monitoring system for smart campus.
In this work, we used Hadoop ecosystem which is built on big data processing architecture to improve the
capacity of big data storage and processing for our system. We compared the performance of Hive and
HBase for searching energy data, and the performance of relational database and big data distributed
database for data search. We also presented a method to identify abnormal electrical condition through
MapReduce framework, and compared the difference of performances between Spark and Hadoop in realtime processing. The proposed system has been implemented in Tunghai University. Finally, the system
interface vividly displays the electricity usage states in campus buildings; thus, users can monitor the
electricity usage in the campus and historical data at any time and any place.
[42] A Smart Cloud Robotic System based on Cloud Computing Services
Lujia Wang, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: In this paper, we present a smart service robotic system based on cloud computing services.
The design and implementation of infrastructure, computation components and communication
components are introduced. The proposed system can alleviate the complex computation and storage load
of robots to the cloud and provide various services to the robots. The computation components can
dynamically allocate resources to the robots. The communication components allow easy access to the
cloud and provide flexible resource management. Furthermore, we modeled the task-scheduling problem
and proposed a max-heaps algorithm. The simulation results demonstrate that the proposed algorithm
minimized the overall task costs.
[43] Learning the Distribution of Data for Embedding
Yunpeng Shen, Chongqing University
Abstract: One of the central problems in machine learning and pattern recognition is how to deal with highdimensional data either for visualization or for classification and clustering. Most of dimensionality reduction
technologies, designed to cope with the curse of dimensionality, are based on Euclidean distance metric.
In this work, we propose an unsupervised nonlinear dimensionality reduction method which attempt to
preserve the distribution of input data, called distribution preserving embedding (DPE). It is done by
minimizing the dissimilarity between the densities estimated in the original and embedded spaces. In theory,
31
patterns in data can effectively be described by the distribution of the data. Therefore, DPE is able to
discover the intrinsic pattern (structure) of data, including the global structures and the local structures.
Additionally, DPE can be extended to cope with out-of-sample problem naturally. Extensive experiments
on different data sets compared with other competing methods are reported to demonstrate the
effectiveness of the proposed approach.
[44] On Blind Quality Assessment of JPEG Images
Guangtao Zhai, Shanghai Jiao Tong University
Abstract: JPEG is still the most widely used image compression format. Perceptual quality assessment for
JPEG images has been extensively studied for the last two centuries. While a large number of no-reference
perceptual quality metrics have been proposed along the years, it is shown in this paper that on existing
image quality databases, statistically, performance of many those metrics is not better than the quality factor
(Q) for JPEG images, as used in the popular implementation by the IJG (Independent JPEG Group). It
should be noted that Q or the quantization table computed from Q is almost always available at the decoder
end so we focus our analysis on no-reference or blind quality assessment metrics. This research highlights
the fact that despite of many progress achieved in that area, JPEG quality assessment is still a topic worth
revisiting and further investigation.
[45] Research on the Application of Distributed Self-adaptive Task Allocation Mechanism in
Distribution Automation System
Haitian Li, North China Electric Power University
Abstract: With the development of distribution network, and the intelligentization of terminal devices in
distribution automation system, it becomes key point that research the task allocation method and selfadaptive task allocation model from perspective of distributed system to improve the utilization of the
devices in whole distribution network and optimize the system performance on intelligent terminal devices
and master station. This paper proposes a mathematical model of the system performance index, and puts
forward four kinds of typical distributed self-adaptive task allocation model in distribution automation system,
OQOD model, OQND model, NQND model and NQOD model, according to the impact factors in the
mathematical model and the existing distributed self-adaptive algorithm, such as self-adaptive Min-Min
algorithm. This paper also analyzes the structural characteristics and application environment of three kinds
of distributed self-adaptive task allocation model, and further puts forward the feasible suggestion about
the future research on each model.
[46] Noise-Robust SLIC Superpixel for Natural Images
Jiantao Zhou, University of Macau
Abstract: Superpixel algorithm aims to semantically group neighboring pixels into a coherent region. It
could significantly boost the performance of the subsequent vision processing task such as image
segmentation. Recently, the work simple linear iterative clustering (SLIC) [1] has drawn huge attention for
its state-of-the-art segmentation performance and high computational efficiency. However, the performance
of SLIC is dramatically degraded for noisy images. In this work, we propose three measures to improve the
robustness of SLIC against noise: 1) a new pixel intensity distance measurement is designed by explicitly
considering the within-cluster noise variance; 2) the spatial distance measurement is refined by exploiting
the variation of pixel locations in a cluster; and 3) a noise-robust estimator is proposed to update the cluster
centers by excluding the possible outliers caused by noise. Extensive experimental results on synthetic
32
noisy images validate the effectiveness of those improvements. In addition, we apply the proposed noiserobust SLIC to superpixel-based noise level estimation task to demonstrate its practical usage.
[47] Big Data Analysis on Radiographic Image Quality
Jianping Gu, Nuctech Company Limited.
Abstract: Mass data generated from in-service radiographic product in routine work contains information
on image quality. Analyzing it might supplement the time-consuming radiographic Quality Assurance Test
Procedure to evaluate image quality, to know product performance in sites, to locate risks, and to give
directions to manufacturers for the following actions. This article illustrates methodologies of extracting
information from mass data and of applying Big Data in visual quality tracking, analysis, control, and risk
mitigation.
[48] Binary Classification and Data Analysis for Modeling Calendar Anamolies in Financial Markets
Yu-Fu Chen, National Chiao Tung University
Abstract: This paper studies on the Day-of-the-week effect by means of several binary classification
algorithms in order to achieve the most effective and efficient decision trading support system. This
approach utilizes the intelligent data-driven model to predict the influence of calendar anomalies and
develop profitable investment strategy. Advanced technology, such as time-series feature extraction,
machine learning, and binary classification, are used to improve the system performance and make the
evaluation of trading simulation trustworthy. Through experimenting on the component stocks of S&P 500,
the results show that the accuracy can achieve 70% when adopting two discriminant feature representation
methods, including “multi-day technical indicators” and “intra-day trading profile.” The binary classification
method based on LDA-Linear Prior kernel outperforms than other learning techniques and provides the
investor a stable and profitable portfolios with low risk. In addition, we believe this paper is a FinTech
example which combines advanced interdisciplinary researches, including financial anomalies and big data
analysis technology.
[49] Decision Support System for Real-Time Trading based on On-Line Learning and Parallel
Computing Techniques
Szu-Hao Huang, National Chiao Tung University
Abstract: A novel intraday algorithmic trading strategy is developed based on various machine learning
techniques and paralleled computing architectures in this paper. The proposed binary classification
framework can predict the price trends of Taiwan stock index futures after thirty minutes. Traditional
learning-based approaches collect all samples during the training period as the learning materials. The
major contribution of this paper is to collect a subset of similar historical financial data to train the real-time
trading model. This goal can be achieved by an on-line learning technique which is required to calculate an
accurate model with training time limitation. In addition, the proposed joint-AdaBoost algorithm is to improve
the system performance based on the concept of paired feature learning and planar weak classifier design.
The core execution components in this algorithm can be further accelerated with the aid of Open Computing
Language (OpenCL) parallel computing platform. The experimental results show that the proposed learning
algorithm can improve the prediction accuracy of final classifier from 53.8% to 61.68%. Compared to the
pure CPU implementation, the OpenCL version which uses CPU and GPGPU simultaneously can reduce
the calculation time around 83.02 times. The efficiency improvement can decrease the delay of investment
opportunity which is a critical issue in real-time financial decision support system application. To sum up,
33
this paper proposed a novel learning framework based on joint-AdaBoost algorithm with similar learning
samples and OpenCL parallel computation. The extended financial decision support system is also proven
to work effectively and efficiently in our simulation experiments to trade the Taiwan stock index futures.
[50] Affinity Propagation Clustering for Intelligent Portfolio Diversification and Investment Risk
Reduction
Chin Chou, National Chiao Tung University
Abstract: In this paper, an intelligent portfolio selection method based on Affinity Propagation clustering
algorithm is proposed to solve the stable investment problem. The goal of this work is to minimize the
volatility of the selected portfolio from the component stocks of S&P 500 index. Each independent stock
can be viewed as a node in graph, and the similarity measurements of stock price variations between
companies are calculated as the edge weights. Affinity Propagation clustering algorithm solve the graph
theory problem by repeatedly update responsibility and availability message passing matrices. This
research tried to find most representative and discriminant features to model the stock similarity. The testing
features are divided into two major categories, including time-series covariance, and technical indicators.
The historical price and trading volume data is used to simulate the portfolio selection and volatility
measurement. After grouping these investment targets into a small set of clusters, the selection process
will choose fixed number of stocks from different clusters to form the portfolio. The experimental results
show that the proposed system can effectively generate more stable portfolio by Affinity Propagation
clustering algorithm with proper similarity features than average cases with similar settings.
[51] Financial Time-series Data Analysis using Deep Convolutional Neural Networks
Jou-Fan Chen, National Chiao Tung University
Abstract: A novel financial time-series analysis method based on deep learning technique is proposed in
this paper. In recent years, the explosive growth of deep learning researches have led to several successful
applications in various artificial intelligence and multimedia fields, such as visual recognition, robot vision,
and natural language processing. In this paper, we focus on the time-series data processing and prediction
in financial markets. Traditional feature extraction approaches in intelligent trading decision support system
are used to applying several technical indicators and expert rules to extract numerical features. The major
contribution of this paper is to improve the algorithmic trading framework with the proposed planar feature
representation methods and deep convolutional neural networks (CNN). The proposed system are
implemented and benchmarked in the historical datasets of Taiwan Stock Index Futures. The experimental
results show that the deep learning technique is effective in our trading simulation application, and may
have greater potentialities to model the noisy financial data and complex social science problems. In the
future, we expected that the proposed methods and deep learning framework can be applied to more
innovative applications in the next financial technology (FinTech) generation.
[52] A Practical Model for Analyzing Push-based Virtual Machine Live Migration
Cho-Chin Lin, National Ilan University
Abstract: Cloud Computing has employed virtual technology to satisfy the service requests from the
customers. Virtual machine live migration can provide non-stop services while unexpected events occur.
The cost of enforcing live migration is measured by the total number of duplicated pages and the impact
caused by downtime. In this paper, a practical model for analyzing push-based virtual machine live
migration is proposed. Based on the model, the patterns on the numbers of duplicated memory frames in
34
the iterations have been analyzed for various dirty frequencies. In addition, our model which abstracts live
migration strategy into policy function is useful for developing formal method to conduct complex analysis.
[53] Performance Comparison and Analysis of Yarn's Schedulers with Stress Cases
Bo Li, Beijing University of Posts and Telecommunications
Abstract: Hadoop, as a popular distributed storage and computing platform, has been widely used in many
companies. Yarn is the resource management platform in Hadoop and plays an important role in the
resource managing, because it can affect the cluster’s energy efficiency and the usability for applications.
The schedulers are the brain of Yarn, which manage and schedule resources from cluster to applications.
In this paper, we conduct experiments to compare and analyze the performance of Yarn’s schedulers. We
use various scenarios to demonstrate the strengths and weaknesses of each scheduler from the
perspective of response speed, cluster’s efficiency, scheduler’s speciality etc. Experimental results
demonstrate that the FIFO Scheduler has a better performance and data locality sense for batch jobs
processing than the other schedulers, but the Capacity Scheduler and the FIFO Scheduler have better
response speed and cluster’s usability than the FIFO Scheduler which has a hunger problem in mixed
scenario.
[54] Classification of Parkinson's disease and Essential Tremor Based on Structural MRI
Li Zhang, Chengdu University
Abstract: Parkinson’s disease (PD) and essential tremor (ET) are two kinds of tremor disorders which
always confusing doctors in clinical diagnosis. Early experiments on structural MRI have already shown
that Parkinson’s disease can cause pathological changes in the brain region named Caudate_R (a part of
Basal ganglia) while essential tremor cannot. Although there are many research work on the classification
of PD and ET, they didn’t achieve the automatic classification of the two diseases. But big data brings new
opportunities to the classification of PD and ET. In order to achieve this, we proposed a machine learning
framework based on principal components analysis (PCA) and Support Vector Machine (SVM) to the
classification of Parkinson’s disease and Essential Tremor. This machine learning framework has two-stage
method. At first, we used principal component analysis (PCA) to extract discriminative features from
structural MRI data. Then SVM classifier is employed to classify PD and ET. We used statistical analysis
and machine learning method to test the differences between PD and ET in specific brain regions. As a
result, the machine learning method has a better performance in extracting the differential brain regions.
The highest classification accuracy is up to 93.75% in the differential brain regions.
[55] Utilizing Real-Time Travel Information, Mobile Applications and Wearable Devices for Smart
Public Transportation
Tsz Fai Chow, Chinese University of Hong Kong
Abstract: We propose a cloud platform that utilizes real-time travel information, a mobile application and
wearable devices for smart public transportation. This platform is capable of retrieving the required data
automatically, reporting real-time public transportation information and providing users with personalized
recommendations for using public transits. Novel features of this platform include the measure of the current
walking speed of the user and the use of real-time estimated arrival times of public transits at different
locations for travel recommendations. We also present our on-going work of developing the proposed
platform for the public transportation system in Hong Kong. We aim to develop this platform for passengers
35
for aiding their decisions and reducing their journey times, thereby improving their commuting experience
and encouraging the use of public transportation.
[56] Event Detection on Online Videos using Crowdsourced Time-Sync Comment
Zhenyu Liao, Tongji University
Abstract: In recent years, more and more people are like to watch videos online because of its convenience
and social features. Due to the limit of entertainment time, there is a new requirement that people prefer to
watch some hot video segments rather than an entire video. However, it is a quite time-consuming work to
extract the highlight segments in videos manually because the number of videos uploaded to the internet
is huge. In this paper, we propose a model of event detection on videos using Time-Sync comments
provided by online users. In the model, three features of Time-Sync comments are extracted firstly. Then,
user behavior relevance in time series are analyzed to find the video shots that people are interested in
most. Metric and its optimization to score video shots for event detection are introduced lastly. Experiments
on several movies shows that the events detected by our method coincide with the highlights in the movies.
Experiments on movies show that the events detected by our method coincide with the highlights in the
movies.
[57] Synthetic Data Generator for Classification Rules Learning
Runzong Liu, Chongqing University
Abstract: Standard data set is useful to empirically evaluate classification rules learning algorithms.
However, there is still no standard data set which is common enough for various situations. Data sets come
from the real world are limited to specific applications. The sizes of attributes, the rules and samples of the
real data are fixed. A data generator is proposed here to produce synthetic data set which can be as big as
the experiments demand. The size of attributes, rules, and samples of the synthetic data sets can be easily
changed to meet the demands of different learning algorithms evaluation. In the generator, related attributes
are created at first. And then, rules are created based on the attributes. Samples are produced following
the rules. Three decision tree algorithms are evaluated used synthetic data set produced by the proposed
data generator.
[58] A Flash Light System for Individuals with Visual Impairment Based on TPVM
Wenbin Fang, Shanghai Jiao Tong University
Abstract: We propose a flashlight system to aid visually impaired people using the paradigm of temporal
psychovisual modulation (TPVM), which is new display mode taking ad-vantage of the limited flicker fusion
frequency of human eyes and high refresh rate of modern display devices to achieve visual bifurcation
effect. Structured light in visible spectrum is projected out onto the road surface and a synchronized camera
detects the deformation and then the recognition system calculates the road flatness (e.g. smooth road,
up- or down- stairs). To minimize the visual disturbance to other people, the TPVM display technique
effectively conceals the structured light and making it basically a normal flashlight to normal observers. The
system works in visible spectrum to minimize cost of the camera and the projector. We design fast and
reliable recognition system with iterative dichotomiser 3(ID3) algorithm to differentiate condition of the
pavement ahead into smooth road, wall ahead and up/down-stairs. Experimental results are presented to
validate the proposed system.
36
[59] Benchmarking State-of-the-Art Deep Learning Software Tools
Shaohuai Shi, Hong Kong Baptist University
Abstract: Deep learning has been shown as a successful machine learning method for a variety of tasks,
and its popularity results in numerous open-source deep learning software tools coming to public. Training
a deep network is usually a very time-consuming process. To address the huge computational challenge
in deep learning, many tools exploit hardware features such as multi-core CPUs and many-core GPUs to
shorten the training time. However, different tools exhibit different features and running performance when
training different types of deep networks on different hardware platforms, which makes it difficult for end
users to select an appropriate pair of software and hardware. In this paper, we aim to make a comparative
study of the state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK,
TensorFlow, and Torch. We benchmark the running performance of these tools with three popular types of
neural networks on two CPU platforms and three GPU platforms. Our contribution is two-fold. First, for deep
learning end users, our benchmarking results can serve as a guide to selecting appropriate software tool
and hardware platform. Second, for deep learning software developers, our in-depth analysis points out
possible future directions to further optimize the training performance.
[60] A Mobile Cloud System for Enhancing Multimedia File Transfer with IP Protection
Tipaporn Juengchareonpoon, Chulalongkorn University
Abstract: Fast transferring and sharing large multimedia files on the mobile network is a challenge. Most
users encounter a long delay before playing a file; so this creates a bad experience for users. One common
mechanism is to buffer the file to the internal memory before playing it. Hence, playing a large media file
without long delay or interruption is a dream. This paper proposed a new mechanism, named as STEM,
that can enhance the media file sharing and transfer speed over the Internet based on smartphones or
mobile devices. In addition, this technique can also protect the intellectual property of the transferred media.
[61] Distinguish True or False 4K Resolution using Frequency Domain Analysis and Free-Energy
Modelling
Wenhan Zhu, Shanghai Jiao Tong University
Abstract: With the prevalence of Ultra-High Definition (UHD) display terminals, 4k resolution (38402160
pixels) contents are becoming a major selling points for online video media. However, due to the
insufficiency of natural UHD contents, a large number of false 4k videos are circulating on the web. Those
‘4k’ contents, usually being upscaled from lower resolutions, often frustrates enthusiastic consumers and
are in fact a waste of stringent bandwidth resources. In this paper, we propose to use frequency domain
analysis to distinguish natural 4k contents from false ones. The basic assumption is that true 4k contents
has much more high frequency responses than upscaled versions. We use free energy modelling to
approximate the human viewing process so as to minimize the impact of structural complexity of visual
contents. We set up a database containing more than 1k original 4k frames together with upscaled versions
using many widely used interpolation algorithms. Experimental results show that the proposed method has
an accuracy rate higher than 90%.
[62] Performance Comparison between Five NoSQL Databases
Enqing Tang, Tsinghua University
37
Abstract: Recently NoSQL databases and their related technologies are developing rapidly and are widely
applied in many scenarios with their BASE (Basic Availability, Soft state, Eventual consistency) features.
At present, there are more than 225 kinds of NoSQL databases. However, the overwhelming amount of
databases and their constantly updated versions make it challenging for people to compare their
performance and choose an appropriate one. This paper is trying to evaluate the performance of five
NoSQL clusters (Redis, MongoDB, Couchbase, Cassandra, HBase) by using a measurement tool – YCSB
(Yahoo! Cloud Serving Benchmark), explain the experimental results by analyzing each database’s data
model and mechanism, and provide advice to NoSQL developers and users.
[63] Collective Extraction for Opinion Targets and Opinion Words from Online Reviews
Xiangxiang Jiang, Guilin University of Electronic Technology
Abstract: Online reviews are very important for lots of Web applications. Extracting opinion targets and
opinion words from online reviews is one of the core works for review analysis and mining. The traditional
extraction methods mainly include two categories: the pipeline-based methods and the propagation-based
ones. The former extracts opinion targets and opinion words separately, which ignores the opinion relations
between them. The latter extracts opinion targets and opinion words iteratively by exploiting the nearestneighbor rules or syntactic patterns, which would probably lead to poor results due to the limitations on
predefined window size and the propagating errors of dependency relation parsing. For such shortcomings
of traditional methods, we propose a collective extraction method for opinion targets and opinion words
based on the word alignment model. In order to tackle the time-consuming and error-prone problem of
manual annotation, we further devise a semi-supervised extraction method based on active learning. Finally,
we carry out a series of experiments on real-world datasets to validate the effectiveness of the proposed
methods.
[64] Efficient Power Allocation under Global Power Cap and Application-Level Power Budget
Xiaoxue Hu, Beijing University of Aeronautics and Astronautics
Abstract: Web-related applications which are typically multi-server, high-parallel, long-running are common
in datacenter. They always compute over large-scale dataset and consume plenty of energy. Up until now,
most research focuses on trading a loss of performance for energy saving. However, managing the power
is more important compared with reducing it. In this paper, we add energy consumption to the list of
managed resources and help managers to control power profile of web-related applications in the
datacenter. Tenants put forward the power budget and corresponding response time target of their own
applications before they rent servers. We designed strategies to make every application in cluster run under
a global power cap and their own power budget. We first propose a Global Feedback Power Allocation
Policy to periodically allocate the global power cap among the applications. We also devise a Local Efficient
Power Policy to determine application-level power cap and allocate it among servers running the application.
Extra budget during each period can be used in the rest of the tenancy to increase the application-level
power cap to minimize the response time. We use Shell, WWW, DNS and Mail workloads to evaluate our
policy.
[65] An Adaptive Tone Mapping Algorithm Based on Gaussian Filter
Chang Liu, Chongqing University
Abstract: A new adaptive Tone Mapping (TM) algorithm based on Gaussian filter is proposed to display
the High Dynamic Range (HDR) image on conventional digital display devices. Unlike the conventional
38
luminance mapping function, the proposed algorithm uses the separated two-dimensional Gaussian filter
and empirical parameter to obtain better details and speed operation performance. Gaussian filter is mainly
used for edge-preserving smoothing. Empirical parameter is introduced to adaptively adjust the
overexposed luminance image after mapping. Multiple regression models are utilized to link the empirical
parameter to image information which is the mean, logarithm mean and variance of luminance values.
Experimental results show that the proposed algorithm retains acceptable image contrast and color
information, moreover, outperforms the previous methods on running speed.
[66] Treand Behavior Research by Pattern Analysis in Financial Big data - A Case Study of Taiwan
Index Futures Market
Mei-Chen Wu, National Chiao Tung University
Abstract: Market structure provides concrete information about the market. Price patterns can be imagined
as the evidence of a supply and demand states in the market. Price shifts higher as the demands exceed
the available supply and vice-versa. These patterns convey precious information about what is going to
happen in the market.The purpose of this study is to investigate the underlying relation between price
pattern in Taiwan Futures Exchange (TAIFEX) Futures Index Market and its following trend. Forecasting
the directions of price shift following the pattern through supervised learning and testing with artificial neural
network (ANN). This research implements changepoint-analysis (CPA) under statistics field, and
perceptually important points (PIP) theory. CPA finds the locations where the shifts in value occur. Then,
PIP algorithm performs the feature extraction of the pattern. Then, the PIP is then fed to ANN to forecast
the following trends. To simulate the research concept, a control model is built based on online time
segmentation algorithm for comparison.The results of this research shows that robust patterns found by
CPA have the ability to forecast market trend direction up to 83.6% accuracy. The result indicates that
TAIFEX Futures market directions can be forecasted through its historical price robust patterns. Thus,
rejecting that TAIFEX Futures Index Market follows random walk theory. In contrast, the control model
which was built based on online time segmentation also has the ability to forecast but not as accurate as
using the CPA method. In conclusion, analyzing the patterns reflected in the market effectively provide
precious insights about its trends behavior.
[67] Applying Market Profile Theory to Analyze Financial Big Data and Discover Financial Market
Trading Behavior - A Case Study of Taiwan Futures Market
Yu-Hsiang Hsu, National Chiao Tung University
Abstract: With financial market constantly changing, prices are often affected by many factors that we
cannot predict its direction especially in the market correction. If investors want to make profits, they can
find a relatively low-risk entry points. This thesis is based on Market Profile Theory to research displacement
of point of control by its and point of control of historical trading day and also the change of time price
opportunities counts to find the best extremely short-term entry and exit points. Finally, this thesis
anticipates finding potential market behavioral knowledge through experiments and statistical analysis. That
can help traders make profits in a very short-term trading, and confirm the Taiwan stock exchange
capitalization weighted stock index (TAIEX) futures market does not meet the weak form efficient market
hypothesis. This thesis found the point of control of historical trading day can be the entry point as a
reference. The whole historical trading day using the point of control in five days as a reference has better
profit performance. It also shows that the point of control has the most traders accept the price properties.
When the point of control shifted to the new price in a very short-term, the difference between the time price
opportunities counts separated from the point of control also can be the entry point as a reference.
39
[68] Protecting Link Privacy for Large Correlated Social Networks
Lin Yang, Harbin Institute of Technology Shenzhen Graduate School
Abstract: Privacy is widely studied in various research do- mains and it gradually attracts more and more
research efforts to investigate how to protect social network privacy. Most of existing approaches seldom
consider the situation that social data might be correlated to each other. In this paper, we are the first
attempt to study this issue by modeling such correlation as the probability that a vertex could be a potential
friend of a given vertex. By not allowing the potential friend vertex to be selected as the neighbor vertex in
the perturbed graph, we not only protect the direct neighbors but also the highly correlated indirect neighbor
vertices. We then defined the privacy and the utility measurement for evaluating whether a perturbed graph
is good or not. Experiments are performed on three datasets and compared with the state-of-the-art
algorithm, it demonstrated that our approach can achieve especially good results on a dense graph, while
is comparably as good as the compared algorithm.
[69] A Protocol for Extending Analytics Capability of SQL Database
Manyi Cai, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Abstract: To extend the capability of big data analytics in SQL databases, we propose an interaction
protocol and communication framework called Dex. Using Dex protocol, we can integrate a SQL database
system and a big data system into a unified data analysis platform. The integrated system allows users to
call complex analytics functions available in the big data analytics system with a simple SQL statement. We
prototype our idea on PostgreSQL and Spark and demostrate promising performance gains against pure
SQL UDF solutions.
[70] Competitive Intelligence Study on Macau Food and Beverage Industry
Simon Fong, University of Macau
Abstract: Due to the dynamic nature of commerce, it is important to capture useful external information as
a support to build appropriate strategy and make decision in cur-rent competitive market with large volume
of data. A review and summary of the current competitive intelligence development, mainly new concepts
and tools, is conducted in this report for proposing a system for food and beverage industry that aims at
obtaining competitive advantage in Macau market.
[71] Finding Optimal Meteorological Observation Locations by Multi-Source Urban Big Data
Analysis
Guoshuai Zhao, Xi’an Jiaotong University
Abstract: In this paper, we try to solve site selection problem for building meteorological observation
stations by recommending some locations. The functions of these stations are meteorological observation
and prediction in regions without these. Thus in this paper two specific problems are solved. One is how to
predict the meteorology in the regions without stations by using known meteorological data of other regions.
The other is how to select the best locations to set up new observation stations. We design an extensible
two-stage framework for the station placing including prediction model and selection model. It is very
convenient for executives to add more real-life factors into our model. We consider not only selecting the
locations that can provide the most accuracy predicted data but also how to minimize the cost of building
new observation stations. We evaluate the proposed approach using the real meteorological data of
40
Shaanxi province. The experimental results show the better performance of our model than existing
commonly used methods.
[72] Research on Algorithm of PSO in Image Segmentation of Cement-Based
Xiaojie Deng, Hubei University of Technology
Abstract: This paper selects OTSU segmentation method. In order to verify the superiority of Chaos
Particle Swarm Optimization, before segmentation, use test function detects chaos particle swarm (PSO)
algorithm accuracy and efficiency. Then OTSU method were optimized and contrasted by four kinds of
optimization algorithms. In order to select the best image segmentation, it provides a scalable processing
platform for future research.
41
Index
Keynote Speakers
Carlo Ghezzi
Xin Yao
Ziran Zhao
[1]
[2]
[3]
Paper Session
Chunzhi Wang
Yi Tan
Chunzhi Wang
Rongzhen Li
Li Zhang
Xinge You
Yurong Zhong
Zhigang Xu
Ehab Mohamed
Anyong Qin
Xing Liu
Xiaohuan Lu
Huapeng Yu
Siping Shi
Yi Li
Peng Huang
Lei Duan
Luyan Xiao
Xutian Zhuang
Li Zhang
Manhua Jiang
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
Ni Luo
[25]
Caiquan Xiong
Jia-Yow Weng
Chang Lu
Chang Liu
Xinlu Zong
Li Zheng
Xuegang Wu
Binyang Li
Weidian Zhan
Jingwei Zhang
Xichun Yue
Zhiguo Gong
Yubin Zhao
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
Qingquan Lai
Kelvin Tsoi
Chan-Fu Kuo
Lujia Wang
Yunpeng Shen
Guangtao Zhai
Haitian Li
Jiantao Zhou
Jianping Gu
Yu-Fu Chen
Szu-Hao Huang
Chin Chou
Jou-Fan Chen
Cho-Chin Lin
Bo Li
Li Zhang
Tsz Fai Chow
Zhenyu Liao
Ruizong Liu
Wenbin Fang
Shaohuai Shi
Tipaporn
Juengchareonpoon
Wenhan Zhu
Enqing Tang
Xiangxiang Jiang
Xiaoxue Hu
Chang Liu
Mei-Chen Wu
Yu-Hsiang Hsu
Lin Yang
Manyi Cai
Simon Fong
Guoshuai Zhao
Xiaojie Deng
42
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]