Download abstract

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Operational transformation wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
An Efficient Framework for Mining
Massive Trajectory of Moving Objects
Yang Zhang
Engineer, Chinese Academy of Science, Beijing
Yang Zhang received his Master degree in Nankai University in 2011. His research interests
include data mining, parallel computing and pattern recognition. His research topics for mining
big data have been published in renowned international conferences and journals, such as in KDD
BigMine and CEUS. His research results for handwritten recognition have been published in
renowned international conferences, such as in CCPR, ICPR and ICDAR. He won the first Prize of
Contemporary Undergraduate Mathematical Contest in Modeling in Tianjin section.
ABSTRACT:
With the fast development of positioning technology, accurately collecting trajectory data of
moving objects has become very convenient. However, efficient processing and analysis of such
large scale trajectory data poses a challenging task for both researchers and practitioners. In this
talk, I will show a novel efficient data processing framework, which includes three modules,
namely (1) a data distribution module, (2) a data transformation module, and (3) an I/O
performance improvement module. Specially, in the data distribution module, I design a twostep consistent hashing algorithm, which can comprehensively consider load balancing, data
locality and scalability when distributing data to computing nodes on many disks; in the data
transformation module, I will present a parallel strategy of linear referencing algorithm, where
the subtasks of this strategy have low coupling and thus make the parallel implementation easy
and low communication cost; and in the I/O module, I will show a compression-aware based
model in order to improve the performance. I will show the experiment and analysis on a
dataset containing 1.114 TB synthetic data and 578GB taxi real GPS data.