Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Operational transformation wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
An Efficient Framework for Mining Massive Trajectory of Moving Objects Yang Zhang Engineer, Chinese Academy of Science, Beijing Yang Zhang received his Master degree in Nankai University in 2011. His research interests include data mining, parallel computing and pattern recognition. His research topics for mining big data have been published in renowned international conferences and journals, such as in KDD BigMine and CEUS. His research results for handwritten recognition have been published in renowned international conferences, such as in CCPR, ICPR and ICDAR. He won the first Prize of Contemporary Undergraduate Mathematical Contest in Modeling in Tianjin section. ABSTRACT: With the fast development of positioning technology, accurately collecting trajectory data of moving objects has become very convenient. However, efficient processing and analysis of such large scale trajectory data poses a challenging task for both researchers and practitioners. In this talk, I will show a novel efficient data processing framework, which includes three modules, namely (1) a data distribution module, (2) a data transformation module, and (3) an I/O performance improvement module. Specially, in the data distribution module, I design a twostep consistent hashing algorithm, which can comprehensively consider load balancing, data locality and scalability when distributing data to computing nodes on many disks; in the data transformation module, I will present a parallel strategy of linear referencing algorithm, where the subtasks of this strategy have low coupling and thus make the parallel implementation easy and low communication cost; and in the I/O module, I will show a compression-aware based model in order to improve the performance. I will show the experiment and analysis on a dataset containing 1.114 TB synthetic data and 578GB taxi real GPS data.