* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Parallel Programming Models Monica Borra Outline Shared Memory Models – Revision Comparison of shared memory models Distributed Memory Models Parallelism in Big Data Technologies Conclusion Shared Memory Models Multi-threaded – Posix Threads(PThreads), TBB, OpenMP Multi-processors – Cilk, ArBB, CUDA, Microsoft Parallel Patterns Compiler Directives and Library functions(PRAM-like) Comparative Study I Most commercially available general purpose computers include hardware features to increase the parallelism. hyperthreading, multi-core, ccNUMA architecture General-purpose threading CPU vector instructions and GPUs SIMD Compared models Four Parallel programming models have been selected. Each of these models exploits different hardware parallel features mentioned earlier. Also, they require different levels of programming skills OPenMP , Intel TBB – parallel threads on multicore systems Intel ArBB – threads + multicore SIMD features CUDA – SIMD GPU features. CUDA • CUDA (Compute Unified Device Architecture) is a C/C++ programming model and API (Application Programming Interface) introduced by NVIDIA to enable software developers to code general purpose apps that run on the massively parallel hardware on GPUs. • GPUs are optimal for data parallel apps aka SIMD (Single Instruction Multiple Data). • Threads running in parallel use extremely fast shared memory for communication. Evaluations: 4 benchmarks: Matrix multiplication, Simple 2D convolution, Histogram computation and Mandelbrot. Different underlying computer architectures. Comparison between OpenMP – TBB and ArBB – CUDA for simple 2D Convolution Comparison Summary: OpenMP and TBB show a very low performance compared to ArBB and CUDA. TBB seems to have a lower performance than OpenMP for single socket architectures, the situation seems to reverse when running on ccNUMA architectures, where TBB shows a significant improvement. ArBB and CUDA. But also that ArBB performance tends to be comparable with CUDA performance in most cases (although it is normally lower). Hence, there are evidences that a carefully designed top range multicore and multisocket architecture( advantage of the TLP and SIMD features) like ArBB applications may approach the performance of top range CUDA GPGPU. Comparative Study II OpenMP, Pthread, Microsoft Parallel Patterns APIs Computation of matrix multiplication Performed on an Intel i5 processor Execution time and speed up Experimental Results: Distributed Parallel Computing Cluster based Message Passing Interface(MPI) – de-facto standard More advantageous when communication between the nodes is high Originally designed for HPC Apache Hadoop Parallel processing for Big Data Implementation of a programming Model, “Map Reduce” Why is parallelism in Big Data important? Innumerable sources – RFID, Sensors, Social Networking Volume, Velocity and Variety Apache Hadoop Framework that allows for the distributed parallel processing of large data sets. Batch processes raw unstructured data Highly reliable and scalable Consists of 4 modules: common utilities, storage, resource management and processing Parallel Case Study: Can we take advantage of MPI to overcome communication overhead in Big Data Technologies? Challenges: 1. Is it worth to speed-up communication? a. Percentage of time taken for communications alone b. Comparisons of achievable latency and peak bandwidth for point to point communications through MPI against Hadoop. 2. How difficult it is to adapt MPI to Hadoop and what are the minimal extensions to the MPI standard? A pair of new MPI calls supporting Hadoop data communication specified via key-value pairs. Contributions of the case study: Abstracting the requirements of the communication model Dichotomic, dynamic, data-centric bipartite model. Key-Value pair based Novel design of DataMPI – High Performance Communication Library Various benchmarks to prove efficiency and ease of use. Contributions: Comparision: DataMPI vs Hadoop Several big data representative benchmarks WordCount, Terasort, K-means, Top K, PageRank Compared for various parameters Efficiency, fault tolerance, easy of use Comparisons for Terasort Both Hadoop and DataMPI exhibit similar trends DataMPI shows better results in all cases. Results: Efficiency: DataMPI speeds up varied Big Data workloads and improves job execution time by 31%-41%. Fault Tolerance: DataMPI supports fault tolerance. Evaluations show that DataMPI-FT can attain 21% improvement over Hadoop. Scalability: DataMPI achieves high scalability as Hadoop and 40% performance improvement. Flexibile and the coding complexity of using DataMPI is on par with that of using traditional Hadoop Conclusion: The efficiency of a model in shared memory parallel computing depends on the type of the program and best use of underlying hardware parallel processing features. Extending MPI for high computational problems like big data mining is much more efficient than the traditional frameworks. Shared memory models are easy to implement but MPI gives best optimal results for more complex problems. References L. SanChez, J. Fernandez, R. Sotomayor, J. D. Garcia, “A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures”, IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, 2012, pp 363 - 374 M. Sharma, P. Soni, “Comparative Study of Parallel Programming Models to Compute Complex Algorithm”, IEEE International Journal of Computer Applications, 2014, pp 174 - 180 Apache Hadoop, hadoop.apache.org Xiaoyi Lu, Fan Liang, Bing Wang, Li Zha, Zhiwei Xu, “DataMPI: Extending MPI to Hadoop-like Big Data Computing”, IEEE 28th Internation Parallel and Distributed Processing Symposium, 2014, pp 829 - 838 Lorin Hochstein, Victor R. Basili, Uzi Vishkin, John Gilbert, “A pilot study to compare programming effort for two parallel programming models”, The Journal of Systems and Software, 2008 Questions? Thank you!!