Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn1, M. Garzaran1, G. DeJong1, D. Padua1, G. Ren1, X. Li1, K. Yotov2, K. Pingali2 1 University of Illinois at Urbana-Champaign 2 Cornell University Two approaches to code optimization: • Empirical Search • Models – E.g., execute and – E.g., calculate the measure different best tile size for MM versions of MM code as a function of cache with different tile sizes. size. – Slow – Fast – Accurate because of – May be inaccurate feedback – No verification through feedback Hybrid Approach • Faster than empirical search • More accurate than the model – Use the model as a prior – Use active sampling to minimize the amount of searching Why is Speed Important? • Adaptation may have to be applied at runtime, where running time is critical. • Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator) • Library routines can be used as a benchmark to evaluate alternative machine designs. Problem: Matrix Multiplication • Tiling – Improves the locality of references • Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB • Matrix multiplication - illustrative example for testing the hybrid approach • Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc. Empirical Search: ATLAS • Try tiling parameters NB in the range 16... min( 80, L1 cache size ) in steps of 4 Model (Yotov et. al.) • Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the matrix multiplication code. • Formula: max NB NB 2 NB L1 size such that 3 * 1 L 1 line size L 1 line size L1 line size • Has been extended to optimize the use of the L2 cache Model in action: • Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors • Whether to tile for the L1 or the L2 cache depends on the architecture and the application Hybrid approach • Model performance with a family of regression curves • Regression (nonparam) – minimizing the average error • Regression (ML) – Distribution over regression curves – Pick the most likely curve Regression (Bayesian) • Prior distribution P(curve) over regression curves – Make regression curves with model-predicted maxima more likely • Posterior distribution given the data (Bayes rule): – P(curve|data)=P(data|curve) P(curve)/P(data) • Pick the maximum a-posteriori curve – Picks curves with peaks in model-predicted locations when the data sample is small – Picks curves which fit the data best when the sample is large Active sampling • Objectives: 1) Sample at lower-tile sizes – takes less time 2) Explore – don’t oversample in the same region 3) Get information about the dominant peak Solution: Potential Fields objectives 1,2 • Positive charge at the origin • Negative charges at previously sampled points • Sample at the point which minimizes the field Potential Fields objective 3 • Positive charge in the region of the dominant peak • How do we know which peak dominates: – Distribution over regression curves • can compute: P(peak1 is located at x), P(peak2 is located at x), P(peak1 is of height h), P(peak2 is of height h) • Hence, can compute P(peak1 dominates peak2) • Impose a positive charge in the region of each peak proportional to its probability of domination Results I – Regression Curves Results II – Time, Performance Time (mins) Performance (MFLOPS) Model Hybrid ATLAS Sparc 376.66 851.04 832.63 SGI 499.81 553.15 505.4 Model Hybrid ATLAS Sparc 0:00 3:12 8:59 SGI 0:00 14:02 59:00 • Sparc – actual improvement due to the hybrid search for NB: ~10% • SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache Results III – Library Performance Conclusion • Approach: incorporates the prior. • Active sampling: actively picks to sample in the most informative region. • Decreases the search time of the empirical search, improves on the model’s performance.