Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMPUTER ARCHITECTURE (T120B125) Assoc.Prof. Stasys Maciulevičius Computer Dept. [email protected] Computer performance Performance. What is it? How can it be evaluated or measured? Calculation time; it includes everything: CPU time, memory and drives access time, input/output time, time spent by operating system; • CPU time includes: - CPU time for user tasks; - CPU time for system tasks. CPU time TCPU = NCPU × = NCPU / F ; • • NCPU - number of clocks used by executing program • • F 2012-2014 - clock time - clock rate ( = 1/F) ©S.Maciulevičius 2 Computer performance Average clocks per instruction: CPI = NCPU / N, CPI - average number of clocks per instruction, N - number of instructions executed by program. Replacing NCPU by N × CPI , we get: TCPU = N × CPI × or TCPU = N × CPI / F . CPI = (CPIi × Ni )/ N = (CPIi × Ni / N ) 2012-2014 ©S.Maciulevičius 3 Gibson mix The Gibson Mix, was produced by J.Gibson of IBM for scientific applications This had the following weighting for the variation used here: 2012-2014 Group of instructions Rate Fixed Point Add/Subtract Fixed Point Multiply Fixed Point Divide Branch Compare Transfer (8 characters) Shift Logical Modification Floating Point Add Floating Point Multiply Floating Point Divide 0.330 0.006 0.002 0.065 0.040 0.175 0.046 0.017 0.190 0.073 0.040 0.016 ©S.Maciulevičius 4 MIPS metrics MIPS metrics: MIPS = N / (T × 106) = F / (CPI × 106). This measure is easy to understand. However, it should be metioned that : • MIPS speeds are highly dependent on instruction set, so comparing of computers having different instruction sets is problematic; • effective MIPS speeds are highly dependent on the programming language used; • in some cases MIPS even acts contrary to productivity 2012-2014 ©S.Maciulevičius 5 MIPS metrics MIPS speeds of some CPUs: AMD Athlon 3,561 MIPS at 1.2 GHz 3.0 MIPS/MHz 2000 Pentium 4 Extreme Edition 9,726 MIPS at 3.2 GHz 3.0 MIPS/MHz 2003 Intel Core 2 Extreme X6800 27,079 MIPS at 2.93 GHz 9.2 MIPS/MHz 2006 Intel Core i7 Extreme QX9770 (Quad core) 59,455 MIPS at 3.2 GHz 18.6 MIPS/MHz 2008 AMD Phenom II X4 940 Black Edition 42,820 MIPS at 3.0 GHz 14.3 MIPS/MHz 2009 Intel Core i7 Extreme 3960X (Hex core) 177,730 MIPS at 53.4 MIPS/MHz 3.33 GHz 2011 2012-2014 ©S.Maciulevičius 6 MFLOPs metrics MFLOPs is an acronym meaning Millions FLoating point Operations Per Second: MFLOPs = N / (T × 106) It should be taken into account that: • different processors may have different floating point instruction sets; (Cray-2 doesn’t have division, Motorola 68882 has, etc.); • duration of floating point instructions varies in wide range. 2012-2014 ©S.Maciulevičius 7 Intel Processor Numbers The processor number is one of several factors, along with processor brand, specific system configurations and system-level benchmarks, to be considered when choosing the right processor for your computing needs. Intel processor numbers are based on a variety of features that may include the processor's underlying architecture, cache, Front Side Bus, clock speed, power and other Intel technologies A processor number represents a broad set of features that can influence overall computing experience but is not a measurement of performance 2012-2014 ©S.Maciulevičius 8 Intel Processor Numbers Processor numbers for the 1st generation Intel® Core™ i7 brand have the i7 identifier followed by a three digit numerical sequence: 2012-2014 ©S.Maciulevičius 9 Intel Processor Numbers The table below explains the alpha prefixes used for the Intel Core 2 processor families Alpha Prefix Description QX X Desktop or mobile quad-core extreme performance processors Desktop or mobile dual-core extreme performance processors Q E T Desktop quad-core high performance processors Desktop energy efficient dual-core processors with TDP greater than or equal to 55W Mobile highly energy efficient processors with TDP 30-39W P Mobile highly energy efficient processor with TDP 20-29 W L U Mobile highly energy efficient with TDP 12-19W Mobile ultra high energy efficient with TDP less than or equal to 11.9W Mobile small form-factor with 22x22 BGA package S 2012-2014 ©S.Maciulevičius 10 Intel Processor Numbers Processor numbers for the 3rd generation Intel Core processor family have an alpha/numerical identifier followed by a four digit numerical sequence (3xxx), and may have an alpha suffix depending on the processor. The table below explains the alpha suffixes used for the 3rd generation Intel Core processor family Alpha Suffix K QM S T 2012-2014 Description Unlocked Quad-Core Mobile Performance optimized lifestyle Power optimized lifestyle ©S.Maciulevičius 11 Intel Atom Processor Numbers Processor numbers for the Intel Atom processor family are categorized by a three digit numerical sequence Netbook class Intel Atom processors have an alpha prefix of N, and Intel Atom processors with an alpha prefix of Z indicate the processor is for Mobile Internet Devices (MIDs) 2012-2014 ©S.Maciulevičius 12 Intel Xeon and Itanium Numbers Intel Xeon and Intel Itanium processor numbers are categorized in four digit numerical sequences, and may have an alpha prefix to indicate power and performance Alpha Prefix Description X Performance E Mainstream (rack-optimized) L Power-Optimized Processor Family Numb. Sequence System Type Intel® Itanium® processor 9000 Multi-processor and dual-processor Intel® Xeon® processor 7000 Multi-processor Intel® Xeon® processor 5000 Dual-processor Intel® Xeon® processor 3000 Single-processor 2012-2014 ©S.Maciulevičius 13 Intel Xeon Phi Coprocessor Numbers 2012-2014 ©S.Maciulevičius 14 AMD Opteron processor numbers Series 100 Series 200 Series 800 Series 1-way Up to 2-way Up to 8-way Socket Socket 939* Socket 940 Socket 940 Performance 100 Series Benchmarks 200 Series Benchmarks 800 Series Benchmarks Scalability Single-Core Options Frequency Model Numbers 1.6GHz - Model 242 Model 842 1.8GHz Model 144 Model 244 Model 844 2.0GHz Model 146 Model 246 Model 846 2.2GHz Model 148 Model 248 Model 848 2.4GHz Model 150 Model 250 Model 850 2.6GHz Model 152 Model 252 Model 852 2.8GHz Model 154 Model 254 Model 854 2012-2014 ©S.Maciulevičius 15 AMD Opteron processor numbers Series 100 Series 200 Series 800 Series 1-way Up to 2-way Up to 8-way Socket Socket 939* Socket 940 Socket 940 Performance 100 Series Benchmarks 200 Series Benchmarks 800 Series Benchmarks Scalability Dual-Core Options Frequency Model Numbers 1.8GHz Model 165 Model 265 Model 865 2.0GHz Model 170 Model 270 Model 870 2.2GHz Model 175 Model 275 Model 875 2.4 GHz Model 180 Model 280 Model 880 2.6GHz Model 185 Model 285 Model 885 2012-2014 ©S.Maciulevičius 16 Processor performance In the paper “The Fundamentals of Performance” (http://www.devx.com/Intel/Article/30831?trk=DXRSS_LATEST) performance of modern microprocessors is characterized as follows: performance = clock speed x IPC x (number of cores x effectiveness) IPC – number of instructions executed per 1 clock Effectiveness multiplier for dual cores CPU equals 1,5-1,7 2012-2014 ©S.Maciulevičius 17 Benchmarks Who develops: 1. Manufacturers 2. Users 3. Special institutions 4 types of benchmarks: 1) Real applications 2) Kernels 3) Game like 4) Syntetic tests 2012-2014 ©S.Maciulevičius 18 Benchmarks Some benchmarks for performance evaluation of processor’s : • • Dhrystone – for integer arithmetic performance Whetstone – for floating-point arithmetic performance • • • Livermore Lops – a benchmark for parallel computers (based on applied physics tasks) Linpack – a software library for performing numerical linear algebra (problem size 100х100, 1000х1000, ... ) on digital computers NAS Parallel Benchmarks – a set of benchmarks targeting performance evaluation of highly parallel supercomputers 2012-2014 ©S.Maciulevičius 19 Benchmarks – PCMark7 PCMark7 includes more than 25 individual workloads combined into 7 separate tests to give different views of system performance The PCMark test measures overall system performance and returns an official PCMark score. The Lightweight test measures the capabilities of entry level systems unable to run the full PCMark suite. The Entertainment test measures system performance in entertainment, media and gaming scenarios. The Creativity test measures performance in typical creativity scenarios involving images and video. The Productivity test measures system performance scenarios using the Internet and office applications. The Computation test contains workloads that isolate the computation performance of the system. The Storage test contains workloads that isolate the performance of the PC’s storage system 2012-2014 ©S.Maciulevičius 20 Benchmarks – PCMark8 PCMark 8 Basic Edition (free): Complete performance measurement for your PC. Includes Home, Creative and Work benchmarks. Test everything from tablets to desktop PCs. Easy to use, no technical know-how needed. Free online account to manage your results. 2012-2014 ©S.Maciulevičius 21 Benchmarks - SYSmark 2012 SYSmark 2012 is an application-based benchmark that reflects usage patterns of business users in the areas of office productivity, data/financial analysis, system management, media creation, 3D modeling and web development SYSmark 2012 is a ground up development and features the latest and most popular applications from each of their respective fields SYSmark 2012 v1.5 supports Microsoft Windows 7 and Windows 8 2012-2014 ©S.Maciulevičius 22 SPEC The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. SPEC develops benchmark suites and also reviews and publishes submitted results from our member organizations and other benchmark licensees 2012-2014 ©S.Maciulevičius 23 SPEC benchmarks SPEC CPU2006 is designed to provide performance measurements that can be used to compare compute-intensive workloads on different computer systems, SPEC CPU2006 contains two benchmark suites: CINT2006 for measuring and comparing computeintensive integer performance, and CFP2006 for measuring and comparing computeintensive floating point performance 2012-2014 ©S.Maciulevičius 24 SPEC benchmarks SPECint and SPECfp benchmarks differs in percentage of FP operations: 2012-2014 ©S.Maciulevičius 25 SiSoft Sandra 2013 Sandra means "System ANalyser, Diagnostic and Reporting Assistant." SiSoftware Sandra 2013, the latest version of utility which includes remote analysis, benchmarking and diagnostic features for PCs, servers, Pocket PC1, Smartphone1, small office/home office (SOHO) networks and enterprise networks Supports Win32 x86, Win64 x64, WinCE, ARM platforms 2012-2014 ©S.Maciulevičius 26 What benchmarks use engineers reviewing processors? In Intel’s Second-Gen Core CPUs: The Sandy Bridge Review performance evaluated in following areas: PCMark Vantage – Memory, Gaming, Productivity,… 3DMark11 – Graphics, Physics, Performance,… SiSoftware Sandra 2011 – Processor Arithmetics, Multimedia, Cryptography, Memory,… Content Creation Productivity – OCR, WinZip, WinRar,… Media Encoding Games - Metro 2033, F1 2010 (DX11), Aliens Vs. Predator (DX11) Power Consumption 2012-2014 ©S.Maciulevičius 27 Speedup Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved : 1 S = ----------------(1 - p) + p / k Here: – S – resulting speedup, – p – proportion of that computation where the improvement has a speedup of k – k – speedup of improvement to a computation 2012-2014 ©S.Maciulevičius 28 Speedup of parallelized implementations Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when parallelized 2012-2014 ©S.Maciulevičius 29 Speedup of parallelized implementations 2012-2014 ©S.Maciulevičius 30 TOP500 The TOP500 project ranks and details the 500 most powerful (non-distributed) computer systems in the world (see www.top500.org ) The project was started in 1993 and publishes an updated list of the supercomputers twice a year The LINPACK Benchmarks are a measure of a system's floating point computing power. They measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering 2012-2014 ©S.Maciulevičius 31 Green500 List The Green500 list ranks computers from the TOP500 list of supercomputers in terms of energy efficiency Today’s release of the Green500 List (http://www.green500.org/lists/green201311) shows that the top of the list is dominated by heterogeneous supercomputers, those that combine two or more types of processing elements together, such as a traditional processor or central processing unit (CPU) combined with a graphical processing unit (GPU) or coprocessor 2012-2014 ©S.Maciulevičius 32 Green500 List Green500 MFLOPS/W Rank 1 4,503.17 2 3,631.86 3 4 5 Site* Computer* TSUBAME-KFC - LX 1U4GPU/104Re-1G Cluster, Intel GSIC Center, Tokyo Institute of Xeon E5-2620v2 6C 2.100GHz, NVIDIA K20x Wilkes - Dell T620 Cluster, Intel Cambridge Xeon E5-2630v2 6C 2.600GHz, University NVIDIA K20 3,517.84 Center for Computational Sciences, University of Tsukuba HA-PACS TCA - Cray 3623G4-SM Cluster, Intel Xeon E5-2680v2 10C 2.800GHz, NVIDIA K20x 3,185.91 Swiss National Supercomputing Centre (CSCS) Piz Daint - Cray XC30, Xeon E52670 8C 2.600GHz, NVIDIA K20x Level 3 measurement data available 3,130.95 ROMEO HPC Center ChampagneArdenne romeo - Bull R421-E3 Cluster, Intel Xeon E5-2650v2 8C 2.600GHz, NVIDIA K20x 2012-2014 ©S.Maciulevičius Total Power (kW) 27.78 52.62 78.77 1,753.66 81.41 33