Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basics and Architectures RISC and CISC Processor 2 Introduction There are two fundamentally different ways of designing CPUs The CPU can be designed to have an instruction set with: very basic instructions OR a wide range of complex instructions Exercise – List typical instructions for each case add, move-data etc multiply, dsp orientated instructions etc 3 CISC Processor • • • • Complex Instruction Set Computer (CISC). The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we'll call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction: MULT data1, data2 • MULT is what is known as a "complex instruction." It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of “data1” and "b" represent the value of “data2”, then this command is identical to the C statement "a = a * b." 4 • One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. 5 RISC Processor • Reduce Instruction Set Computer (RISC). • RISC processors only use simple instructions that can be executed within one clock cycle. • Thus, the "MULT" command described above could be divided into three separate commands: "LOAD," which moves data from the memory bank to a register, "PROD," which finds the product of two operands located within the registers, and "STORE," which moves data from a register to the memory banks. In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly: LOAD A, data1 LOAD B, data2 PROD A, B STORE data1, A 6 CISC v/s RISC CISC RISC • Emphasis on hardware. • Emphasis on software. • Includes multi-clock complex instructions • Single-clock, reduced instruction only • Memory-to-memory: "LOAD" and "STORE" incorporated in instructions • Register to register: "LOAD" and "STORE" are independent instructions • Small code sizes, high cycles per second • Low cycles per second, large code sizes • Transistors used for storing complex instructions • Spends more transistors on memory registers 7 Conclusion CISC v/s RISC • At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form. • However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle "MULT" command. • These RISC "reduced instructions" require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible. 8 Performance Equation The following equation is commonly used for expressing a computer's performance ability: The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. 9 The Overall RISC Advantage • Today, the Intel x86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. • The price of RAM has decreased dramatically. In 1977, 1MB of DRAM cost about $5,000. By 1994, the same amount of memory cost only $6 (when adjusted for inflation). • Compiler technology has also become more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal. 10 PIPELINING CONCEPT Pipelining Concept Fetch • The instruction is fetched from memory Decode Execute • The instruction is decoded and the data path • control signals prepared for the next cycle • The operands are read from the register bank, shifted, combined in the ALU and • The result written back 3 stage Pipelining • That is basically 3 stage pipelining, – Fetch – Decode – Execute How it works???? 3 Basic Architecture 1. Super Scalar Architecture/Von-Neuman Architecture 2. Very Long Instruction word Architecture (VLIW)/ HARWARD Architecture 3. Single Instruction Multiple data (SIMD) Superscalar Architecture • Superscalar processing is the ability to initiate multiple instructions during the same clock cycle. • A superscalar architecture consists of a number of pipelines that are working in parallel. • Depending on the number and kind of parallel units available, a certain number of instructions can be executed in parallel. • It is known as Instruction level parallelism (ILP). Degree of Superscalar Architecture Superscalar Execution Advantage of Superscalar Architecture • Maximum Performance • Propagation delay is less. Disadvantage of Superscalar Architecture • Problem in Scheduling. Example of Superscalar Architecture • Athlon(AMD) processor has these type of Architecture Very Long Instruction Word (VLIW) Architecture/ HARWARD Architecture • Problem in Superscalar Architecture : – Problem of Scheduling which operation and which one have to wait and has done sequentially after others. • A typical VLIW (very long instruction word) machine has instruction words hundreds of bits in length. • Every VLIW contains enough bits to specify not just one machine operation but several to be performed simultaneously. Operand 3C Operand 3B Operand 3A Operation 3 Operand 2C Operand 2B Operand 2A Operation 2 Operand 1C Operand 1B Operand 1A Operation 1 How it can be achieved ? The Word format of VLIW Architecture • It’s name implies: a machine language instruction format that is fixed in length(as in RISC Arch.) but much longer than 32 bit to 64 bits. • The format of VLIW word has fixed length and each instructions are scheduled to execute. Now compiler has to more work than a control unit. • It is compiler’s job to analyze the program for data and resources and pack the slots of each VLIW with as many concurrently executable operations as possible Advantages of VLIW • • • • Reducing delay by Scheduling. Allow to work the system at higher frequency. It can execute more operation per cycle. Increases CPU Clock speed. Disadvantages of VLIW • Poor code density ( Number of bits required to store each instruction). • Wasted bit fields. • Take up more space. SIMD Architecture • SIMD stands for single instruction, multiple data. • It is the ability to do the same instruction on multiple pieces of data. This can lead to significantly better performance, since it is using less cycles with the same amount of data. • This is done by packing the numbers into a vector. For instance, to add {1,2,3,4} to {5,6,7,8} you could add 1 to 5, 2 to 6, and so on. Or you could use SIMD and add 1,2,3,4 to 5,6,7,8, and the result gets stored, when you can then unpack it and store it in a register. This is why it is sometimes called vector math. • SIMD has far-reaching applications; although the bulk and focus has been on multimedia. Why? Because it is an area of computing that needs as much computing power as possible, is popular, and in most cases, it is necessary to compute a lot of data at once. • SIMD works by performing an instruction on multiple pieces of data at the same time by packing a vector with data and sending each data parallel to one another. Application of SIMD • Two such real world examples of usage of SIMD are FastFourier Transform (FFT) and three-dimensional transform. • Fast Fourier Transform is used primarily with applications dealing with waveforms, such as Digital Signal Processors (DSPs), radar, sonar, and more popularly, audio/mpeg encoding (MP3s).