Download lecture5-cpu,mips קובץ

Computer Structure Fall 2015, Lecture 5 Ran Canetti ‫??איפה אנחנו ולאן ממשיכים‬ Input Multiplier Input Multiplicand 32 Multiplicand Register LoadMp 32=>34 signEx <<1 32 34 34 32=>34 signEx 1 Arithmetic Multi x2/x1 34 34 Sub/Add 34-bit ALU Control Logic 32 32 2 ShiftAll LO register (16x2 bits) Prev 2 Booth Encoder HI register (16x2 bits) LO[1] 2 "LO [0]" 34 Extra 2 bits ENC[2] ENC[1] ENC[0] LoadLO LoadHI 2 ClearHI Single/multicycle Datapaths 0 34x2 MUX 32 Result[HI] LO[1:0] 32 Result[LO] 1000 CPU IFetchDcd WB Exec Mem WB Performance Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 9%/yr. DRAM (2X/10 yrs) 1 198 2 3 198 498 1 5 198 6 198 7 198 8 198 9 199 0 199 199 2 199 399 1 4 199 5 199 699 1 7 199 8 199 9 200 0 Exec Mem 100 198 098 1 1 198 IFetchDcd ‫מבנה‬ ‫מחשבים‬ “Moore’s Law” µProc 60%/yr. (2X/1.5yr) Time IFetchDcd Exec Mem IFetchDcd WB Exec Mem WB Pipelining I/O Memory Systems  ? ‫מה נלמד‬ .‫כיצד מחשב בנוי‬ • .‫כיצד לנתח ביצועי מחשב‬ • ) cache, pipeline( ‫נושאים המשפיעים על מעבדים חדשים‬ • :‫הספר‬ Computer Organization & Design The hardware/software interface, David A. Patterson and John L. Hennessy. Third Edition 2005 Anatomy: 5 components of any Computer (since 1946) Keyboard, Mouse Computer Processor Control (“brain”) Datapath (“brawn”) Memory (where programs, data live when running) Devices Input Disk (where programs, data live when not running) Output Display, Printer Instruction Sets: A Thin Interface Application (iTunes) Compiler Software Hardware Assembler Operating System (Mac OS X) Processor Memory I/O system Datapath & Control Digital Design Circuit Design Transistors Instruction Set Architecture Instruction Sets: A Thin Interface Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10 Application (iTunes) Compiler Software Hardware Assembler Operating System (Mac OS X) Processor Memory I/O system Instruction Set Architecture Datapath & Control Digital Design Circuit Design Transistors “R-Format” Fieldsize: 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Bitfield: opcode rs rt rd shamt funct Binary: 000000 01001 01010 01000 00000 100000 In Hexadecimal: 012A4020 Levels of Representation High Level Language Program Compiler Assembly Language Program Assembler Machine Language Program Machine Interpretation Control Signal Specification ° ° :‫שפה עלית‬ ‫• קל לתכנת‬ ‫• לא חד ערכי לשפת מכונה‬ ‫• תלוי קומפיילר אופטימיזר‬ Portable • temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw $15, lw $16, sw sw 0000 1010 1100 0101 0($2) 4($2) )Assembly ‫שפת סף (אסמבלי‬ $16 0($2) ‫ לשפת מכונה‬1:1 • $15, 4($2) ‫• יותר קריא מש' מכונה‬ 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 ALUOP[0:3] <= InstReg[9:11] & MASK 0101 1100 0000 1010 1000 0110 1001 1111 MIPS Instruction Set Use the MIPS ISA document as the final word on the ISA MIPS ISA document available on Course Web Site. ‫‪Instruction Set Architecture‬‬ ‫• יש דמיון רב בין שפות המכונה השונות‬ ‫• אנו נלמד את שפת המכונה של מעבד ה‪ MIPS -‬שפותח בתחילת‬ ‫שנות ה‪ ( 80 -‬משתמשים בו ב‪.) Silicon Graphics,NEC,Sony -‬‬ ‫• ‪RISC v. CISC‬‬ ‫– ‪Reduced Instruction Set Computer - MIPS‬‬ ‫– ‪8086 - Complex Instruction Set Computer‬‬ ‫•‬ ‫המוטו “פחות זה‬ ‫יותר“‬ ‫כלומר‪ :‬סט פקודות קטן יותר‪ ,‬כאשר הפקודות עצמן פשוטות יותר‪ ,‬מאפשר‬ ‫ביצועים טובים יותר‪ .‬זאת היות והחומרה הדרושה לביצוע הפקודות הנ”ל תהיה‬ ‫פשוטה יותר המהירות תגדל ושטח הסיליקון הדרוש (=המחיר) יצטמצם‪.‬‬ Hardware implements semantics ... Syntax: ADD $8 $9 $10 Instruction Semantics: $8 = $9 + $10 Fetch next inst from memory:012A4020 Fetch opcode rs rt rd shamt funct Instruction Decode Operand Fetch Execute Result Decode fields to get : ADD $8 $9 $10 “Retrieve” register values: $9 $10 Add $9 to $10 Place this sum in $8 Store Next Instruction Prepare to fetch instruction that follows the ADD in the program. Why is ISA important? Code size long instructions may take more time to be fetched Requires larges memory (important in small devices, e.g., cell phones) Number of instructions (IC) Reducing IC reduce execution time (assuming same CPI and frequency) Code “simplicity” Simple HW implementation which leads to higher frequency and lower power Code optimization can better be applied to “simple code” The impact of the ISA RISC vs CISC CISC Processors CISC - Complex Instruction Set Computer The idea: a high level machine language Characteristic Many instruction types, with many addressing modes Some of the instructions are complex: Perform complex tasks Require many cycles ALU operations directly on memory Usually uses limited number of registers Variable length instructions Common instructions get short codes  save code length Example: x86 CISC Drawbacks • Compilers do not take advantage of the complex instructions and the complex indexing methods • Implement complex instructions and complex addressing modes  complicate the processor  slow down the simple, common instructions  contradict Amdahl’s law corollary: Make The Common Case Fast • Variable length instructions are real pain in the neck: – It is difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts – An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures) • Examples: VAX, x86 (!?!) RISC Processors RISC - Reduced Instruction Set Computer The idea: simple instructions enable fast hardware Characteristic A small instruction set, with only a few instructions formats Simple instructions execute simple tasks require a single cycle (with pipeline) A few indexing methods ALU operations on registers only Memory is accessed using Load and Store instructions only. Many orthogonal registers Three address machine: Add dst, src1, src2 Fixed length instructions Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM RISC Processors (Cont.) • Simple architecture  Simple micro-architecture – Simple, small and fast control logic – Simpler to design and validate – Room for on die caches: instruction cache + data cache • Parallelize data and instruction access – Shorten time-to-market • Using a smart compiler – Better pipeline usage – Better register allocation • Existing RISC processor are not “pure” RISC – e.g., support division which takes many cycles So, what is better, RISC or CISC • Today CISC architectures (X86) are running as fast as RISC (or even faster) • The main reasons are: –Translates CISC instructions into RISC instructions (ucode) –CISC architecture are using “RISC like engine” • We will discuss this kind of solutions later on in this course. ‫כלל תכנון מספר ‪Simplicity favors Regularity :1‬‬ ‫פעולות אריתמטיות‬ ‫•‬ ‫‪MIPS‬‬ ‫‪addi a,b,100 # a=b+100‬‬ ‫‪add a,b,c‬‬ ‫‪# a=b+c‬‬ ‫• ‪8086‬‬ ‫‪ADD EAX,B # EAX= EAX+B‬‬ ‫אנו מעדיפים מנגנון פשוט עם פקודות מינימליות כמו‬ ‫‪ R3 = R1 op R2‬עלפני מחשב שקל יותר לתכנות עם כמה משתנים שרוצים בפקודה למשל‬ ‫)‪R5 = ( R1 op1 R2) op2 (R3 op3 R4‬‬ ‫אבל קשה מאוד לתכנן ולממש אותו‪.‬‬ ‫כלל תכנון מס’ ‪Smaller is faster :2‬‬ ‫•‬ ‫•‬ ‫•‬ ‫•‬ ‫•‬ ‫‬‫‪-‬‬ ‫נאפשר פעולות אריתמטיות רק על רגיסטרים‪.‬‬ ‫האופרנדים יכולים להיות רגיסטר או קבוע (אחד)‪.‬‬ ‫סה”כ יש ‪ 32‬רגיסטרים‪.‬‬ ‫רגיסטר ‪word = 32 bits = 4 bytes‬‬ ‫הרגיסטרים מסומנים ע”י ‪.$‬‬ ‫או ‪ 0$ - 31$‬או ע”י שמות‬ ‫קונבנציות‬ ‫הקשורים לתפקידיהם‬ ‫‪ 2$ ,1$‬משתנים של ‪C‬‬ ‫בתכנית‪ .‬יש הסכמה על‬ ‫תפקידי הרגיסטרים הנ”ל בכל‬ ‫‪$t0… $t7, $s0…$s7‬משתנים זמניים‬ ‫דוגמא‪:‬‬ ‫‪f=(g+h)-(k+j) #‬‬ ‫‪add $t0,$s1,$s2‬‬ ‫‪add $t1,$s3,$s4‬‬ ‫‪sub $s0, $t0, $t1‬‬ ‫התכניות בשפת ‪ C‬למשל‪.‬‬ ‫‪$s0=f, $s1=g, $s2=h, $s3=k, $s4=j‬‬ ‫הדוגמא מתארת איך המשפט‬ ‫( ‪ f=(g+h)-(k+j‬מיוצג ע”י פקודות‬ ‫אסמבלי‪.‬‬ ‫הרגיסטרים הם ליבו של הפרוססור‪.‬‬ ‫הגישה אליהם מהירה מלזיכרון‪ .‬נגשים‬ ‫‪,‬בוזמנית” ל‪ 3-‬רגיסטרים‪ :‬מ‪ 2-‬קוראים‬ ‫ולשלישי כותבים‪.‬‬ Policy of Use Conventions Name Register number $zero 0 $v0-$v1 2-3 $a0-$a3 4-7 $t0-$t7 8-15 $s0-$s7 16-23 $t8-$t9 24-25 $gp 28 $sp 29 $fp 30 $ra 31 Usage the constant value 0 values for results and expression evaluation arguments temporaries saved more temporaries global pointer stack pointer frame pointer return address ‫ השמור לאסמבלר‬$ at = $1 ‫רגיסטרים נוספים הם‬ ‫ השמורים למערכת ההפעלה‬$ k0 , $ k1 = $ 26, $ 27 -‫ו‬ ‫הזיכרון‬ .‫ מערך גדול‬- ‫• הזיכרון‬ .‫ אינדקס למערך‬- ‫• כתובת לזיכרון‬ .‫ האינדקס בבתים‬- Byte addressing • 0 1 2 3 4 5 6 ... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data ‫גודל הזיכרון המכסימלי‬ 230 words = 232 bytes • ‫פניה לזיכרון‬ Load and Store ‫פקודות‬ LW - ‫טוענים מילה אבל הכתובת בזיכרון היא בבתים‬ lw $s1,100($s2) # $s1=Memory[$s2+100] base register = ‫מצביע למערך‬ offset = ‫מקום במערך‬ sw $s1,100($s2) # Memory[$s2+100]=$s1 :‫דוגמא נוספת‬ Word ‫ מערך של‬- save lw $8,save($9) base register = ‫מצביע למערך‬ • • # Temporary reg $8:=save[i] = ‫מקום במערך‬ offset • Memory Instructions: LW Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction $1,22($2) Fetch the load inst from memory opcode rs rt offset “I-Format” Decode fields to get : LW $1, 22($2) “Retrieve” register value: $2 Compute memory address: 22 + $2 Load memory address contents into: $1 Prepare to fetch instr that follows the LW in the program. ‫גישה לזיכרון‬ MIPS = Big endian A 0 32 bits of data 4 8 12 16 A[0] A[1] A[2] 0 1 2 3 INTEL = Little endian 3 2 1 0 C code: A[2] = h + A[2]; MIPS code: lw $t0, 8($s3) add $t0, $s2,$t0 sw $t0, 8($s3) # $t0=$s3[8] # $t0=$s2+$t0 # $s3[8]=$t0 :‫• דוגמא‬ words ‫ מערך של‬A $ s3 -‫ ב‬A ‫כתובת‬ $ s2-‫ נמצא ב‬h byte ‫קריאה של‬ lb (load byte( ‫• ישנם גם פקודות כמו‬ sb(store byte( -‫ו‬ ASCII - ‫ ב‬byte ‫ גודל‬:char ‫• שימושי לקריאת‬ American Standard Code For Information Interchange .‫ בתים‬2 ‫ הוא‬char ‫ גודל‬- Unicode -‫• ב‬ ‫התוכנית בזיכרון‬ ‫• התוכנית נשמרת בזיכרון בדיוק כמו נתונים‬ ‫‪memory for data, programs,‬‬ ‫‪compilers, editors, etc.‬‬ ‫‪Memory‬‬ ‫‪Processor‬‬ ‫ביצוע תוכנית‬ ‫• רגיסטר מיוחד ‪ PC - Program Counter‬שומר את כתובת הפקודה‪.‬‬ ‫• קוראים מילה שלמה מהזיכרון‪.‬‬ ‫• מקדמים את ה ‪.PC -‬‬ ‫כלל שחשוב לזכור‬ ‫באסמבלי של ‪ MIPS‬מקודדים‪:‬‬ ‫כתובת ‪ code‬במילים‬ ‫כתובת ‪ data‬בבתים‬ ‫כאשר מעבד ‪ MIPS‬ניגש לזיכרון הוא מבקש את הכתובת בבתים‬ Branch Instructions: BEQ Instruction Fetch Fetch branch inst from memory opcode Instruction Decode Operand Fetch Execute Result Store Next Instruction $1,$2,25 rs rt offset “I-Format” Decode fields to get: BEQ $1, $2, 25 “Retrieve” register values: $1, $2 Compute if we take branch: $1 == $2 ? ALWAYS prepare to fetch instr that follows the BEQ in the program (”delayed branch”). IF we take branch, the instr we fetch AFTER that instruction is PC + 4 + 100. PC == “Program Counter” Branch vs Jump ‫ קפיצה “אבסולוטית” ללא תנאים‬- Jump • j label ‫ קפיצה יחסית מותנת‬- Branch • # $1!=$2 go to label Example: • ($s3=h beq $s4, $s5, Lab1 add $s3, $s4, $s5 j Lab2 sub $s3, $s4, $s5 Lab1: ... Lab2: bne $1,$2,label $s4 =i $s5=j ) if (i!=j) h=i+j; else h=i-j; Addresses in Branches and Jumps • Instructions: bne $t4,$t5,Label beq $t4,$t5,Label j Label Next instruction is at Label if $t4!= $t5 Next instruction is at Label if $t4 = $t5 Next instruction is at Label • Formats: I op J op rs • Beq $s1,$s2,25 rt 16 bit address 26 bit address ‫מכאן‬ branch.‫ מילים‬2^16 ‫ קפיצה יחסית בגבולות‬-‫ רוב ה‬:‫הנחה‬branches.‫יהיו קפיצות לוקאליות‬ # if ($s1 ==$s2) go to PC +4 +25*4 ‫ תכנון טוב דורש לעיתים פשרות‬: 3 ’‫כלל תכנון מס‬ .‫ דמיון בין הפקודות‬.‫צמצום מספר סוגי הפקודות השונים‬ 6 5 op rs rt rd I op rs rt 16 bit address J op R 5 5 5 shamt 26 bit address • Example: lw $s1, 32($s2) 6 5 5 35 18 17 op rs rt # $s1 =$17, $s2=18 5 5 32 16 bit number 6 6 funct • ‫דוגמא לקידוד‬ Loop: lw $8, save($19) # $8=save[i] bne $8, $21,Exit #Goto Exit if save[i]<> k add $19,$19,$20 j Loop # i:=i+j # Goto Loop Exit: SAVE - 1000 80,000 35 19 8 1000 80,004 5 8 21 2 80,008 0 19 20 80,012 2 80,016 19 20,000 0 32 Branch-if-less-than? if $s1 < $s2 then Slt- set less then $t0 = 1 else slt $t0, $s1, $s2 $t0 = 0 :blt ‫• ניתן לבנות את‬ Blt –branch less then # $t0 gets 1 if $s0<$s1 # go to Less if $t0 != 0 blt $s0,$s1, Less slt $at,$s0,$s1 bne $at,$zero, Less Pseudo instruction ‫ היא‬blt • Assembler uses $at (= $1) for pseudo instructions • Pseudo instruction -‫דוגמאות נוספות ל‬ bne $8,$21,far_adrs - ‫שקול ל‬ beq $8,$21,nxt j far_adrs nxt: move $t1,$s4 - ‫שקול ל‬ add $t1,$s4,$zero ‫תרגיל‬ :MIPS ‫ ושני תרגומים שלו לשפת האסמבלי של‬,C-‫ לפניך קוד הכתוב ב‬: ‫שאלה‬ while (save[i]!=k) do i=i+j ; save:array [ 0..100] of word .k-‫ כ‬$21-‫ ו‬j-‫ כ‬$20 ‫ ו‬i ‫ מתפקד כ‬$19 :‫תרגום ראשון‬ # Temporary reg $9:=i*4 # Temporary reg $8:=save[i] # Goto Exit if save[i] = k # i:=i+j # Goto Loop • Loop: muli $9,$19,4 lw $8,save($9) bne $8,$21,Exit add $19,$19,$20 j Loop Exit: ‫המשך תרגיל‬ muli lw beq Loop: add muli lw beq Exit: :‫תרגום שני‬ • ‫ מה מספר הפקודות‬,‫ פעמים‬10 ‫ בהנחה שהלולאה מתבצעת‬:‫שאלה‬ ?‫שמתבצעות בכל אחד מהתרגומים‬ • $9,$19,4 $8,save($9) $8,$21,Exit $19,$19,$20 $9,$19,4 $8,save($9) $8,$21,Loop # Temporary reg $9:=i*4 # Temporary reg $8:=save[i] # Goto Exit if save[i] = k # i:=i+j # Temporary reg $9:=i*4 # Temporary reg $8:=save[i] # Goto Loop if save[i]!=k MIPS operands Name 32 registers Example Comments $s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is $fp, $sp, $ra, $at reserved for the assembler to handle large constants. Memory[0], 2 30 memory Memory[4], ..., words ‫לסיכום‬ Memory[4294967292] Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. Assembler assembler A.asm A.obj P.exe linker B.asm assembler B.obj C.lib (c.obj) loader Memory ‫דוגמא‬ m: .word 2 sw 7, m($3) 2 assembler A.asm sw $7,0($3) 2 3 A.obj linker 4 s: .word 3,4 j 3 4 k lw $1,s ($2) k: add $1,$2,$3 B.asm assembler j 2 lw $1,0($2) add $1,$2,$3 B.obj sw $7,0($3) j 3 lw $1,4($2) add $1,$2,$3 P.exe Unix -‫ ב‬Object ‫קובץ‬ • • • • • • Object file header text segment data segment relocation information symbol table debugging information Preprocessing • Macro: • Code: • After preprocessing ‫תהליך הקומפילציה‬ ‫‪MIPS‬מבנה הזיכרון במחשב‬ Calling procedure: Callee procedure: Sub1: Single Cycle I nstr u ctio n [ 2 5– 0] 26 Shift l eft 2 J u mp a d dr es s [31 – 0] 28 PC +4 [31 – 2 8] Ad d Ad d 1 M u x M u x 1 0 Shift left 2 Re g D st Ju mp 4 A LU r e sult 0 Br a n ch Me mR e a d I nstr uctio n [31 – 2 6] C o ntr ol Me mto Reg AL U Op Me m Write AL U Sr c Re g Writ e I nstr uctio n [25 – 2 1] PC Re a d addr e ss I nstr uctio n [20 – 1 6] I n str ucti o n [3 1– 0] Instr ucti o n me mor y Rea d r e gi st er 1 I nstr uctio n [15 – 1 1] 0 M u x 1 Re a d data 1 Rea d r e gi st er 2 Regi ster s Re a d Wri te data 2 r e gi st er Z er o 0 M u x 1 Wri te dat a AL U AL U r e sult Ad dr e ss Dat a me mory Writ e d ata I nstr uctio n [15 – 0] 16 32 Sig n ex te nd I n str u cti on [5– 0] AL U co ntr ol Re a d dat a 1 M u x 0

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture5-cpu,mips קובץ