Download ENGR 5863 COMPUTER ARCHITECTURE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time-to-digital converter wikipedia , lookup

Transcript
ENGR6859 COMPUTER ENGINEERING FUNDAMENTALS –
COMPUTER ARCHITECTURE
Problem Set #0; RV; Issued: Mon. Sep. 11, 2006
Problems to be submitted on Oct. 6, 2006: 0, 7, 9, 14 & 16
0. Problem 1.2 in the textbook.
1. Problem 1.7 in the textbook.
2. Problem 1.17 in the textbook.
3. Discuss the issues related to the choice of entity sizes for memory access in a 64-bit processor.
Recall this issue is different from that of choosing operand sizes.
4. Read Appendix D (on the web). Give your detailed comments on the choice Intel 80x86 architects
have made on the issue in the previous problem. Of course, the Intel microprocessors do not follow
the load-store architecture, and so one may not agree that this comparison is fair.
5. Problem 2.5 in the textbook.
6. Problem 2.6 in the textbook.
7. Problem 2.11 in the textbook.
8. Problem 2.12 in the textbook.
9. We are contemplating the addition of floating point divider hardware unit to a processor so that its
performance improves for the anticipated application where the processor is going to be used. This
application consists of execution of 40% floating point instructions, and 5% of these operations are
expected to be divisions. All other floating point instructions could execute as fast as the integer
instructions, i.e., in one clock cycle, whereas division is currently implemented in microcode that
takes 10 clock cycles to execute. If we go ahead with the plan, of course, the division operation can
also be performed in one clock cycle. However, the designers tell us that an inexpensive addition
of the floating point square root unit stretches the clock by 20%. Determine if it would be
beneficial to incorporate this enhancement, ignoring the additional cost involved.
10. Discuss why processors based on load-store architectures facilitate access of information from
memory in various sizes, but limits the operands in ALU operations to the word size. Also discuss
why many high performance architectures require aligned memory access.
11. Assume that we make an enhancement to a computer that improves some mode of execution by a
factor of 10. Enhanced mode is used 50% of the time, measured as a percentage of the execution
time when the enhanced mode is in use. Recall that Amdahl’s law depends on the fraction of the
original, unenhanced, execution time that could make use of the enhanced mode.
a. What is the speedup we have obtained from fast mode?
b. What percentage of the original execution time has been converted to fast mode?
12. You are considering an enhancement to the implementation of the divide operation in the processor
your company is designing for a particular application. Assume that divide instruction takes 40
cycles before enhancement. It has been estimated that divide instructions account for 3% of all
instructions, and that the average execution time of all other instructions is 2 clock cycles.
a. Calculate the percentage of the total time spent for executing divide instructions.
b. You have determined that it is possible to reduce the number of cycles required for division to
8, but that this would require a 10% increase in the clock cycle time. Nothing else will be
affected. Would you proceed with this enhancement? Why?
c. Calculate the maximum percentage decrease in the clock frequency that would still make the
above enhancement (reducing divide time to 6 clock cycles) attractive.
d. State Amdahl’s law – any form is fine.
e. Suppose you are considering another modification which would cut down the number of clock
cycles needed for division to 10 clock cycles, while not imposing any penalty on the clock
cycle. Calculate the speedup in this case.
13. Measurements have shown that a certain load-store machine uses 45% ALU operations, 20% load
operations, 10% store operations, and 25% branch operations. The execution times of these
operations are 1, 2, 2, and 2 cycles, respectively. Assume that an optimizing compiler for this
machine discards 40% of the arithmetic logic unit (ALU) instructions, although it cannot reduce
loads, stores, or branches. Ignore system issues, and assume a 1 ns clock cycle time.
a. Calculate the CPI and MIPS rating of the unoptimized code.
b. Calculate the CPI and MIPS rating of optimized code.
c. What are the execution times with and without optimization?
d. Considering the MIPS ratings and execution times computed above, comment on whether
optimization improves performance.
14. Measurements have shown that a certain load-store machine uses 40% ALU operations, 25% load
operations, 10% store operation, and 25% branch operations. The processor takes 1 clock cycle to
execute each ALU instruction, but 2 clock cycles to run each of the other instructions. Assume that
an optimizing compiler for this machine discards 40% of the arithmetic logic unit (ALU)
instructions, although it cannot reduce loads, stores, or branches. Ignore system issues, and assume
a 1 ns clock cycle time.
a. Calculate CPI and MIPS ratings of the unoptimized code.
b. Calculate CPI and MIPS ratings of the optimized code.
c. What are the execution times with and without optimization?
d. Considering the MIPS ratings and execution times computed above, comment on whether
optimization improves performance.
e. Discuss how optimizing compilers, in general, improve the performance.
15. Discuss the advantages and limitations of using a fixed instruction length in a processor.
16. Consider the following code written in MIPS64 assembly language.
Here:
DADDI
R2, R0, #2000
DADD
R3, R0, R0
LB
R1, 10000(R3)
SB
20000(R3), R1
DADDI
R3, R3, #1
BNE
R3, R2, Here
a. Write clearly, in one sentence, what the above code accomplishes.
b. Rewrite the above code so that 64-bit load and store, LD and SD, are used in lieu of the byte
operations above.
c. As MIPS-64 allocates only a 16 bit signed number for the displacement field in an instruction,
if the array address starts with a large number, say 100000, the load and store instructions
cannot be used as shown above. Rewrite the code so that the same task would be accomplished
when array address is large, without changing the number of instructions within the loop.