Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
计算机系统结构 -设计概念 <<上海大学计算机系统结构>> 课程组 5/23/2017 1 计算机系统结构的定义 Amdahl提出:计算机系统结构是从程序设计 者所看到的计算机的属性,即概念性结构和功 能特性。这实际上是计算机系统的外特性。 从计算机系统的层次结构概念出发,不同级的 程序设计者所看到的计算机属性显然是不一样 的, “系统结构”就是指计算机系统中对各级 之间界面的定义及其上、下的功能分配。 例:图1-8中M2级:机器语言级计算机。其界 面之上是所有软件功能,界面之下是所有硬件 和固件的功能。 5/23/2017 2 计算机组成-1 计算机组成(Computer Organization)指 计算机系统结构的逻辑实现,包括机器 级内的数据通道和控制信号的组成及逻 辑设计,它着眼于机器级内各时间的时 序方式与控制机构、各部件功能及相互 联系。 5/23/2017 3 计算机组成-2 计算机组成还应包括:数据通路宽度; 根据速度、造价、使用状况设置专用部 件,例如是否设置乘法器、除法器、浮 点运算协处理器、 I/O处理器等;部件共 享和并行执行;控制器结构(组合逻辑、 PLA、微程序)、单处理机或多处理机、 指令先取技术和预估、预判技术应用等 组成方式的选择;可靠性技术;芯片的 集成度和速度的选择。 5/23/2017 4 计算机实现 计算机实现(Computer Implementation) 指计算机组成的物理实现,包括处理机、 主存等部件的物理结构,芯片的集成度 和速度,芯片、模块、插件、底板的划 分与连接,专用芯片的设计,微组装技 术,总线驱动,电源、通风降温、整机 装配技术等,它着眼于芯片技术和组装 技术。 5/23/2017 5 三者之间的关系 计算机系统结构、组成和实现是三个不同的概 念。系统结构是计算机系统的软、硬件界面; 计算机组成是计算机系统结构的逻辑实现;计 算机实现是计算机组成的物理实现。他们各自 有不同的内容,但又有紧密的关系。 例如:指令系统功能的确定属于系统结构,而 指令的实现,如取指、取操作数、运算、送结 果等具体操作及其时序属于组成,而实现这些 指令功能的具体电路、器件设计及装配技术等 属于实现。 5/23/2017 6 计算机等级与设计思想 计算机等级的发展遵循以下三种不同的设计思想。 (1)在本等级范围内以合理的价格获得尽可能好的 性能,逐渐向高档机发展,称为最佳性能价格比设 计; (2)只求保持一定的合用的性能而争取最低价格, 称为最低价格设计,其结果往往是从低档向下分化 出新的计算机等级; (3)以获取最高性能为主要目标而不惜增加价格, 称为最高性能设计,以至于产生当时最高等级计算 机。 5/23/2017 7 系列机概念 先设计一种系统结构(机器属性),而后按这种系统 结构设计它的系统软件,按器件状况和硬件技术研 究这种结构的各种实现方法,并按照速度、价格等 不同要求,分别提供不同速度、不同配置的各挡机 器。(系列机必须保证用户看到的机器属性一致) 例:IBM AS/400 5/23/2017 8 IBM 360 (1964年) 系列中各机型(规模由小到大,功能从弱 到强,包括20、30、40、50、65、75等6 个型号,后来扩充了25、85、91、195等 型号)具有兼容性 5/23/2017 9 系列机的优点 1。在使用共同系统软件的基础上,解决程序的兼容性问题; 2。在统一数据结构和指令系统的基础上,便于组成多机系统和网络; 3。使用标准的总线规程,实现接插件和扩展功能卡的兼容,便于实 现OEM(Original Equipment Manufacture)。 4。扩大计算机应用领域,提供用户在同系列的多种机型内选用最合 适的机器的可能性; 5。有利于机器的使用、维护和人员培训 6。有利于计算机升级换代; 7。有利于提高劳动生产率,增加产量、降低成本、促进计算机的发 展。 5/23/2017 10 模拟与仿真-1 系列机能实现程序移植,其原因在于系列机有 相同的系统结构。如果要求程序能在具有不同 系统结构的机器间相互移植,就要求做到在某 系统结构之上实现另一种系统结构,即实现另 一种机器的属性。 仿真是用微程序解释,其解释程序在微程序存 储器;模拟是用机器语言程序解释,其解释程 序在主存储器。 5/23/2017 11 模拟与仿真-2 模拟(Simulation) B虚拟机(Virtual Machine) A 宿主机(Host Machine) B的一条机器指令用A的一段机器 语言程序去解释执行-->模拟。 5/23/2017 12 模拟与仿真-3 仿真(Emulation) B目的机(Target Machine) A宿主机(Host Machine) B的一条机器指令用A的一段 微程序去解释执行-->仿真。 5/23/2017 13 M5:高级语言 M4以上应用 M4:汇编 M3:OS M3:OS M2:机器语言 B 虚拟机 模拟 M2:机器语言 仿真 M1:微程序 A 宿主机 5/23/2017 14 Introduction-1 Computer technology has made incredible progress in the roughly 55 years since the first general-purpose electronic computer was created. Today, less than a thousand dollars will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in 1980 for 1 million dollars. This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation (创新)in computer design. 5/23/2017 15 Introduction-2 During the first 25 years of electronic computers, both forces made a major contribution; but beginning in about 1970, computer designers became largely dependent upon integrated circuit technology(集成电路技术). During the 1970s, performance continued to improve at about 25% to 30% per year for the mainframes(主机系统) and minicomputers (小型机) that dominated the industry. 5/23/2017 16 Introduction-3 The late 1970s saw the emergence of the microprocessor (微型机). The ability of the microprocessor to ride the improvements in integrated circuit technology more closely than the less integrated mainframes and minicomputers led to a higher rate of improvement— roughly 35% growth per year in performance. This growth rate, combined with the cost (成本) advantages of a mass-produced microprocessor, led to an increasing fraction of the computer business being based on microprocessors. 5/23/2017 17 Introduction-3 In addition, two significant changes in the computer marketplace made it easier than ever before to be commercially successful with a new architecture. First, the virtual elimination of assembly language programming reduced the need for object-code compatibility. Second, the creation of standardized, vendorindependent operating systems, such as UNIX and its clone, Linux, lowered the cost and risk of bringing out a new architecture. 5/23/2017 18 Introduction-4 These changes made it possible to successfully develop a new set of architectures, called RISC (Reduced Instruction Set Computer) architectures, in the early 1980s. The RISC-based machines focused the attention of designers on two critical performance techniques, the exploitation of instruction-level parallelism(指令级并 行) (initially through pipelining(流水线)and later through multiple instruction issue(多指令发射)) and the use of caches (initially in simple forms and later using more sophisticated organizations and optimizations). 5/23/2017 19 Introduction-5 The combination of architectural and organizational enhancements has led to 20 years of sustained growth in performance at an annual rate of over 50%. 5/23/2017 20 Introduction-6 Figure 1.1 shows the effect of this difference in performance growth rates. 5/23/2017 21 Figure 1.1 shows the effect of this difference in performance growth rates. 5/23/2017 22 Introduction-7 First, it has signifi-cantly enhanced the capability available to computer users. For many applications, the highest-performance microprocessors of today outperform the supercomputer(超级计算机) of less than 10 years ago. Second, this dramatic rate of improvement has led to the dominance of microprocessor-based computers across the entire range of the computer design. 5/23/2017 23 Introduction-8 Workstations(工作站) and PCs have emerged as major products in the computer industry. Minicomputers, which were traditionally made from off-the-shelf logic or from gate arrays(门 阵列), have been replaced by servers made using microprocessors. Mainframes have been almost completely replaced with multiprocessors consisting of small numbers of off-the-shelf microprocessors. 5/23/2017 24 Introduction-9 Even high-end supercomputers are being built with collections of microprocessors. Freedom from compatibility with old designs and the use of microprocessor technology led to a renaissance in computer design, which emphasized both architectural innovation and efficient use of technology improvements. 5/23/2017 25 Introduction-10 This renaissance is responsible for the higher performance growth shown in Figure 1.1—a rate that is unprecedented in the computer industry. This rate of growth has compounded so that by 2001, the difference between the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15. 5/23/2017 26 Introduction-11 In the last few years, the tremendous improvement in integrated circuit capability has allowed older, less-streamlined architectures, such as the x86 (or IA-32) architecture, to adopt many of the innovations first pioneered in the RISC designs. (用新技术手段改造过时的结构) 5/23/2017 27 Introduction-12 As we will see, modern x86 processors basically consist of a front end that fetches and decodes x86 instructions and maps them into simple ALU(算术逻辑单元), memory access(存储器访问), or branch operations(分支操作) that can be executed on a RISC-style pipelined processor. 5/23/2017 28 Introduction-13 Beginning in the late 1990s, as transistor counts soared(晶体管数量的迅猛增长), the overhead (in transistors) of interpreting the more complex x86 architecture became negligible(微不足道的) as a percentage of the total transistor count of a modern microprocessor. 5/23/2017 29 Introduction-14 The architectural ideas and accompanying compiler improvements that have made this incredible growth rate possible. The dramatic revolution has been the development of a quantitative approach (定量 方法)to computer design and analysis that uses empirical observations (经验观察能力力)of programs, experimentation, and simulation as its tools(工具). 5/23/2017 30 Introduction-15 Sustaining the recent improvements in cost and performance will require continuing innovations in computer design. We believe such innovations will be founded on this quantitative approach to computer design. (我们相信这种创新是建立在对计算 机设计的定量探求上的) 5/23/2017 31 Introduction The Changing Face of Computing In the 1960s, the dominant form of computing was on large mainframes— machines costing millions of dollars and stored in computer rooms with multiple operators overseeing their support. Typical applications included business data processing (商务数据处理)and largescale scientific computing(大规模科学计 算). 5/23/2017 32 Introduction The Changing Face of Computing The 1970s saw the birth of the minicomputer, a smaller-sized machine initially focused on applications in scientific laboratories, but rapidly branching out as the technology of time-sharing(分时)— multiple users(多用户) sharing a computer interactively through independent terminals(独立终端)—became widespread. 5/23/2017 33 Introduction The Changing Face of Computing The 1980s saw the rise of the desktop computer(台式机) based on microprocessors, in the form of both personal computers and workstations. The individually owned desktop computer replaced time-sharing and led to the rise of servers(服务器)— computers that provided larger-scale services such as reliable, longterm file storage and access, larger memory, and more computing power(计算能 5/23/2017 34 力). Introduction The Changing Face of Computing The 1990s saw the emergence of the Internet and the World Wide Web, the first successful handheld computing devices (personal digital assistants or PDAs), and the emergence of high-performance digital consumer electronics, from video games to set-top boxes(机顶盒). 5/23/2017 35 Introduction The Changing Face of Computing Not since the creation of the personal computer more than 20 years ago have we seen such dramatic changes in the way computers appear and in how they are used. These changes in computer use have led to three different computing markets(计算市场) ( desktop computing , servers , Embedded computers ), each characterized by different applications(应用), requirements(需求), and computing technologies(计算技术). 5/23/2017 36 Introduction Changing Face for Desktop Computing The first, and still the largest market in dollar terms, is desktop computing. Desktop computing spans from low-end systems that sell for under $1000 to high-end, heavily configured workstations that may sell for over $10,000. Throughout this range in price and capability, the desktop market tends to be driven to optimize priceperformance. 5/23/2017 37 Introduction Changing Face for Desktop Computing This combination of performance (measured primarily in terms of compute performance and graphics performance) and price of a system is what matters most to customers in this market, and hence to computer designers. As a result, desktop systems often are where the newest, highest-performance microprocessors appear, as well as where recently cost-reduced microprocessors and systems appear first. 5/23/2017 38 Introduction Changing Face for Desktop Computing Desktop computing also tends to be reasonably well characterized in terms of applications and benchmarking, though the increasing use of Web-centric, interactive applications poses new challenges in performance evaluation. The PC portion of the desktop space seems recently to have become focused on clock rate as the direct measure of performance, and this focus can lead to poor decisions by consumers as well as by designers who respond to this predilection. 5/23/2017 39 Introduction Changing Face for Servers As the shift to desktop computing occurred, the role of servers to provide larger-scale and more reliable file and computing services grew. The emergence of the World Wide Web accelerated this trend because of the tremendous growth in demand for Web servers and the growth in sophistication of Web-based services. Such servers have become the backbone of large-scale enterprise computing, replacing the traditional mainframe. 5/23/2017 40 Introduction Changing Face for Servers For servers, different characteristics are important. First, availability is critical. The term “availability(有效性),” which means that the system can reliably and effectively provide a service. This term is to be distinguished from “reliability,” which says that the system never fails. Parts of large-scale systems unavoidably fail; the challenge in a server is to maintain system availability in the face of component failures, usually through the use of redundancy. 5/23/2017 41 Introduction Changing Face for Servers Why is availability crucial? Consider the servers running Yahoo!, taking orders for Cisco, or running auctions on eBay. Obviously such systems must be operating seven days a week, 24 hours a day. Failure of such a server system is far more catastrophic than failure of a single desktop. Although it is hard to estimate the cost of downtime, Figure 1.2 shows one analysis, assuming that downtime is distributed uniformly and does not occur solely during idle times. 5/23/2017 42 Introduction Changing Face for Servers As we can see, the estimated costs of an unavailable system are high, and the estimated costs in Figure 1.2 are purely lost revenue and do not account for the cost of unhappy customers! 5/23/2017 43 Introduction Changing Face for Servers A second key feature of server systems is an emphasis on scalability(可扩展性). Server systems often grow over their lifetime in response to a growing demand for the services they support or an increase in functional requirements. Thus, the ability to scale up the computing capacity, the memory, the storage, and the I/O bandwidth of a server is crucial. 5/23/2017 44 Introduction Changing Face for Servers Lastly, servers are designed for efficient throughput(吞吐量). That is, the overall performance of the server—in terms of transactions(交互) per minute or Web pages served per second—is what is crucial. Responsiveness to an individual request remains important, but overall efficiency and cost-effectiveness, as determined by how many requests can be handled in a unit time, are the key metrics for most servers. 5/23/2017 45 Introduction Changing Face for Embedded Computers Embedded computers(嵌入式计算机)—computers lodged in other devices where the presence of the computers is not immediately obvious—are the fastest growing portion of the computer market. These devices range from everyday machines (most microwaves, most washing machines, most printers, most networking switches, and all cars contain simple embedded microprocessors) to handheld digital devices (such as palmtops, cell phones, and smart cards) to video games and digital set-top boxes. 5/23/2017 46 Introduction Changing Face for Embedded Computers Although in some applications (such as palmtops) the computers are programmable, in many embedded applications the only programming occurs in connection with the initial loading of the application code or a later software upgrade of that application. Thus, the application can usually be carefully tuned for the processor and system. 5/23/2017 47 Introduction Changing Face for Embedded Computers This process sometimes includes limited use of assembly language(汇编语言) in key loops, although time-to-market pressures and good software engineering practice usually restrict such assembly language coding to a small fraction of the application. This use of assembly language, together with the presence of standardized operating systems(标准化操作系统), and a large code base has meant that instruction set compatibility(指令系统兼容性) has become an important concern in the embedded market. Simply put, like other computing applications, software costs are often a large part of the total 5/23/2017 48 cost of an embedded system. Introduction Changing Face for Embedded Computers Embedded computers have the widest range of processing power and cost—from low-end (低端)8-bit and 16-bit processors that may cost less than a dollar, to full 32-bit microprocessors capable of executing 50 million instructions per second that cost under 10 dollars, to high-end (高端) embedded processors that cost hundreds of dollars and can execute a billion instructions per second for the newest video game or for a high-end network switch. Although the range of computing power in the embedded computing market is very large, price is a key factor in the design of computers for this space. Performance requirements do exist, of course, but the primary goal is often meeting the performance need at a minimum price, rather than achieving higher performance at a higher price. 5/23/2017 49 Introduction Changing Face for Embedded Computers Often, the performance requirement in an embedded application is a real-time (实时)requirement. A real-time performance requirement is one where a segment of the application has an absolute maximum execution time that is allowed. For example, in a digital set-top box the time to process each video frame (视帧)is limited, since the processor must accept and process the next frame shortly. In some applications, a more sophisticated requirement exists: the average time for a particular task is constrained as well as the number of instances when some maximum time is exceeded. Such approaches (sometimes called soft real-time ) arise when it is possible to occasionally miss the time constraint on an event, as long as not too many are missed. 5/23/2017 50 Introduction Changing Face for Embedded Computers Real-time performance tends to be highly application dependent. It is usually measured (测量)using either from the application or from a standardized benchmark(标准化评 测) .With the growth in the use of embedded microprocessors, a wide range of benchmark kernels requirements exist, from the ability to run small, limited code segments to the ability to perform well on applications involving tens to hundreds of thousands of lines of code. 5/23/2017 51 Introduction Changing Face for Embedded Computers Two other key characteristics exist in many embedded applications: the need to minimize memory(最小化存储器) and the need to minimize power (最小化功耗). Although the emphasis on low power is frequently driven by the use of batteries(电 池), the need to use less expensive packaging (plastic versus ceramic) and the absence of a fan (风扇) for cooling(冷却) also limit total power consumption. 5/23/2017 52 Introduction Changing Face for Embedded Computers In many embedded applications, the memory can be a substantial portion of the system cost, and it is important to optimize memory size in such cases. Sometimes the application is expected to fit totally in the memory on the processor chip; other times the application needs to fit totally in a small off-chip memory. In any event, the importance of memory size translates to an emphasis on code size, since data size is dictated by the application. Some architectures have special instruction set capabilities to reduce code size. Larger memories also mean more power, and optimizing power is often critical in embedded applications. 5/23/2017 53 Introduction Changing Face for Embedded Computers Another important trend in embedded systems is the use of processor cores together with application-specific circuitry(专用电路芯片). Often an application’s functional and performance requirements are met by combining a custom hardware solution(用户硬件解决方 案) together with software running on a standardized embedded processor core, which is designed to interface to such special-purpose hardware. 5/23/2017 54 Introduction Changing Face for Embedded Computers In practice, embedded problems are usually solved by one of three approaches: 1. The designer uses a combined hardware/software solution that includes some custom hardware and an embedded processor core that is integrated with the custom hardware, often on the same chip. 2. The designer uses custom software running on an offthe-shelf(通用) embedded processor. 3. The designer uses a digital signal processor (DSPs) and custom software for the processor. Digital signal processors are processors specially tailored for signalprocessing applications. 5/23/2017 55 The Task of the Computer Designer-1 The task the computer designer faces is a complex one: Determine what attributes (属性)are important for a new machine, then design a machine to maximize performance while staying within cost and power constraints. This task has many aspects, including instruction set design, functional organization, logic design, and implementation. 5/23/2017 56 The Task of the Computer Designer-2 The implementation may encompass integrated circuit design, packaging(封 装), power, and cooling. Optimizing the design requires familiarity with a very wide range of technologies, from compilers and operating systems to logic design and packaging. 5/23/2017 57 The Task of the Computer Designer-3 In the past, the term computer architecture often referred only to instruction set design. Other aspects of computer design were called implementation(实现), often insinuating that implementation is uninteresting or less challenging. We believe this view is not only incorrect, but is even responsible for mistakes in the design of new instruction sets. 5/23/2017 58 The Task of the Computer Designer-4 The architect’s or designer’s job is much more than instruction set design, and the technical hurdles in the other aspects of the project are certainly as challenging as those encountered in instruction set design. This challenge is particularly acute at the present, when the differences among instruction sets are small and when there are three rather distinct application areas. 5/23/2017 59 The Task of the Computer Designer-5 The implementation of a machine has two components: organization and hardware. The term organization includes the high-level aspects of a computer’s design, such as the memory system, the bus structure, and the design of the internal CPU (where arithmetic, logic, branching, and data transfer are implemented). 5/23/2017 60 The Task of the Computer Designer-6 For example, two embedded processors with identical instruction set architectures but very different organizations are the NEC VR 5432 and the NEC VR 4122. Both processors implement the MIPS64 instruction set, but they have very different pipeline and cache organizations. In addition, the 4122 implements the floating-point instructions in software rather than hardware! 5/23/2017 61 The Task of the Computer Designer-7 Hardware is used to refer to the specifics of a machine, including the detailed logic design and the packaging technology of the machine. Often a line of machines contains machines with identical instruction set architectures and nearly identical organizations, but they differ in the detailed hardware implementation. 5/23/2017 62 The Task of the Computer Designer-8 For example, the Pentium II and Celeron are nearly identical, but offer different clock rates and different memory systems, making the Celeron more effective for low-end computers. Term architecture is intended to cover all three aspects of computer design— instruction set architecture, organization, and hardware. 5/23/2017 63 The Task of the Computer Designer-9 Computer architects must design a computer to meet functional requirements as well as price, power, and performance goals. Often, they also have to determine what the functional requirements are, which can be a major task. The requirements may be specific features inspired by the market. 5/23/2017 64 The Task of the Computer Designer-10 Application software often drives the choice of certain functional requirements by determining how the machine will be used. If a large body of software exists for a certain instruction set architecture, the architect may decide that a new machine should implement an existing instruction set. 5/23/2017 65 The Task of the Computer Designer-11 The presence of a large market for a particular class of applications might encourage the designers to incorporate requirements that would make the machine competitive in that market. Figure 1.4 summarizes some requirements that need to be considered in designing a new machine. Many of these requirements and features will be examined in depth in later chapters. 5/23/2017 66 The Task of the Computer Designer-12 Once a set of functional requirements has been established, the architect must try to optimize(优化) the design. Which design choices are optimal depends, of course, on the choice of metrics. The changes in the computer applications space over the last decade have dramatically changed the metrics. Although desktop computers remain focused on optimizing costperformance(性能-价格) as measured by a single user, servers focus on availability, scalability, and throughput cost-performance, and embedded computers are driven by price and often power issues. 5/23/2017 67 The Task of the Computer Designer-13 These differences and the diversity and size of these different markets lead to fundamentally different design efforts. For the desktop market, much of the effort goes into designing a leading-edge microprocessor and into the graphics and I/O system that integrate with the microprocessor. 5/23/2017 68 The Task of the Computer Designer-14 In the server area, the focus is on integrating state-of-the-art microprocessors, often in a multiprocessor architecture, and designing scalable and highly available I/O systems to accompany the processors. 5/23/2017 69 The Task of the Computer Designer-15 In the embedded processor market, the challenge lies in adopting the high-end microprocessor techniques to deliver most of the performance at a lower fraction of the price, while paying attention to demanding limits on power and sometimes a need for high-performance graphics or video processing. 5/23/2017 70 The Task of the Computer Designer-16 In addition to performance and cost, designers must be aware of important trends in both the implementation technology and the use of computers. Such trends not only impact future cost, but also determine the longevity of an architecture. 5/23/2017 71 Technology Trends-1 If an instruction set architecture is to be successful, it must be designed to survive rapid changes in computer technology. After all, a successful new instruction set architecture may last decades—the core of the IBM mainframe has been in use for more than 35 years. An architect must plan for technology changes that can increase the lifetime of a successful computer. 5/23/2017 72 Technology Trends-2 To plan for the evolution of a machine, the designer must be especially aware of rapidly occurring changes in implementation technology. Four implementation technologies, which change at a dramatic pace, are critical to modern implementations: 5/23/2017 73 Technology Trends-3 1. Integrated circuit logic technology— Transistor density(密度) increases by about 35% per year, quadrupling in somewhat over four years. Increases in die size (模板大小)are less predictable and slower, ranging from 10% to 20% per year. The combined effect is a growth rate in transistor count on a chip (片上晶体管数)of about 55% per year. Device speed scales(速率) more slowly. 5/23/2017 74 Technology Trends-4 2. Semiconductor DRAM (dynamic randomaccess memory)—Density increases by between 40% and 60% per year, quadrupling in three to four years. Cycle time has improved very slowly, decreasing by about one-third in 10 years. Bandwidth(带宽) per chip increases about twice as fast as latency decreases. In addition, changes to the DRAM interface have also improved the bandwidth. 5/23/2017 75 Technology Trends-5 3. Magnetic disk technology —Recently, disk density has been improving by more than 100% per year, quadrupling in two years. Prior to 1990, density increased by about 30% per year, doubling in three years. It appears that disk technology will continue the faster density growth rate for some time to come. Access time (访问时间)has improved by one-third in 10 years. 5/23/2017 76 Technology Trends-6 4. Network technology —Network performance depends both on the performance of switches and on the performance of the transmission system. Both latency(延迟) and bandwidth(带宽) can be improved, though recently bandwidth has been the primary focus. For many years, networking technology appeared to improve slowly: for example, it took about 10 years for Ethernet technology to move from 10 Mb to 100 Mb. The increased importance of networking has led to a faster rate of progress, with 1 Gb Ethernet becoming available about five years after 100 Mb. The Internet infrastructure in the United States has seen even faster growth (roughly doubling in bandwidth every year), both through the use of optical media(光介质) and through the deployment of much more switching hardware. 5/23/2017 77 Technology Trends-7 These rapidly changing technologies impact the design of a microprocessor that may, with speed and technology enhancements, have a lifetime of five or more years. Even within the span of a single product cycle for a computing system (two years of design and two to three years of production), key technologies, such as DRAM, change sufficiently that the designer must plan for these changes. Indeed, designers often design for the next technology, knowing that when a product begins shipping in volume that next technology may be the most costeffective or may have performance advantages. Traditionally, cost has 5/23/2017 decreased at about the rate at which density 78 Technology Trends-8 Although technology improves fairly continuously, the impact of these improvements is sometimes seen in discrete leaps, as a threshold that allows a new capability is reached. . 5/23/2017 79 Technology Trends-9 For example, when MOS technology reached the point where it could put between 25,000 and 50,000 transistors on a single chip in the early 1980s, it became possible to build a 32-bit microprocessor on a single chip. By the late 1980s, first-level caches could go on chip. By eliminating chip crossings within the processor and between the processor and the cache, a dramatic increase in cost-performance and performance/power was possible. This design was simply infeasible until the technology reached a certain point. Such technology thresholds are not rare and have a significant impact on a wide variety of design decisions. 5/23/2017 80 Cost, Price, and Their Trends-1 Although there are computer designs where costs tend to be less important—specifically supercomputers—costsensitive designs are of growing significance: More than half the PCs sold in 1999 were priced at less than $1000, and the average price of a 32-bit microprocessor for an embedded application is in the tens of dollars. Indeed, in the past 15 years, the use of technology improvements to achieve lower cost, as well as increased performance, has been a major theme in the computer industry. 5/23/2017 81 Cost, Price, and Their Trends-2 Textbooks often ignore the cost half of costperformance because costs change, thereby dating books, and because the issues are subtle and differ across industry segments. Yet an understanding of cost and its factors is essential for designers to be able to make intelligent decisions about whether or not a new feature should be included in designs where cost is an issue. 5/23/2017 82 Cost, Price, and Their Trends-3 We focuses on cost and price, specifically on the relationship between price and cost: price is what you sell a finished good for, and cost is the amount spent to produce it, including overhead. We also discuss the major trends and factors that affect cost and how it changes over time. The exercises and examples use specific cost data that will change over time, though the basic determinants of cost are less time sensitive. 5/23/2017 83 An Example Cost, Price, and Their Trends-3 System Cabinet Processor board I/O devices 5/23/2017 Software Subsystem Fraction of total Sheet metal, plastic 2% Power supply, fans 2% Cables, nuts, bolts 1% Shipping box, manuals 1% Subtotal 6% Processor 22% DRAM(128MB) 5% Video card 5% Motherboard with basic I/O support, networking 5% Subtotal 37% Keyboard and mouse 3% Monitor 19% Hard disk(20GB) 9% DVD drive 6% Subtotal 37% 84 OS + Basic Office Suite 20% Amdahl’s Law 1、Make the common case fast (加快经常 性事件的速度) Improving the frequent event, rather than the rare event, will obviously help performance. 5/23/2017 85 Amdahl’s Law Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction(部分、比例) of the time the faster mode can be used. 该定律表示:系统中某一部件由于采用某种 更快的执行方式后整个系统性能的提高与这 种执行方式的使用频率或占总执行时间的比 例有关。 5/23/2017 86 Amdahl’s Law 使用频度:应用对象、统计手段 改进使用频度最高的部件,可获得最大 的效率 形式化描述的主要指标——加速比 5/23/2017 87 Performance for entire task Speedup= using the enhancement when possible Performance for entire task without using the enhancement Execution time for entire task Speedup= without using the enhancement Execution time for entire task using the enhancement when possible 5/23/2017 88 Speedup(加速比) 加速比=(采用改进措施后的性能)/ (没有采用改进措施前的性能) = (没有采用改进措施前执行某任 务的时间)/ (采用改进措施后执行某任务的 时间) 5/23/2017 89 two factors-1 1. The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement。 计算机执行某个任务的总时间中可被改进部 分的时间所占的百分比。 For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This value, which we will call Fractionenhanced, is always less than or equal to 1. 5/23/2017 90 two factors-2 2. The improvement gained by the enhanced execution mode; that is, how much faster the task would run if the enhanced mode were used for the entire program. 改进部分采用改进措施后比没有采用改进措施 前性能提高倍数。 For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. We will call this value, which is always greater than 1, Speedupenhanced. 5/23/2017 91 The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced portion of the machine plus the time spent using the enhancement. 5/23/2017 92 Example 1: Suppose that we are considering an enhancement to the processor of a server system used for Web serving. The new CPU is 10 times faster on computation in the Web serving application than the original processor. Assuming that the original CPU is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement? Answer Fractionenhanced = 0.4 Speedupenhanced = 10 5/23/2017 93 Example 2: A common transformation required in graphics engines is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark.One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for a total of 50% of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives. 5/23/2017 94 Answer: 5/23/2017 95 3、The CPU Performance Equation (CPU性能公式) CPU time =CPU clock cycles for a program(CPU时钟周期总数) ×Clock cycle time(时钟周期) 时钟频率 5/23/2017 96 三要素 CPU性能取决于3个要素: Clock cycle time—Hardware technology and organization clock cycles per instruction(CPI)—Organization and instruction set architecture Instruction count(IC)—Instruction set architecture and compiler technology 5/23/2017 97 CPU time CPU time =Instruction count ×Clock cycle time ×Cycles per instruction 5/23/2017 98 5/23/2017 99 Example Suppose we have made the following measurements: Frequency of FP operations (other than FPSQR) = 25% Average CPI of FP operations = 4.0 Average CPI of other instructions = 1.33 Frequency of FPSQR= 2% CPI of FPSQR = 20 Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the CPU 5/23/2017 performance equation. 100 =2-(4.0-2.5) ×25%=1.625 5/23/2017 101 局部性原理 4、程序访问的局部性原理 经统计:一段时间90%的时间去执行10% 的程序代码,即大部分时间是访问程序 的局部空间。 程序访问的局部性是构建存储体系和建立 Cache的理论基础。 5/23/2017 102 参考 中文书pp.9~13:计算机系统的结构、组 成、实现,等级、系列机、模拟与仿真 中文书pp.13~15:计算机系统设计的定 量原则 外文书pp. 1~17 , 21~24, 39~42 : Fundamentals of Computer Design 5/23/2017 103