Download Computer_Architecture

UNIT 3 1. SIMD ARRAY PROCESSORS A synchronous array of parallel processors is called an array processor which consists of multiple processing elements (PE) under supervision of one control unit (CU). An array processor can handle single instruction multiple data streams (SIMD). It is also known as SIMD computers. SIMD appears in 2 basic architectural organization Array Processors using random access memory Associative Processors 1.1 SIMD COMPUTER ORGANIZATION All N identical processors operate under the control of a single instruction stream issued by a central control unit. There are N data streams; one per processor, so different data can be used in each processor. Figure 15.1 Configuration of SIMD Array Processor • The processors operate synchronously and a global clock is used to ensure lockstep operation, i.e., at each step (global clock tick) all processors execute the same instruction, each on a different datum. • Array processors such as the ICL DAP, Connection Machine CM-200, and MasPar are SIMD computers. • SIMD machines are particularly useful at exploiting data parallelism to solve problems having a regular structure in which the same instructions are applied to subsets of data. The same instruction is issued to all 4 processors (add two numbers), and all processors execute the instructions simultaneously. It takes one step to add the matrices, compared with 4 steps on a SISD machine. • In this example the instruction is simple, but in general it could be more complex such as merging two lists of numbers. • The data may be simple (one number) or complex (several numbers). • Sometimes it may be necessary to have only a subset of the processors execute an instruction, i.e., only some data needs to be operated on for that instruction. This information can be encoded in the instruction itself indicating whether – the processor is active (execute the instruction) – the processor is inactive (wait for the next instruction) • SIMD machines usually have 1000's of very simple processors. Shared memory SIMD machines are unrealistic because of the cost and difficulty in arranging for efficient access to shared memory for so many processors. There are no commercial shared memory SIMD machines. • MIMD machines use more powerful processors and shared memory machines exist for small numbers of processors (up to about 100). 1.2 MASKING & DATA ROUTING MECHANISM Each processor PE has its own memory PEMi. It has a set of working registers and flags names Ai, Bi ,Ci. It contains an Arithmetic and Logic unit Si and a local index register Ii, an address register Di and a data routing register Ri. The Ri of each PEi is connected to the Rj of other PEs via the interconnection network. When data transfer among PEs occurs, it is the content of Ri registers that are being transferred. Some array processor may use 2 routing register, one for input and the other for output. Each PEi is either active or in the inactive mode during each instruction cycle. If a PEi is active, it executes the instruction broadcast to it by the CU. If a PEi is inactive, it will not execute the broadcast instruction. The masking schemes are used to specify the status flag Si of PEi. The conventions Si = 1 is chosen for an active PEi and Si = 0 for an inactive PEi. In the CU, there is a global index register I and a Masking register M. The M register has N bits. The physical length of a vector is determined by the number of PEs. The CU performs the segmentation of a long vector into vector loops, the setting of a global address, and the offset increment. Two processes are employed Master Process: o Holds pool of tasks for worker processes to do o Sends worker a task when requested o Collects results from workers Worker Process: repeatedly does the following o Gets task from master process o Performs computation o Sends results to master Worker processes do not know before runtime which portion of array they will handle or how many tasks they will perform. Dynamic load balancing occurs at run time: the faster tasks will get more work to do. 1.3 INTER PC COMMUNICATION These are fundamental decisions in determining the appropriate structure of an interconnection network for an SIMD machine. The decisions are made between Operation Modes Control Strategies Switching methodologies Network Topologies Operation Mode The operations modes of interconnection networks can be of 3 categories Synchronous Asynchronous Combined. Synchronous communication is needed for establishing communication paths synchronously for either a data manipulating function or for a data instruction broadcast. Asynchronous communication is needed for multiprocessing in which connection requests are issued dynamically. A system may also be designed to facilitate both synchronous and asynchronous operations. Control Strategy A typical interconnection networks consists of number of switching elements and interconnecting links. Interconnection functions are realized by properly setting control of the switching elements. The control setting function can be of 2 types Distributed Control managed by individual switching element. Centralized Control managed by a centralized Switching Technology The 2 major switching methodologies are Circuit switching Packet switching In circuit switching a physical path is actually established between source and destination before transmission of data. This is suitable for bulk data transmission. In packet switching, data is put in a packet and routed through interconnection network without establishing a physical connection path. This is suitable for short messages. Network Topology A network can be depicted by a graph which nodes represent switching points and edges represent communication links. The topologies can be of 2 types Static Dynamic In static link between 2 processors are passive and dedicated buses cannot be reconfigured for direct connection to other processors. In dynamic link, configuration can be made by setting the network's active switching elements. 2. SIMD INTERCONNECTION NETWORK Various interconnection networks have been suggested for SIMD computers 1.1 STATIC NETWORKS & DYNAMIC NETWORKS The topological structure of an SIMD array processor is mainly characterized by the data routing network used in interconnecting the processing elements. Interconnection Networks • Parallel computers with many processors do not use shared memory hardware. • Instead each processor has its own local memory and data communication takes place via message passing over an interconnection network. • The characteristics of the interconnection network are important in determining the performance of a multicomputer. • If network is too slow for an application, processors may have to wait for data to arrive. Interconnection Networks and Message Passing In this case each processor has its own private (local) memory and there is no global, shared memory. The processors need to be connected in some way to allow them to communicate data. The SIMD interconnection networks are classified into the following 2 categories based on network topologies Static Networks Dynamic Networks Static Networks Topologies in the static networks can be classified according to the dimension required for layout. One dimensional topologies include Linear array Two dimensional topologies include The ring Star Tree Mesh Systolic Array Thee dimensional topologies include Completely connected chordal ring 3 cube 3 cube connected cycle Dynamic Networks 2 classes of dynamic networks are single stage multi stage Single Stage Networks A single stage switching network with N input selectors (IS) and N output selectors (OS). Each is essentially a 1- to-D demultiplexer and each OS is an M-to-1 multiplexer. Cross bar network is a single stage network. The single stage network is also called as recirculating network. Data items may have to recirculate through the single stage several times before reaching their final destinations. The number of recirculation depends on the connectivity in the single stage network. Multistage Networks Multi-stage networks are based on the so called shuffle-exchange switching element, which is basically a 2 x 2 crossbar. Multiple layers of these elements are connected and form the network. Many stages of interconnected switches form a multistage SIMD networks. Three characterizing feature describe multistage networks The Switch Box Network Topology Control Structure Each box is essentially an interchange device with 2 inputs and 2 outputs. There are 4 states in a switch box. They are Straight Exchange Upper Broadcast Lower broadcast. Figure 16.4 A two-by-two switching box and its four interconnection states Multistage Networks Banyan Baseline Cube Delta Flip Indirect cube Omega All multistage networks that are based on shuffle-exchange elements, is a blocking network because not all possible input-output connections can be made at the same time, since one path might block another (in contrast to a crossbar, which is nonblocking). Using crossbars instead of the shuffle-exchange elements, it is possible, to build nonblocking networks. Such networks are called CLOS-networks. A multistage network is capable of connecting an arbitrary input terminal to an arbitrary output terminal. Multistage networks can be One sided Two Sided The one sided network called as full switches, have input-output ports on the same side. The two sided network have an input side and output side and can be divided into three classes Blocking Arrangeable Non- Blocking In Blocking networks, simultaneous connections of more than one terminal pair may result conflicts in the use of network communication links. Examples : Data Manipulator, Flip, N cube, omega In rearrangeable network, a network can perform all possible connections between inputs and outputs by rearranging its existing connections so that a connection path for a new input-output pair can always be established. Example : Benes Network A non –blocking can handle all possible connections without blocking. Example : Clos Network (a) Omega Network (b) Benes Network (c) Clos Network 1.2 MESH CONNECTED ILLIAC NETWORK In a mesh network nodes are arranged as a q-dimensional lattice, and communication is allowed only between neighboring nodes. In a periodic mesh, nodes on the edge of the mesh have wrap-around connections to nodes on the other side. This is sometimes called a toroidal mesh. Mesh Metrics For a q-dimensional non-periodic lattice with kq nodes: • Network connectivity = q • Network diameter = q(k-1) • Network narrowness = k/2 • Bisection width = kq-1 • Expansion Increment = kq-1 • Edges per node = 2q The topology is formerly described by the four routing functions: • R+ : I i+1 mod N=> (0,1,2…,14,15) • R- : I i-1 mod N=> (15,14,…,2,1,0) • R+r: I i+r mod N=> (0,4,8,12)(1,5,9,13)(2,6,10,14)(3,7,11,15) • R-r: I i-r mod N=> (15,11,7,3)(14,10,6,2)(13,9,5,1)(12,8,4,0) where (i j k)(l m n)=> i-->j,j-->k,k-->i ; l-->m,m-->n,n-->l and r = N Each PEi is directly connected to its four neighbors in the mesh network. Horizontally, all PEs of all rows form a linear circular list as governed by the following two permutations, each with a single cycle of order N. The permutation cycles (a b c) (d e) stands for permutation a b, b c, c a and d e, e d in a circular fashion with each pair of parentheses. R+1 = (0 1 2….N-1) R–1 = (N-1 ….. 2 1 0) For the example betwork of N = 16 and r = 16 = 4, the shift by a distance of four is specifies by the followimg permutations, each with four cycles of order four each: R+4 = (0 4 8 12)(1 5 9 13)(2 6 10 14)(3 7 11 15) R –4 = (12 8 4 0)(13 9 5 1)(14 10 6 2)(15 11 7 3) 1.3 CUBE NETWORK The cube network consists of m functions defined by cubei (Pm-I ..Pi+1 PiPi-i ... Po) = Pm-i ….Pi+iP iPi-i …..Po for 0 <= i < m. When the PE addresses are considered as the corners of an m-dimensional cube this network connects each PE to its m neighbors. Dimensionality of a cube “n” = log2N • Communication links connect each pair of PEs with addresses that differ in 1 bit only (distance of 1). • Examples It takes <= log N steps to rotate data from any PE to another. • Routing functions are formerly described as: Ci (An-1 …. A1 A0)= An-1…Ai+1 A’i Ai-1……A0 V i =0,1,2,…,n-1 Example: N=8 => n=3 Two functions straight and exchange switch boxes are used in constructing multistage cube networks. The stages are numbered as 0 at input end and increased to n-1 at the output stage. 1.4 PARREL SHIFTER & DATA MANIPULATOR 1.5 SHUFFLE EXCHANGE & OMEGA NETWORK A shuffle-exchange network consists of n=2k nodes, and two kinds of connections. 1.Routing • Have wide applications in implementing parallel algorithms such as FFT, sorting, AT, and polynomial evaluations. • Two routing functions: shuffle “S” and Exchange “E”. Shuffle: S(A)=S(An-1…A1A0)=A.n-2…A1A0An-1, 0<A<1 Which is effectively like shuffling the bottom half of a card deck into the top half. Exchange: E(An-1…A1A0)= (An-1…A1A0’)=C0 . The complement of LSM means data exchange between 2 PEs with adjacent addresses. Exchange connections links nodes whose numbers differ in their lowest bit. Perfect shuffle connections link node i with node 2i mod(n-1), except for node n-1 which is connected to itself. Example of n = 8 An N by N omega network, consists of n identical stages, where each stage is a perfect shuffle interconnection followed by a column of N/2 four-function interchange boxes under individual box control. The shuffle connects output P n-l...Pl P0 of stage i to input P n-2...PlP0Pn-l of stage i-1. Each interchange box in an omega network is controlled by the n-bit destination tags associated with the data on its input lines. The perfect shuffle cuts the deck into 2 halves from the center and remixes them evenly. The inverse perfect shuffle does the opposite to restore the original ordering. 8-Node Shuffle-Exchange Network Below is an 8-node shuffle-exchange network, in which shuffle links are shown with solid lines, and exchange links with dashed lines. Shuffle-Exchange Networks • What is the origin of the name “shuffle-exchange”? • Consider a deck of 8 cards numbered 0, 1, 2,…,7. The deck is divided into two halves and shuffled perfectly, giving the order: 0, 4, 1, 5, 2, 6, 3, 7 • The final position of a card i can be found by following the shuffle link of node i in a shuffleexchange network. Shuffle-Exchange Networks • Let ak-1, ak-2,…, a1, a0 be the address of a node in a shuffle-exchange network in binary. • A datum at this node will be at node number ak-2,…, a1, a0, ak-1 after a shuffle operation. • This corresponds to a cyclic leftward shift in the binary address. • After k shuffle operations we get back to the node we started with, and nodes through which we pass are called a necklace. 3. ASSOCIATIVE ARRAY PROCESSOR 1.1 ASSOCIATIVE MENMORY ORGANIZATION 1.2 ASSOCIATIVE PROCESSOR (PEPE & TRANS) 1.3 ASSOCIATIVE SEARCHING ALGORITHM

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Computer_Architecture