Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6CCS3PAL-7CCSPDA: Answers Exercises for week 2: 1) Sum of n = 2k numbers using n/2 processors. Input array A=[1,2,3,4,5,6,7,8] 1.1) Show each step to output 1.2) Show total time 1.3) Show total work Answer 1: Using algorithm P-Sum shown in first slide of 2.PRAM-CBtree/3.SumMatrix.pdf. An array of size n can be solved by k=log2n steps. For the given size of n=8 this is k=3 steps. 1.1: a) Copy elements of array A to an array B to retain original data. B=[1,2,3,4,5,6,7,8] b) Step1: for 1 ≤ i ≤ 8/21 (4 times) add up elements [2i-1] and [2i] [1, 2, 3, 4, 5, 6, 7, 8] B=[3, 7, 11, 15, …] Step 2: for 1 ≤ i ≤ 8/22 (2 times) add up elements [2i-1] and [2i] [3, 7, 11, 15, …] B=[10, 26, …] Step 3: for 1 ≤ i ≤ 8/23 (1 time) add up elements [2i-1] and [2i] [10, 26, …] B=[36, …] c) Output answer at B[1] = 36 1.2: The time for a) is constant (a single command). T(a) = 1 The time for b) is k=3 (once per loop iteration). T(b) = k = log2n The time for c) is constant (a single command). T(c) = 1 Total time is for n elements is thus Ttotal = 1 + log2n + 1 = Θ(log2n) 1.3: The work for a) is n: the size of the copied array. W(a) = n The work for b) is the amount of additions done (red arrow pairs). This is determined by the line: for 1 ≤ i ≤ 8/2h in the algorithm. Running this k times and increasing h every iteration, the overall additions done are n/2 + n/4 + n/8+….+2 + 1 = n-1. For example adding 3 numbers together takes 2 total additions. W(b) = n-1 The work for c) is constant (single, known location). W(c) = 1 Total work for n elements is thus Wtotal = n + n-1 + 1 = 2n = Θ(n) Estimated running time with n elements on p processors: Tp(n) = Wtotal/p + Ttotal = Θ(n/p + log2n) Exercises for week 3: 1) Find the maximum member of array A=[3, 17, 27, 6] using a parallel EREW solution. 1.1) Show each step to output Answer 1: 1.1: Using algorithm EREW Max-of-Array shown on the fourth slide of 2.PRAMCBtree/5.tree-algs.pdf. The size of the given data is n=4. We construct an array A of size 2n-1 and place the original data into the array from A[n] to A[2n-1]. Elements A[1] to A[n-1] are empty. m = log2n = 2 and k = m – 1 = 1 For k = 1 step -1 to 0 (each step decrease the value of k until you reach 0): Step k=1) for all j such that 21 ≤ j ≤ 22 -1 in parallel find the maximum of the pairs A[2j] and A[2j + 1]) and place it into A[j]. j=2: A=[…,…,…,3,17,27,6] A[4] is 3 A[5] is 17. 17>3 thus A[2] = 17 j=3: A=[…,…,…,3,17,27,6] A[6] is 27 A[7] is 6. 27>6 thus A[3] = 27 Step k=0) for all j such that 20 ≤ j ≤ 21 -1 in parallel find the maximum of the pairs A[2j] and A[2j + 1]) and place it into A[j]. j=1: A=[…,17,27,3,17,27,6] A[2] is 17 and A[3] is 27. 27>17 thus A[1] = 27 End of array Output A[1] = 27 2) Find the maximum member of array A =[10, 9, 13, 62, 6, 3, 1, 14] using a parallel EREW solution. 2.1) Show each step to output 2.2) Show total time 2.3) Show total work Answer 2: 2.1: Using the same algorithm as above, an array can be presented as a tree with the root at A[1] and the leaves in A[2l-1] to A[2l-1] where l is the last level of the tree starting from 0 at the root. n=8 m = log28 = 3 k=3-1=2 A =[…, 10, 9, 13, 62, 6, 3, 1, 14] can be presented as: Step k = 2) for all j such that 22 ≤ j ≤ 23 -1 in parallel find the maximum of the pairs A[2j] and A[2j + 1]) and place it into A[j]. Step k = 1) for all j such that 21 ≤ j ≤ 22 -1 in parallel find the maximum of the pairs A[2j] and A[2j + 1]) and place it into A[j]. Step k = 0) for all j such that 20 ≤ j ≤ 21 -1 in parallel find the maximum of the pairs A[2j] and A[2j + 1]) and place it into A[j]. Output A[1] = 62 2.2: Total time: the amount of parallel pairwise comparison steps = Θ(log2n) 2.3: Total work: the size of the original array of elements = Θ(n) 3) Find the maximum member at the smallest index of array A=[16, 112, 8, 112, 112] using a CRCW parallel solution Answer 3: Using the algorithm CRCW Max-of-Array on slide 5 of 2.PRAM-CBtree/5.treealgs.pdf As the algorithm compares each element with the rest, it can be graphically presented as a complete graph. 1) Initialise array M with 0s indicating that every member could potentially be the largest value. 2) For all ordered pairs, compare their values in parallel and assign M = 1 to smaller members of the pairs. 3) In parallel, consider and compare the indices of all members that have an M value of 0. Assign 1 to members that have a larger index 4) Output the member with the M value of 0: index 2 with value 112 4) Find if value x=5 is in array A=[2, -1, 5, 7, 33, 0, 5, 5] using a EREW parallel solution. 4.1) What problems occur for this solution? 4.2) What three algorithms would a complete solution use to find the smallest index of a value X in an array A? Answer 4: Using the algorithm EREW Is-X-In-Array from slide 7 of 2.PRAM-CBtree/5.treealgs.pdf Temp = [5, 5, 5, 5, 5, 5, 5, 5] A = [2, -1, 5, 7, 33, 0, 5, 5] In parallel compare each member at index i…n of arrays Temp and A: if the elements are the same, assign value i to Temp[i] else assign value ∞. After comparison Temp = [∞,∞,3,∞,∞,∞,7,8] 4.1) The problems that occur are the initialization of the array Temp with values of x and the final retrieval of the answer after comparison. 4.2) To resolve the two issues the final solution utilizes 3 consecutive algorithms (from the slides): a) EREW Broadcast query item X to array Temp[1::n] This initializes the Temp array b) EREW Is-X-In-List for array L[1::n] This marks Temp entries where array L is equal to X c) EREW Min-Binary-fan-in with array Temp[1::n] This returns smallest marked temp entry 5) Find the minimum index of a solution for exercise 4) using a binary fan-in solution. Answer 5: a) Using algorithm EREW Broadcast on slide 8 from 2.PRAM-CBtree/5.treealgs.pdf x=5 size of array = 8 k = log2size = 3 Temp[1] =5 Temp = [5] For i = 0…2 For 2i + 1≤ j ≤2i+1 Temp[j] = Temp[j-2i] i=0 2≤j≤2 Temp[2] = Temp[1] Temp = [5, 5] i=1 3≤j≤4 Temp[3] = Temp[1] Temp[4] = Temp[2] Temp = [5, 5, 5, 5] i=2 5≤j≤8 Temp[5] = Temp[1] Temp[6] = Temp[2] Temp[7] = Temp[3] Temp[8] = Temp[4] Temp = [5, 5, 5, 5, 5, 5, 5, 5] b) Using the algorithm EREW Is-X-In-Array from slide 7 of 2.PRAM-CBtree/5.treealgs.pdf (same as Answer 4) Temp = [5, 5, 5, 5, 5, 5, 5, 5] A = [2, -1, 5, 7, 33, 0, 5, 5] In parallel compare each member at index i…n of arrays Temp and A: if the elements are the same, assign value i to Temp[i] else assign value ∞. After comparison Temp = [∞,∞,3,∞,∞,∞,7,8] c) Using the algorithm EREW Min-Binary-fan-in from slide 9 of 2.PRAMCBtree/5.tree-algs.pdf Temp = [∞,∞,3,∞,∞,∞,7,8] n=8 k=3 for j = 1…3 for all 1≤i≤n/2j Compare each pair Temp[2i-1] and Temp[2i] and assign the smaller value to Temp[i] j=1 1≤i≤4 Temp[1] ≤ Temp[2] thus Temp[1] = Temp[1] Temp[3] ≤ Temp[4] thus Temp[2] = Temp[3] Temp[5] ≤ Temp[6] thus Temp[3] = Temp[5] Temp[7] ≤ Temp[8] thus Temp[4] = Temp[7] Temp = [∞,3,∞,7,…] j=2 1≤i≤2 Temp[1] ≥ Temp[2] thus Temp[1] = Temp[2] Temp[3] ≥ Temp[4] thus Temp[2] = Temp[4] Temp = [3,7,…] j=3 1≤i≤1 Temp[1] ≤ Temp[2] thus Temp[1] = Temp[1] Temp = [3,…] Output answer Temp[1] = 3. The smallest occurrence of x = 5 in array A = [2, -1, 5, 7, 33, 0, 5, 5] is at index 3. 6) Find the smallest index of occurrence x=3 in the array A=[3, 16, 3, 29, 57, 8, 3, 4] using 3 EREW parallel algorithms. Show work for all three major steps. Answer 6: a) Using algorithm EREW Broadcast on slide 8 from 2.PRAM-CBtree/5.treealgs.pdf x=3 size of array = 8 k = log2size = 3 Temp[1] = 3 Temp = [3] For i = 0…2 For 2i + 1≤ j ≤2i+1 Temp[j] = Temp[j-2i] i=0 2≤j≤2 Temp[2] = Temp[1] Temp = [3, 3] i=1 3≤j≤4 Temp[3] = Temp[1] Temp[4] = Temp[2] Temp = [3, 3, 3, 3] i=2 5≤j≤8 Temp[5] = Temp[1] Temp[6] = Temp[2] Temp[7] = Temp[3] Temp[8] = Temp[4] Temp = [3, 3, 3, 3, 3, 3, 3, 3] b) Using the algorithm EREW Is-X-In-Array from slide 7 of 2.PRAM-CBtree/5.treealgs.pdf (same as Answer 4) Temp = [3, 3, 3, 3, 3, 3, 3, 3] A=[3, 16, 3, 29, 57, 8, 3, 4] In parallel compare each member at index i…n of arrays Temp and A: if the elements are the same, assign value i to Temp[i] else assign value ∞. After comparison Temp = [1,∞, 3, ∞,∞,∞,7, ∞] c) Using the algorithm EREW Min-Binary-fan-in from slide 9 of 2.PRAMCBtree/5.tree-algs.pdf Temp = [1,∞, 3, ∞,∞,∞,7, ∞] n=8 k=3 for j = 1…3 for all 1≤i≤n/2j Compare each pair Temp[2i-1] and Temp[2i] and assign the smaller value to Temp[i] j=1 1≤i≤4 Temp[1] ≤ Temp[2] thus Temp[1] = Temp[1] Temp[3] ≤ Temp[4] thus Temp[2] = Temp[3] Temp[5] ≤ Temp[6] thus Temp[3] = Temp[5] Temp[7] ≤ Temp[8] thus Temp[4] = Temp[7] Temp = [1,3,∞,7,…] j=2 1≤i≤2 Temp[1] ≤ Temp[2] thus Temp[1] = Temp[1] Temp[3] ≥ Temp[4] thus Temp[2] = Temp[4] Temp = [1,7,…] j=3 1≤i≤1 Temp[1] ≤ Temp[2] thus Temp[1] = Temp[1] Temp = [1,…] Output answer Temp[1] = 1. The smallest occurrence of x = 3 in array A=[3, 16, 3, 29, 57, 8, 3, 4] is at index 1. Exercises for week 4: 1) Find the prefix sums of array A=[3, 7, 8, 3, 9, 2, 3, 1] using an EREW binary tree solution. Answer 1: Using algorithm EREW-Prefix-Sum from the last slide of 2.PRAM-CBtree/5.treealgs.pdf The array A can be represented as a tree with the values n…2n-1 as the leaves: n=8 m=3 In k = m - 1 step – 1 to 0 we add assign the elements 1…n-1 the value of the sum of their children (See Answer 1 from week 2 above). We then initialise an array B such that B[1] = A[1]: We then consider the members of B at indices j at 2…2n-1. If the index is odd, we assign the member the value at index (j-1)/2. For example for j = 3: B[3] = B[1]. If the considered index is even, as is the case with j = 2, we assign the member the value of its parent in array B minus the value of its sibling in array A. The value of j = 2 is thus: B[2] = B[1] – A[3]. Calculating each level of the in parallel and moving away from the root we obtain the tree: The answer is the array of members at indices n…2n-1 of array B, or the leaves of the tree: [3, 10, 18, 21, 30, 32, 35, 36] 2) Find the prefix sums of array A=[7, 2, 3, 8, 7, 4, 6, 16] //This data is different from the one in the original class. Answer 2: Adding up the children of parents we obtain the following tree: Initialising array B, moving down its tree and assigning new values based on indices we obtain the tree: Output the answer of the leaves: [7, 9, 12, 20, 27, 31, 37, 53] 3) Show the structure of a linked list with the parent array P=[1, 3, 1, 2] Answer 3: 4) Show the structure of a complete binary tree with the parent array P=[1,1,1,2,2,3,3] Answer 4: Exercises for week 5: 1) Given the parent array P=[5, 1, 6, 2, 7, 4, 8, 8]: a) show the structure of the underlying data structure b) find the distance from the root of each member using a parallel ranking algorithm. Show each step. Answer 1: a) The structure of list L starting from the root is: 8—7—5—1—2—4—6—3 b) Using algorithm List-Rank from slide 4 of 2.PRAMCBtree/8.pointerjump.pdf Notation: k – element P(k) – parent of k PP(k) – parent of parent of k dist – distance from root _dist – new distance _P(k) – new parent of k rank – rank of element Step 0: Initialise all dist values to 1 except where k = P(k) which has dist=0 k P(k) dist PP(k) 8 8 0 8 7 8 1 8 5 7 1 8 1 5 1 7 2 1 1 5 4 2 1 1 6 4 1 2 3 6 1 4 Step 1: Where P(k) does not equal PP(k) assign dist(k) the value dist(k)+dist(P(k)). Assign P(k) the value of PP(k) k dist P(k) PP(k) P(k)=PP(k)? _dist _P(k) 8 0 8 8 Y 0 8 7 1 8 8 Y 1 8 5 1 7 8 N 2 8 1 1 5 7 N 2 7 2 1 1 5 N 2 5 4 1 2 1 N 2 1 6 1 4 2 N 2 2 3 1 6 4 N 2 4 6 2 2 5 N 4 5 3 2 4 1 N 4 1 Step 2: Where P(k) does not equal PP(k) assign dist(k) the value dist(k)+dist(P(k)). Assign P(k) the value of PP(k) k dist P(k) PP(k) P(k)=PP(k)? _dist _P(k) 8 0 8 8 Y 0 8 7 1 8 8 Y 1 8 5 2 8 8 Y 2 8 1 2 7 8 N 3 8 2 2 5 8 N 4 8 4 2 1 7 N 4 7 Step 3: Where P(k) does not equal PP(k) assign dist(k) the value dist(k)+dist(P(k)). Assign P(k) the value of PP(k) k dist P(k) PP(k) P(k)=PP(k)? _dist _P(k) 8 0 8 8 Y 0 8 7 1 8 8 Y 1 8 5 2 8 8 Y 2 8 1 3 8 8 Y 3 8 2 4 8 8 Y 4 8 4 4 7 8 N 5 8 6 4 5 8 N 6 8 3 4 1 8 N 7 8 Output answer: k rank 8 0 7 1 5 2 1 3 2 4 4 5 6 6 2) Given the parent array P=[1,1,1,2,8,3,4,6] show graphically the steps of an algorithm to find the root(s) of the underlying data structure. Answer 2: Using the algorithm FOREST-ROOT from the last slide of 2.PRAMCBtree/8.pointerjump.pdf. Original structure: For all elements in parallel, set P(k) to P(P(k)) 3 7 1) 2) 3) For the given interconnected network M, show the steps of an algorithm that finds the smallest member. What is the optimal running time of such an algorithm and why? M= 3 17 6 8 500 72 64 11 25 32 4 1 16 2 10 8 Answer 3: Using the algorithm MIN-2D-MESH from slide 3 of 3.IC-networks/2.mesh.pdf. q=4 a) For columns j = 0…3 in parallel: For rows i = 2…0 sequentially: Compare elements at [i, j] and [i + 1, j] and copy the smaller value to [i, j] b) For columns j = 2…0 sequentially: Compare elements at [0, j] and [0,j+1] and copy the smaller value to [0,j] Output answer at [0,0] a) j = 0…3 i = 2: 3 17 6 8 3 17 6 8 500 72 64 11 500 72 64 11 16 2 4 1 8 16 2 10 8 8 3 17 6 8 16 2 4 1 25 32 4 1 16 2 10 3 17 6 i = 1: 500 72 64 11 16 2 4 1 16 2 4 1 16 2 10 8 16 2 10 8 3 17 6 8 3 2 4 1 16 2 4 1 16 2 4 1 16 2 4 1 16 2 4 1 16 2 10 8 16 2 10 8 3 2 4 1 3 2 1 1 3 2 1 1 3 1 1 1 3 1 1 1 1 1 1 1 i=0 b) j = 2: j = 1: j = 0: Output answer at [0,0] = 1 The optimal running time is width – 1 + height -1 = with + height – 2 as the algorithm must parse the width and height of the mesh/matrix in-order to sequentially compare elements in pairs. 4) Find the prefix sums of the 2D mesh M given below using a parallel algorithm. Show each step. M= 1 4 2 3 7 6 11 2 4 Answer 4: Using the algorithm Prefix computation from slide 9 of 3.IC-networks/2.mesh.pdf 1) In parallel add up the members of each row such that: For 1 ≤ j ≤q- 1 Si,j = Si,j-1 + Xi,j 1 4 2 3 7 6 11 2 4 1 5 7 3 10 16 11 13 17 2) Add up the members of last column such that: For 1 ≤ i ≤q- 1 Si,q-1 = Si,q-1 + Si-1,q-1 1 5 7 3 10 16 1 5 7 3 10 23 11 13 40 11 13 17 3) Factor last column across rows in parallel For 1 ≤ i ≤ q -1 in parallel do For 0 ≤ j ≤ q -2 do Si,j = Si,j + Si-1,q-1 (i.e. add the sum at the end of the last row) 1 5 3 10 23 1 7 11 13 40 5 7 10 17 23 34 36 40 5) Find the prefix sums of the 2D mesh M given below using a parallel algorithm. Show each step. M= 7 1 3 4 16 9 11 23 8 7 13 10 5 6 4 0 Answer 5: Using the algorithm Prefix computation from slide 9 of 3.IC-networks/2.mesh.pdf 1) In parallel add up the members of each row such that: For 1 ≤ j ≤q- 1 Si,j = Si,j-1 + Xi,j 7 7 8 11 15 1 3 4 16 9 11 23 8 7 13 10 8 15 28 38 5 6 4 0 5 11 15 15 16 25 36 59 2) Add up the members of last column such that: For 1 ≤ i ≤q- 1 Si,q-1 = Si,q-1 + Si-1,q-1 7 8 7 11 15 16 25 36 59 8 11 15 16 25 36 74 8 15 28 38 8 15 28 112 5 11 15 15 5 11 15 127 3) Factor last column across rows in parallel For 1 ≤ i ≤ q -1 in parallel do For 0 ≤ j ≤ q -2 do Si,j = Si,j + Si-1,q-1 (i.e. add the sum at the end of the last row) 7 8 11 15 16 25 36 74 7 8 11 15 31 40 51 74 89 102 112 8 15 28 112 82 5 11 15 127 117 123 127 127 6) Describe a parallel solution for matrix-vector multiplication and apply it to the matrix-vector pair M-V given. Show each step. M= V= 1 925 26 31 16 5 11 2 27 9 3 3 Answer 6: Using algorithm Matrix Vector Multiplication from page 1 of 3.ICnetworks/3.processor-ring-mult.pdf 1) Spilt up the matrix: M1= M2 = 925 16 27 M3 = 26 5 9 31 11 3 2) Split up the vector: V1= 1 V2=2 V3=3 3) In parallel calculate matching pairs Mi * Vi M1 * V1 = M2 * V2 = M3 * V3 = 925 16 27 52 10 18 93 33 9 4) Add up all obtained matrices: 925 16 27 + 52 10 18 + 93 33 9 = 1070 59 54 7) Given the matrix A and a network of 4 processors denoted by x, show the illustrated steps of matrix multiplication on 1D networks. A= x= 1 2 3 4 3 0 2 2 0 3 5 3 6 4 4 1 1 1 0 2 Answer 7: Explanation brought out in the last slides of 3.IC-networks/2.mesh.pdf The columns of the original matrix are shifted. The procedure then considers the matrix from bottom to top, calculates existing members with network values in the corresponding location and passes the answer on to the next network position to be added to the multiplication answer there. The final network position places its overall result into a final answer construct. 1 0 5 1 3 3 2 6 1 2 2 3 1 3 4 4 0 4 0 2 Step 1: 1 0 5 1 3*1 3 2 6 1 2 2 3 1 3 4 0 4 0 4 2 Step 2: 2 2 3 1 1 0 5 1 3*5 3 2 6 1 4 0 4 0 3*1+3 4 2 Step 3: 2 2 3 1 1 0 5 1 3*0 3 2 6 1 4 0 4 0 3 * 3 + 15 4*1+6 2 4 * 6 + 24 2 * 0 + 10 Step 4: 2 2 3 1 1 0 5 1 3*1 3 2 6 1 4 0 4 0 3*2+0 Step 5: 1 0 5 1 3 2 2 3 1 3 2 6 1 3*2+3 4 0 4 0 4*2+6 2 * 4 + 48 10 Step 6: 1 0 5 1 3 2 2 3 1 3 3 2 6 1 4 0 4 0 4*3+9 2 * 0 + 14 56 10 Step 7: 1 0 5 1 3 2 2 3 1 3 3 2 6 1 4 0 4 0 4 2 * 4 + 21 14 56 10 Step 8: 1 0 5 1 3 2 2 3 1 3 3 2 6 1 4 4 0 4 0 2 29 14 56 10 8) Given the matrix A and a network of 3 processors denoted by x, show the illustrated steps of matrix multiplication on 1D networks. A= 1 3 7 2 4 8 6 8 10 1 2 3 x= ½ ¼ ¼ Answer 8: The final answer is: 3 4 15/2 7/4 Exercises for week 6: 1) Given a statement: A[i] = 2B[i] + 3, where A and B are arrays and A[i] is element i of A: 1.1) write pseudo code for an algorithm that would use multi threading to implement this statement. 1.2) draw a computation dag* for this algorithm given an input in the form (B, i, j, A) where B is an array of size 4, indexed [1…4], i and j are indices and A is the output in the form of an array. Answer 1: 1.1: Begin Parallel-Alg (B,i,j,A) if i = j A[i] = 2B[i] + 3 else{ m= (i + j) / 2 spawn Parallel-Alg (B,i,m,A) Parallel-Alg (B,m+1,j,A) Sync } end 1.2: