Download Parallel Euler tour and Post Ordering for Parallel Tree Accumulations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lattice model (finance) wikipedia , lookup

B-tree wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Parallel Euler tour and Post Ordering for
Parallel Tree Accumulations
An implementation technical report
Sinan Al-Saffar & David Bader
University Of New Mexico
Dec. 2003
Introduction
Tree accumulation is the process of aggregating data placed in tree
nodes according to their tree structure. The aggregation can be a
simple arithmetic operation or a more complex function. This is similar
to arithmetic expression evaluation except there is one operation to be
performed on the entire tree and is thus implied and internal nodes do
not contain arithmetic operations. We have written a program to
implement accumulations on tree data structures in parallel.
We developed and tested our code on a fourteen-processor Sun SMP
and implemented parallelism using the threaded SIMPLE[1] library. We
report our results and describe our work in this report.
To implement tree accumulation and generally other algorithms in
parallel one requires input to be arranged and preprocessed in a
certain way, for example a tree needs to be supplied in a post
ordering[2] or preordering of its nodes. These operations are as
important as the main task one is addressing since if they are not
done efficiently, they become bottle-necks for performance. Some
authors wave their hand and assume a certain property exists in the
data after a “preprocessing step”. We address these aspects in this
report in detail.
We demonstrate how to obtain the Euler tour of a tree in parallel and
how to use this tour to obtain the post order of the tree nodes in
parallel. We then show how to perform parallel tree accumulations also
in parallel using this obtained post ordering.
Input:
We wrote treeGen, a small program that generates random trees to be
used on our runs. By a random tree we mean a tree whose nodes
contain an arbitrary value and have an arbitrary (up to a specifiable
max degree) number of child nodes. The arbitrary values in the nodes
can be used later in tree accumulation or other operations. We only
show the node numbers in the figure below. The weights or values of
the nodes are irrelevant to the tour. An example input tree is shown in
figure-1 below along with the output file from treeGen specifying it.
The DIMACS file format is simply an edges list.
c
c
p
n
e
e
n
e
e
n
e
n
e
e
n
n
n
n
tree file generated by
treeGen
format 8 7
0 53
0 1
0 2
1 14
1 3
1 4
2 27
2 5
3 2
3
3 6
3 7
4 31
5 85
6 30
7 11
6
0
1
2
4
5
7
Figure-1 (the input tree)
The Euler Tour
The Euler tour is a traversal of a graph such that each edge of this
graph is visited exactly once. The Euler tour technique is essential for
different parallel computations on graphs. For example, computing the
post order of the tree which is equivalent to conducting a depth first
search is inherently sequential since a processor cannot determine the
rank or order of its nodes without waiting for the output of the
previous processor so it can learn the starting rank of its first node.
The starting rank depends on the nodes appearing before in the depth
first search. A post ordering and other properties can be obtained with
the Euler tour technique without the need for a depth first search.
To obtain an Euler tour of the undirected tree on which we are to
perform the accumulation, we transform this tree into a directed graph
first by replacing each edge in the tree with two arcs opposed to each
other in direction. This results in an Eulerian graph since every node in
the graph will be of even degree. Figure-3 shows the input tree of
figure-1 transformed in this manner.
Result of the tour:
The output of performing the Euler tour will be a successor array
containing elements equal to twice the number of edges in the input
tree (since each edge will be replaced with two directed arcs for the
Euler tour). Each cell in this successor array represents an edge (the
index is the edge number) which points to the edge that succeeds it in
the tour. For example if Successor[2] = 7, then edge number 7
succeeds edge number 2 in the Euler tour. The successor array in
figure-3 represents the Euler tour of edges: 1-4-9-13-10-14-8-5-11-32-7-12-6.
Data Structures:
Graphs are usually represented using an adjacency list data structure.
We augment the typical adjacency list structure to facilitate the
parallel computation of the successor array (the Euler tour). For this
purpose we make three modification. The first two are suggested by[3].
The modifications to the adjacency list are:
1. Make the adjacency list circular
2. Add reverse pointers to each edge aÆb this pointer references
the reverse edge in the graph, edge bÆa.
3. The edge list is contiguous and the location of each edge in it
uniquely identifies that edge. We found this condition to be
necessary for the algorithm to work.
The single edge structure will thus look as follows:
Figure-2
struct edge{
int val;
struct edge * next;
struct edge * nextR;
}
val
nextR
next
nextR is the reverse pointer and val is the second node in the edge. If
‘val’ were node ‘c’ and nextR leads to an edge whose ‘val’ happens to
be the value ‘a’ then we know this node represents the edge aÆc and
its index in the edgeList[] array (see figure-3) is the edge number in
the input tree. This is important since the successor array holding the
Euler tour will refer to the tour by edge numbers. The next pointer
simply points to the next edge in the list and is made circular.
Our program starts out be reading the DIMACS tree file into memory
and storing the tree in the adjacency list illustrated in figure-3. the
program then performs discovers the Euler tour in parallel by running
the following code:
pardo (i, 0, numEdgesX2, 1){
succ[i].succ= edgeList[i].nextR->next- edgeList;
succ[i].prefix = 0;
}
Note the subtracting of the edgeList value in the second line. This is
why we mentioned that each edge’s index or location in memory is
significant, it corresponds to the number of that edge in the tree.
The successor array elements have two pointers, the successor value
which points to the next edge, trivially. And the prefix value which
may contain any weights to be assigned to the edges. This code is
abstracted in the algorithm in figure-3 without the prefix line.
The prefix or weight of each node is initialized to zero. We need the
prefix value of each edge to do post ordering on the tour once it is
found.
Obtaining the post ordering from the tour:
We assign the value of 1 to upward pointing edges and the value of 0
to the downward pointing edges. We then perform parallel prefix sums
on the tree. This results in the post ordering of the nodes. If
preordering were desired, it could be obtained by assigning the
upwardly pointing edges 1’s and 0’s to the edges pointing downwards
and then performing parallel prefix sums on the edges. The parallel
prefix sums is provided by David Helman and is an implementation of
the parallel prefix sums algorithm in [3].
The following code summarizes the calculation of the post order given
the successor array containing the correct Euler tour and prefix sums
in the prefix field:
// compute post order
pardo(i,0,numEdges*2,1){
post[succ[i].prefix] = edgeList[i].nextR->val;
}
This code block results in the post[] array containing the list of tree
nodes in post order. Next Page we show an illustration of the data
structures discussed so far and needed for the Euler tour and obtaining
the post ordering.
Edge # 2 in the
tree occurs in the
2nd position in the
EdgeList array.
a
1
2
3
13
a
4
d
5
e
6
a
7
f
8
b
9
g
10
h
f
10
g
3
12
e
14
c
7
11
d
9
2
c
5
8
b
6
b
4
1
a
h
b
c
Above: the input tree with
numbered edges.
d
Right: its adjacency and edge lists
made circular and reverse edge
pointers added.
e
f
11
b
Below: the successor array
representing the Euler tour of the
tree. We can obtain this in parallel
as the algorithm below shows.
g
12
c
h
13
d
AdjList[]
14
d
EdgeList[]
The EdgeList array above occupies contiguous memory and
the index of each edge is the edge number in the graph.
Successor[]
4
7
2
9
11
1
12
5
13
14
3
6
10
8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Algorithm for parallel Euler tour using the structures above:
For every edge, e, in the tree do in parallel {
Successor [ e ] = ( EdgeList [ e] . reversePtr ) . nextPtr ;
}
Figure-3
EdgeList[12]
Successor[12] = 6
Because:
We follow the reverse
pointer in the
EdgeList[12] cell. This
leads to the cell at index
7. Its next pointer goes
one cell above (index 6).
Equivekently the nodes
read c--->a
Tree Accumulation
Thus far we arranged our data in a post ordering. This makes it
possible to divide the data among processors in a way necessary for
performing tree accumulation in parallel, the next step. To perform
tree accumulation we identify the leaf nodes in the post ordering of the
tree and partition those among the processors where each processor
will keep pointers to its part of the shared array containing this post
ordering.
To perform upwards tree accumulation in parallel, each processor
starts with its set of leaf nodes and aggregates the data in each node
with its parent and then marks the child as being removed. The
parent’s child count is decremented and the parent is flagged as a new
leaf node to be processed in the next iteration when its child’s count
becomes zero. Two arrays are kept, leaves0 and leaves1. Leaves0
holds the current leaves and leave1 holds the leaves for the coming
iteration. The array pointers are swapped after each iteration.
Repeating this process and going up the tree till the root is reached
will complete the upward accumulation.
Below is the sample of the code performing the above task.
pardo(i,0,*numLeaves0,1){
//1. aggrigate your data with parent.
parent = edgeListPtrs[leaves0[j]]->val;
nodeValue[parent] = nodeValue[parent]
OPR nodeValue[leaves0[j]];
//2 update the child count for that parent
childCount[parent] = childCount[parent]-1;
}
if (childCount[parent] == 0){ // parent has become leaf
block1[id].size = block1[id].size +1;
leaves1[k] = parent;
k++;
}
j++;
Results
We present performance results for carrying out parallel Euler tour and
post ordering of the input tree on a trees of size 2^22 nodes. We ran
our parallel code on three different types of trees, random, caterpillar
(linear), and full trees. The results are shown below.
Performance on random trees
20
Seconds
15
Accumulation
10
Euler and post order
5
0
1
2
4
6
8
10
12
14
Number of Processors
One observes that our performance does achieve good speedups up to 8 processors but it
seems to level off thereafter. We suspect this could be an artifact of the system we ran on
since parallel expression evaluation code that is known to scale on other systems up to 64
processors had the similar speedup curved as ours shown here.
Performance on full trees
14
12
Seconds
10
8
Accumulation
6
Euler and post ordering
4
2
0
1
2
4
6
8
10
Number of Processors
12
14
Performance on caterpillar trees
14
12
Seconds
10
8
Accumulation
6
Euler and Post Order
4
2
0
1
2
4
6
8
10
12
14
Number of Processors
Another aspect to notice is that running times are dominated by the
time it takes to construct the Euler tour and computing the post
ordering thus one should not ignore these processes.
In this report we demonstrated how preprocessing can be and should
be conducted in parallel and how it can be used in the main task being
parallelized.
The choice of data structures, as always, is tightly related to the kinds
of operations we wish to perform in the data. We used augmented
adjacency lists to perform parallel Euler tour and post ordering. Then
we used this post ordering in parallel tree accumulation. We hope this
technical report will assist those trying to implement these operations
in the future.
References
•
•
•
[1] SIMPLE: A Methodology for Programming High Performance
Algorithms on Clusters of Symmetric Multiprocessors (SMPs)
(1999) David A. Bader, Joseph JaJa.
[2] F. Sevilgen, S. Aluru and N. Futamura, "Distributed memory
tree accumulations," Proc. Parallel and Distributed Computing
Systems, pp. 389-395, 1999.
[3] Introduction to Parallel Algorithms. Joseph JaJa, 1992