Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Efficient Data Structures for Storing the Partitions of Integers Rung-Bin Lin Computer Science and Engineering Yuan Ze University, Chung-Li, 320 Taiwan [email protected] Abstract Algorithms for enumerating the partitions of a positive integer n have long been invented. However, data structure for storing the partitions is not received due attention. In this paper, several data structures, ranging from the most intuitive one to the most efficient one, are proposed. The space and time complexity for creating the most efficient data structure is O(n2). The space complexity is low enough to make possible for storing all the partitions of an integer up to several ten thousands. This data structure can be used to enumerate the partitions of any integer smaller than or equal to n. time needed to create each of these data structures. In Section 3 we discuss how the proposed data structure can be used to enumerate the partitions of an integer. In Section 4, we present some experimental data about the amount of memory needed for these data structures. The last section draws some conclusions. 2 Four Data Structures Let the set of partitions of a positive integer n be denoted by Ω. Any element w ∈ Ω is denoted as w = y1 , y 2 ,..., y k , where k ∑ yi = n i =1 for y i ∈ I , y i > 0, i = 1,..., k ≤ n 1 Introduction Partitioning an integer n is to divide it into its constituent parts which are all positive integers. Algorithms for enumerating all the partitions of an integer or only the partitions with a restriction have long been invented [1,2]. However, data structure for storing the partitions is not received due attention. In this paper, we investigate four data structures for storing all the partitions of an integer. What we mean data structure here is that a data structure can be used to enumerate the partitions without doing any arithmetic operation except for computing an index to an array. The enumeration can be done either with a restriction or without any restriction. Four data structures are investigated. The most efficient data structure need only store 0.75 n 2 + 3n + 3 integers if n is even or 0.75 n 2 + 3n + 2.25 if n is odd. It is created without exhaustively enumerating all the partitions. The time complexity for creating this data structure is the same as the space complexity. The complexity is low enough to make possible for storing all the partitions of an integer up to several ten thousands. The rest of this paper is organized as follows. In Section 2 we describe how the four data structures are created and derive the amount of memory and and y i is called a part of partition w. The parts of a partition are not necessarily distinct, nor do they have a fixed order. However, in order to ease the representation, we assume y1 ≤ y 2 ≤ ... ≤ y k . Note that the value of k can vary from one partition to another. For example, the partitions of 6 are 〈1,1,1,1,1,1〉, 〈1,1,1,1,2〉, 〈1,1,1,3〉, 〈1,1,2,2〉, 〈1,1,4〉, 〈1,2,3〉, 〈1,5〉, 〈2,2,2〉, 〈2,4〉, 〈3,3〉, and 〈6〉. In the following we will discuss how the partitions of an integer can be stored in a computer. Four data structures are investigated. They are called direct linear, multiplicity linear, tree, and diagram structures. The first two structures store data in a linear array, so they are called linear structures. 2.1 Linear structures Given the set of partitions of an integer n, said Ω = {w1 , w2 ,..., w p} , the partitions can be stored in a one-dimensional array in the form of w1 w1 w2 w2 ... w p w p , where wi denotes the number of parts in wi . For example, the partitions of 6 can be stored as 6,1,1,1,1,1,1,5,1,1,1,1,2,4,1,1,1,3, 4,1,1,2,2,3,1,1,4,3,1,2,3,2,1,5,3,2,2,2,2,2,4,2,3,3,1,6. Totally, it needs 46 integers to store all the partitions. This data structure is called direct linear in this paper. It can be created using an algorithm that enumerates the partitions in lexicographic order. Two partitions and are said in y1 , y 2 ,..., y l x1 , x 2 ,..., x k lexicographic order if there exists a d ≤ min(k , l ) such that x i = y i for i < d and x d < y d . Once the data structure is created, it can be employed to enumerate the partitions one-by-one by first deciding the number of parts in a partition and then by retrieving the parts in sequence. This data structure is not amenable to other types of enumeration with a restriction such as enumerating partitions with the smallest part larger than a certain number. A partition of an integer can also be represented by the repetitive parts contained in the partition. For example, the partition 〈1,1,1,3〉 can be denoted by (1,3)(3,1), where (1,3) means that part 1 occurs three times in this partition and (3,1) means that part 3 occurs only once, i.e., 1’s multiplicity is 3 and 3’s multiplicity is 1, respectively. Thus, all the partitions of an integer can be stored in an array as w1' ( w1' : m1) w'2 ( w'2 : m 2)... w'p ( w'p : m p ) , where wi' is the set of distinct parts in partition wi and wi' denotes the number of distinct parts; wi' : mi represents all the pairs of (part, multiplicity) in partition wi . For example, the partitions of 6 can be stored as 1,1,6,2,1,4,2,1,2,1,3,3,1,2,1,2,2,2,2,1,2,4,1, 3,1,1,2,1,3,1,2,1,1,5,1,1,2,3,2,2,1,4,1,1,3,2,1,6,1. It needs 49 integers to store all the partitions of integer 6. This data structure is called multiplicity linear in this paper. Similar to direct linear, this data structure is not amenable to enumerations with certain restriction. The space and time complexity for creating and storing a linear data structure is proportional to the number of partitions of an integer. 2.2 Tree structure Here we proposed a tree structure to store all the partitions of an integer. The basic idea comes from an observation that two partitions of an integer may differ in only a few parts. For example, 〈1,1,1,1,1,1〉 and 〈1,1,1,1,2〉 differs in only two parts. In this situation, a sequence of the branches in a tree can be used to store those parts common in any two partitions. For example, a tree that stores all the partitions of 6 is shown in Figure 1. Here, a tree node is denoted by (y,Y), where y is a part of a partition and Y is a number remained to be divided into parts that are at least as large as y. For a root node, y is not a part and simply used to denote the least number into which Y should be divided. Prior to discussing how this tree is constructed, let us see how this tree can be used to enumerate all the partitions of 6. Starting from the root, we traverse the tree in depth-first-search. When a leaf node is visited, we print out the parts encountered along the path from the root to the leaf. These parts except the one stored in the root form a partition of 6 and the path length is equal to the number of parts in the partition. A path length from the root to a leaf is defined as the number of edges traversed from the root to the leaf. For example, the path (1,6) (1,5) (1,4) (2,2) (2,0) represents the partition 〈1,1,2,2〉 and the number of parts in this partition is 4. The root is denoted by (1,n) and a leaf is denoted by (y,0) for y ≤ n . A general partition tree of integer n is presented in Figure 2, where f (• ) is a floor function. The pseudo code for creating such a tree is presented in Figure 3. Some detailed explanations of the algorithm will be presented when we elaborate on the proofs of some lemmas given later. In the following we will give some definitions and lemmas that are used to prove a theorem that gives the number of nodes in a partition tree. Definition 1: A node without any child is called a leaf node and is denoted by (y,0). Definition 2: A node with at least one child is called an internal node and is denoted by (y,Y) with Y > 0 . (1,6) (1,5) (1,4) (1,3) (1,2) (1,1) (2,4) (3,3) (6,0) (2,3) (5,0) (2,2) (4,0) (3,0) (2,2) (4,0) (3,0) (2,0) (3,0) (2,0) (2,0) (1,0) Figure 1. A partition tree for integer 6. (1,n) ... (1,n-1) (2,n-2) . . . (f((n-1)/2)-1,nf((n-1)/2)-1) (2,n-4) (3,n-5) . . . (f((n-2)/2),n-2f((n-2)/2)) (f((n-1)/2),nf((n-1)/2)) (n-2,0) ... Figure 2. A partition tree of integer n. (n,0) void integer_partition_tree (int n){ (1) create a node labeled with (1,n); (2) put the node into a queue Q; (3) while (Q is not empty) { (4) remove a node (y,Y) from Q; (5) for (j=y; j<= f(Y/2); j++) { (6) add a child node (j,Y-j) to the parent node; (7) put the node (j,Y-j) into Q; // an internal node } // end of for add a child node (0,Y) to (y,Y); // a leaf node (8) } // end of while } Figure 3. An algorithm for creating a partition tree. Lemma 1: if (y,Y) is an internal node, then 0< y ≤Y . Proof: It is clear that 0 < y ≤ Y holds if (y,Y) is a root node. Initially, the queue contains only the root node (1,n). For any node (y,Y) removed from the queue, a child node (0,Y) (see line 8 in Figure 3) will be created for (y,Y) and thus (y,Y) is an internal node. Based on this statement, the root node is an internal node. Furthermore, for any y ≤ j ≤ Y / 2 , using lines 6 and 7 in Figure 3 we will create a node (j,Y-j) which will also be put into the queue. The condition y ≤ j ≤ Y / 2 implies that Y − j ≥ Y / 2 and 0 < y ≤ j ≤ Y − j . Therefore, any node (y,Y) removed from the queue is an internal node and has the property 0 < y ≤ Y . Lemma 2: Given a node ( y p , Y p ) and its child node ( y c , Y c ) , we have y p ≤ y c . Proof: From lines 4, 5, 6, 7, 8, and also from Lemma 1, a child node is created only when the child’s part is greater than or equal to its parent’s part. Thus, we have y p ≤ y c . Lemma 3: < y1 , y 2 ,..., y k > with y1 ≤ y 2 ≤ ... ≤ y k is a partition of integer n if and only if ( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) is a path from the root to a leaf. Proof: Supposed we have a path ( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) , based on Lemma 2, we have y1 ≤ y 2 ≤ ... ≤ y k . Since ( y i +1 , Y i +1) is a child node of ( y i , Y i ) , we have Y i = y i +1 + Y i +1 (see line 6 in Figure 3). Using this relation, we can easily derive k that n =Y 0 = y1 + Y 1 = y1 + y 2 + Y 2 = ... = ∑ y i and thus i =1 < y1 , y 2 ,..., y k > is a partition of n. On the contrary, supposed < y1 , y 2 ,..., y k > with y1 ≤ y 2 ≤ ... ≤ y k is a partition of n, we can find a transition from to where ( y 0 , Y 0) = (1, n ) ( y1 , Y 0 − y1) y1 ∈ {1,2,3,..., n / 2 , n} . Generally, for any two parts y i and y i +1 we can find a transition from ( y i , Y i ) to ( y i +1 , Y i +1) = ( y i +1 , Y i − y i +1) based on the tasks done in lines 4, 5, 6, 7, and 8 in Figure 3. Recursively, we can find a path ( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) . Lemma 4: The total number of partitions of an integer is equal to the number of leaf nodes in a partition tree. Proof: This is an obvious consequence of Lemma 3. Theorem 1: The total number of nodes needed to store all the partitions of an integer is twice the number of leaf nodes in its partition tree. Proof: To prove this theorem, it is sufficient to show that for any given leaf node, there exists exactly an internal node that is the parent of the leaf node and for any given internal node, there exists exactly a leaf node that is a child of the internal node. From the proof of Lemma 1, we know that an internal node can have only one child that is a leaf node. Since the data structure obtained by the algorithm in Figure 3 is a tree, a leaf node can have only one parent node that is also an internal node. As a consequence, the number of internal nodes is the same as the number of leaf nodes. Using Lemma 4, we complete the proof of this theorem. It is clear that the space and time complexity for creating and storing a partition tree is proportional to the number of partitions. In our implementation, each of the nodes in a partition tree actually contains three fields shown below. struct tree_node{ int part; // a part in a partition int num_of_children; // the number of child nodes struct tree_node *next; // point to the beginning of child nodes }; Where part is used to store a part; here, all the child nodes of an internal node are stored in an array in ascending order of their parts and the field num_of_children is used to register the number of children an internal node has; next is a pointer used to locate the starting address of the array that holds the child nodes. num_of_children is used to decide whether the end of an array is reached or not during enumeration. The number of children of an internal node (y, Y) can be computed as follows: Definition 3: A node (y,Y) that has the largest Y is called an anchored node in a partition diagram. if (y<=f(Y/2)) num_of_children= f(Y/2)-y+2 else num_of_children = 1; For example, the number of children for an internal node (2,4) is 2 and it is 1 for (2,3). Storing the number of children rather than the value of Y, we can speed up the enumeration of partitions. Figure 4 shows a partition tree of this kind. A node in the partition tree is denoted as [part, num_of_children]. Although this kind of data structure is what we have implemented, it is not instrumental in helping us understand the algorithm. Therefore, we will still use the data structure given in Figure 1 for our presentation. [1,4] The anchored node is also the node last added to a partition diagram. For example (1,6) is the anchored node for the partition diagram of integer 6. Definition 4: A node (y,Y) with Y=0 is called a terminal node. (1,6) (1,5) (1,4) Stored as an array (1,3) [1,3] [2,2] [3,1] [6,0] (1,2) [1,3] [2,1] [5,0] (2,4) (3,3) (6,0) (2,3) (5,0) (2,2) (4,0) (3,0) [2,1] [4,0] [3,0] (1,1) (2,0) [1,2] [2,1] [4,0] [3,0] [2,0] (1,0) [1,2] [3,0] [2,0] Figure 5. A partition tree with data structure sharing. [1,1] [2,0] [1,0] Figure 4. An implementation of the data structure for the partition tree of integer 6. 2.3 Diagram Structure As it can be seen from Figure 1, the partitions of a node (2,4) form a sub-structure of the partitions of a node (1,4). We can in fact create a data structure for (2,4) simply using a pointer pointing to the data structure for (1,4) and remember where the shared data structure begins. In this manner we can create a data structure for the partitions of an integer without actually enumerating all the partitions. Because of the sharing of data structure, the new data structure is no longer a tree. We thus call it a partition diagram, which is in fact a directed acylic graph. Figure 5 shows an example of sharing data structure in a partition diagram for integer 6. The total number of nodes in this partition diagram is 16. For being able to perform enumeration correctly, we must know the place where the shared data structure begins. The starting place can be stored in a node to facilitate enumeration. To facilitate our discussion, the following definitions are given. An algorithm for creating such a partition diagram is given in Figure 6. We create a partition diagram recursively for each of the integers starting from 1 to n. When we build a partition diagram for integer m, i.e., a partition diagram with the anchored node (1,m), we have to create a terminal node (m,0) and also the internal nodes (2,m-2), (3,m-3), …, (f((m)/2),(m-f((m)/2)) that store the pointers to the shared data structures. Note that node (1, m-1) is created in the previous iteration. The shared data structure pointed by an internal node (k,m-h) for 2 < h ≤ f ( m / 2) is located in the partition diagram rooted at (1,m-h) which has been created previously. Therefore, the sharing of data structure can be done easily. For example, given m=6, before creating node (1,6) we have to create nodes (1,5), (2,4), (3,3), and (6,0) and the sharing is done as shown in Figure 5. Given an internal node (y,Y) where Y ≥ y > 1 , we immediately know that (y,Y) has a shared data structure belonging to node (1,Y). However, we don’t know where the sharing begins, i.e., which children of (1,Y) are also the children of (y,Y). We can derive this information from (y,Y) quite easily. If y ≤ f (Y / 2) , data structure sharing starts from the yth child of (1,Y); otherwise, it starts from the ( f (Y / 2) + 1) th child, i.e., the last child of (1,Y). For example, given a node (2,4), data structure sharing starts from the second child of (1,4) because 2 ≤ f (4 / 2) whereas given a node (2,3), data structure sharing starts from the second child, i.e., the last child of (1,3). [1,4] void integer_partition_tree (int n){ (1) for (i=1; i<=n; i++){ (2) Stored as an array [1,3] [2,2] [3,1] [6,0] create a node (i,0); // a terminal node (3) for (j=2; j<= f(i/2); j++) [1,3] [2,1] [5,0] (4) create a node (j,i-j) with a pointer pointing to (1,i-j); (5) create a node (1,i) with pointers point to (1,i-1) and [1,2] [2,1] [4,0] all the nodes created for i; } // end of for } [1,2] [3,0] Figure 6. Algorithm for creating a partition diagram. [1,1] [2,0] The enumeration can be done simply as that for a partition tree. That is, a path from the anchored node of a partition diagram to any of the terminal nodes defines a partition of integer n. In fact, if a partition diagram is expanded based on Lemma 2, a corresponding partition tree will be generated. Theorem 2: The total number of nodes in a partition diagram is equal to 0.25 n 2 + n + 1 if n is even or 0.25 n 2 + n + 0.75 if n is odd. Proof: According to the algorithm presented in Figure 6, we need f (i / 2) + 1 nodes to store all sharing information for each i. We also need to create the anchored node in the final iteration. Using simple arithmetic, we can easily derive the above result. Based on theorem 2, we can create a partition diagram for an integer up to several ten thousands on a PC with 512M bytes main memory. Clearly, the space and time complexity for creating and storing a partition diagram is O(n2). In our implementation, each node in a partition diagram has three fields as shown below: struct diagram_node{ int part; int num_of_children; // the number of child nodes struct diagram_node *next; // point to the beginning of the shared array }; Similar to the creation of a partition tree, the nodes having the same parent are put into an array. next field is used to directly locate the beginning of the shared nodes in an array. This is somewhat different from that stated in the algorithm in Figure 6. However, such an implementation incurs least computation during enumeration. The field num_of_children is used to see whether the last element of an array is reached. A partition diagram of this kind is given in Figure 7. If a pointer is treated as an integer, the number of integers needed to form a partition diagram is equal to 0.75 n 2 + 3n + 3 if n is even or 0.75 n 2 + 3n + 2.25 if n is odd. [1,0] Figure 7. An implementation of the data structure for the partition diagram of integer 6. 3 Enumerating Partition Diagram It is clear that if a partition diagram (tree) is created for integer n, this partition diagram (tree) can be used to perform enumeration of the partitions for any integer not larger than n. Since a partition diagram is a concise representation of a partition tree, enumerations done for a partition diagram can also be done for a partition tree. Therefore, our discussion will be made primarily on a partition tree. Various kinds of enumeration can be performed on a partition tree. The simplest one is to employ a depth-first-search for listing all the partitions in lexicographic order. During depth-first search, the child nodes of an internal node must be visited in ascending order of their parts. On the contrary, a partition tree is not amenable to the enumeration of the partitions in reverse lexicographic order. The enumeration of the partitions that have smallest part or largest part restriction can also be efficiently done. For enumeration with smallest part restriction, we need only visit the branches that have parts greater than or equal to a certain number. Any subtree rooted at a node whose part is smaller than a certain number can be pruned completely. This is reason why this kind of enumeration can be performed efficiently. For example, to find the partitions of integer 6 with their parts not less than 2, the traversal of the subtree rooted at (1,5) can be completely eliminated. The enumeration with largest part restriction can be done similarly. Enumeration of the partitions that consist of only distinct parts can also be done efficiently. When the depth-first search of a subtree encounters a subpath with any two nodes whose parts are the same, the traversal of the subtree can be completely eliminated. 4 Experimental Study In this section we implement all the data structures discussed above. Figure 8 gives the number of integers needed to store all the partitions of an integer. The Y-axis is in log10 scale. As one can see that the partition diagram needs about 6 order of magnitude less memory than the partition tree for n=120. This gap is widened as n increases. Partition tree needs several times less memory than multiplicity linear. Direct linear needs largest amount of memory. It is clear that only partition diagram can store all the partitions of an integer up to several hundreds. We have in fact created a partition diagram for an integer up to 30 thousands on a PC with 512M bytes main memory. 5. Conclusions In this paper we have studied four different data structures for storing all the partitions of integer n. The space and time complexity for creating a linear structure or a partition tree is proportional to the number of partitions whereas the complexity for creating a partition diagram is only O(n2). This complexity allows us to create a partition diagram that can store all the partitions of an integer up to several ten thousands. 12 Number of numbers stored (log) The enumeration of the partitions with even number of parts or odd number of parts can not be done efficiently. To carry out such enumeration, we need enumerate all the paths, each of which starts from the anchored node to a terminal node, to find out the length of each path. If the path length is even (odd), the associated partition has even (odd) number of parts. On the contrary, the enumeration of the partitions with even (odd) parts can be done efficiently. This is because traversal of any subtree is terminated when the part of the root node of the subtree is even (odd). Direct Multiplicity Diagram Tree 10 8 6 4 2 0 5 25 45 65 Integer 85 105 Figure 8. Number of integers used for different data structures. References [1] D. Stanton and D. White. Constructive Combinatorics. Springer-Verlag, Berlin, 1986. [2]. C. L. Liu. Introduction to Combinatorial Mathematics. Mcgraw-Hill College, 1968.