Download Efficient Data Structures for Storing Partitions of Integers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Efficient Data Structures for Storing the Partitions of Integers
Rung-Bin Lin
Computer Science and Engineering
Yuan Ze University, Chung-Li, 320 Taiwan
[email protected]
Abstract
Algorithms for enumerating the partitions of a
positive integer n have long been invented.
However, data structure for storing the partitions
is not received due attention. In this paper, several
data structures, ranging from the most intuitive
one to the most efficient one, are proposed. The
space and time complexity for creating the most
efficient data structure is O(n2). The space
complexity is low enough to make possible for
storing all the partitions of an integer up to
several ten thousands. This data structure can be
used to enumerate the partitions of any integer
smaller than or equal to n.
time needed to create each of these data structures.
In Section 3 we discuss how the proposed data
structure can be used to enumerate the partitions
of an integer. In Section 4, we present some
experimental data about the amount of memory
needed for these data structures. The last section
draws some conclusions.
2 Four Data Structures
Let the set of partitions of a positive integer n be
denoted by Ω. Any element w ∈ Ω is denoted as
w = y1 , y 2 ,..., y k
,
where
k
∑ yi = n
i =1
for
y i ∈ I , y i > 0, i = 1,..., k ≤ n
1
Introduction
Partitioning an integer n is to divide it into its
constituent parts which are all positive integers.
Algorithms for enumerating all the partitions of an
integer or only the partitions with a restriction
have long been invented [1,2]. However, data
structure for storing the partitions is not received
due attention. In this paper, we investigate four
data structures for storing all the partitions of an
integer. What we mean data structure here is that a
data structure can be used to enumerate the
partitions without doing any arithmetic operation
except for computing an index to an array. The
enumeration can be done either with a restriction
or without any restriction. Four data structures are
investigated. The most efficient data structure
need only store 0.75 n 2 + 3n + 3 integers if n is
even or 0.75 n 2 + 3n + 2.25 if n is odd. It is created
without exhaustively enumerating all the partitions.
The time complexity for creating this data
structure is the same as the space complexity. The
complexity is low enough to make possible for
storing all the partitions of an integer up to several
ten thousands.
The rest of this paper is organized as follows. In
Section 2 we describe how the four data structures
are created and derive the amount of memory and
and y i is called a part of
partition w. The parts of a partition are not
necessarily distinct, nor do they have a fixed order.
However, in order to ease the representation, we
assume y1 ≤ y 2 ≤ ... ≤ y k . Note that the value of k can
vary from one partition to another. For example, the
partitions of 6 are 〈1,1,1,1,1,1〉, 〈1,1,1,1,2〉, 〈1,1,1,3〉,
〈1,1,2,2〉, 〈1,1,4〉, 〈1,2,3〉, 〈1,5〉, 〈2,2,2〉, 〈2,4〉, 〈3,3〉,
and 〈6〉. In the following we will discuss how the
partitions of an integer can be stored in a computer.
Four data structures are investigated. They are called
direct linear, multiplicity linear, tree, and diagram
structures. The first two structures store data in a
linear array, so they are called linear structures.
2.1 Linear structures
Given the set of partitions of an integer n, said
Ω = {w1 , w2 ,..., w p} , the partitions can be stored in a
one-dimensional
array
in
the
form
of
w1 w1 w2 w2 ... w p w p , where wi denotes the number
of parts in wi . For example, the partitions of 6 can
be stored as 6,1,1,1,1,1,1,5,1,1,1,1,2,4,1,1,1,3,
4,1,1,2,2,3,1,1,4,3,1,2,3,2,1,5,3,2,2,2,2,2,4,2,3,3,1,6.
Totally, it needs 46 integers to store all the partitions.
This data structure is called direct linear in this paper.
It can be created using an algorithm that enumerates
the partitions in lexicographic order. Two partitions
and
are said in
y1 , y 2 ,..., y l
x1 , x 2 ,..., x k
lexicographic order if there exists a d ≤ min(k , l )
such that x i = y i for i < d and x d < y d . Once the data
structure is created, it can be employed to enumerate
the partitions one-by-one by first deciding the
number of parts in a partition and then by retrieving
the parts in sequence. This data structure is not
amenable to other types of enumeration with a
restriction such as enumerating partitions with the
smallest part larger than a certain number.
A partition of an integer can also be represented by
the repetitive parts contained in the partition. For
example, the partition 〈1,1,1,3〉 can be denoted by
(1,3)(3,1), where (1,3) means that part 1 occurs three
times in this partition and (3,1) means that part 3
occurs only once, i.e., 1’s multiplicity is 3 and 3’s
multiplicity is 1, respectively. Thus, all the partitions
of an integer can be stored in an array
as w1' ( w1' : m1) w'2 ( w'2 : m 2)... w'p ( w'p : m p ) , where wi' is
the set of distinct parts in partition wi and wi'
denotes
the
number
of
distinct
parts;
wi' : mi represents all the pairs of (part, multiplicity) in
partition wi . For example, the partitions of 6 can be
stored as 1,1,6,2,1,4,2,1,2,1,3,3,1,2,1,2,2,2,2,1,2,4,1,
3,1,1,2,1,3,1,2,1,1,5,1,1,2,3,2,2,1,4,1,1,3,2,1,6,1. It
needs 49 integers to store all the partitions of integer
6. This data structure is called multiplicity linear in
this paper. Similar to direct linear, this data structure
is not amenable to enumerations with certain
restriction. The space and time complexity for
creating and storing a linear data structure is
proportional to the number of partitions of an integer.
2.2
Tree structure
Here we proposed a tree structure to store all the
partitions of an integer. The basic idea comes from
an observation that two partitions of an integer may
differ in only a few parts. For example, 〈1,1,1,1,1,1〉
and 〈1,1,1,1,2〉 differs in only two parts. In this
situation, a sequence of the branches in a tree can be
used to store those parts common in any two
partitions. For example, a tree that stores all the
partitions of 6 is shown in Figure 1. Here, a tree node
is denoted by (y,Y), where y is a part of a partition
and Y is a number remained to be divided into parts
that are at least as large as y. For a root node, y is not
a part and simply used to denote the least number
into which Y should be divided. Prior to discussing
how this tree is constructed, let us see how this tree
can be used to enumerate all the partitions of 6.
Starting from the root, we traverse the tree in
depth-first-search. When a leaf node is visited, we
print out the parts encountered along the path from
the root to the leaf. These parts except the one stored
in the root form a partition of 6 and the path length is
equal to the number of parts in the partition. A path
length from the root to a leaf is defined as the
number of edges traversed from the root to the leaf.
For example, the path (1,6) (1,5) (1,4) (2,2) (2,0)
represents the partition 〈1,1,2,2〉 and the number of
parts in this partition is 4. The root is denoted by (1,n)
and a leaf is denoted by (y,0) for y ≤ n . A general
partition tree of integer n is presented in Figure 2,
where f (• ) is a floor function. The pseudo code for
creating such a tree is presented in Figure 3. Some
detailed explanations of the algorithm will be
presented when we elaborate on the proofs of some
lemmas given later.
In the following we will give some definitions and
lemmas that are used to prove a theorem that gives
the number of nodes in a partition tree.
Definition 1: A node without any child is called a
leaf node and is denoted by (y,0).
Definition 2: A node with at least one child is called
an internal node and is denoted by (y,Y) with Y > 0 .
(1,6)
(1,5)
(1,4)
(1,3)
(1,2)
(1,1)
(2,4) (3,3) (6,0)
(2,3) (5,0) (2,2) (4,0) (3,0)
(2,2) (4,0) (3,0)
(2,0)
(3,0) (2,0)
(2,0)
(1,0)
Figure 1. A partition tree for integer 6.
(1,n)
...
(1,n-1) (2,n-2) . . . (f((n-1)/2)-1,nf((n-1)/2)-1)
(2,n-4) (3,n-5) . . . (f((n-2)/2),n-2f((n-2)/2))
(f((n-1)/2),nf((n-1)/2))
(n-2,0)
...
Figure 2. A partition tree of integer n.
(n,0)
void integer_partition_tree (int n){
(1) create a node labeled with (1,n);
(2) put the node into a queue Q;
(3) while (Q is not empty) {
(4)
remove a node (y,Y) from Q;
(5)
for (j=y; j<= f(Y/2); j++) {
(6)
add a child node (j,Y-j) to the parent node;
(7)
put the node (j,Y-j) into Q; // an internal node
} // end of for
add a child node (0,Y) to (y,Y); // a leaf node
(8)
} // end of while
}
Figure 3. An algorithm for creating a partition tree.
Lemma 1: if (y,Y) is an internal node, then
0< y ≤Y .
Proof: It is clear that 0 < y ≤ Y holds if (y,Y) is a
root node. Initially, the queue contains only the root
node (1,n). For any node (y,Y) removed from the
queue, a child node (0,Y) (see line 8 in Figure 3) will
be created for (y,Y) and thus (y,Y) is an internal node.
Based on this statement, the root node is an internal
node. Furthermore, for any y ≤ j ≤ Y / 2 , using lines
6 and 7 in Figure 3 we will create a node (j,Y-j)
which will also be put into the queue. The
condition y ≤ j ≤ Y / 2 implies that Y − j ≥ Y / 2
and 0 < y ≤ j ≤ Y − j . Therefore, any node (y,Y)
removed from the queue is an internal node and has
the property 0 < y ≤ Y . 
Lemma 2: Given a node ( y p , Y p ) and its child node
( y c , Y c ) , we have y p ≤ y c .
Proof: From lines 4, 5, 6, 7, 8, and also from Lemma
1, a child node is created only when the child’s part
is greater than or equal to its parent’s part. Thus, we
have y p ≤ y c . 
Lemma 3: < y1 , y 2 ,..., y k > with y1 ≤ y 2 ≤ ... ≤ y k is a
partition of integer n if and only if
( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) is a path from the root
to a leaf.
Proof:
Supposed
we
have
a
path
( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) , based on Lemma 2,
we have y1 ≤ y 2 ≤ ... ≤ y k . Since ( y i +1 , Y i +1) is a child
node of ( y i , Y i ) , we have Y i = y i +1 + Y i +1 (see line 6
in Figure 3). Using this relation, we can easily derive
k
that n =Y 0 = y1 + Y 1 = y1 + y 2 + Y 2 = ... = ∑ y i and thus
i =1
< y1 , y 2 ,..., y k > is a partition of n. On the contrary,
supposed < y1 , y 2 ,..., y k > with y1 ≤ y 2 ≤ ... ≤ y k is a
partition of n, we can find a transition from
to
where
( y 0 , Y 0) = (1, n )
( y1 , Y 0 − y1)
y1 ∈ {1,2,3,..., n / 2 , n} . Generally, for any two parts
y i and y i +1 we can find a transition from ( y i , Y i )
to ( y i +1 , Y i +1) = ( y i +1 , Y i − y i +1) based on the tasks
done in lines 4, 5, 6, 7, and 8 in Figure 3. Recursively,
we can find a path ( y 0 , Y 0)( y1 , Y 1)( y 2 , Y 2)...( y k , Y k ) . 
Lemma 4: The total number of partitions of an
integer is equal to the number of leaf nodes in a
partition tree.
Proof: This is an obvious consequence of Lemma 3.

Theorem 1: The total number of nodes needed to
store all the partitions of an integer is twice the
number of leaf nodes in its partition tree.
Proof: To prove this theorem, it is sufficient to show
that for any given leaf node, there exists exactly an
internal node that is the parent of the leaf node and
for any given internal node, there exists exactly a leaf
node that is a child of the internal node. From the
proof of Lemma 1, we know that an internal node
can have only one child that is a leaf node. Since the
data structure obtained by the algorithm in Figure 3
is a tree, a leaf node can have only one parent node
that is also an internal node. As a consequence, the
number of internal nodes is the same as the number
of leaf nodes. Using Lemma 4, we complete the
proof of this theorem. 
It is clear that the space and time complexity for
creating and storing a partition tree is proportional
to the number of partitions. In our implementation,
each of the nodes in a partition tree actually
contains three fields shown below.
struct tree_node{
int part; // a part in a partition
int num_of_children;
// the number of child nodes
struct tree_node *next;
// point to the beginning of child nodes
};
Where part is used to store a part; here, all the
child nodes of an internal node are stored in an
array in ascending order of their parts and the field
num_of_children is used to register the number of
children an internal node has; next is a pointer
used to locate the starting address of the array that
holds the child nodes. num_of_children is used to
decide whether the end of an array is reached or
not during enumeration. The number of children
of an internal node (y, Y) can be computed as
follows:
Definition 3: A node (y,Y) that has the largest Y is
called an anchored node in a partition diagram.
if (y<=f(Y/2))
num_of_children= f(Y/2)-y+2
else num_of_children = 1;
For example, the number of children for an
internal node (2,4) is 2 and it is 1 for (2,3). Storing
the number of children rather than the value of Y,
we can speed up the enumeration of partitions.
Figure 4 shows a partition tree of this kind. A node
in the partition tree is denoted as [part,
num_of_children]. Although this kind of data
structure is what we have implemented, it is not
instrumental in helping us understand the
algorithm. Therefore, we will still use the data
structure given in Figure 1 for our presentation.
[1,4]
The anchored node is also the node last added to a
partition diagram. For example (1,6) is the anchored
node for the partition diagram of integer 6.
Definition 4: A node (y,Y) with Y=0 is called a
terminal node.
(1,6)
(1,5)
(1,4)
Stored as
an array
(1,3)
[1,3] [2,2] [3,1] [6,0]
(1,2)
[1,3] [2,1] [5,0]
(2,4) (3,3) (6,0)
(2,3) (5,0)
(2,2) (4,0)
(3,0)
[2,1] [4,0] [3,0]
(1,1) (2,0)
[1,2] [2,1] [4,0]
[3,0]
[2,0]
(1,0)
[1,2] [3,0]
[2,0]
Figure 5. A partition tree with data structure sharing.
[1,1] [2,0]
[1,0]
Figure 4. An implementation of the data structure
for the partition tree of integer 6.
2.3
Diagram Structure
As it can be seen from Figure 1, the partitions of a
node (2,4) form a sub-structure of the partitions of a
node (1,4). We can in fact create a data structure for
(2,4) simply using a pointer pointing to the data
structure for (1,4) and remember where the shared
data structure begins. In this manner we can create a
data structure for the partitions of an integer without
actually enumerating all the partitions. Because of
the sharing of data structure, the new data structure is
no longer a tree. We thus call it a partition diagram,
which is in fact a directed acylic graph. Figure 5
shows an example of sharing data structure in a
partition diagram for integer 6. The total number of
nodes in this partition diagram is 16. For being able
to perform enumeration correctly, we must know the
place where the shared data structure begins. The
starting place can be stored in a node to facilitate
enumeration.
To facilitate our discussion, the following definitions
are given.
An algorithm for creating such a partition diagram is
given in Figure 6. We create a partition diagram
recursively for each of the integers starting from 1 to
n. When we build a partition diagram for integer m,
i.e., a partition diagram with the anchored node (1,m),
we have to create a terminal node (m,0) and also the
internal
nodes
(2,m-2),
(3,m-3),
…,
(f((m)/2),(m-f((m)/2)) that store the pointers to the
shared data structures. Note that node (1, m-1) is
created in the previous iteration. The shared data
structure pointed by an internal node (k,m-h) for
2 < h ≤ f ( m / 2) is located in the partition diagram
rooted at (1,m-h) which has been created previously.
Therefore, the sharing of data structure can be done
easily. For example, given m=6, before creating node
(1,6) we have to create nodes (1,5), (2,4), (3,3), and
(6,0) and the sharing is done as shown in Figure 5.
Given an internal node (y,Y) where Y ≥ y > 1 , we
immediately know that (y,Y) has a shared data
structure belonging to node (1,Y). However, we don’t
know where the sharing begins, i.e., which children
of (1,Y) are also the children of (y,Y). We can derive
this information from (y,Y) quite easily. If y ≤ f (Y / 2) ,
data structure sharing starts from the yth child of (1,Y);
otherwise, it starts from the ( f (Y / 2) + 1) th child, i.e.,
the last child of (1,Y). For example, given a node
(2,4), data structure sharing starts from the second
child of (1,4) because 2 ≤ f (4 / 2) whereas given a
node (2,3), data structure sharing starts from the
second child, i.e., the last child of (1,3).
[1,4]
void integer_partition_tree (int n){
(1) for (i=1; i<=n; i++){
(2)
Stored as
an array
[1,3] [2,2] [3,1] [6,0]
create a node (i,0); // a terminal node
(3) for (j=2; j<= f(i/2); j++)
[1,3] [2,1] [5,0]
(4) create a node (j,i-j) with a pointer pointing to (1,i-j);
(5) create a node (1,i) with pointers point to (1,i-1) and
[1,2] [2,1] [4,0]
all the nodes created for i;
} // end of for
}
[1,2] [3,0]
Figure 6. Algorithm for creating a partition diagram.
[1,1] [2,0]
The enumeration can be done simply as that for a
partition tree. That is, a path from the anchored node
of a partition diagram to any of the terminal nodes
defines a partition of integer n. In fact, if a partition
diagram is expanded based on Lemma 2, a
corresponding partition tree will be generated.
Theorem 2: The total number of nodes in a
partition diagram is equal to 0.25 n 2 + n + 1 if n is
even or 0.25 n 2 + n + 0.75 if n is odd.
Proof: According to the algorithm presented in
Figure 6, we need f (i / 2) + 1 nodes to store all
sharing information for each i. We also need to create
the anchored node in the final iteration. Using simple
arithmetic, we can easily derive the above result. 
Based on theorem 2, we can create a partition
diagram for an integer up to several ten thousands on
a PC with 512M bytes main memory. Clearly, the
space and time complexity for creating and storing a
partition diagram is O(n2).
In our implementation, each node in a partition
diagram has three fields as shown below:
struct diagram_node{
int part;
int num_of_children;
// the number of child nodes
struct diagram_node *next;
// point to the beginning of the shared array
};
Similar to the creation of a partition tree, the nodes
having the same parent are put into an array. next
field is used to directly locate the beginning of the
shared nodes in an array. This is somewhat different
from that stated in the algorithm in Figure 6.
However, such an implementation incurs least
computation during enumeration. The field
num_of_children is used to see whether the last
element of an array is reached. A partition diagram of
this kind is given in Figure 7. If a pointer is treated as
an integer, the number of integers needed to form a
partition diagram is equal to 0.75 n 2 + 3n + 3 if n is
even or 0.75 n 2 + 3n + 2.25 if n is odd.
[1,0]
Figure 7. An implementation of the data structure for
the partition diagram of integer 6.
3
Enumerating Partition Diagram
It is clear that if a partition diagram (tree) is
created for integer n, this partition diagram (tree)
can be used to perform enumeration of the
partitions for any integer not larger than n. Since a
partition diagram is a concise representation of a
partition tree, enumerations done for a partition
diagram can also be done for a partition tree.
Therefore, our discussion will be made primarily
on a partition tree.
Various kinds of enumeration can be performed on
a partition tree. The simplest one is to employ a
depth-first-search for listing all the partitions in
lexicographic order. During depth-first search, the
child nodes of an internal node must be visited in
ascending order of their parts. On the contrary, a
partition tree is not amenable to the enumeration
of the partitions in reverse lexicographic order.
The enumeration of the partitions that have
smallest part or largest part restriction can also be
efficiently done. For enumeration with smallest
part restriction, we need only visit the branches
that have parts greater than or equal to a certain
number. Any subtree rooted at a node whose part
is smaller than a certain number can be pruned
completely. This is reason why this kind of
enumeration can be performed efficiently.
For
example, to find the partitions of integer 6 with
their parts not less than 2, the traversal of the
subtree rooted at (1,5) can be completely
eliminated. The enumeration with largest part
restriction can be done similarly. Enumeration of
the partitions that consist of only distinct parts can
also be done efficiently. When the depth-first
search of a subtree encounters a subpath with any
two nodes whose parts are the same, the traversal
of the subtree can be completely eliminated.
4
Experimental Study
In this section we implement all the data structures
discussed above. Figure 8 gives the number of
integers needed to store all the partitions of an
integer. The Y-axis is in log10 scale. As one can see
that the partition diagram needs about 6 order of
magnitude less memory than the partition tree for
n=120. This gap is widened as n increases.
Partition tree needs several times less memory
than multiplicity linear. Direct linear needs largest
amount of memory. It is clear that only partition
diagram can store all the partitions of an integer
up to several hundreds. We have in fact created a
partition diagram for an integer up to 30 thousands
on a PC with 512M bytes main memory.
5. Conclusions
In this paper we have studied four different data
structures for storing all the partitions of integer n.
The space and time complexity for creating a
linear structure or a partition tree is proportional to
the number of partitions whereas the complexity
for creating a partition diagram is only O(n2). This
complexity allows us to create a partition diagram
that can store all the partitions of an integer up to
several ten thousands.
12
Number of numbers stored (log)
The enumeration of the partitions with even
number of parts or odd number of parts can not be
done efficiently. To carry out such enumeration,
we need enumerate all the paths, each of which
starts from the anchored node to a terminal node,
to find out the length of each path. If the path
length is even (odd), the associated partition has
even (odd) number of parts. On the contrary, the
enumeration of the partitions with even (odd) parts
can be done efficiently. This is because traversal
of any subtree is terminated when the part of the
root node of the subtree is even (odd).
Direct
Multiplicity
Diagram
Tree
10
8
6
4
2
0
5
25
45
65
Integer
85
105
Figure 8. Number of integers used for different
data structures.
References
[1] D. Stanton and D. White. Constructive
Combinatorics. Springer-Verlag, Berlin, 1986.
[2]. C. L. Liu. Introduction to Combinatorial
Mathematics. Mcgraw-Hill College, 1968.