Download Abstract - Department of Computer Science and Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Persistent Data Structures
By,
Brett Keck
Caly Nguyen
Lisa Radden
Abstract
Persistent data structures allow past versions of a dynamic set to be maintained as it is
updated. Information, therefore, no longer has to be flattened to be stored, and data that has been
altered is no longer lost. Our example consists of two parts. The first part implements a fully
persistent data structure using a binary search tree and the node-copying method. This method
involves assigning a unique version number to each node in the tree, and maintaining a data
structure of these version numbers so that no previous alteration is lost in the tree. Three basic
operations are incorporated into the node-copying method: link, delete, and pull.
These
operations, in turn, are combined into the increasing subtree method so as to ensure an O(log n)
running time ( where n is the number of leaves).
The second part of the example is a
demonstration of a current software program ( Adobe Photoshop 5.0 ) that actually uses persistent
data structures to perform a certain task. The in-class demonstration shows an application of the
‘History’ function in Photoshop on an image with multiple layers.
Introduction
This document contains the definition, description, and implementation of persistent data
structures. It is broken down into three sections an overview (p. 2-3), in which persistent data
structures are defined and explained; an implementation section (p. 3-4), in which the methods
and algorithms used to create a persistent data structure are described; and an appendix starting
on page four which describes our example. A list of references is provided on page six, and a hard
copy of our program can be viewed on pages eight through twelve. Biographical sketches of the
authors begin on page thirteen.
Overview of Persistent Data Structures:
Persistent data structures allow for more efficient storage and retrieval of information.
Before the dawn of persistent data structures, in order for any data to be stored permanently in a
complex data structure, the structure would first have to be flattened and then copied into a file
(Dearle 1996, p. 64). There also had to be a code corresponding to the original code in order to
reload it. And finally, as with most early applications, revisions to the data structure were
ephemeral – that is, the sequence of revisions leading to the current one was not kept, and thus
changes were permanent. Now, with persistent data structures, information no longer has to be
flattened to be stored and data that had once been altered is no longer lost. The key to all these
improvements lies within the heart of a persistent structure – its ability to maintain past versions
of a dynamic set as it is updated.
One way to implement a persistent data structure is with a binary tree. As the tree is
updated, there are two possible methods of updating the tree. In the first method, each time an
element is added into the tree, the program can make a copy of the entire set over which it was
modified. Or, as in the second method, the program can maintain a separate root for every
version of the set. This way, data that is common to certain sets would not have to be duplicated
(Dearle 1996, p. 64).
Basically, for a binary tree, each node can be seen as a container. The container is not
only able to hold the data that the location represents, but also a list of pointers that allow for
some type of ordering of the containers to be formed. The main purpose of mapping the
containers is to allow data to be shared between these containers. Thus, the list of pointers is the
key to the persistent data structure (Dearle 1996, p. 65). By following the pointers, different data
can be accessed. This way, a record of the structures that were created before certain updates
were made is available. Since data may persist for an arbitrary length of time, the original data
structures used by applications may be maintained in their original form.
Subsequent work on
saved documents simply involves having the applications reattach themselves to the persistent
data structures (Kakkad 1992, p.146).
A data structure uses two types of operations : queries and updates. A query retrieves
information, whereas an update changes the information. If queries and updates are only done on
the current version, In a partially persistent data structure, queries and updates are performed on
any version, however updates can only be performed on the most recent version. Therefore, it is
most efficient to use a fully persistent data structure when reconstructing past history, because
queries and updates can be performed on any version (Driscoll et al. 1994, p. 944).
Implementation of Persistent Data Structures:
The best method to implement a fully persistent data structure is the node-copying
method. The node-copying method involves assigning a unique version number to each node in
the tree, and maintaining a data structure of those version numbers. Each version number is a
pointer to a particular node. To navigate throughout the tree, the version number is used. This
type of navigation is accomplished by the way the tree is built. A binary tree is used that is
composed of fully persistent lists as internal nodes, and whose leaves are the elements of the list
to be represented. Each list is represented as a rooted ordered tree with the list elements in the
leaves, so that a left-to-right traversal is an inorder traversal. Each internal node of the tree
contains a header node that points to the first element of a singly linked list of pointers to its
children. In the root, the list elements of the children that are not leaves are singly linked together
by additional pointers. The header node also has pointers to the first and last of these nonleaf
children, and to the last child. Every tree is 3-collapsible and the ith tree has more than twice as
many leaves as the (i-1)st tree. A 3-collapsible tree ensures that the first child of the root is a leaf
by applying the “delete and pull three times” sequence until the tree is empty. The size of the tree
is recorded and stored. See Diagram A on page seven for a graphical representation (Driscoll et
al. 1994, p. 944).
Three basic operations are used to manipulate the tree: link, delete, and pull. Link takes two trees
and makes the root of the second tree the last child of the root of the first. This method keeps the
beginning of the list easily accessible. The delete operation removes the first child of the root, as
long as it is leaf. The pull operation brings subtrees up to the root, and ensures that the first child
of the root is always a leaf (Driscoll et al. 1994, p. 958).
These operations are combined into the increasing subtree method so as to ensure an
O(log n) running time (where n is the number of leaves). A faster running time of O(log log k)
could be obtained by implementing the finger-tree method on a red-black tree, but this requires a
heavy amount of machinery to accomplish (Driscoll et al. 1994, p. 958).
In the increasing subtree method, the subtrees of the root increase exponentially in size
from left to right, the left-most leaves being nearest to the root. The pop and catenate functions
are used. Popping will use the pull function repeatedly until the left child of the root is a leaf, and
then use the delete function to remove that leaf. Catenating is accomplished by moving the
subtrees of the second list into the entire tree of the first list one at a time via the link function.
Appendix
Description of Example
The example consists of two parts. The first part is that of an example program that
shows a simple way to implement a persistent data structure. The second part is an in-class
demonstration of a current software program that actually uses persistent data structures to
perform a certain task.
The implementation of a basic dynamic structure revolves around that of a binary tree.
Within the binary tree, each node is a vector of pointers that allows data to be accessed in
different orders without having to directly copy the data of an update every time something is
changed. The data itself is kept in a different structure. Keeping the ordering and the data
separate allows a program to be able to call data in an ordering that corresponds to an ordering in
the binary tree. This data can then be manipulated according to the state that we want to reach.
For the second part of the example, a demonstration will be done using
Adobe
Photoshop 5.0. When working with Photoshop, the program itself keeps a list of all the events, or
actions, that have been performed. Accessing the history menu in Photoshop allows the user to
return to any previous instance in their project. So, if you were doctoring an image and decided
that somewhere along the way you messed up, you would be able to retrace your steps and start
back up at any point you wanted. The history function in Photoshop, however, is not fully
persistent. When you back track and begin to work on your project from a previous stage, all the
operations that were performed ahead of your new starting point are lost. Photoshop begins anew
at that point and branches off from the original structure.
The in-class demonstration will be of an image created in Photoshop with multiple layers.
After adding layers, we will travel back to the beginning, middle, and end points to show that the
records of what we did before and after are kept in tact.
References
Driscoll, James R.; Sleator, Daniel, D.; Tarjan, Robert E.. “Fully Persistent Lists with
Catenation.” Journal of the Association for Computing Machinery. Vol 41, No 5, September
1994, pp 943-959.
Dearle, Alan; Hulse, David; Linderstrom, Anders; Norris, Stephen; Rosenberg, John. “Operating
System Support for Persistent and Recoverable Computations.” Communications of the ACM.
Vol. 39. September 1996.
Kakkad, Sheetal V.; Singhal, Vivek; Wilson, Paul R.. “Texas: Good, Fast, Cheap Persistence for
C++.” Addendum to the Proceedings. October 1992.
Lisa Radden ( originally from Westwood, Massachusetts ), is a senior at the University
of Notre Dame, and will be graduating in the spring of 2000 with a Bachelor of Science in
science-computing, and a Bachelor of Fine Arts in design. As an SCCO major, she studied the
basic sciences ( biology, chemistry, physics, and geology ) as well as a sequence of five
engineering courses in the software theory field. As a BFA major, she studied the digital arts,
gaining experience in such software as Adobe Photoshop, Adobe Illustrator, and Director 7.0.
She will be taking a Digital 3D class in the spring to finish off her portfolio. She is interested in
pursuing a career or further education in the computer graphics industry, especially computer
animation in film. She currently works for the University of Notre Dame Web Administration
producing graphics for its official web site, and editing its content and style.
She, too, was fated to become partners with the infamous Dr. Keck and his sidekick
Number Two in their programming adventures. Together, one day, these three mighty heroes
will conquer the evil Bjarne and the world of C++ and be very, very, rich. Stay tuned…