Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Persistent Data Structures By, Brett Keck Caly Nguyen Lisa Radden Abstract Persistent data structures allow past versions of a dynamic set to be maintained as it is updated. Information, therefore, no longer has to be flattened to be stored, and data that has been altered is no longer lost. Our example consists of two parts. The first part implements a fully persistent data structure using a binary search tree and the node-copying method. This method involves assigning a unique version number to each node in the tree, and maintaining a data structure of these version numbers so that no previous alteration is lost in the tree. Three basic operations are incorporated into the node-copying method: link, delete, and pull. These operations, in turn, are combined into the increasing subtree method so as to ensure an O(log n) running time ( where n is the number of leaves). The second part of the example is a demonstration of a current software program ( Adobe Photoshop 5.0 ) that actually uses persistent data structures to perform a certain task. The in-class demonstration shows an application of the ‘History’ function in Photoshop on an image with multiple layers. Introduction This document contains the definition, description, and implementation of persistent data structures. It is broken down into three sections an overview (p. 2-3), in which persistent data structures are defined and explained; an implementation section (p. 3-4), in which the methods and algorithms used to create a persistent data structure are described; and an appendix starting on page four which describes our example. A list of references is provided on page six, and a hard copy of our program can be viewed on pages eight through twelve. Biographical sketches of the authors begin on page thirteen. Overview of Persistent Data Structures: Persistent data structures allow for more efficient storage and retrieval of information. Before the dawn of persistent data structures, in order for any data to be stored permanently in a complex data structure, the structure would first have to be flattened and then copied into a file (Dearle 1996, p. 64). There also had to be a code corresponding to the original code in order to reload it. And finally, as with most early applications, revisions to the data structure were ephemeral – that is, the sequence of revisions leading to the current one was not kept, and thus changes were permanent. Now, with persistent data structures, information no longer has to be flattened to be stored and data that had once been altered is no longer lost. The key to all these improvements lies within the heart of a persistent structure – its ability to maintain past versions of a dynamic set as it is updated. One way to implement a persistent data structure is with a binary tree. As the tree is updated, there are two possible methods of updating the tree. In the first method, each time an element is added into the tree, the program can make a copy of the entire set over which it was modified. Or, as in the second method, the program can maintain a separate root for every version of the set. This way, data that is common to certain sets would not have to be duplicated (Dearle 1996, p. 64). Basically, for a binary tree, each node can be seen as a container. The container is not only able to hold the data that the location represents, but also a list of pointers that allow for some type of ordering of the containers to be formed. The main purpose of mapping the containers is to allow data to be shared between these containers. Thus, the list of pointers is the key to the persistent data structure (Dearle 1996, p. 65). By following the pointers, different data can be accessed. This way, a record of the structures that were created before certain updates were made is available. Since data may persist for an arbitrary length of time, the original data structures used by applications may be maintained in their original form. Subsequent work on saved documents simply involves having the applications reattach themselves to the persistent data structures (Kakkad 1992, p.146). A data structure uses two types of operations : queries and updates. A query retrieves information, whereas an update changes the information. If queries and updates are only done on the current version, In a partially persistent data structure, queries and updates are performed on any version, however updates can only be performed on the most recent version. Therefore, it is most efficient to use a fully persistent data structure when reconstructing past history, because queries and updates can be performed on any version (Driscoll et al. 1994, p. 944). Implementation of Persistent Data Structures: The best method to implement a fully persistent data structure is the node-copying method. The node-copying method involves assigning a unique version number to each node in the tree, and maintaining a data structure of those version numbers. Each version number is a pointer to a particular node. To navigate throughout the tree, the version number is used. This type of navigation is accomplished by the way the tree is built. A binary tree is used that is composed of fully persistent lists as internal nodes, and whose leaves are the elements of the list to be represented. Each list is represented as a rooted ordered tree with the list elements in the leaves, so that a left-to-right traversal is an inorder traversal. Each internal node of the tree contains a header node that points to the first element of a singly linked list of pointers to its children. In the root, the list elements of the children that are not leaves are singly linked together by additional pointers. The header node also has pointers to the first and last of these nonleaf children, and to the last child. Every tree is 3-collapsible and the ith tree has more than twice as many leaves as the (i-1)st tree. A 3-collapsible tree ensures that the first child of the root is a leaf by applying the “delete and pull three times” sequence until the tree is empty. The size of the tree is recorded and stored. See Diagram A on page seven for a graphical representation (Driscoll et al. 1994, p. 944). Three basic operations are used to manipulate the tree: link, delete, and pull. Link takes two trees and makes the root of the second tree the last child of the root of the first. This method keeps the beginning of the list easily accessible. The delete operation removes the first child of the root, as long as it is leaf. The pull operation brings subtrees up to the root, and ensures that the first child of the root is always a leaf (Driscoll et al. 1994, p. 958). These operations are combined into the increasing subtree method so as to ensure an O(log n) running time (where n is the number of leaves). A faster running time of O(log log k) could be obtained by implementing the finger-tree method on a red-black tree, but this requires a heavy amount of machinery to accomplish (Driscoll et al. 1994, p. 958). In the increasing subtree method, the subtrees of the root increase exponentially in size from left to right, the left-most leaves being nearest to the root. The pop and catenate functions are used. Popping will use the pull function repeatedly until the left child of the root is a leaf, and then use the delete function to remove that leaf. Catenating is accomplished by moving the subtrees of the second list into the entire tree of the first list one at a time via the link function. Appendix Description of Example The example consists of two parts. The first part is that of an example program that shows a simple way to implement a persistent data structure. The second part is an in-class demonstration of a current software program that actually uses persistent data structures to perform a certain task. The implementation of a basic dynamic structure revolves around that of a binary tree. Within the binary tree, each node is a vector of pointers that allows data to be accessed in different orders without having to directly copy the data of an update every time something is changed. The data itself is kept in a different structure. Keeping the ordering and the data separate allows a program to be able to call data in an ordering that corresponds to an ordering in the binary tree. This data can then be manipulated according to the state that we want to reach. For the second part of the example, a demonstration will be done using Adobe Photoshop 5.0. When working with Photoshop, the program itself keeps a list of all the events, or actions, that have been performed. Accessing the history menu in Photoshop allows the user to return to any previous instance in their project. So, if you were doctoring an image and decided that somewhere along the way you messed up, you would be able to retrace your steps and start back up at any point you wanted. The history function in Photoshop, however, is not fully persistent. When you back track and begin to work on your project from a previous stage, all the operations that were performed ahead of your new starting point are lost. Photoshop begins anew at that point and branches off from the original structure. The in-class demonstration will be of an image created in Photoshop with multiple layers. After adding layers, we will travel back to the beginning, middle, and end points to show that the records of what we did before and after are kept in tact. References Driscoll, James R.; Sleator, Daniel, D.; Tarjan, Robert E.. “Fully Persistent Lists with Catenation.” Journal of the Association for Computing Machinery. Vol 41, No 5, September 1994, pp 943-959. Dearle, Alan; Hulse, David; Linderstrom, Anders; Norris, Stephen; Rosenberg, John. “Operating System Support for Persistent and Recoverable Computations.” Communications of the ACM. Vol. 39. September 1996. Kakkad, Sheetal V.; Singhal, Vivek; Wilson, Paul R.. “Texas: Good, Fast, Cheap Persistence for C++.” Addendum to the Proceedings. October 1992. Lisa Radden ( originally from Westwood, Massachusetts ), is a senior at the University of Notre Dame, and will be graduating in the spring of 2000 with a Bachelor of Science in science-computing, and a Bachelor of Fine Arts in design. As an SCCO major, she studied the basic sciences ( biology, chemistry, physics, and geology ) as well as a sequence of five engineering courses in the software theory field. As a BFA major, she studied the digital arts, gaining experience in such software as Adobe Photoshop, Adobe Illustrator, and Director 7.0. She will be taking a Digital 3D class in the spring to finish off her portfolio. She is interested in pursuing a career or further education in the computer graphics industry, especially computer animation in film. She currently works for the University of Notre Dame Web Administration producing graphics for its official web site, and editing its content and style. She, too, was fated to become partners with the infamous Dr. Keck and his sidekick Number Two in their programming adventures. Together, one day, these three mighty heroes will conquer the evil Bjarne and the world of C++ and be very, very, rich. Stay tuned…