Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stream processing wikipedia , lookup
Abstraction (computer science) wikipedia , lookup
Design Patterns wikipedia , lookup
Data-intensive computing wikipedia , lookup
Corecursion wikipedia , lookup
C Sharp syntax wikipedia , lookup
C Sharp (programming language) wikipedia , lookup
Persistence and Objects Group presentation given by Barry Myles Gary Stewart Stephanie Dunsire Introduction This presentation will give an in-depth view of persistence When coupling the programming power of an object-oriented language like Java or C++ with the data storage and manipulation mechanism’s of a relational database, difficulties arise! Due to different characteristics when handling data Programming Languages work with transient data, where as Relational DB’s tend to work more with Pesistant data Also due to differing modelling paradigms A Relation Model cannot map directly to an OO Model What is Persistence Persistence has different meanings With respect to a single piece of Data “With respect to a piece of data it means the length of time for which that datum exists.” [Ref: Object Database and ODMG approach - Richard Cooper] With Respect to an Information system "A persistent programming language (PPL) is a programming language that includes a persistent memory area (e.g., a heap of objects) that outlives the execution of any individual program." [Ref: Working with Persistent Objects - J. Eliot B. Moss] Types of Persistence How do program’s write of data to secondary storage The Three types of persistence are Session Persistence File Persistence Orthogonal Persistence Types of Persistence Session persistence - allows the state of a program to be resumed from a snapshot of the system. Used where the entire state must be restored and/or where the workspace size is small Examples: Windows Hibernation mode Smalltalk implementations Machine Emulators Able to take the state of the machine they are emulating Types of Persistence Session Persistence(contd.) Problems: Not designed for multiple users modifying the same system. User has no control over what is and is not saved. For its inability to allow concurrent users to access the same workspace schema this system is not suitable for database systems. Types of Persistence File Persistence The widest used format for storing data. Traditional way of making programming languages persistent. Extracts the data from the program and then writes it of to Secondary Storage. This will normally have to be converted to be placed back in the program as data Types of Persistence File Persistence (contd..) Problems: Conversion of data to and from the application can be expensive both on processor and designer time Open access to the file may lead to other programs corrupting it although access to a file in this manner is not always bad practice, it can be useful Since there are multiple models for the data in files, one for internal data and another for files then this removes the simplicity of a single model view. Types of Persistence Orthogonal Persistence Concerned with storing the data in the same structure on the disk as it had in memory Allows any memory data to be stored without transformation If a class is marked as persistent the program will also make all the data in the lower classes persistent to. This maintains referential integrity. The same logical data structure exists in main memory as it does on hard disk Allows any data value to be made persistent or transient irrelevant of type Types of Persistence Orthogonal Persistence (contd.) Only areas of either the code or data that need to be addressed will be brought into memory, the rest will remain on disk until required. The data model of the database can represent the class structure. The database itself represents the class hierarchy. This results in the the persistence being a single representation of the data. Distinguishing Transient and Persistent Data Implementation varies by a large degree in the different OODBMSs. The two main criteria are: How little effort it takes for the programmer to make objects persistent Efficiency of the storage of data in the system A benefit to one of these criteria may result in a loss of performance/decreasing functionality of the other. Marking Persistence An object can be marked as persistent in one of two ways: Explicitly declared as persistence by the programmer or programming language Is reachable by another persistent object, therefore must persist otherwise it will lead to a dangling reference All objects that are not part of the set above are transient Not all languages support persistence by reachability, these either force the programmer to check all references or allow dangling references Marking Persistence—Explicitly Multiple methods of marking persistence used Some systems may allow more than one method. Seven main methods used Named Root Objects Persistent Classes Persistence Declared at Object Creation Persistent Root Class Persistent Shadow Classes Persistence By Explicit Storage System–Provided Persistent Roots Persistent Shadow Classes Automatically creates a persistent class for all classes that exist User can then use the persistent class to create an instance of the class or use the transient class Once initialised as one type cannot be changed to the other (though it should be possible to copy the object across) Ensures that all classes in the language are persistent capable Persistence By Explicit Storage This provides a method to call an object to be stored at any time, hence making it persistent Similar to file persistence but should be orthogonal unlike file persistence Has the run time cost of copying the memory requested into the persistent store One of the most flexible systems, should allow all objects to be made persistence Obviously supports the 'upgrading' of object into persistent objects System–Provided Persistent Roots This provides a mechanism for adding objects to a persistent store The persistent store has some method allowing the user to add an object to it The object, by being in this store, is persistent The store should allow all object types to be added hence all objects can be made persistent at run-time and 'upgraded' from transient objects The System-Provided Persistent Roots should be globally available ODMG uses this for it's Java and Smalltalk implementation. Persistence By Reachability This means that if an object is persistent all objects that are referred to by that object must also be made persistent or references would become invalid Explicit persistent objects therefore dictate what is made persistent from them, however a cascading effect may happen This should be entirely handled by the system but may make the programmer confirm things to be persistent Persistence By Reachability Persistence by reachability provides: Referential integrity to the system by insuring object that are referred to by a persistent object are retained A reduction of workload by the programmer as they no longer have to explicitly state what objects are to be made persistent It may, however, store unwanted data that will increase the size of the database Programmers might be able to denote fields in objects that are never to be made persistent (transient keyword in Java) Storing Code and Data Storing Code and Data is important as databases can hold complex data. The Figure 1 shows file systems, code is shown as ovals, while data is shown as rectangles: Figure 1 System (a) in figure 1 is an unprotected file system however it has a flexible environment in which code and data can be freely mixed. System (b) in figure 1 is a traditional DBMS that controls part of the file system, which it uses to store and protect the data. Application code remains outside the control of the DBMS. System (c) in figure 1 is an orthogonally persistent language, which allows code to be protected as well and for data and code to be freely mixed. System (d) in figure 1 is an OODBMS file system that provides a class structure in which the mixture of code and data is controlled. Systems which exclude the program code from the structure of the database, will often fall into disuse as they make the creation and maintenance of the applications exceptionally costly and error-prone. Removing Data An important facility a database must provide is a method of removing unwanted data, there are various methods of removing data which depends on the system: Explicit Deletion of Data – Systems can allow explicit deletion of objects and classes. Some systems allow this as it is believed that the programmer can achieve the most efficient storage of data. This method can avoid integrity violations for example by using C++ an operation called a destructor can be used, thus it can be used to explicitly remove data without causing deletion violation. Garbage Collection – Systems can also allow an automatic method of removing data called garbage collection, a popular use of garbage collection is in a OODBMS which uses persistence through reachability. Types of Garbage Collection Mark-and-Sweep This involves two stages: Mark - This starts at the persistent root objects and marks everything referred to by the roots as required data, and then recursively marking further objects referred to by already marked objects until all objects are marked. Stage two Sweep – removes unmarked data so the space is freed and returned to the free pool for future reallocation. The problem with the mark-and-sweep method of garbage collection is that it has a tendency to fragment memory. Stop-and-Copy This method collects garbage and de-fragments memory, this method involves the memory being divided into two regions an active region and an inactive region. The stop aspect of the method starts when the memory in the active region is exhausted, the copy aspect of the method copies all of the live objects from the active region to the inactive region. The final step is to switch the active and inactive regions, as this method only copies the live objects any objects left over are garbage and are deleted. The problem with this method is the cost of doubling the size of the heap (i.e. active and inactive). Reference Counting This method involves every object keeping a count of all references made to it, thus every time a new reference is made this total is incremented and every time a reference is removed the total is decremented. If the total is zero the object is unreachable and so is garbage and can be removed. . The two main problems of this method: the counts and maintaining them can take up space and there is no guarantee that all unreachable objects will have a reference count of zero. Mark-and-Compact This method again consists of two phases: the first phase mark this starts at the persistent root objects it marks everything referred to by the roots as required data, and then recursively marking further objects referred to by already marked objects. The second phase is called the compaction phase this collects anything that remains as garbage and compacts the memory by moving all the live objects into continuous memory locations. This method eliminates fragmentation and does not incur the additional costs of additional space. Example Garbage Collection An example of Garbage Collection is a course being cancelled at Napier due to this all students on this course can be relocated on other courses or leave (in this scenario we will ignore the students that leave), the course attribute of the student objects will be changed to their new courses. Once all objects representing the members of the course are updated there will be no references left from the student objects to the old course object thus it can be removed from the object holding the set of courses and as there will be no existing references to the object it will become unusable data Figure 2 shows an object graph with garbage and non-garbage aspects: Figure 2 References ODMG 2.0: A Standard for Object Storage by Doug Barry Component Strategies (July 1998) http://www.odmg.org/library/readingroom/Article%20%20Components%20Strategies%20-%20July98.html Garbage Collection In Java, Vern Martin, December 2, 1997: http://trident.mcs.kent.edu/~vmartin/proj/proj.html A presentation on mark-and-compact http://cselab.snu.ac.kr/project/ipctv/javagc/gcover view/sld010.htm Databases From Relational to Object Oriented Systems by Claude Delobel (1991)