Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Object-Oriented Database • • • • • New Database Applications Object-Oriented Data Models Object-Oriented Languages Persistent Programming Languages Persistent C++ Systems CIS-552 Introduction 1 New Database Applications • Data models designed for data-processing-style applications are not adequate for new technologies such as computer-aided design, computer-aided software engineering, multimedia, and image database, and document/hypertext databases. • These new applications requirement the database system to handle features such as: – Complex data types – Data encapsulation and abstract data structures – Novel methods for indexing and querying CIS-552 Introduction 2 Object-Oriented Data Model • Loosely speaking, an object corresponds to an entity in the E-R model. • The object-oriented paradigm is based on encapsulating code and data related to an object into a single unit. • The object-oriented data model is a logical model (like the E/R model). • Adaptation of the object-oriented programming paradigm (e.g. Smalltalk, C++) to database systems. CIS-552 Introduction 3 Object Identity • An object retains its identity even if some or all of the values of the variables or definitions of methods change over time. • Object identity is a stronger notion of identity than in programming languages or data models not based on object orientation. – Value – data value; used in relational systems. – Name – supplied by user; used for variables in procedures. – Build-in – identity built into data model or programming language • No user-supplied identifier is required. • Form of identity used in object-oriented systems. CIS-552 Introduction 4 Object Identifiers Object identifiers used to uniquely identify objects – Can be stored as a field of an object, to refer to another object. – E.g., the spouse field of a person object may be an identifier of another person object – Can be system generated (created by database) or external (such as social-security number) CIS-552 Introduction 5 Object Containment bicycle wheel rim spokes tire brake lever pad gear frame cable • Each component in a design may contain other components • Can be modeled as containment of objects. Objects containing other objects are called complex or composite objects. • Multiple levels of containment create a containment hierarchy: links interpreted as is-part-of, not is-a. • Allows data to be viewed at different granularities by different users. CIS-552 Introduction 6 Object-Oriented Languages • Object-oriented concepts can be used as a design tool, and be encoded into, for example, a relational database (analogous to modeling data with E/R diagram and then converting to a set of relations). • The concepts of object orientation can be incorporated into a programming language that is used to manipulate the database. – Object-relational systems – add complex types and object-orientation to relational languages. – Persistent programming languages – extend objectoriented programming language to deal with databases by adding concepts such as persistence and collections. CIS-552 Introduction 7 OO-DBMS • Save objects created by an OOP language to disk (make objects persistent). • Ensure that if an object is saved, all of the objects it references are saved. • Allow saved objects (and the objects they reference) to be retrieved from disk. • Provide transaction management and concurrency control to maintain data integrity. CIS-552 Introduction 8 Persistent Programming Language • Persistent programming languages: – Allow objects to be created and stored in a database without any explicit format changes (format changes are carried out transparently). – Allow objects to be manipulated in-memory – do not need to explicitly load from or store to the database. – Allow data to be manipulated directly from the programming language without having to go though a data manipulation language like SQL. • Due to power of most programming languages, it is easy to make programming errors that damage the database. • Complexity of languages makes automatic high-level optimization more difficult. • Do not support declarative querying very well CIS-552 Introduction 9 Persistence of Objects Approaches to make transient objects persistent include establishing persistence by: – Class – declare all objects of a class to be persistent; simple but inflexible. – Creation – extend the syntax for creating transient objects to create persistent objects. – Marking – an object that is to persist beyond program execution is marked as persistent before program termination. – Reference – declare (root) persistent objects; objects are persistent if they are referred to (directly or indirectly) from a root object. CIS-552 Introduction 10 Object Identity and Pointers • A persistent object is assigned a persistent object identifier. • Degrees of permanence of identity: – Intraprocedure – identity persists only during the execution of a single procedure. – Intraprogram – identity persists only during execution of a single program or query. – Interprogram – identity persists from one program execution to another. – Persistent – identity persists through program executions and structural reorganizations of data; required for object-oriented systems. CIS-552 Introduction 11 Object Identity and Pointers (Cont.) • In O-O languages such as C++, an object identifier is actually an in-memory pointer. • Persistent pointer – persists beyond program execution; can be thought as a pointer into the database. CIS-552 Introduction 12 Storage and Access of Persistent Objects How to find objects in the database: • Name objects (as you would name files) – cannot scale to large number of objects. – Typically given only to class extents and other collections of objects, but not to objects. • Expose object identifiers or persistent pointers to the objects – can be stored externally. – All objects have object identifiers. CIS-552 Introduction 13 Storage and Access of Persistent Objects (Cont.) How to find objects in the database (Cont): • Store collections of objects and allow programs to iterate over the collections to find required objects. – Model collections of objects as collection types – Class extent – the collection of all objects belonging to the class; usually maintained for all classes that can have persistent objects. CIS-552 Introduction 14 Persistent C++ System • C++ language allows support for persistence to be added without changing the language – declare a class called Persistent_Object with attributes and methods to support persistence – Overloading - ability to redefine standard function names and operators (i.e., +, -, the pointer dereference operator ) when applied to new types • Providing persistence without extending the C++ language is – relatively easy to implement – but more difficult to use CIS-552 Introduction 15 ODMG C++ Object Definition Language • Standardized language extensions to C++ to support persistence • ODMG standard attempts to extend C++ as little as possible, providing most functionality via template classes and class libraries • Templates class Ref<class> used to specify references (persistent pointers) • Template class Set<class> used to define sets of objects. Provides methods such as insert_element and delete_element. • The C++ object definition language (ODL) extends the C++ type definition syntax in minor ways. Example: Use notation inverse to specify referential integrity constraints. CIS-552 Introduction 16 ODMG C++ ODL: Example Class Person : public Persistent Object { public: String name; String address; }; class Customer : public Person { public: Date member_from; int customer_id; Ref<Branch> home_branch; Set<Ref<Account>> accounts inverse Account::owners; }; CIS-552 Introduction 17 ODMG C++: Example (Cont.) Class Account : public Persistent_Object { private: int balance; public: int number; Set<Ref<Customer>> owners inverse Customer::accounts; int find_balance(); int update_balance(int delta); } CIS-552 Introduction 18 ODMG C++ Object Manipulation Language • Uses persistent versions of C++ operators such as new(db). Ref<Account> account = new(bank_db) Account; new allocates the object in the specified database, rather than in memory • Dereference operator when applied on a Ref<Customer> object in memory (if not already present) and returns in-memory pointer to the object. • Constructor for a class – a special method to initialize objects when they are created; called automatically when new is executed • Destructor for a class – a special method that is called when objects in the class are deleted. CIS-552 Introduction 19 ODMG C++ OML: Example int create_account_owner(String name, String address) { Database * bank_db; bank_db = Database::open(“Bank-DB”); Transaction Trans; Trans.begin(); Ref<Account> account = new(bank_db) Account; Ref<Customer> cust = new(bank_db) Customer; cust->name = name; cust->address = address; cust->accounts.insert_element(account); account->owners.insert_element(cust); … Code to initialize customer_id, account number, etc. Trans.commit(); } CIS-552 Introduction 20 ODMG C++ OML: Example of Iterators int print_customers() { Database * bank_db; bank_db = Database::open(“Bank-DB”); Transaction Trans; Trans.begin(); Iterator<Ref<Customer>> iter = Customer::all_customer.create_iterator(); Ref<Customer> p; while (iter.next(p)) { print_cust(p); } Trans.commit(); } • Iterator construct helps step through objects in a collection CIS-552 Introduction 21 Mapping of Objects to Files • Mapping objects to files is similar to mapping tuples to files in a relational system; object data can be stored using file structures. • Objects in O-O databases may lack uniformity and may be very large; such objects have to be managed differently from records in a relational system. – Set fields with a small number of elements may be implemented using data structures such as linked lists. – Set fields with a larger number of elements may be implemented as B-trees, or as separate relations in the database. – Set fields can also be eliminated at the storage level by normalization. CIS-552 Introduction 22 Mapping of Objects to Files (Cont.) • Objects are identified by an object identifier (OID); the storage system needs a mechanism to locate an object given its OID. – logical identifiers do not directly specify an object’s physical location; must maintain an index that maps an OID to the object’s actual location. – physical identifiers encode the location of the object so the object can be found directly. Physical OIDs typically have the following part: 1. a volume or file identifier 2. a page identifier within the volume or file 3. an offset within the page CIS-552 Introduction 23 Management of Persistent Pointers • Physical OIDs may have a unique identifier. This identifier is stored in the object also and is used to detect references via dangling pointers. Object Physical Object Identifier Vol. Page Unique-Id Offset Unique-Id Data …… (a) General Structure Location 6.32.45608 Unique-Id 51 Data … data … Good OID 6.32.45608 51 Bad OID 6.32.45608 50 (b) Example of use CIS-552 Introduction 24 Management of Persistent Pointers (Cont.) • Implement persistent pointers using OIDs; persistent pointers are substantially longer than are in-memory pointers • Pointer swizzling cuts down on cost of locating persistent objects already in memory. • Software swizzling (swizzling on pointer dereference) – When a persistent pointer is first dereferenced, it is swizzled (replaced by an in-memory pointer) after the object is located in memory. – Subsequent dereferences of the same pointer become cheap – The physical location of an object in memory must not change if swizzled pointers point to it; the solution is to pin pages in memory – When an object is written back to disk, any swizzled pointers it contains need to be unswizzled. CIS-552 Introduction 25 Hardware Swizzling • Persistent pointers in objects need the same amount of space as in-memory pointers – extra storage external to the object is used to store rest of pointer information. • Uses virtual memory translation mechanism to efficiently and transparently convert between persistent pointers and in-memory pointers. • All persistent pointers in a page are swizzled when the page is first read in. – Thus programmers have to work with just one type of pointer, i.e. in-memory pointer. – Some of the swizzled pointers may point to virtual memory addresses that are currently not allocated any real memory. CIS-552 Introduction 26 Hardware Swizzling • Persistent pointer is conceptually split into two parts: a page identifier, and an offset within the page. – The page identifier in a pointer is a short indirect pointer: each page has a translation table that provides a mapping from the short page identifiers to full database page identifiers. – Translation table for a page is small (at most 1024 pointers in a 4096 byte page with 4 byte pointers) – Multiple pointers in a page to the same page share same entry in the translation table. CIS-552 Introduction 27 Hardware Swizzling (Cont.) Page ID Off. 2395 255 Page ID Off. 4867 020 Page ID Off. 4867 170 Object 1 Object 2 Object 3 PageID Translation Table 2395 4867 FullPageID 679.34.28000 519.56.84000 • Page image when on disk (before swizzling) CIS-552 Introduction 28 Hardware Swizzling (Cont.) • When an in-memory pointer is dereferenced, if the operating system detects the page it points to has not yet been allocated storage, a segmentation violation occurs. • mmap call associates function to be called on segmentation violation • The function allocates storage for the page and reads in the page from disk. • Swizzling is then done for all persistent pointers in the page (located using object type information). – If pointer points to a page not already allocated a virtual memory address, a virtual memory address is allocated (preferably the address in the short page identifier if it is unused). Storage is not yet allocated for the page. – The page identifier in pointer (and translation table entry) are changed to the virtual memory address of the page. CIS-552 Introduction 29 Hardware Swizzling (Cont.) Page ID Off. 5001 255 Page ID Off. 4867 020 Page ID Off. 4867 170 Object 1 Object 2 Object 3 PageID Translation Table 5001 4867 FullPageID 679.34.28000 519.56.84000 Page image after swizzling • Page with short page identifier 2395 was allocated address 5001. Observe change in pointers and translation table. • Page with short page identifier 4867 has been allocated address 4867. No change in pointer and translation table. CIS-552 Introduction 30 Hardware Swizzling (Cont.) • After swizzling, all short page identifiers point to virtual memory address allocated for the page – Functions accessing the objects need not know it has persistent pointers! – Can reuse existing code and libraries that use in-memory pointers. • If all pages are allocated the same address as in the short page identifier, no changes required in the page! • No need for deswizzling – page after swizzling can be saved back directly to disk • A process should not access more pages than size of virtual memory – reuse of virtual memory addresses for other pages is expensive. CIS-552 Introduction 31 Disk versus Memory Structure of Objects • The format in which objects are stored in memory may be different from the format in which they are stored on disk in the database. Reasons are : – software swizzling – structure of persistent and in-memory pointers are different – database accessible from different machines, with different data representations • Make the physical representation of objects in the database independent of the machine and the compiler. • Can transparently convert from disk representation to form required on the specific machine, language, and compiler, when the object (or page) is brought into memory. CIS-552 Introduction 32 Large Objects • Very large objects are called binary large objects (blobs) because they typically contain binary data. Examples include: – text documents – Graphical data such as images and computer aided designs – audio and video data • Large objects may need to be stored in a contiguous sequence of bytes when brought into memory. – If an object is bigger than a page, contiguous pages of the buffer pool must be allocated to store it. – May be preferable to disallow direct access to data, and only allow access through a file-system-like API, to remove need for contiguous storage. CIS-552 Introduction 33 Modifying Large Objects • Use B-tree structures to represent object: permits reading the entire object as well as updating, inserting, and deleting bytes from specified regions of the object. • Special-purpose application programs outside the database are used to manipulate large objects: – Text data treated as a byte string manipulated by editors and formatters. – Graphical data is represented as a bit map or as a set of geometric objects; can be managed within the database system or by special software (e.g. VLSI design). – Audio/video data is typically created and displayed by separate application software and modified using special purpose editing software. – checkout/checkin method for concurrency and version control CIS-552 Introduction 34