Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Approaches to Persistence in Java Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu HI 96822 (1) Part 1: Small Scale Persistence Small scale persistence: • dozens to hundreds of users • single machine • All persistent objects can fit in memory at once. • No transaction, rollback, fail-over • Cheap and fast to implement (2) Large scale persistence: • thousands to millions of users • clusters of machines, shared caching • lazy/incremental loading of objects • Transaction, rollback, fail-over required • Costly and timeconsuming to implement Common motivations Persistence across application restarts/failure • Avoid data loss Checkpoints • Allow rollback to previous state Transfer of data between multiple applications • Synchronous vs. asynchronous -Example: network vs. file system • Application-specific vs. portable -Example: Serialized object vs. XML Caching of intermediate results • Avoid computation loss (3) Some flavors of persistence Simple, key-value: • java.util.Properties • java.util.prefs package • JNDI Object-based: • java.io.Serializable • JavaBean persistence • Java Data Objects (JDO) XML-based: • JDOM • JAXB (4) Database: • Relational • Object-oriented Enterprise Java Beans: • Entity & Session beans • CMP & BMP Persistence creates development issues Persistence tends to slow down development. • Adds cost & risk to major design changes. • Tends to “lock in” early (bad) design decisions. Why is persistence a problem? • The Object-Relational Impedance Mismatch • Multiple design issues and constraints How can we maintain development velocity in face of need for persistence? • A “Late-binding persistence” development strategy (5) Object-Relational Impedance Mismatch Object paradigm: •Networks of objects with state and behavior •Processing via: traversal •Classes, inheritance, polymorphism, etc. Relational paradigm: •Tables of entities with only data. •Processing via: selection/joining of rows •Tables, columns, keys, indices, etc. The intrinsic differences between paradigms creates design problems. (6) Example Consider a family tree. Consider the query “Return all of the grandchildren of Family Member X” Which representation would make this query easiest to implement? •A network (tree) of family members •A set of database tables and SQL statements (7) Addressing the OO-DB IM 1. Eliminate the OO: •User interface manipulates SQL. •Pros: single paradigm, simplicity •Cons: complexity of “advanced” processing (stored procedures, etc.) 2. Eliminate the relational DB: •OODBs, Serialized objects, etc. •Pros: single paradigm, simplicity •Cons: potential loss of relational data integrity (normalization) (8) Addressing the OO-DB IM 3. Hide the DB: •Object-to-relational mappings, JDO, EJB... •Pros: Allows use of back-end RDBMs •Cons: Complexity, lock-in, overhead 4. Stop whining and just deal with it: •Manual mapping between objects and tables •Pros: Flexibility •Cons: Maintenance and complexity (9) Choice of persistence depends upon many design issues Simplicity: • How complicated to set up for me? For my users? Financial cost: • Do I have to pay for it? Do my users? Data specificity: • What kinds of data am I saving? Can I use a “special purpose” persistence mechanism? Design lock-in: • How much code will I have to change if I need to change my mechanism? (10) Longevity: • How long must the persistent data exist? Scalability: • What usage level do I expect over the next six months? Integrity: • Do I require transactions? Rollback? Fail-over? OO-Relational impedance mismatch: • Do I mind the cost? Optimization: • Do I need something faster than a relational database? One development approach: Late-binding persistence Initial development: No persistence. • Deploy initial versions to user on “trial basis” with no persistence guarantees. Early “live” releases: simple, “non-scalable”. • Enable data migration. • Determine true bottlenecks/integrity issues. • Maintain application evolvability. Ongoing development: • Think about multiple persistence approaches. • Example: Preferences + XML + RDMS • Each approach optimized to persistence requirements. Applicability of this approach depends upon nature of (11) system/requirements! A birds-eye view of selected persistence mechanisms (12) Preference and configuration data java.util.Properties: • Well known, easy to use • No standards as to where data should reside • Problems for backup, or transfer to other machines. JNDI (Java Naming and Directory Service): • Back-end neutrality • Large, complicated to set up java.util.prefs (JDK 1.4): • Back-end neutrality of JNDI • Simplicity of java.util.Properties • Can be invoked by multiple threads safely (13) Object-based persistence: java.io.Serializable Pros: • Converts an object (and all internal objects) into a stream of bytes that can be later deserialized into a copy of the original object (and all internal objects). • Fast, simple, compact representation of an object graph. • May be great choice for *temporary* storage of data. Cons: • Creates long-term maintenance issues • Harder to evolve objects and maintain backward compatibility with serialized representation. • See “Effective Java”, Chapter 10, for a good description of issues with Serialization (14) XML file-based persistence: JDOM and JAXB Pros: • Very high level of data portability. • Simple Cons: • Space-inefficient • Complex graph structures problematic. For data structures: • JDOM For bi-directional object mappings: • JAXB (15) Java-based RDBMS Most important one is Derby. •http://db.apache.org/derby •Will be included in JDK 1.6! Can run as either ‘embedded’ in your application JVM or as a stand-alone network server. •To embed, just add derby.jar to your classpath! (16) Open Source Java Persistence Frameworks Hibernate (www.hibernate.org): •Object to RDBMS binding •“Hibernate Query Language” •Claims to be very fast, very scalable, very efficient. •Most popular open source framework for object/relational mapping in Java. (17) Others Enterprise Java Beans •Public standard framework •Simple reference implementation •Support for clustering, fail-over, etc. in distributed applications. Firestorm/DAO •Automatically generates Java source code for accessing relational databases. (18) Things to think about Sometimes simple is better •Try the least complicated persistence mechanism first. Sometimes you can mix and match •Not all data must necessarily be persisted the same way You can evolve your solution over time •Especially if you design your system to encapsulate your persistence mechanism. (19) Things to think about Your persistence strategy might depend on context: (20) Java First: • You’ve developed/inherited some Java code and need someplace to store the objects. -Hibernate Database First: • You’ve developed/inherited a database and want to access it in Java. -Firestorm/DAO Spaghetti Junction: • You’ve inherited Java code and a DB and want to put the two together. -Uh oh. Things to think about IF: Client-side, single thread, simple structure, installation simplicity •DB optional, consider XML. IF: Multiple clients need access to data •DB highly recommended IF: Transaction support, fail-over, etc: •DB required (21)