Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Developing ObjectRelational Database Applications Paul Brown Chief Plumber INFORMIX Software What We Will Talk About • Plenty of Focus on Features & Functionality • Question is, “How to use it all?” Object-Relational technology makes a fresh look at the process of database analysis and design useful. • What is different, what is the same? • Many things are the same . . . • . . . some things are new. Developers using an ORDBMS benefit by taking a more holistic approach to development than was common with SQL-92 DBMS products. Informix user.conference 2 Overview of Tutorial • Design Objectives of Databases • How are OR Database Similar/Different? • What about the theory? • Methodology Description • • • • Steps of a development procedure Analysis and design Implementation advice Examples as we go through • Architecture Advice • Back-end and middle-ware • Component-centric development 3 Informix user.conference Design Objectives Meet the Needs of End Users • Complete • The Database needs to model everything that is relevant to the problem domain (PD). • Correct • The Database needs to be an accurate model of the PD. • Consistent • The database model should not contain redundant data. • Flexible • The database should adapt to changing requirements. • Efficient • Fast, scaleable and simple to administer. Informix user.conference 4 Object-Relational Data Model • Based on Relational Model of Codd [1971] • Relation as storage abstraction and organizing principle • Declarative programming as interface model • An Object-Relational DBMS Provides • Physical abstraction (no byte level details for ORSQL) • Formal modeling techniques (normalization) 20 years of R&D have addressed most performance, reliability and scalability issues. Informix user.conference 5 Theoretic Overview ( 1/2 ) • Domains ( kinds or classes of data objects) • { a, b, c . . . } • Attributes (named columns of a domain) • { A, B, C . . . } • Dom ( A ) a • Relation Schema (set of attributes) • { . . . X, Y, Z } where X {A, B, C . . } • Relation (set of n-ary tuples over a relation schema) • { . . P, Q, R . . } For P over X, P Dom ( Xi ) • We also say t: X Dom ( X ) Informix user.conference 6 Theoretic Overview ( 2/ 2 ) Operations to Manipulate Relations • Projection (select sub-set of columns from relation) • Given P over X, and Y X, < P, Y > Q over Y s.t. t P, t’ Q where t ( A ) = t’ ( A ) A Y. • Restriction (select sub-set of tuples from relation ) • Given P over X, and ( X ), < P, > Q over X s.t. t Q, (t) true. • Join ( cross-product of two relations ) • Given P over X, and Q over Y, < P, Q > R over ( X Y ) s.t. t R, t’ P and t’’ Q where t [ X ] = t’, and t [ Y ] = t’’. • Plus (Aggregate, Intersection, Division, Union . .) Informix user.conference 7 Enough Greek! What does it look like in practice? • Declarative language programming • Dynamic, run-time interpretation CREATE TABLE Employees ( Id Employee_Number PRIMARY KEY, Name PersonName NOT NULL, Date_of_Birth DATE NOT NULL, Address MailAddress NOT NULL, LivesAt st_point NOT NULL, Resume Document NOT NULL, Voice_Key Audio_Recording NOT NULL, Holidays SET ( Period NOT NULL ) ); How do you get to here, starting with a bunch of users, a small herd of programmers, and a budget. Informix user.conference 8 Development Road-map 1. Analyze the Problem Domain • Document the high level structure (E-ER) • Describe each kind of data (UML Class Diagrams) 2. Design and Implementation • Use User-Defined Types (UDT) and User-Defined Functions (UDF) for Objects • Combine Objects into Schema Tables/Views • Queries to map to user views 3. Verify Each Step • Prototype UDT/UDFs • Normalize • Test performance against production scale loads Informix user.conference 9 Outline of Procedure: 3 Tier Model . CREATE OPAQUE TYPE PersonName ( internallength = variable, maxlen = 64 ); CREATE FUNCTION getFamilyName ( . . CREATE TABLE Employees ( Id Employee_Id PRIMARY KEY, Dept Dept_Id REFERENCES Dept ( Id ), Name PersonName NOT NULL, DOB DATE NOT NULL . ); . . 1. Conceptual 2. Logical 3. Physical Informix user.conference 10 Phase 1: Conceptual Modeling • Describe User Views of Problem Domain • Pictures instead of words • Multiple Semantic Model Possibilities • Extended Entity-Relationship (E-ER) Modeling • Object-Role Modeling (ORM or NIAM) • Universal Modeling Language (UML) • Use E-ER in this Tutorial • Familiar to most developers and analysts • Concepts introduced here apply everywhere Informix user.conference 11 E-ER Diagrams for Dummies Entity: Principle data objects in the problem domain. Typically identified with a noun term; Branch, Employee, etc Branch Branch Product Employees Branch has_a Relationship: Association between Entities. Typically identified with a verb term; Manufacturers, WorksFor, etc. has_a relationships can have additional attributes. Full_Time Production 12 Sales is_a Relationship: Inheritance relationship between Entities. Informix user.conference 1.1 Describe High-Level E-ER worked for Branch Employees When Contractors manufactures Full_Time Customer Production Product Sales sold Qty Informix user.conference 13 1.2 Describe Each Entity Employee Products Customer Name Kind Name Kind Name Kind Id Employ_Num Id Cust_Num Id UPC_Code Name Person_Name Contact Person_Name Name LabelString Address Mail_Addr Address Deliver_Addr Dimensions Physical_Size Resume Document Bill_Address Mail_Addr Capacity Mass Lives_At Geo_Point Available_In SET{ Color } Date_of_Birth DATE Price Currency For each entity, name its attributes, and decide what kind of data is in each of them. Informix user.conference 14 E-ER Modeling Review • This is Really Nothing New • Better modeling of is_a relationships (inheritance) • What Other Tasks? • Note Logical Keys in Entities • Record arity of has_a relationships (1:N, N:M, etc) • Record mandatory and optional relationships • What is New? Resist the temptation to analyze down to SQL-92 types. Within E-ER modeling, maintain a degree Informix of abstraction. user.conference 15 Handling Multiple Conceptual Views • Multiple Semantic Model Possibilities • Extended Entity-Relationship Modeling • Object-Role Modeling • Universal Modeling Language • What about their Object Model? User Object Models often contain elements of interest to OR database developers. A UDT defined in the database need not be used in a table. Informix user.conference 16 The ‘Object’ Concept “The most brutalized term in computer science” • What is an ‘Object’? • Atomic unit of meaning encapsulating both state and behavior. • Objects mostly map to domains in Relational Theory • Why is the Concept Useful? • Intuitive way of describing ‘things’ in the data model • These intuitions can drive user-defined type and user-defined function design. Informix user.conference 17 1.3 Catalog the Kinds of Data Product_Number Color Geo_Point Currency Product String Employee_Num Document Bonus_Plan Employees Physical_Size Person_Name Customer_Num Period Full_Time_Emp Mass Mail_Address Delivery_Address Quantity Sales Create a list of all of the different kinds of data identified in the schema and in the object models. Some of these will be in more than one entity. Informix user.conference 18 1.4 Use UML Class Diagrams Mail_Address Delivery_Address + Address_Line_One String + Address_Line_One String + Address_Line_Two String + Address_Line_Two String + City String + City String + State String + State String + ZipCode + Country Zip_Code String + Mail_Address(String, String, String, Zip_Code) + approximateMatches () + Label () + Equal () - Compare() + ZipCode + Country Zip_Code + Delivery_Notes Document String + Delivery_Address(String, String, String, Zip_Code, Document) + approximateMatches () + Label () + Equal () - Compare() Informix user.conference 19 Class Diagram Details State Elements (Data Structure) Constructors Extract Data Modifiers Operators Kind of Data in Element GeoPoint - Longitude FLOAT - Latitude FLOAT - Ellipsoid Ellispoid_Enum + GeoPoint ( String ) -> GeoPoint, + GeoPoint( FLOAT, FLOAT) -> GeoPoint, + Latitude ( GeoPoint ) -> FLOAT, + Longitude ( GeoPoint ) -> FLOAT + Distance ( GeoPoint, GeoPoint ) -> FLOAT + Quadrant ( GeoPoint ) -> CHAR(2) + SetLatitude( FLOAT ) -> GeoPoint + SetLongitude ( FLOAT ) -> GeoPoint [ Spatial Operators, R-Tree Support ] [ Compare(), Equal, NotEqual ] Interface Methods Why These? Informix user.conference 20 Implementation Tip # 1 • Object Classes are used in OR-SQL queries • SELECT DISTINCT, UNION are SQL operations • Merge-Join and Hash-Join for internal efficiencies Always create a Compare() for any object that has an Equal(). The resulting order might be meaningless, but within the context of the query processor it can make sense. Informix user.conference 21 1.5 Minimize Set of Object Classes • Use Detailed Specifications to Identify • synonyms and • antonyms • Identify ‘like’ Object Classes • Inheritance allows re-use • { Mail_Address, Delivery_Address } • Describe Algorithms for all Behaviors • What does it mean to have ‘Equal’ UPC? • Space efficient internal representation Informix user.conference 22 Workload and Common Queries • Document Common Queries and Processes • Helps to flesh out Object Class specifications • Useful for identifying additional kinds of data • Useful for later functionality/scalability testing SELECT E.Name FROM Employees E WHERE Birthday(E.Date_of_Birth) = Birthday(TODAY); Birthday + Month INTEGER + Day INTEGER - From_Leap_Year Note: Types don’t have to be used to define tables. They can be used to extend SQL. 23 boolean + Birthday( DATE ) -> Birthday , + Print( Birthday ) -> String, + Equal( Birthday,Birthday)-> Informix boolean . etc . user.conference By this Point you Out to Have: • An E-ER Model Representing High Level View • List of entities, and their structure • Understanding of relationships among entities • List of rules-- keys, constraints etc • Class Diagrams of each Distinct Kind of Data • Described in terms of Structure, and Behavior • Analysis of algorithms in methods Next Step: Transform this conceptual model into a sound OR database design and implementation. Informix user.conference 24 Phase 2: Database Implementation • UDTs and UDFs for each Domain • Use DataBlade extension libraries • Develop business objects from scratch • Build Relational Schema • Naïve translation of E-ER conceptual model • Normalize the initial model • Some special considerations • Performance and Scalability Testing • Create volume data quickly • Workload test harness Informix user.conference 25 2.1 Implement Domains/Kinds of Data • Several Mechanisms to Choose From • Do I buy a DataBlade™ product? • Do I build objects from scratch? • What about the client side? • If Build, Which Technique? • Built-in type, ROW TYPE, DISTINCT TYPE, OPAQUE TYPE • Each mechanism has different properties • How to Implement the UDFs? • SPL, Java or ‘C’? Informix user.conference 26 Implementation Options Mechanism Example Built-In Types Strengths and Weaknesses INTEGER, VARCHAR, DATE etc. These Mature and high performance because they are standardized in the SQL-92 language are compiled into the ORDBMS. But they specification. are very simple. Good building-blocks for other types. DISTINCT CREATE DISTINCT TYPE Simple to create, and useful when what you String AS VARCHAR(32); want is something very close to another type. ROW TYPES CREATE ROW TYPE Address ( Relatively easy to use means of combining Address_Line_One String NOT NULL, pre-existing types into a more complex Address_Line_Two String NOT objects, and enforcing rules about contents. NULL, ROW TYPEs have several drawbacks that City String NOT NULL, makes them a poor choice for types to define State String, columns. ZipCode PostCode, Country String NOT NULL ); Java Classes Combination of Java UDRs with opaque More complex to develop, but an excellent data storage. choice when you want code that runs in both the outside, and inside, the DBMS. OPAQUE CREATE OPAQUE TYPE GeoPoint ( Most complex to develop, but these are the TYPES internallength = 16 most powerful in terms of performance, Informix ); scalability and the range ofuser.conference object sizes that can be supported. 27 Implementation Tip # 2 • DISTINCT TYPES Re-Use Pre-Existing Types • Create EXPLICIT Cast between Parent and DISTINCT • Overload that CAST to Enforce Data Integrity CREATE DISTINCT TYPE Quantity AS INTEGER; -CREATE FUNCTION INTEGER2Quantity ( Arg1 INTEGER ) RETURNS Quantity IF ( Arg1 < 0 ) THEN RAISE EXCEPTION –746,0, “Error: Quantity must be non-negative”; END IF; RETURN Arg1::Quantity; END FUNCTION; -DROP CAST ( INTEGER AS Quantity ); Informix CREATE IMPLICIT CAST user.conference ( INTEGER AS Quantity WITH INTEGER2Quantity ); 28 UDF Implementation Performance • Choice of Three Languages • Stored Procedure Language, Java and ‘C’ • Experiment: Pop( INTEGER ) and IsPrime ( INTEGER ) Seconds to Compute 10,000 Performance of UDR Languages 1000 'C' Prime Performance 800 'C' Pop Performance 600 Java Prime Performance 400 Java Pop Performance SPL Prime Performance 200 SPL Pop Performance 0 4 5 6 7 8 9 10 Log ( Size ( Value s ) ) Informix user.conference 29 Implementation Tip # 3 • Avoid SQL Callbacks Inside User-Defined Functions • UDFs with SQL are variant and can’t be parallelized • SQL inside a UDF kind of violates the philosophy CREATE OPAQUE TYPE Mass ( internallength = 16, alignment = 8 ); --- The code implementing this external function includes all -- of the conversion rates compiled into it. This permits it -- to perform conversion operations hyper-fast. -CREATE FUNCTION Compare (Mass ,Mass) RETURNS INTEGER WITH ( NOT VARIANT, PARALLELIZABLE ) EXTERNAL NAME ”$INFORMIXDIR/extend/Massbin/Mass.bld(MassCompare)"Informix user.conference LANGUAGE C; 30 2.2 Create the Schema Tables • Combine the Types into Schema Tables • Simply CREATE TABLE if it is not in hierarchy • Create Hierarchy of ROW TYPE instances if it is • For simple has_a relationships, extend the dependent table with additional column for the first table’s primary key column. • For multi-variant has_a relationships, create a table with one column per key, and extra columns Similar procedure to the one used transforming ER to RDBMS, except in the handling of Inheritance. Informix user.conference 31 A Note on Keys • The Relational Key Concept is Important • Identifies correctness constraints within schema • Drives normalization, which helps ensure consistency • Formerly, a Key is defined as: A column where no two values are equal. Products Id B-SH-3001 B-P-312-A B-P-312-L B-PACK-411 Name Shoe Box Small Pizza Large Pizza Computer Dimensions ’18 IN x 6 IN x 6 IN’ ’21 IN x 21 IN x 2 IN’ ’26 IN x 26 IN x 2 IN’ ’40 IN x 40 IN x 40 IN’ Capacity ’11 LB’ ‘3 LB’ ‘4 LB’ ’50 LB’ Available_In {white, black} {brown} {brown} {brown,white} Price $US 0.12 $US 0.25 $US 0.25 $US 5.00 Informix user.conference 32 Example Schema (1/3) -CREATE TABLE Products ( Id Product_Num NOT NULL PRIMARY KEY, Name LabelString NOT NULL, Dimensions Physical_Size NOT NULL, Capacity Mass NOT NULL, Available_In SET{Color NOT NULL}, Price Currency NOT NULL ); -CREATE TABLE Customers ( Id Customer_Num NOT NULL PRIMARY KEY, Contact Person_Name NOT NULL, Address Mail_Address NOT NULL, Delivery Delivery_Address ); Informix user.conference 33 Example Schema (2/3) -CREATE ROW TYPE Sales_Employee_Type ( Base_Salary Currency NOT NULL, Sales_YTD Currency NOT NULL ) UNDER Full_Time_Employee_Type; -CREATE TABLE Sales_Reps USING Sales_Employee_Type UNDER Full_Timers; -CREATE TABLE Product_Sales_to_Customers ( Customer Customer_Num NOT NULL REFERENCES Customers ( Id ), Product Product_Num NOT NULL REFERENCES Products ( Id ), Sales_Rep Employee_Num NOT NULL REFERENCES Sales_Reps ( Id ), Qty_Sold Quantity NOT NULL ); Informix user.conference 34 Example Schema (3/3) --- Query against the Product.Capacity column showing -- behavior of the Mass object class invoked. -SELECT P.Id, P.Capacity FROM Products P Mass WHERE P.Capacity > ’3 KG’ ORDER BY P.Capacity ASC; { Ounce, LB, Gram, KG . .} - Unit - Quantity Id::Product_Num Capacity::Mass B-SH-3001 B-PACK-411 35 ’11 LB’ ’50 LB’ float + Mass( String ) -> Mass ,+ Print( Mass ) -> String, + Equal( Mass, Mass ) -> boolean, + LessThan( Mass, Mass ) -> boolean, + LessThanOrEqual( Mass, Mass ) ->boolean, + GreaterThan( Mass, Mass ) ->boolean, + GreaterThanOrEqual( Mass, Mass ) -> boolean, + NotEqual( Mass, Mass ) -> boolean, - Compare( Mass, Mass ) -> INTEGER, + Plus( Mass, Mass ) ->Mass, + Minus( Mass, Mass ) ->Mass, Informix + Divide ( Mass, INTEGER ) ->Mass, user.conference + Times ( Mass, INTEGER ) -> Mass 2.3 Normalization • Normalization Concept & Procedure Same • Key, Whole Key, Nothing but the Key • Keys can be User-defined Types (of course) • New “corner cases”: spatial key (topography), temporal key • What about COLLECTION Attributes? • Good for domains of limited cardinality • { Work Days, Sizes_Available, Other_Phone_Nums } • Bigger sets, bigger domains, less useful • No Indexing! • Nested Tables are Right Out. COLLECTION works well for UDF varargs. Informix user.conference 36 2.4 Load and Test (and test) • Can Use ORDBMS to Generate Data INSERT INTO Products ( Name, Dimensions, Capacity, Available_In, SELECT MakeString( Random(1000), 24 ), Random_Dimensions(), Random_Mass(), Random_Color_Set(), Random_Currency() FROM TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N1 ( TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N2 ( TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N3 ( Price ) Num ), Num ), Num ); • Physical Tuning • IDS.2000 Little Different from 7.X • Remember to partition SBLOB data, as well as table data (text indexing particularly!) Informix user.conference 37 Organizing Development Teams • Developers Writing Extensions Get own Instance • Writing UDFs means debugging, which will halt the IDS instance. • Testing with the DBMS • Development Staging Process • Individual Instances, to Test Environment • only released to Production when happy. • Looks like Mainframe development. • Prototyping an Annealing • SPL pass 1, Java or ‘C’ for what’s slow Informix user.conference 38 Architectural View JDBC, ODBC, ESQL/C, RMI, Foo-ML, etc Java, C, etc Application Process JDBC, ODBC, ESQL/C, RMI Java or SPL Application Process UDF ‘C’, SPL, Java SQL JDBC (if Java) SQL Types in Schema Built-in, DISTINCT, or OPAQUE UDF ‘C’, SPL, Java Informix user.conference 39 Err..What was that? The Following Principles Seem to Work: • No SQL Inside a UDT Method • Objects are orthogonal to schema structure • Do not put SQL inside ‘C’ User-defined Routines • No performance advantage • Makes development hard • UI calls App Logic calls SQL Still Applies • Impedance mis-match between SQL DBMS and languages like Java are minimized. Informix user.conference 40 Implementation Tip # 4 • User-defined Functions Work in SPL • Any ‘C’ or Java (or SPL) UDF can be called as subroutine • Most useful with iterator functions CREATE FUNCTION GetFirstName ( Arg1 LVARCHAR ) RETURNING lvarchar; DEFINE lIntVals LIST(lvarchar NOT NULL); DEFINE lvRetVal lvarchar; LET lIntVals = Split(Arg1, ' ' ); FOREACH cursor1 FOR SELECT * INTO lvRetVal FROM TABLE(lIntVals) EXIT FOREACH; END FOREACH; RETURN lvRetVal; END FUNCTION; -EXECUTE FUNCTION GetFirstName (“John Fitzgerald Kennedy”) John Informix user.conference 41 On the Client Side • 4GL is Perfectly Reasonable for ORDBMS • OR-SQL queries can take & return strings & numbers, and can invoke stored procedures. • ‘C’, C++ • ESQL/C, ODBC, CLI are workable • DataBlade™ products ship with client libraries and header files • Java • JDBC 2.0 is majestic mojo • Type <-> Class mapping can make the interaction seamless Informix user.conference 42 ‘Organic’ Applications Another way to look at Database Development • What is an ORDBMS? • An ORDBMS is a Software Back-Plane . . . • supporting a declarative/query-centric interface. • How do you use it? • Combine objects to solve problems. • No more system-wide upgrades. • Component-centric development. Informix user.conference 43 Summary and Conclusions • ORDBMS Development Is • Best approached in a ‘holistic’ fashion. • Consider the design of at high-level (relational) • and low-level (objects and components) • More Implementation Tips http://www.informix.com/idn http://www.iiug.org/software http://examples.informix.com/ Informix user.conference 44 References and Further Reading INFORMIX Tech Notes: Q1, Q2 of 2000 contains two long papers describing this in more detail. Roy, Jacques Server-Side Programming in C: INFORMIX Dynamic Server 2000 INFORMIX Press. 1999 And lastly! Brown, Paul G. Developing Object-Relational Database Applications INFORMIX Press. Dec, 2000. Informix user.conference 45