Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DAMA International Symposium & Wilshire Meta-Data Conference GETITLE 1 2004 May 2-6 Los Angeles, California Conducting Database Design Project Meetings ---o--- Gordon C. Everest Carlson School of Management University of Minnesota © 2004 [email protected] Outline DBPROJ 2 • Database Design Project Meetings – Initial Meeting(s) followed by extended series of meetings – Process and Product – Explaining the Objective, Purpose, Principles, and Benefits of Data Modeling – Based on many actual experiences, but focusing on one – Concluding with Guidelines and Best Practices Then expand our view: • Interviews • Accelerated Group Meetings – Comparative study – Lessons learned • Advice from others – Simsion, Moody, Moriarty, Barden Data Modeling Project: The Context DBPROJ • Global view • Set Priorities 3 Enterprise Data Model PLANNING & ANALYSIS Feedback To flesh out piece by piece PRIORITIES, SCOPE Carve out a piece: DATA • • • • PROCESS / BEHAVIOR Understandable Doable Priority Greatest payoff DESIGN USER INTERFACE (MODELING) PLATFORM "REPOSITORY" DESIGN DATABASE CONSTRUCTION (GENERATION) OPERATION & MAINTENANCE The Process DBPROJ 4 • Global Architecture – Inventory and set priorities • Choose a User Application Area • Obtain User Top Management Support [ ]-> – INITIAL MEETING: with user area managers and experts, IS Dept representative, and database design expert/facilitator – Explain the project, process, deliverables, benefits, and expected project time duration (variable!) – Obtain required commitment of people • Conduct Kickoff Training Session • Begin an Extended series of Database Design Project Meetings Initial Meeting(s) DBPROJ 5 First with User Top Management, then with User Domain Experts. EXPLAIN THE FOLLOWING: • Objective of Data Modeling – To accurately and completely model a chosen user domain – Within a defined Scope • Purpose of Data Modeling – To understand the chosen user domain – Prelude to building a database • The Benefits of Data Modeling • The Process – Finding Entities (nouns) and Relationships (verbs) – Adding attributes (roles in relationships), and constraints • The Product – Design documentation – diagram and supporting narrative – Notational scheme Modeling DMOD MODEL = Abstract (Re).present.(ation) Reality (mental models) MODELING PROCESS Knowledge externalized, formalized, shared. MODEL What drives or guides the process? Re.present Knowledge in the head Knowledge in the world present 6 The Modeling Process DMOD 7 MODELING SCHEME METHODOLOGY: Steps/Tasks + Milestones + Deliverables + Real World Universe of Discourse perception selection/filtering REPRESENTATIONAL FORMS: Narrative, Graphical Diagram, Formal Language Statements (the Syntax) Context Constructs Composition Constraints MODELING PROCESS MODEL Data Modeling Constructs DMOD 8 What to look for: Relative emphasis differentiates Data Modeling approaches e.g. ER modeling focuses on Entities and Relationships, de-emphasizing or hiding Attributes. ENTITY RELATIONSHIP (OBJECT) IDENTIFIER ATTRIBUTE (Data Item) characteristics [ FOREIGN KEY ] characteristics Data Modeling Process DMOD 9 PERCEIVE Mental Model EXTERNALIZE Conceptual Model (ORM/ER diagram) map FORMALIZE Logical Data Model (relational tables) IMPLEMENT Physical Model (define database to a DBMS) Objective of Data Modeling DMOD 10 (WHAT we are trying to do) TO ACCURATELY AND COMPLETELY MODEL SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD) OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS. Purpose of Data Modeling DMOD (WHY we do it) DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS: 11 USE R • Facilitate Human Communication, Understanding, Validation – capture and present meaning, the semantics of a model – direct representation of only essential model semantics PRESENTATION CHARACTERISTICS: – scoping and presenting subparts of a Model – unfolding presentation at different levels of abstraction or detail – visual prominence in proportion to semantic importance SECONDARY: • Basis for Implementation - defining & creating a Database – complete in all the necessary details – construction/generation able to be fully automated SCHEMA DATABASE Purpose of Modeling Satzinger2e, SA&D, Fig 5.2, p.149. DMOD 12 FIRST STEP in the DESIGN phase of Systems Development (BUILDING)* • Capture semantics – all relevant, important details • Document – record and remember • Understand – learn, raise questions, record answers, refine • Communicate – shared with all interested parties – Users, stakeholders, management, developers • Validate – a complete and accurate representation – Internal validation – consistent with the modeling rules – External validation – Who can do this? • Blueprint to Build * Some say that Modeling begins in the Analysis phase. Data Model to Database Realization DMOD 13 Database Definition Language DATA MODEL DATABASE DEFINER data input DataBase Management System DDL stmts DataBase Management System DATABASE "Schema" DEFINITION describes DATABASE Data Modeling Principles DBPROJ 14 • Done at the highest conceptual level • Done at the schema level augmented with sample data populations • Involve all interested parties (not just one department or application) • Easier for users to learn data modeling than for IS professionals/data modelers to learn the business • Capture all possible/expressible semantics • Users’ (collectively) will always know more • Be inclusive (within the defined Scope) [ ]-> Stages of Data Modeling DMOD 15 Start at the highest Conceptual Level! USE R Domain Knowledge CONCEPTUAL ER CLUSTERED ORM “LOGICAL” Attribs in Records RELATIONAL • Objects MultiValued, • Obj. ID’s PHYSICAL Nested - - - - - -> Flat (1NF) • Roles/Relships Ternaries - - - - - -> Binary only • Implementation • (Fnl. Dep) in/for a DBMS M:N - - - - - - - - - -> 1:Many only NO clustering • Denormalize Normalized (2,3,4) Primary Keys (for performance) => NO “attributes” Relationships - - -> Foreign Keys + triggers, stored w/attributes procedures Sub/SupTypes SCHEMA DATABASE Record-Based Data Modeling DMOD 16 • Commonly called Entity Relationship (ER) Modeling • Attributes clustered into Entity Records (or Tables) • Focus on Entities and Relationships (hence ER) suppressing attributes in ER Diagrams (hence no explicit representation of identifiers) leaving open the nature of the intra-record structure. • Most general case allows: – “Nested” Multivalued attributes or repeating groups Hence not in first normal form (1NF) (should still satisfy other normal forms – 2NF, 3NF, …) – – – – Direct representation of M:N relationships between entities Attributed relationships (i.e., with attributes) Ternary (and higher) relationships Subtypes and supertypes • Restricting all of the above gives the Relational Model – Atomic (single-valued) attributes; binary relationships (FKey) => Often, ERDiagrams are Relational Table Diagrams Choosing a Relationship Notation DMOD 17 Everest-DM: p.224. Candidate suggestions for ‘one-to-many’ (1:M): ENTITY1 PARENT ENTITYy ENTITY1 ENTITY1 ENTITY1 ENTITY1 M 1 y=f(x) P M ENTITY2 1 CHILD ENTITYx ENTITY2 ENTITY2 ENTITY2 ENTITY2 Bachman 1969 Nijssen 1974 Chen 1976 Kroenke IDEF1X SilverRun CRITERIA: ENTITY1 • NOT imply direction, access path, or physical representation • Visually intuitive to aid human understanding • Printable • international The “Fork” ENTITY2 Everest 1976 Benefits of Data Modeling DMOD 18 • Users gain a better understanding of their area. • Greater system success with user involvement. • Platform for communication between users and designers. • Separation of information-oriented specifications from economic / performance / implementation considerations. • Determines the content of the database. • Solid base for information systems development. • Database more viable/stable; Greater evolvability for handling changes in the developed information system. • A basis for integration • Data modeling is a small part of the total IS development effort, but, when done “right,” can reduce overall development costs and downstream maintenance costs. When done poorly, the downstream impacts can be disastrous and costly. The Chosen User Area DBPROJ 19 After conducting a survey of existing applications and databases, evaluating them, and setting priorities. • Department of Transportation, Right of Way Division • Functions: – Appraisal, Direct Purchase, Leasing, Relocation, Sale, Demolition, Reconveyance, Legal Owners, Condemnation • • • • Manageable Scope; Not too Complex Great Need; Potentially High Payoff 100 People; Mostly Manual Operations One Large COBOL file (1971) on Magnetic Tape • 110,000 parcels of land; 250 attributes (Data Items) • Several Manual Files on Floating Carts The Data Modeling Process DMOD 20 GATHERING INFORMATION • Once the SCOPE and OBJECTIVES are set • and understanding the modeling constructs to use How to determine the INFORMATION REQUIREMENTS? • Where would you go? • Where would you look? • What would you look for? • Who would you talk to? • What would you ask? N Database Design Process – Two Approaches DMOD 21 BOTTOM-UP: TOP-DOWN: DFDs, Sample forms, reports, files, ... REALITY User Domain of interest LIST Look, Listen of data items “Data Dictionary” Perceive, Filter FIND ENTITIES The pivotal construct in Data Modeling CLUSTER DATA ITEMS ADD RELATIONSHIPS “Conceptual Model Diagram” Ask questions USER-DOMAIN EXPERT Talk echo validate DATABASE DESIGNER Different Kinds of Entity (Types) DMOD 22 • Independent / Base / Reference WATSON2-ch.7, p.176-9. – Exists / is of interest… for some duration of time – Frequently the starting point; most important to users • Dependent – Depends on some other entity(s) for existence, and – Perhaps for identification (Watson notation: ) • Association (“Intersection”) – Represents a Many:Many binary (or more) relationship – May be something meaningful in the users world • Event or “Transaction” – A happening at a point in time – Number of instances grows endlessly • Summary – to contain summary (derived) information • Generalization (“Aggregate”, Supertype) or • Specialization (“Subordinate”, Subtype) The Processing Continuum – Choosing Entities ISUSE 23 e.g.: Transaction EVENTS FLOW data Standing ENTITIES LEVEL/STATUS AGGREGATIONS DERIVATIONS SUMMARY data hire, fire sales Employee Product Inventory workforce growth stockouts • DESIGN ISSUE: calculating derived information – at input/update time - when transaction event captured & recorded – at output/retrieval time - when output data is requested • Sometimes we don’t record event transactions at all – of no interest – just record the effect of the event transaction, e.g. marriage • We don’t usually store summary data – calculated at retrieval request time – except in Data Warehousing/OLAP for better response time Steps in the Modeling Process DMOD 24 The A B C D E F G procedure: • Ask & Analyze • Bounce Back & Forth with/among user domain experts • Comprehend what they are saying; Verbalize • Design - Diagram & Document in Dictionary with narrative • Evaluate against rules of construction & user experts • Formalize in a Data Model (mapping for implementation) • Generate a definition for implementation in a DBMS List of Data Items (Bottom-up Design) DMOD ISDATAD 25 • UNORGANIZED, UNSTRUCTURED e.g. the “Data Dictionary” derived from DFDs • ORGANIZED, CLUSTERED Add ATTRIBUTES ... Customer Number Customer Name Billing Address Customer Phone Shipping Address Credit Limit Salesperson ID Salesperson Name Salesperson Address Salesperson Phone Commission Rate Order Number Order Date Ship Date Terms Gross Amount of Order Inventory Item Number Item Description Price Bin Location Quantity Ordered of … ENTITIES: RELATIONSHIPS CUSTOMER calls on SALESPERSON places ORDER contains ITEM ORDER LINE ITEM The Product DBPROJ 26 Documentation – produced according to a set of Guidelines (See Appendix to Everest paper) – structured to facilitate incremental updates - Hierarchical organization, dated, modular sections • Scope and Objectives – Use Cases; Major Processes (Setup, Retrieval & Reporting, Update/Maintenance/Transaction processing, Archival • Global Data Model Diagram – Top-down unfolding presentation • Narrative Description of: – Entities – Relationships – Attributes • Formal Definition in a Data Dictionary / Repository – Preferably using a CASE Tool • Generated Schema (DDL Script) for a target DBMS User Experiences and Activities DBPROJ 27 • Users get excited • Learning and Self-Confidence grew • Relationship with central IS support unit • One user forged ahead early • Anxious to buy equipment and install systems • The Product: Documentation - 40 entities, 400 pages • Used a CASE Tool to support data modeling Sample Data Model (Excelerator 1.9) DMODPRE 28 AUTHMAP Authorization Map MAINTDIST Maintenance District 1-4 COUNTY County Num | Code... ROADSECT Road Section Cty# |RS# AGREEMENT Agreement rare PROJECTS Project Actions RWPROJ R/W PROJECT 900's or Dash # 20% rare COMORDACT Commissioners Orders Action PMSSPROJ PMSS Project FEDPROJ Federal Project 10% usually 1 rare <99 rare PARCEL Interest in a Land Parcel COMMORDER Commissioners Order 10% 2 if EG m if 88 Minnesota DOT Right of Way Database Structure Gordon C. Everest INTHOLDER Interest Holder PARTY INT Party to Interest PARTY NAD Party Name & Address 0-2 APPACTION Appraisal Action & Cert APPRAISAL Appraisal APPRAISER Appraiser COMREPORT Commissioners Report EMDOMACT Em Domain Action: St vs. PETITION Petition & Lis Pendens FINALCERT Final Certificate TRIALSETL Trial and Settlement EDPARCTRK EmDom Parcel Tracking ? CHARGEID Charge Identifier <- last LEASE Lease 3% LEGEND COMMWORK Commissioner Hours Worked 3-5 rare One )----------E( many Dependent -- --D -- -Orphan -- -- -- -- F -- -Foreign ID -- -- -- -- --> COMMISSION Commissioner 5/yr COMASSIGN Commissioner Assignment OCCUPANT Occupant Relocation DIRPURCH Direct Purchase SUPHOUSING Supplemental Housing RELOCPMTS Relocation Payments & Appls LESSEE Lessee MEMBERS Household Members OCCATTRNY Occupant Attorney NAD 3% IMPROVEMENT Improvements on R/W Parcel latest V <.01 REMOVCONT Removal Contract SALESACT Sales Action CONTRACTOR Contractor OTHERBIDS Other Bids <3 Data Modeling DMOD 29 GUIDELINES for GATHERING & RECORDING Information: 1. PERCEPTIONS IN MINDS OF KEY USERS 2. EXISTING FILES/SCREENS/FORMS/REPORTS ONLY CLUES 3. DOCUMENTATION GUIDELINES Parts and organization Diagramming conventions 4. GROWING THE DOCUMENTATION: ENTITIES FIRST 5. DISCOVERING ENTITIES What is a file? 6. NAMING AND DESCRIBING ENTITIES 7. FOLLOWING THE RULES FOR LOGICAL DATABASE DESIGN 8. UNCONSTRAINED BY IMPLEMENTATION/SYSTEM LIMITATIONS 9. LOOKING FOR THE EXTREMES; NOT THE TYPICAL 10. SEEKING CONSENSUS AMONG THE USERS Conducting Data Modeling Project Meetings DBPROJ 30 BEST PRACTICES: • Get user top management support & commitment • Don’t limit to a fixed deadline • Get the “right” people to the table; ask what they ‘do’ • Set and agree on the project scope early • Be inclusive in the design • Break expectation that it will all be implemented • Biweekly, ½ day meetings • Focus on finding entities, relationships, & characteristics • Grow the documentation (not meeting minutes) following guidelines, modeling scheme, and notation • Facilitator – an “outsider” (know the process, not the domain) • Scribe – an “insider” (so organization takes ownership) CAUTION: must be open, balanced, willing to record all viewpoints • Use a data modeling CASE tool to iterate on revisions to diagrams and documentation Gathering Business User Requirements DBPROJ 31 from user domain experts (not IS people): • Interviews + everyone gets heard ° one at a time + requires less interviewee time ° small group (homogeneous) + interaction stimulates ideas • Facilitated Group Sessions ° Accelerated + less elapsed time (intensive 1-3 days) + creative brainstorming + raise issues + set priorities (voting) ? build consensus? resolve issues? ° Extended + advantage + to achieve common, accepted design Interviews vs. Group Meetings DBPROJ The “sweet” spots: Many # PARTICIPANTS 32 5-10 ACCELERATED (“JAD” session) for brainstorming, straw votes, and setting priorities EXTENDED for Database Design 2-3 Managers Executives Interviews Visionaries 1 1 2 (Follow-up) # MEETINGS Many Interviews: Preparation DBPROJ 33 • Understand Background – the business - its strategic direction – the industry - trends, competition – the organization - formal and real organization charts – the history - any prior initiatives ––> still IS people/interviewers DO NOT presume to know everything, and DO pretend to know nothing (to ask the “dumb” questions). • Select Interviewees – horizontal and vertical cross section – the visionaries; the thorns in the side; the power users • Project Kick-Off Meeting – with impacted users and their management – introduced by user management sponsor – convey commitment, scope, expectations, required user involvement • Pre-Interview Letter – from project sponsor: internal respected authority – logistics and what to bring • Plan a Structured Interview Conducting the Interview DBPROJ 34 • Think through what you need to discover • Prompting single sheet of topics/questions • Lead Interviewer + Scribe + Observers • REVIEW Project Purpose and Scope • Let USERS TALK about what they DO, what they know (stay within their comfort zone… initially) • then LISTEN carefully for expressions of: – vision, strategies, priorities, strengths, problems, suggestions for improvement, … • ASK the classic questions: – why, how (much), who, where, when, what if, what then. • FLAG the nouns and verbs – Nouns become entities – Verbs become relationships Accelerated vs. Extended Design Approaches DBPROJ 35 TASK SCOPE SCHEME • DATA PLANNING • DETAILED DESIGN • Division-wide data model • Forest inventory database • Entity-Relationship Modeling • Extended E-R Modeling APPROACH DURATION PEOPLE ORGS LEADER/S • Accelerated Workshop • Extended Project Meetings • 5 consecutive days • Biweekly 1/2 day - 6 months • 76 participants from • 11 participants from • Forestry + 10 other agencies • Forestry, Fish & Wildlife • 2 facilitators (also as scribes) • 1 facilitator (also as scribe) Results: Accelerated Approach for Data Planning & Modeling DBPROJ 36 TASK: • Intro / Kickoff / Training • Define ENTITIES • Define ATTRIBUTES • Define RELATIONSHIPS • Partition and Prioritize TIME (days) Planned Actual 1 1 1 2 1/2* 1 3/4 1 3/4 1 0 *Difficult and contentious, so facilitators decided arbitrarily to move on. Entity definitions were incomplete, missing, or poorly stated, with no consensus reached -- which hindered definition of attributes and relationships in remainder of workshop. A global data model not produced, nor detailed design projects defined and prioritized. The contractor promised to develop these later in the final report of the workshop. User Surveys DBPROJ 37 • Data planning workshop unsatisfactory, final report omitted "where used" matrix and global data model was useless. Contractor released. No user validation. • Comparison of user survey results confounded by contractor's apparent lack of experience, preparation, organization and management of the workshop. Novice facilitators in both approaches. • Extended bi-weekly meetings: participants willing to do it on another project, felt this project was completed but uncomfortable stating that a good data model had been produced. Lessons Learned DBPROJ 38 • Accelerated approach may be good for eliciting information requirements and setting priorities, not for database design. • Difficult to reach consensus with a broad scope ... necessitating 76 participants. • Experienced, prepared facilitator is critical... the accelerated approach is unforgiving for the novice. • Clearly define and communicate organizational goals, expectations, and outcomes / deliverables. • User domain experts: get the best; use as needed. • Top management support to ensure good participation. • Facilitator: expert in the process, but not the domain. • Dedicated scribe from within orgn… to take ownership. • Select the first design project with a manageable scope to ensure success, and increase future mgmt and user buy-in. • Consider using a blend of the two approaches. Advice from Others DBPROJ 39 • • • • Terry Moriarty Dan Moody Graeme Simsion Dick Barden Conflicting Objectives in IS Development DBPROJ 40 T. Moriarty, “… Data Modelers!” Intelligent Enterprise (3:1), 2000 Jan. • User Domain / Subject Matter Experts (SME’s) // Application Systems • (Business) Process Analysis • Process Models (DFD’s) Object-Oriented Development • • • • Implement in OO Programming Languages Object Models; Use Cases (UML) Class Diagrams State Transition Diagrams Data Warehousing / Data Marts • Multi-dimensional models Data Modeling • Focus on Data • Singular Objective (NOT implementation) • Precise Thinking • Rich Semantics -probe for hidden meaning • (Shared) (Integrated) “Enterprise” Models • Normalized ER Diagrams RELATIONAL DATA MODELS • Implementation in RDBMS What do Data Modelers bring to the table? – Strengths and Perceived Disadvantages How to get invited to the table, to be involved in IS Development? Dan Moody’s “Seven Habits” DBPROJ 41 • IMMERSE yourself in the client/user environment – See it for yourself • CHALLENGE • • • • • – Generate alternatives, test the boundaries, find the exceptions GENERALIZE, discern the underlying similarities of entities – Keep it simple, reduce the number of entities TEST out the model; have users validate the model – Examine every relationship … in both directions LIMIT the Time and set the Scope up front – Know when to stop INTEGRATE with existing systems and databases – Keep an eye on the big picture COMPLETE – resolve ambiguities; handle the exceptions – Follow the job through to completion Daniel MOODY, “The Seven Habits of Highly Successful Data Modelers,” Database Programming and Design (9:10), 1996 October, pages 57 – 64. Summarized in: Richard Watson, Data Management, 2nd ed, Wiley, 1999, p.185. Simsion’s Foundation Principles DBPROJ 42 G. Simsion, Database Programming & Design (9:2), 1996 Feb. • Data Modeling is about Design – Different designers may produce different solutions there is no single correct model for a given situation; thus need quality criteria to make an objective choice. [ ]-> • Data Modeling is Important… and NOT Optional – Data modelers believe it; problem is persuading other stakeholders • Data Modeling is a Discipline… requiring expertise – Requiring Training, Practice, Experience. … Users can’t model! But.. NOT just knowing and applying some rules and conventions; witness the difficulty in using data modeling CASE tools • Data Modelers use Patterns … DW is a dimensional model – e.g., hierarchies, M:N, assemblies (ring fact), orders/warehouses • Subtypes help… Level of Generalization is critical • Logical DB Design is the Data Modeler’s Responsibility – DBA’s for physical design, implementation in a DBMS, performance • Corporate (Enterprise) Data Modeling is different – Purpose – understand global architecture; integrate; set priorities Data Modeling as Design DBPROJ 43 • • • • “Data Modeling is a Design activity” – Graeme Simsion Analysis seeks to discover the (one) truth, represented in a model Our perceptions of reality differ; Modeling Schemes are imperfect Design involves Choice – e.g., Entity/Object Types; Sub/Supertypes • Need Criteria Criteria for Choosing a Quality Design DBPROJ 44 • Follows the rules of construction (“grammar”) of the data modeling scheme. • Accurate model of the users real world domain of interest • Complete… within the defined scope • Enables enforcement of (business) rules • Non-redundant • Stable • Flexible • Extensible • Understandable • Simple • Unambiguous • Basis for an efficient, workable implementation SOME OF THESE IN CONFLICT and INVOLVE TRADEOFFS. “Baloney Detection Kit” – Dick Barden DBPROJ 45 Adapted from Carl Sagan, “The Fine Art of Baloney Detection,” The Demon-Haunted World: Science as a Candle in the Dark, 1996. 1. Seek out independent confirmation of the ‘facts’ 2. Encourage substantive debate on the evidence 3. Be fair to the process; treat each expert equally 4. Spin more than one way of looking at your UoD 5. Seek out others for critical feedback – challenge 6. Populate – gather example data for the facts 7. Everything in a chain of argument must fit 8. Use the simpler one when two equally model the data 9. Ask how the examples can be falsified 10. Can others understand and accept the model References DBPROJ 46 • Matthew H. Pelkki, Gordon C. Everest, Dietmar W. Rose, “Using Accelerated and Extended Approaches for Data Planning and Design,” The Compiler (13:3), 1995 Fall. • Terry Moriarty, “Data Modeling is Dead! Long Live Data Modelers!” Intelligent Enterprise (3:1), 2000 Jan 1. • Daniel Moody, “Seven Habits of Highly Effective Data Modelers,” Database Programming and Design, 1996 October. • Graeme Simsion, “Data Modeling: Testing the Foundations,” Database Programming & Design (9:2), 1996 February. • Dick Barden, “Baloney Detection Kit,” Journal of Conceptual Modeling (10), 1999 August. www.inconcept.com/jcm