Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CvLvP - May 31, 2016 Stages of Data Modeling Page 1 Presentation to DAMA, Minnesota, 2016/06 DAMA - Minnesota, 2016 June GETITLE 1 Conceptual vs. Logical vs. Physical Stages of Data Modeling Gordon C. Everest © Professor Emeritus of MIS and Database Carlson School of Management University of Minnesota [email protected] http://geverest.umn.edu Outline 2 CvLvP Goals of this presentation [slide#] • Levels of Data Models - Conceptual vs. Logical vs. Physical Data Models - Role of Abstraction in Conceptual Models; examples [4] [10] [23] [30] [35] [40] • • • • Data Modeling Data Modeling Schemes Data Models – focus and name for Stages of Data Models/Modeling - A continuum of introducing modeling constructs - Starts with a user narrative => elementary fact sentences Objects (nouns) – instances, types, populations, sub/supertypes Relationships (verbs) => Characteristics and Constraints Attributes – where do they fit? Identifiers, Keys, Foreign Keys N © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 2 Presentation to DAMA, Minnesota, 2016/06 Stages of Data Modeling: Introducing Data Elements or Constraints 3 A CvLvP B ? Fact Modeling SCOPE S/STYPE CONSTRAINTS User Narratives NOUNS OBJECT Instances OBJECT TYPES SENTENCES “FACTS” SUB/SUPER TYPES OBJECT NAMES Relationship NAMES RELATIONSHIP TYPES VERBS CONSTRAINTS defined after element introduced RESOLVE to Tables DEPENDENCY DISTRIBUTION DATA TYPES Object IDENTIFIERS COLUMN NAMES ATTRIBUTES ROLE NAMES MULTIPLICITY Physical Modeling PARTITIONING OBJECT Population CONSTRAINTS Relationship & Role Set CONSTRAINTS C ER/Relational Modeling BINARY only CLUSTER ENTITY Records INDEXES KEYS FOREIGN KEYS ( 1NF ) Relational TABLES DENORMALIZE Common Understanding - Levels 4 CvLvP See David HAY video • Conceptual – high-level, enterprise-wide, abstract model • Physical – How data is stored in some database system • Logical – adding detail to the conceptual model, … free of physical implementation details which do not contribute to the logical understanding of the data model. – Often considered the ER or Relational Model. Generally depicted as a pyramid, implying levels of models: Conceptual Logical Physical Let’s look at the generic meaning of these terms, but first… © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 3 Presentation to DAMA, Minnesota, 2016/06 Conceptual Data Model? Logical? DMODPRE 5 MAINTDIST Maintenance District 1-4 COUNTY County Num | Code... AUTHMAP Authorization Map ROADSECT Road Section Cty# |RS# RWPROJ R/W PROJECT 900's or Dash # 20% PMSSPROJ PMSS Project FEDPROJ Federal Project 10% usually 1 rare <99 rare 10% 2 if EG m if 88 PARCEL COMMORDER Commissioners Order PETITION Petition & Lis Pendens FINALCERT Final Certificate CHARGEID Charge Identifier ? LEGEND Minnesota DOT Right of Way Database Structure Gordon C. Everest INTHOLDER Interest Holder PARTY INT Party to Interest PARTY NAD Party Name & Address APPRAISAL Appraisal <- last OCCUPANT Occupant Relocation DIRPURCH Direct Purchase SUPHOUSING Supplemental Housing APPRAISER Appraiser RELOCPMTS Relocation Payments & Appls LESSEE Lessee MEMBERS Household Members OCCATTRNY Occupant Attorney NAD N Conceptual Data Model? DMODPRE 6 PARCEL Interest in a Parcel of Land N © Gordon C. Everest, All rights reserved. TRIALSETL Trial and Settlement EDPARCTRK EmDom Parcel Tracking LEASE Lease 3% One )----------E( many Dependent -- --D -- -Orphan -- -- -- -- F -- -Foreign ID -- -- -- -- --> COMREPORT Commissioners Report EMDOMACT Em Domain Action: St vs. Interest in a Land Parcel 0-2 APPACTION Appraisal Action & Cert COMMWORK Commissioner Hours Worked AGREEMENT Agreement rare rare COMORDACT Commissioners Orders Action COMMISSION Commissioner 5/yr 3-5 rare PROJECTS Project Actions COMASSIGN Commissioner Assignment 3% IMPROVEMENT Improvements on R/W Parcel latest V <.01 REMOVCONT Removal Contract SALESACT Sales Action CONTRACTOR Contractor OTHERBIDS Other Bids <3 CvLvP - May 31, 2016 Stages of Data Modeling Page 4 Presentation to DAMA, Minnesota, 2016/06 Conceptual - Definition 7 CvLvP CONCEPTUAL: -- Mirriam-Webster, Dictionary.com • Consisting of, relating to, concerned with… Concepts*; abstract. • Concerned with the definitions or relations of concepts, rather than the facts. Synonyms: theoretical, visual, imaginary. Antonyms: real; facts. *CONCEPT: • an idea of what something is or how it works; something formed in the mind; a mental image. If mental, how do we document, communicate? If entity/object, relationship, identifier, domains – already logical? If add attributes, foreign keys – now Relational. Logical - Definition 8 CvLvP LOGICAL • Of or according to the rules of logic or formal argument; characterized by or capable of clear, sound reasoning. Synonyms: natural, reasonable, sensible, understandable* Logical Data Model – a model of some user domain** complete and understandable in the detail needed to represent that domain, built according to and consistent with some formal modeling scheme, within a defined scope. *Understandable - defined, documented, communicated. **area of the business being modeled - real world, user world, domain of discourse, subject area, … © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 5 Physical Data Model 9 CvLvP • How data will be encoded and stored • Implemented in some data system (DBMS, NoSQL…) • Dealing with storage & processing performance, volumetrics (time & space), partitioning, distribution. Physical vs. Logical separation • Historically, to better understand physically stored data existing on punched cards, tape, etc. the notion of a logical representation was introduced to strip away storage considerations and focus on documenting just the logical aspects of the data. • Logical derived from, a representation of… the Physical Physical Data Model – a stored representation of a Logical data model Abstraction 10 CvLvP ABSTRACTION* = “leaving something out”; Hiding In Designing/Developing a Data Model: (can’t do it all at once) • Start with high-level preliminary sketches (top down) Details are still presumed to be present, yet to be added • Work on one part or subject area at a time • Could also start with some details (bottom up) • Once built, (how) do you maintain the Conceptual Model? Useful? In Presenting a Data Model: (already completed in all its detail) • Start with a high-level view, then successively add detail –> VERTICAL ABSTRACTION • One part at a time –> HORIZONTAL ABSTRACTION N *Webster Dictionary: abstract. (n) summary; shortened version. abstraction. (n) the act of taking away (v) to take out, remove something. (adj) as in abstract object (vs. concrete) - a different meaning, not useful here. © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 6 Presentation to DAMA, Minnesota, 2016/06 Sample Data Model DMODPRE 11 CvLvP FOCUS MAINTDIST Maintenance District 1-4 COUNTY County Num | Code... AUTHMAP Authorization Map ROADSECT Road Section Cty# |RS# RWPROJ R/W PROJECT 900's or Dash # 20% PMSSPROJ PMSS Project FEDPROJ Federal Project 10% usually 1 rare <99 rare PARCEL Interest in a Land Parcel COMMORDER Commissioners Order 10% 2 if EG m if 88 LEGEND INTHOLDER Interest Holder One )----------E( many Dependent -- --D -- -Orphan -- -- -- -- F -- -Foreign ID -- -- -- -- --> Minnesota DOT Right of Way Database Structure Gordon C. Everest PARTY INT Party to Interest PETITION Petition & Lis Pendens FINALCERT Final Certificate CHARGEID Charge Identifier ? PARTY NAD Party Name & Address <- last OCCUPANT Occupant Relocation DIRPURCH Direct Purchase SUPHOUSING Supplemental Housing APPRAISER Appraiser TRIALSETL Trial and Settlement EDPARCTRK EmDom Parcel Tracking LEASE Lease APPRAISAL Appraisal COMREPORT Commissioners Report EMDOMACT Em Domain Action: St vs. 3% 0-2 APPACTION Appraisal Action & Cert COMMWORK Commissioner Hours Worked AGREEMENT Agreement rare rare COMORDACT Commissioners Orders Action COMMISSION Commissioner 5/yr 3-5 rare PROJECTS Project Actions COMASSIGN Commissioner Assignment RELOCPMTS Relocation Payments & Appls LESSEE Lessee MEMBERS Household Members OCCATTRNY Occupant Attorney NAD IMPROVEMENT Improvements on R/W Parcel 3% latest V <.01 REMOVCONT Removal Contract SALESACT Sales Action CONTRACTOR Contractor OTHERBIDS Other Bids <3 What could improve this presentation? N DMODPRE 12 CvLvP 2.a HORIZONTAL ABSTRACTION - Partitioning • Fencing off a part of the Diagram: – Often helpful to have some overlap of the partitions PARCEL Interest in a Land Parcel 3% 0-2 APPACTION Appraisal Action & Cert APPRAISAL Appraisal APPRAISER Appraiser © Gordon C. Everest, All rights reserved. DIRPURCH Direct Purchase CvLvP - May 31, 2016 Stages of Data Modeling Page 7 Presentation to DAMA, Minnesota, 2016/06 2.b DEPTH – Vertical Levels of Abstraction DMODPRE 13 B AUTHORIZATION MAP (GRAPHIC) CvLvP DISTRICT CONSTRUCTION PROJECT Drilling down on parts for increasing levels of detail. PARCEL OF LAND INTEREST IN A PARCEL OF LAND INTEREST HOLDER "OWNER" DIRECT PURCHASE OFFER CERTIFIED APPRAISAL Adding Attributes: APPRAISER APPRAISAL NAME APPRAISER How many objects here? CONTRACTOR SUPPLEMENTAL HOUSING PAYMENT RELOCATION OCCUPANT APPRAISAL APPRAISAL ACTION LEGAL AGREEMENT COMMISSIONERS ORDERS ADDRESS RATINGS FEE RATES N IMPROVEMENTS ON LAND PARCEL CONDEMNATION ACTION APPRAISER: ID NUM NAME, PERSON ADDRESS, MAILING PHONE ALTPHONE NAME-COMPANY (OPT) DATE OF LAST APPRAISAL (der) QUALIFICATION RATING EVALUATION RATING TESTIMONY RATING HOURLY FEE WORK AGREEMENT NAM EXPIRATION DATE IMPROVEMENT REMOVAL CONTRACT SALES ACTION CONTRACTOR OTHER BIDS HECB Student Database DMODPRE 14 FIRST: What is missing from this diagram -- two things? Is this how you would first present the data model to your users? What entity or entities are the most important? © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 8 Presentation to DAMA, Minnesota, 2016/06 HECB Student Database DMODPRE 15 Unfolding detail from the most important: PAST, PRESENT or PROSPECTIVE ? STUDENT POST-SECONDARY EDUCATIONAL INSTITUTION PROGRAM FINANCIAL AID (Degree/Diploma/Certif) ENROLLMENT COMPLETION Student-Course High-Level Data Model DMOD 16 Start with major: • Entities, and • Relationships STUDENT COURSE © Gordon C. Everest, All rights reserved. INSTRUCTOR CvLvP - May 31, 2016 Stages of Data Modeling Page 9 Presentation to DAMA, Minnesota, 2016/06 Student-Course Data Model DMOD 17 Adding Intersection Entities: • to resolve M:N Relationships • to store additional attributes STUDENT REGISTRATION in > COURSE OFFERING INSTRUCTOR COURSE Student-Course Data Model DMOD DMODPRE 18 StudentID STUDENT Name • Adding Attributes Address Major Year Term Section REGISTRATION in > COURSE OFFERING Building Room Days-of-week TimeStart Grade TimeEnd SSN Course# Title Credits Name COURSE INSTRUCTOR Address Phone Dept © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 10 Presentation to DAMA, Minnesota, 2016/06 Extended Student-Course Data Model B5 DMOD DMODPRE 19 ISBN StudentID STUDENT TEXTBOOK Name Address (City, State, Zipcode) Major Title Author(s) Course# (FK) Year Term ? REGISTRATION in > Section AUTHOR Building COURSE OFFERING Room Days-of-week TimeStart StudentID (FK) TimeEnd CourseOffID (FK) Grade SSN Course# Title Name COURSE INSTRUCTOR Credits Description Address Phone Dept (FK) DeptNo Name Office Number DEPT Student-Course Database - Table Diagram DMOD 20 Adding Attributes & FKeys. Diagram of the Schema: STUDENT Student ID Name Address Major GPA REGISTRATION Course ID Student ID Grade COURSE OFFERING Course# Year Term Section Building Room Days Time Start Control Enrollment Instructor SSN COURSE Course# Title Description Credits INSTRUCTOR SSN LastName FirstName Address Phone Dept LEGEND: ENTITY NAME (upper case) What if you move the arrow head to the other end of the arc? © Gordon C. Everest, All rights reserved. Identifier (bold face) Attributes (not bold face) Foreign Key Identifier M:1 relationship CvLvP - May 31, 2016 Stages of Data Modeling Page 11 Presentation to DAMA, Minnesota, 2016/06 ORM Data Model - Presentation DMODPRE 21 earns works in EMPLOYEE (number) BOSS SALARY (dollars) paid to DEP employs T (number) supervises is headed by reports to superior to ac may spend up to of spending for "EmployeeSkill!" possesses <=5 possessed by LIMIT { 1000 .. 9999 } SKILL has (code) DESCRIPTION (name) is of So, present the ORM model using a series of top-down unfolding … abstractions. { 1 .. 10 } RATING with proficiency of assigned to A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottomup design, BUT… ER Diagrams usually hide the details of attributes and most constraints. Abstractions of ORM Data Model DMODPRE 22 earns 1. Hide "Terminal" (M:1) Objects (=> Attributes) SALARY (dollars) paid to 2. Hide Reference Modes DEPT works in EMPLOYEE (number) employs (number) DEPT 3. Hide Constraints BOSS supervises is headed by reports to superior to ac may spend up to of spending for "EmployeeSkill!" possesses <=5 possessed by LIMIT { 1000 .. 9999 } SKILL SKILL has (code) is of DESCRIPTION (name) Is this the same data model we started with? { 1 .. 10 } with proficiency of assigned to RATING 4. Hide Less Important Objects & Predicates - Subtypes - Objectified Predicates - Reflexive Relationships 5. Hide all Predicates Leaving BASE Entities! 6. Add back Multiplicity char. on relationships => A High-level Abstract “Conceptual” Data Model... an ER Diagram ?!!! © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 12 Levels (or Stages?) of Data Models DMOD 23 CvLvP • Reality - the real world User Domain, infinitely complex • Mental Model - in our minds – must be formally documented so we can communicate it to others • Conceptual Model - "natural", unconstrained, initial. – independent of physical storage and implementation • Logical Model - according to a modeling scheme – e.g., the E-R or Relational Model (most popular today) • Physical Model - defining storage characteristics – Encoding, storage structure and access methods (indexes, etc.) • Implementation Model - for a given DataStore Manager – memory organization (blocking, buffering, partitioning, distribution, etc.) Objective of Data Modeling DMOD 24 (WHAT we are trying to do) TO ACCURATELY AND COMPLETELY MODEL SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD) (the USER DOMAIN) OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS. © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 13 Presentation to DAMA, Minnesota, 2016/06 Modeling: is Choosing... DMOD 25 REALITY is Infinite, Complex, Multidimensional, Detailed. - so we must CHOOSE: • SCOPE / Boundary - where to look • FOCUS - what to look for • DEPTH / Resolution - how much detail to look for ... based upon our PURPOSE A Model is an Abstraction DMOD 26 CvLvP TWO PERSPECTIVES of Data: Abstract “Conceptual” View of the Real World Mental Model “Logical” “DATA” MODEL Physical (Storage) Model Concrete Symbols Stored on some Medium REALIZATION Both realities are infinitely complex. NEED some constructs to look for and use in modeling. Sometimes we have the data, and try to find what it means. © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 14 Presentation to DAMA, Minnesota, 2016/06 Modeling Process – the HOW DMOD 27 MODEL = Abstract (Re).present.(ation) (infinitely complex) (mental models) Reality MODELING PROCESS Knowledge externalized, formalized, shared. MODEL Re.present Knowledge in the head present Knowledge in the world What drives or guides the process? The Modeling Process DMOD 28 MODELING SCHEME METHODOLOGY: Steps/Tasks + Milestones + Deliverables + Real World Universe of Discourse perception selection/filtering REPRESENTATIONAL FORMS: Narrative, Graphical Diagram, Formal Language Statements (the Syntax) N Context Constructs Composition Constraints MODELING PROCESS MODEL The Semantics are most important The SEMANTICS of a data model can only be seen through the presentation, the SYNTAX. © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 15 Presentation to DAMA, Minnesota, 2016/06 Data Model to Database Realization DMOD 29 Database Definition Language DATA MODEL DATABASE DEFINER data Input & Query DDL stmts DataBase Management System DataBase Management System DATABASE "Schema" DEFINITION describes DATABASE Data Modeling Schemes 30 CvLvP • ALL data modeling activity and data management tools are driven or guided by some Data Modeling Scheme • Think of it as a Meta Model (or Meta-Meta-Data) • Tells you what to look for, what constructs to use, how to put them together (compose) with what constraints, and how to represent that all syntactically. • Logical Rules however formal or informal • May be developed independently of any implementation Not based on any particular implementation • Many variants within families Since all data modeling is driven by some modeling scheme, i.e., by some logical rules for building a model, All data models are logical models! © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 16 Presentation to DAMA, Minnesota, 2016/06 The Many Faces of Databases DMOD 31 Multi-Dimensional Fact Modeling ObjectOriented 6 ANSI SQL 5 CODASYL (M) (ORM) “Star” schema (UML) Database Multi-File B Snowflake 7s Relational 7 8 Network 4 3 No File Hierarchical Single File 2 File (COBOL) (0) Flat File (FORTRAN) (1) 1 What do all these have in common?-----> Logical Database Structures © Gordon C. Everest Data Modeling Schemes – Types & Examples 32 CvLvP • Developed along generation lines SCHEME Examples Flat file Fortran (1956?), spread sheet (VisiCalc, Multiplan>Excel, Quattro) Hierarchy COBOL*(1960), System 2000, HQL* Network O-O ext. CODASYL*(1971), IDMS, ANSI-NDL*, IMS (DL/1), Adabas OO-COBOL*, ANSI-SQL:1999*, UML* E-R Chen*(1976), IE*(Finkelstein), Barker, IDEF1X*(ERwin), ER Studio, Relational (SQL) Codd*(1970), SEQUEL*(1976), Oracle (1979), DB2, ANSI-SQL*(1986), Sybase, SQL Server, Dbase II (Inverted) Fully indexed - not really a logical scheme, Model 204, CASE 360(IBM) Dimensional As a “Cube”: EXPRESS(6) (MDS>IRI>Oracle), Multiplan(3), MicroStrategy As a Relational Model = Star Schema*(R. Kimball), Red Brick Fact-Based NIAM*(1976), ORM*(1989), NORMA, FCO-IM NoSQL A family of tools to overcome the limitations of SQL tools. Each tool has its own modeling scheme – key-value pair, columnar, document (XML, Hierarchical), graph (nodes & edges). *initially not an implementation but a concept paper or language specification © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 17 Modeling Schemes in NoSQL Tools 33 CvLvP • NoSQL refers to a family of tools designed to handle Big Data, “Unstructured” Data, Fast(er than SQL tools) • Based on particular data storage schemes • Vendors have augmented their Relational/SQL tools with some of these storage schemes, and other models • One driver is OO programming languages which handle objects of varying structure and complexity. Mapping OO to a relational structure is inadequate. Physical Scheme Examples Key-Value pair Value - any complex structure Dynamo, Redis, Riak, LevelDB - an index Graph (O1 O2 R – triplets) Neo4j, OrientDB, Infinite Graph, Mark Logic “Document” (XML, JSON) MongoDB, CouchBase, Mark Logic (Wide) Column stores Cassandra, HBase Inverse (“dual”) of Tables Criteria for a Data Modeling Scheme DMOD 34 • Simple, understandable – for human communication • Comprehensive – can model every phenomenon in the user domain, e.g., overlapping populations ==> generalization • Direct – visually intuitive, unambiguous e.g., “Fork” for manyness ───< – without spurious, artificial, intermediate constructs e.g, intersection entity (for M:N), foreign key (redundant with an arc) • Minimal – at most one way to model a given phenomenon • Consistent – uses same syntax for similar phenomenon e.g., for dependency within a record, between records, and S/Stypes • Universal – independent of language © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 18 Outward Facing vs. Inward Facing 35 CvLvP Outward Facing – to the business user domain Inward Facing – to existing, stored data • Historically we had data on punched cards or paper tape, and needed a representation which transcended its physical storage, hence, logical data models. Still inward facing. • Next we found logical models too complex and needed to simplify, particularly at the beginning stages of development, hence conceptual data models, even before logical models. • Then we realized that these models were really representations of things in the business user domain, hence outward facing. • The modern approach to data modeling is to begin by modeling user domains independent of any physical storage or implementation considerations. • More recently, we collect massive amounts of data (BIG data), it exists. Now the challenge is to process it efficiently (hence NoSQL tools), and apply analytics to make sense of it. NOTE: Someone designed the stored data, so where are the definitions? Data Model – Outward Facing 36 CvLvP Initially a data model is outward facing, to the business • Whether modeling big data, fast data, thick data, NoSQL data, Relational data (SQL) … (these are all representations for physical implementation) you still need to know, understand and document the business. • The “first stage” data model is a fully detailed model of the business independent of any physical implementation BUT… capturing rich, detailed semantics which describe the user domain in the model. © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 19 “Data” Modeling is NOT about: 37 CvLvP • Scope – what do we mean by “enterprise-wide”? • Simplified, high-level, abstract, “conceptual” – Whether in data model development or – A matter of presentation, choosing to hide detail. • Syntax – chosen notation to represent a Data Model – Data Modeling is about Semantics – meaning – The same semantic can have several notations - e.g., multiplicity in a relationship: --<(fork), ‘M’, *, -->> • Storage, Physical Implementation, Performance – Only exogenous information to represent the user domain – No unnecessary, artificial, spurious constructs introduced on the path to implementation, e.g., FKey, 1NF, entity records! – User-facing, NOT database/datastore-facing Though these are all important aspects of Data Modeling. What to Call our model? 38 CvLvP • Conceptual is a scoping and presentation issue • Physical is not part of the Data Model • Logical is what we are left with. But all models are logical! • Data Model - but not always a model of data, particularly when the database has not yet been built None of these adjectives are helpful when referring to data models so… How do we distinguish types of data models? What do we call the initial complete data model? © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 20 A Business Data Model 39 CvLvP For our initial but complete, detailed data model • Capturing all exogenous* information about the user domain which is of interest • Capturing only exogenous information about the user domain • Our mental models need to be externalized, and formally documented to be communicated. • Hence, we need a modeling scheme with a rich Syntax to represent the Semantics of the user domain in the model • Devoid of anything relating to physical storage, technology, encoding, implementation, etc. Let’s call it a “Business Data Model” (G. Witt) Halpin calls it the Conceptual Data Model. * relating to, developed or derived from external factors; originating from outside Introducing Design Elements 40 CvLvP As we move through the continuum of: Conceptual ==> Logical ==> Physical • How to rationalize the many differences and alternatives in logical data models? • Logical data models differ based on which modeling elements are included in the model i.e. the modeling scheme SO • Let’s lay out the various design elements in a precedence graph reflecting order of introduction © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 21 Presentation to DAMA, Minnesota, 2016/06 Data Modeling Constructs DMOD 41 What to look for: Relative emphasis differentiates Data Modeling Schemes • ER modeling focuses on Entities and Relationships, de-emphasizing, even hiding Attributes. • Relational (restricted ER, 1NF) focuses on Entities and Attributes, relegating Relationships to Foreign Keys. • Object Role Modeling (ORM) folds Attribute and Entity into Object ENTITY RELATIONSHIP (OBJECT) IDENTIFIER [ FOREIGN KEY ] characteristics ATTRIBUTE (Data Item) What about VALUE ? characteristics N Traditional “Levels” of Data Models 42 CvLvP Conceptual Logical Reality User Domain Mental Model Physical | | | | Database Managed Datastore Are intersection/associative entities or foreign keys part of the logical model or the physical model? Let’s forget levels, and focus on the ordering of design elements N © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 22 Presentation to DAMA, Minnesota, 2016/06 Introducing Elements of a Data Model 43 CvLvP Point at which we have all the essential exogenous semantic information from the Reality user domain to build User Domain a complete data model =====> Mental Model What to call it? Business Data Model _________ All of these are logical becoming physical. Continuum for introducing Elements of a Data Model Database =====> Point from which we begin to introduce additional data elements to physically implement the data model (build a database). N Data Modeling Constructs 44 CvLvP • “Things” – objects, entities, attributes – Names of things – Populations (or types) of things, Domains (of values) – Subtypes/Supertypes to model overlapping populations • Relationships – Names of relationships, Names of Object Roles – Characteristics/constraints (dependency, multiplicity) – Ternary+++ relationships ________________________________Business Data Model. moving to implementation • Entity records – Clustered attributes (based on relationships) • Identifiers – Encoded representation of instances of things (IDs) – Keys, Foreign keys © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 23 Presentation to DAMA, Minnesota, 2016/06 Stages of Data Modeling: Introducing Data Elements or Constraints 45 CvLvP SCOPE SUB/SUPER TYPES S/STYPE CONSTRAINTS OBJECT NAMES OBJECT Population CONSTRAINTS User Narratives OBJECT Instances OBJECT TYPES NOUNS SENTENCES “FACTS” Relationship NAMES RELATIONSHIP TYPES VERBS CONSTRAINTS defined after element introduced Object IDENTIFIERS Relationship & Role Set CONSTRAINTS OBJECT ROLE NAMES CLUSTERING MULTIPLICITY DEPENDENCY ? Fact Modeling Stages of Data Modeling: Introducing Data Elements or Constraints 46 A CvLvP B ? Fact Modeling SCOPE S/STYPE CONSTRAINTS User Narratives NOUNS OBJECT Instances OBJECT TYPES SENTENCES “FACTS” VERBS SUB/SUPER TYPES OBJECT NAMES Relationship NAMES RELATIONSHIP TYPES CONSTRAINTS defined after element introduced © Gordon C. Everest, All rights reserved. RESOLVE to Tables DEPENDENCY Physical Modeling DISTRIBUTION DATA TYPES Object IDENTIFIERS COLUMN NAMES ATTRIBUTES ROLE NAMES MULTIPLICITY C PARTITIONING OBJECT Population CONSTRAINTS Relationship & Role Set CONSTRAINTS ER/Relational Modeling BINARY only CLUSTER ENTITY Records INDEXES KEYS FOREIGN KEYS ( 1NF ) Relational TABLES DENORMALIZE CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 24 Observations on the Diagram 47 CvLvP • Showing the precedence ordering of the introduction of data modeling elements • At some point we have all the exogenous semantic information needed to complete the model. Up to that point our model is outward or user-facing • Anything introduced later is physical realization or implementation, i.e., inward facing. Stages: • Fact Modeling • >> controversial elements in between • ER / Relational Modeling • Physical Modeling • Implementation Modeling Start with User Narratives 48 CvLvP Begin with Statements from User Domain Experts (SMEs) within a defined, agreed upon scope; they are the primary source of knowledge about the world being modeled. Then find the Model Elements • Analyze the Vocabulary in the User Narratives – develop agreed upon definitions • Breakdown user narratives => into elementary fact sentences • Extract the nouns • Extract the verbs => become Things or Objects => become Relationships • Extract other words/phrases => become Constraints © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 25 Presentation to DAMA, Minnesota, 2016/06 Verbalize User Descriptions noun verb constraint ORMODLG 49 GIVEN A DESCRIPTION FROM THE USER(S): Famous Foods, a small, specialty food wholesaler, fills orders for restaurants. Customers have names, addresses, etc. An order can include several products. Products have unique SKU numbers, descriptions, manufacturer, etc. The company has one big warehouse with many rooms on several floors. Each product is stored in only one bin location in the warehouse, but it can change frequently. Multiple products may be stored in the same bin. Bin numbers are only unique within a room, hence the same number can be used in different rooms. Since the bin locations can be hard to find in a room (could be on a shelf, on the floor, in a cabinet or cooler, hanging from the ceiling, etc.), and the rooms can be hard to find in the warehouse (with many hallways, doors, tunnels, split levels, mezzanines, etc.), explicit location directions must be recorded for each room and for each bin in the room. Location information is a textual narrative and is used by the pickers who run around gathering the items to fill an order. Each product has its own standard price but it may be modified by applying a discount (a fraction) on any individual order. The discount can be different for each of the products on an order, and for the same product on different orders. The quantity of each product on an order is recorded ( it is not the quantity on hand or in inventory). Terms indicates the number of days during which a standard discount can be taken on the payment. The terms can vary from one customer to the next, and from one order to the next for the same customer. Establishing the Vocabulary 50 CvLvP Before we can develop a data model we must first carefully define our terms so we can talk about it • From a business perspective • By the user domain or subject matter experts – Listen to what they say, talking about the domain • Initially will be fuzzy, with areas of disagreement – Requiring some discussion and negotiation to come to a common understanding; and documenting that – The most difficult and important aspect of data modeling Call it a business glossary or [data] dictionary? However, a glossary is usually only for nouns (objects). We also need to define the relationships - the mortar that holds the bricks (nouns) together… and constraints. N © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 26 Objects 51 CvLvP • Encompass Entities, and Attributes… independent of entities described • Derived from the nouns in the user narratives • A single instance (of what population?) … or a population of individual instances – e.g., given the noun ‘George’: until it is associated with a particular, defined population, it is just a string of characters • Each Object Population – uniquely named • Define the population, criteria for inclusion of members, how we know we have one, what’s not included, etc. e.g., does Employee include retired, suspended, laid off, contract, visitor, temp • Not concerned (yet) with how the members are represented – identifiers - surrogate lexical encoding. e.g. Days of the Week, 7 members, one represented by – Tuesday, Tues, Tue, Tu, Mardi, Martes, … N Objects – 2 52 CvLvP • Grouping individual instances into populations does not occur naturally in the real world. The designer chooses to include members based on some common characteristics for some purpose(s) • By convention we name object type populations with a singular noun, makes it easier to build sentences • NOTE: in general, individual object populations could be overlapping, i.e., an individual could be a member of multiple populations e.g., an Employee could also be a Customer or a Shareholder This is handled using Subtype/Supertype constructs USER NARRATIVES in the Domain of Discourse => NOUNS => OBJECT instances => OBJECT TYPES => NAMES © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 27 Presentation to DAMA, Minnesota, 2016/06 Subtypes/Supertypes 53 CvLvP See Dataversity Webinar • Data modeling schemes assume Object Populations are strictly disjoint i.e., an individual member of an Object Population cannot be a member of any other Object Population • We know that is not always true e.g., Person can be Employee, Customer, and Shareholder If these are modeled as separate populations, redundancy results which can lead to inconsistent data. Maintaining consistent data becomes a user responsibility • S/Stype construct is used to formally represent overlapping populations. It only depends upon the nature of defined Object populations. • Supertype is a generalization of its Subtypes. Several constraints can be defined on S/Stypes. OBJECT TYPES => SUBTYPE/SUPERTYPES => S/Stype CONSTRAINTS Relationships 54 CvLvP • “Connection” between or among members of one or more object populations. Arity = number of Roles played by Objects participating in the Relationship, e.g., Unary, Binary, Ternary, etc. X (Binary) RELATIONSHIP TYPE: X A RELATIONSHIP Instances: All valid X-A pairs (in the R/W) A What are the characteristics of the relationship ‘X-A’ ? © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Presentation to DAMA, Minnesota, 2016/06 Page 28 Relationship Names and Object Roles 55 CvLvP • Naming all Relationships and Roles not necessary, can use the Object Names as a default: Relationship “X-Y” – But user narratives will include verb phrases to reference relationships making it easier to form sentences when talking about the user domain. − Except if there are multiple relationships on X-Y e.g., “Employee works in Dept” and “Employee heads Dept” in which case the Employee plays the role of Boss in the “heads” relationship. NOTE: role order matters, e.g.,binary has two readings − Except if the same Object type plays multiple roles e.g., “Person is parent of Person” then must name the relationship or distinguish the roles as Parent and Child • Object Role names are nouns within context of a relationship USER NARRATIVES in the Domain of Discourse => OBJECT TYPES => => VERBS => RELATIONSHIP TYPES => NAMES => ROLE NAMES Constraints on a Relationship 56 CvLvP • The defaults are the least constrained – Multiple - every object instance may participate more than once e.g., many-to-many (M:N) for a binary relationship – Optional - every object instance need not participate in the relationship • The Constraints would be: (the opposites) – Exclusive – at most one – Dependent (Mandatory, Required …) – at least one Many different notations, sometimes confusing. Combination called ‘Cardinality’ [min:max], a notational convenience RELATIONSHIP TYPES => EXCLUSIVITY Constraint => DEPENDENCY Constraint © Gordon C. Everest, All rights reserved. CvLvP - May 31, 2016 Stages of Data Modeling Page 29 Presentation to DAMA, Minnesota, 2016/06 What is an Attribute? Ω ORMvER 57 CvLvP An ATTRIBUTE is … of what? an OBJECT... playing a ROLE in a RELATIONSHIP with some (other) OBJECT. What comes first? RELATIONSHIPS => MULTIPLICITY => CLUSTERING => => ENTITY records => ATTRIBUTES N Data Modeling Constructs DMOD 58 What to look for: ENTITY (Object) DOMAIN IDENTIFIER RELATIONSHIP [ FOREIGN KEY ] characteristics: ATTRIBUTE (Data Item) A Day of the Week: characteristics What’s the difference? N © Gordon C. Everest, All rights reserved. Tuesday, Tues, Tu, Mardi, Martes... CvLvP - May 31, 2016 Stages of Data Modeling Page 30 Presentation to DAMA, Minnesota, 2016/06 Stages of Data Modeling: Introducing Data Elements or Constraints 59 A CvLvP B ? Fact Modeling SCOPE S/STYPE CONSTRAINTS User Narratives NOUNS OBJECT Instances OBJECT TYPES SENTENCES “FACTS” VERBS SUB/SUPER TYPES OBJECT NAMES CONSTRAINTS defined after element introduced RESOLVE to Tables DEPENDENCY DISTRIBUTION COLUMN NAMES ATTRIBUTES ROLE NAMES MULTIPLICITY Physical Modeling DATA TYPES Object IDENTIFIERS Relationship NAMES C PARTITIONING OBJECT Population CONSTRAINTS Relationship & Role Set CONSTRAINTS RELATIONSHIP TYPES ER/Relational Modeling BINARY only CLUSTER ENTITY Records INDEXES KEYS FOREIGN KEYS ( 1NF ) Relational TABLES DENORMALIZE Conceptual vs. Logical vs. Physical GETITLE 60 Data Models Questions? ©Gordon C. Everest Professor Emeritus Carlson School of Management University of Minnesota [email protected] © Gordon C. Everest, All rights reserved. http://geverest.umn.edu