Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Modeling Overview By: Dave Wentzel What we will accomplish Review of DBMS Issues related to DBMS Entity Relationship Modeling – Process flow – Model types – Component definition Selecting entities and attributes Defining relationships What we will accomplish Defining Cardinality Selecting Primary Keys Review of recursive relationships, weak entities, and ternary relationships Participation constraints Erwin Notation NULL issues The Physical Model What we will accomplish Generalization / Specialization Transaction processing Normalization Rules History issues What is data? Data – Raw facts. Can be described, observed, and measured. Information – Data organized in a form that is useful for decision making. The meaning behind the data. – New thing not previously observed that is created based on the data. Knowledge – Information that is used for decision making. What is a Database? Collection Database of interrelated data Data which can be visualized in a table format Contains relationships between data Can be of any size and varying complexity Can be maintained manually or by computer Data Base Management System (DBMS) Collection of programs (software) that allows users to create and maintain a database Supports data: – Definition - specification of data types, structures, and constraints – Construction - storing of the data itself – Manipulation - updating & querying of the data Defines itself. Contains a catalog which describes its data. Components of a DBMS Catalog – Maintains information about the data in the database – Considered data about data (metadata) Databases – Collection of related tables Tables – Rows and columns containing data Issues in DBMS Data independence Query optimization – Improve efficiency – Faster responses Transaction management – Sequence of operations that are treated as a unit – Once 1st step is completed, 2nd step must also be completed otherwise 1st step is aborted (ROLLBACK mechanism) Example: Transferring Bank Funds Issues in DBMS continued Transaction management – Concurrency – Recovery Controlled redundancy – Goal of database design is to minimize redundancy (duplicate data) Integrity constraints – Includes business rules and data rules Issues in DBMS continued Security and privacy – Protect against unauthorized access Data / database administration – Involves managing people, data, performance, security, etc. Entity Relationship Modeling Employee Person Account Transaction Data Model Tool for describing data, its relationships, semantics, and integrity constraints Provides for data abstraction Hides details of data storage Why use an ER Model? Easy to use for modeling DB design Succinct representation of database layout Good communication tool among project team members Most case tools support ER modeling Implementation independent Categories of Data Models Logical model – Conceptual data model – High level model – Closest view user has of the data Physical model – Low level model – Defines how data is stored Steps in Database Design Mini World Functional Requirements Requirements Collection and Analysis Functional Analysis Database Requirements Logical Model High Level Transaction Requirement DBMS Independent Data Model Mapping DBMS Specific API Application Program Design Physical Design Transaction Implementation Internal Schema Application Programs ER Modeling composed of Entity (table) Attribute (field) Relationship – Binary Relationships – Cardinality of relationships What is an entity? Conceptual definition – Distinguishable object that exists Operational definition – Business object that has properties we are interested in storing Physical definition – Set of related data forming a table composed of attributes (fields) Entities Primary THINGS of a business about which users need to record data Objects about which the business is interested in tracking information When an ER Diagram is translated into a relational model, the entities become the tables. Selecting Entities Nouns are candidate entities Possible classes of entities: – People who carry out some function ( employees, students, customers) – Places (cities, offices, routes) – Things which are tangible physical objects (equipment, products, buildings) – Organizations (teams, suppliers, departments) Selecting Entities Continued Events which occur at a given date/time or have steps (employee promotions, project phases, account payments) Concepts which are intangible ideas used to keep track of business activities (projects, accounts, complaints) Questions to ask... What things do we need to keep data about? What things are essential to the organization? What things do we talk about in the organization? What questions do we have that reports can help answer? What information should the reports contain? Naming entities Use a SINGULAR noun Meaningful but intuitive Avoid names which may be misinterpreted within the problem domain Follow organizational / industry trends Do not try to rename entities within an organization Avoid abused names such as Task, Form, Operation, Schedule... Is it an entity to worry about? Decide if an entity is relevant to your problem domain by determining if it has attributes you need to track If it does not have attributes you need to track, it is NOT a valid entity for your problem Is it really an entity? Can you define attributes for it? An attribute is a piece of information that we are interested in tracking about an entity. It is a property of an entity. In general, if two objects differ by one attribute, they are separate entities. Does it participate in a relationship? Two entities that are related somehow interact with one another. Attributes Properties of an object (entity) Each attribute has a data type (char, int, datetime) Each attribute in an RDBMS (relational database management system) has only one value at a time (atomic) Categories of Attributes Descriptive – Property of the entity that helps describe the entity Identifying (key attributes) – Property of the entity that helps uniquely identify the entity – Normally short – If one does not exist it MUST be created – If creating a key, use a numeric/integer data type Types of Attributes Atomic – Indivisible value – Most desired state Composite – Can be divided into smaller parts – Need to convert into atomic Types of Attributes Continued Multi-valued – Multiple instances of an attribute – Normally create another entity Derived – Can be determined by the value of another attribute or attributes – In most cases, do NOT store derived attributes Naming Attributes Use a noun, adjective, or adverb Name should be unique database wide Use attribute names consistently Use singular names Define a naming convention for the organization Rules for Entity Analysis Every noun is a candidate for an entity Every entity should be relevant to the problem If an object has only one property of importance, then it should be considered an attribute of another entity If an object has only one data instance (1 row) then do not model as an entity If an object needs a unique identifier then model it as an entity Relationships Way entities interact with one another An association between two or more entities Depicts business interactions between entities They DO NOT represent business flow Relationships Continued Number of entities associated through a relationship defines its degree (unary, binary, ternary, n-ary) Cardinality defines the maximum number of entities that can participate in the relationship How to Identify a Relationship Ask what is the action or verb used to describe how one entity interacts with another Three types of relations to consider: – Existence (Employee HAS Children) – Functional (Professor TEACHES Course) – Event (Customer PLACES Order) Ignore verbs not important to the organization More on Relationships Relationships and cardinality constraints represent business rules When naming a relationship use and active verb in the present tense Relationships are read bi-directionally Example notes: Together the customer and account tables form a schema - structure / layout of a logical database design Note the attributes. Order DOES NOT MATTER but convention puts primary key first. No duplicates for attributes. No duplicate tuples (rows) Relationship - same attribute name ( or different attribute name with same meaning, in 2 tables. Cardinality Constraints Express the MAXIMUM number of entities that can be associated with another entity via a relationship Also known as mapping constraints Types: – 1:1 (one to one) – 1:N (one to many) – N:M (many to many) The Key to It All Identifiers... Attribute(s) which uniquely identify a record An entity may have multiple identifiers Every entity MUST have at least one Can be made up of more then one attribute Candidate vs. Primary Keys Both are identifiers Candidate keys are all the identifiers from which you can choose which uniquely identify the record Primary key is the one candidate key which is selected to always uniquely identify the record Selecting the Primary Key In general we create a primary key however... Choose the attribute most widely used in the query Select the shorter data type If one does not exist, must create one Select a MINIMUM key if using compound attributes (not recommended) Key Requirements and Preferences Known at all times Can NOT be null Should not be changed Shorter is better Numeric / integer is better Avoid keys containing letters O, I, Z, S - can be confused with numbers If key includes time, it should be in 24hr format Avoid carrying meaning With this all said... It is difficult to come up with a primary key based on real attributes which will not change over time (phone numbers, SSN, addresses, driver’s license numbers…) In most cases it is best to create the primary key In SQL Server can use the identity column which creates a sequential number Primary Keys and Relationships In a 1:1 relationship, the primary key of either one of the entities must migrate to the other entity In a 1:N, the primary key of the 1 side must migrate to the entity on the N side In a M:N, the keys of both entities are used to identify a new entity which resolves the M:N into two 1:N relationships Foreign Key When a key migrates to another entity it is called a Foreign Key A foreign key CAN BE null if it is not part of an entity’s primary key If the FK value is NOT null, then that value MUST exist in the table in which it is the primary key. This is called Referential Integrity (RI) Recursive Relationships An entity having a relationship with itself Same entity participates more than once in a relationship type in different roles Same cardinality examples exist in recursive relationships Weak Entity Type Entity that does not have a key attribute of its own Identified by its relationship with another entity Created for multi-valued attributes and time dependent attributes Weak entity has EXISTENCE dependence on the parent. Only exists if the owner entity exists. Primary Keys of Weak Entities Can use the primary key of the owner entity along with a qualifier such as sequence number or date/time Can create a surrogate key but make sure you migrate the key of the parent Ternary Relationship Relationship between 3 entities Differs from 3 binary relationships States that all three entities occur at the same time Must be converted to binary relationships Creating Binary Relationships from a Ternary Relationship Participation Constraints Specifies whether the existence of an entity depends on its being related to another entity via a relationship Notes the minimum cardinality Total participation (mandatory) Partial participation (optional) Identifying Participation Constraints Can entity A exist without entity B? – If no, A has total participation in the relationship – If yes, entity A has partial participation in the relationship Identifying Relationships In Erwin An identifying relationship is a relationship between two tables in which an instance of a child table is identified through its association with a parent table, which means the child table is dependent on the parent table for its identity, and cannot exist without it. In an identifying relationship, one instance of the parent table is related to multiple instances of the child. Non-Identifying Relationship In Erwin A non-identifying relationship is a relationship between two tables in which an instance of the child table is not identified through its association with a parent table, which means the child table is not dependent on the parent table for its identity, and can exist without it. In a non-identifying relationship, one instance of the parent table is related to multiple instances of the child. Optional Non-Identifying In an optional non-identifying relationship, the columns that are migrated into the nonkey area of the child table are not required in the child table. This means that nulls are allowed in the foreign key. ERwin draws an optional non-identifying relationship differently depending on the notation for your diagram Mandatory Non-Identifying In a mandatory non-identifying relationship, the columns that are migrated into the non-key area of the child table are required in the child table. This means that the foreign key cannot be null. Erwin Notation Cardinality Description One to 0, 1, or M Identifying Non-Identifying Nulls No Nulls To Null or Not to Null…. NULL means no value Two types of null values – Unknown – None (does not exist or not applicable) Null Examples Employee e# 1 2 3 4 name Bob Jack Mary Kelly salary 10,000 20,000 30,000 NULL Questions: • How many people make more than 15K? • What is the average salary? • Is Mary married? spouse Mary Kate NULL John Problems with NULL Null values are ambiguous More programming is required to deal with NULL values Try to use UNKNOWN or NONE if applicable Getting Physical… Getting Physical… Converting the logical data model into the physical data model Things to do when converting Identify data type – Is it a string (character field) or a number? – Use of varchar() or char()? – Dates are dates not strings Identify data length – Consider growth over time and maximum size requirements Identify value constraints (valid ranges, values, etc.) Things to do when converting Follow proper naming conventions Determine indexes Consider combining 1:1 relationship entities Roll-up generalization / specialization hierarchies Add organizational attributes if any Indexes Index is a physical access structure Makes queries more efficient Things to consider when creating – Create an index for each PK – Create an index for each FK – Create an index for each AK which will be used in queries – Try to minimize number of indexes (update overhead) Specialization / Generalization Specialization / Generalization Inheritance / Abstraction Subclasses / Superclasses Specialization / Generalization Two processes resulting in the same model Specialization is top-down approach. Can a high level entity be broken down? Generalization is bottom-up approach. Can entities be combined at a higher level? Example Notes on Generalization/Specialization Key of subclass is always key of superclass Subclasses can participate in their own relationships Participation in a subclass can either be inclusive or exclusive Exclusive subclasses should be defined by a type Multiple inheritance not allowed in most modeling tools When converting to physical could combine into one entity Database Operations CRUD – – – – Create (Insert) Read Update (Modify) Delete Transactions can not violate any integrity constraints Several may be grouped into a transaction May propagate to maintain integrity constraints If update violations occur Cancel the operation (Restrict) Perform additional updates / deletes so the violation is corrected (Cascade) Execute a user specified operation to correct (Trigger) Perform the operation but inform the user Normalization - What’s normal... Normalization Process to design a highly desirable relational schema using functional dependencies Guidelines for relational database design which – – – – Minimize redundancy Avoid potential inconsistency Help predict data behavior problems Avoid update anomalies Update Anomalies Insert extra values Add redundant records Delete records not intended Change a fact more then once, possibly in multiple tables Miss changing a fact which is repeated multiple times Normal Forms # of Tables Joins First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fourth Normal Form Fifth Normal Form First Normal Form A relation is in 1NF if it contains only scalar (atomic) values – – – – One value for an attribute No repeating groups No composite attributes No multi-valued attributes To convert to 1NF – Create 1 table for each repeating group by adding the PK of the original table – Remove the repeating group from the original table Example of Non-1NF w/ Conversion Non-1NF Dname Dnumber DM GRSSN Research 5 333445555 Administration 4 987654321 Headquarters 1 888665555 Dlocations {Bellaire, Sugarland, Houston} Stafford, Voorhees Houston 1NF (note redundancy) Dname Dnumber DM GRSSN Research 5 333445555 Research 5 333445555 Research 5 333445555 Administration 4 987654321 Administration 4 987654321 Headquarters 1 888665555 Dlocations Bellaire Sugarland Houston Stafford Voorhees Houston Example of Non-1NF EmployeeProject - NON-1NF SSN Ename 123456789 Smith, John Pnumber Hours 1 2 666885555 Narayan, Ramesh 3 453223344 English, Joyce 1 2 32.5 7.5 40 20 20 Conversion SSN Ename 123456789 Smith, John 666885555 Narayan, Ramesh 453223344 English, Joyce SSN Pnumber Hours 123456789 1 32.5 123456789 2 7.5 666885555 3 40 453223344 1 20 453223344 2 20 Second Normal Form All attributes in the relation have a functional dependency on the complete PK Each non-key attribute is uniquely defined by all components of the primary key Example of Non-2NF w/ Conversion EmployeeProject SSN Pnumber Hours Ename Pname Plocation FD1 FD2 FD3 Conversion to 2NF EP1 SSN Pnumber Hours EP2 SSN Ename EP3 Pnumber Pname Plocation Third Normal Form Every non-key attribute (does not participate in the primary key) is mutually independent Irreducibly dependent on the primary key Example of Non-3NF w/ Conversion Ex a mple Lots Pro p e rtyID# Co untyNa me Lo t# Are a Price Lo ts1 Pro p e rtyID# Co untyNa me Lo t# Are a Price Lo ts2 Co untyNa me T a xRa te Lo t# Are a 2 NF 3 NF Lo ts1A Pro p e rtyID# Co untyNa me Lo ts1B Are a Price T a xRa te Maintaining History Maintaining History can serve one of two purposes: – Tracking changes in the entity over time – Tracking record history in order to maintain inactive records over time and maintain RI Tracking changes in an entity over time is very difficult and requires significant storage Tracking inactive records is our standard here and provides value to the end user Examples of History…