Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overview of Database Design By Nazife Dimililer Database Management System A DBMS is a data storage and retrieval system which permits data to be stored nonredundantly while making it appear to the user as if the data is well-integrated. Database Management System Application #1 Application #2 Application #3 DBMS DBMS manages data resources like an operating system manages hardware resources Database containing centralized shared data Advantages of Database Approach • Program-Data Independence – Metadata stored in DBMS, so applications don’t need to worry about data formats – Data queries/updates managed by DBMS so programs don’t need to process data access routines – Results in: increased application development and maintenance productivity • Minimal Data Redundancy – Leads to increased data integrity/consistency Advantages of Database Approach • Improved Data Sharing – Different users get different views of the data • Enforcement of Standards – All data access is done in the same way • Improved Data Quality – Constraints, data validation rules • Better Data Accessibility/ Responsiveness – Use of standard data query language (SQL) • Security, Backup/Recovery, Concurrency – Disaster recovery is easier Costs and Risks of the Database Approach • Up-front costs: – Installation Management Cost and Complexity – Conversion Costs • Ongoing Costs – Requires New, Specialized Personnel – Need for Explicit Backup and Recovery • Organizational Conflict – Old habits die hard The Range of Database Applications • Personal Database – standalone desktop database • Workgroup Database – local area network (<25 users) • Department Database – local area network (25-100 users) • Enterprise Database – wide-area network (hundreds or thousands of users) Evolution of DB Systems • • • • • • • • Flat files - 1960s - 1980s Hierarchical – 1970s - 1990s Network – 1970s - 1990s Relational – 1980s - present Object-oriented – 1990s - present Object-relational – 1990s - present Data warehousing – 1980s - present Web-enabled – 1990s - present Database Design Phases • Conceptual Design Model the data without any physical considerations for each user view. • Logical Design Choose the data model that will be used and modify the conceptual data model to fit the data model without any other physical considerations. Validate the model using normalization and transaction requirements. • Physical Design Choose the actual DBMS and implement the data model efficiently. Performance, security and reliability are key issues. Systems Development Life Cycle Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Implementation Maintenance Systems Development Life Cycle Project Identification and Selection Purpose --preliminary understanding Deliverable –request for project Project Initiation and Planning Analysis Database activity – enterprise modeling Logical Design First step in database development Specifies scope and general content Overall picture of organizational data, not specific design Entity-relationship diagram Descriptions of entity types Relationships between entities Business rules Physical Design Implementation Maintenance Systems Development Life Cycle Project Identification and Selection Purpose – state business situation and solution Deliverable – request for analysis Project Initiation and Planning Analysis Logical Design Physical Design Database activity – conceptual data modeling Implementation Maintenance Systems Development Life Cycle Project Identification and Selection Purpose –thorough analysis Deliverable – functional system specifications Project Initiation and Planning Analysis Logical Design Physical Design Database activity – conceptual data modeling Implementation Maintenance Systems Development Life Cycle Project Identification and Selection Project Initiation and Planning Purpose –information requirements structure Deliverable – detailed design specifications Analysis Logical Design Physical Design Database activity – logical database design Implementation Maintenance Systems Development Life Cycle Purpose –develop technology specs Deliverable – program/data structures, technology purchases, organization redesigns Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Database activity – physical database design Implementation Maintenance Systems Development Life Cycle Purpose –programming, testing, training, installation, documenting Deliverable – operational programs, documentation, training materials Project Identification and Selection Project Initiation and Planning Analysis Logical Design Physical Design Database activity – database implementation Implementation Maintenance Systems Development Life Cycle Project Identification and Selection Purpose –monitor, repair, enhance Deliverable – periodic audits Project Initiation and Planning Analysis Logical Design Physical Design Database activity – database maintenance Implementation Maintenance Simplified Database Development Procedure Start Draw ERD Convert to Relational Schema Validate using Normalization Validate against user transactions Stop Documentation Entity Document Entity Name Description Aliases Occurrence Name of entity A short Description of entity Other names the users used to refer to this entity A common Situation where this entity can be found Instructor Employees teaching courses Lecturer, professor Instructors work in departments Documentation Relationship Document Entity Type Relationship Type Entity Type Cardinality Participation (Optionality) Name Of participating Entity : Entity A Name of Name of participating Entity : Entity B Cardinality from Entity A to Entity B 1:1 1:M M:1 Participation constraints on the relationship from Entity A to Entity B (Optionalities) Full (F) : Manadatory Entity (min>0) Partial (P) : Optional Entity (min=0) Instructor workFor Department M:1 P:F Relationship Documentation Attribute Document Entity Names of Attributes Description Data type and length Constraint Name of Entity List of all attributes of the entity Description of each attribute Data type of each attribute. It is possible to use domain names you have described in the domain document Primary , Unique and Secondary Key. (Secondary Keys are used to search for the entity) Student Student Id Uniquely identifies a student. 6 fixed character Primary Key Name Full name of student 50 variable character Secondary Index Gender Gender of student 1 fixed character Documentation Attribute Document Continued Names of Attributes Default Value Alias Null Value? (Yes or No) Derived? List of all attributes of the entity Default value for attributes Other names, the users used for the attribute Yes : Null values are allowed No: Null values are not allowed Yes: It is derived No: It is not derived Student Id No Name No Gender cgpa ‘F’ Sex Yes Cumulative grade Yes Yes Documentation Attribute Domain Document Domain Name Domain Characteristics Examples of allowed values Name of Domain for attributes Description of domain Illustrative examples Cgpa domain 3 digit floating point between 0.00 and 4.00 3.33, 4.00 Gender 1 character string (‘F’ or ‘M’) M, F APPLY NORMALIZATION If you normalized your database design, you can be more confident that there will be no redundancy in the final database. There are three basic checks that most relational database designers carry out. •1NF REMOVE REPEATING GROUPS •2NF REMOVE PARTIAL DEPENDENCIES •3NF REMOVE NON-KEY DEPENDENCIES USEFUL DESIGN PRINCIPLES 1-Faitfulness Entity sets, their attributes and relationships should reflect reality •Don’t attach pointless attributes. Employee Number-of-legs •Relationships between attributes depend on the policy of organization. employee employee works works department department USEFUL DESIGN PRINCIPLES 2- Avoid Redundancy id Student name Advisor advisorname name 3- Simplicity counts Avoid introducing more elements into your design Than is absolutely necessary. id Student name kindergarten highschool id USEFUL DESIGN PRINCIPLES 4- Choosing the right relationship Adding to our design every possible relationship is not often a good idea. • It can cause to redundancy. •Resulting database could require more space to store redundant elements. •Modifying the database could become more complex. USEFUL DESIGN PRINCIPLES 5- Picking the right kind of element. title studioname movie studioaddress year movie title year Repeat studio name and address for each movie If studio doesn't have any movie, we lost the address of the movie studio filmedBy studioname studioaddress USEFUL DESIGN PRINCIPLES 5- Picking the right kind of element. Example 1: Using Multiway relationship Example 1: If entity set has only one attribute and relationship is 1:N relationship. f E E K F e e K e k k K k e f k F K Helpful pointers • Transform “complex” attributes to entities. ServiceRecord ServiceRecord ServiceDate EquipmentNo EmployeeName Description ServiceDate EquipmentNo EmployeeNo Description Employee performedBy EmployeeId FirstName LastName Hiredate Helpful pointers • Use lookup entities(tables) for frequently used data. s tude nt Id Name Country Em ploye e Id Name Salary Country student Id Name CountryCode Em ployee Id Name Salary CountryCode sresident country code Name eresident Helpful pointers • Split compound attributes custom er custom er customerId Name Address customerId Name StreetAddress City Country Helpful pointers • Transform weak entities to strong entities Invoice InvoiceNo InvoiceDate CustomerId payments Invoice InvoiceNo InvoiceDate CustomerId payments Ins tallm e nt Ins tallm e nt InvoiceNo InstallmentDate Amount InstallmentId InvoiceNo InstallmentDate Amount Helpful pointers • Add History Instructor id Name Instructor id Name Title TitleChanges InstructorTitle ChangeDate Title Some helpful pointers • Use consistent naming rules for all entities,relationships and attributes • Choose primary keys intelligently. Primary keys should NOT change over time. • Choose appropriate data types for attributes Intelligent vs Surrogate Keys • A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key aka intelligent key. • "Surrogate key" may also be known as "System-generated key", "Database Sequence number", "Synthetic key", Data Models unique that useidentifier". surrogate keys "Technical key" or an "Arbitrary, have more normalization • primary keys are hardusually to change. errors.because not only are • Intelligent keys suffer from this problem they used as primary and foreign keys but they also have some business meaning associated with them • The biggest advantage for intelligent keys is that users understand what they mean whereas surrogate keys don't make any business sense. Surrogate vs. Intelligent Keys Natural keys: • are more logical • can sometimes can mean fewer joins • help to encourage good modeling • are traditional/user friendly • make snooping around in the data easier Surrogate keys: • are shorter • are easier to join • take less storage • enable natural key fields to be easily changed • are what Object Oriented (and object relational) databases use Some helpful pointers : Physical Database Design • Purpose - translate the logical description of data into the technical specifications for storing and retrieving data • Goal - create a design for storing data that will provide adequate performance and insure database integrity, security and recoverability Some helpful pointers : Physical Design Process Inputs Normalized Volume Decisions relations Attribute data types estimates Physical record descriptions Attribute definitions Response time Data expectations security needs Backup/recovery needs Integrity expectations DBMS (doesn’t always match logical design) technology used Leads to File organizations Indexes and database architectures Query optimization Some helpful pointers : Designing Fields • Field: smallest unit of data in database • Field design –Choosing data type –Coding, compression, encryption –Controlling data integrity Some helpful pointers : Field Data Integrity • • • • Default value - assumed value if no explicit value Range control – allowable value limitations (constraints or validation rules) Null value control – allowing or prohibiting empty fields Referential integrity – range control (and null value allowances) for foreign-key to primary-key matchups Some Helpful Pointers : Denormalization • Transforming normalized relations into unnormalized physical record specifications • Benefits: – Can improve performance (speed) be reducing number of table lookups (i.e reduce number of necessary join queries) • Costs (due to data duplication) – Wasted storage space – Data integrity/consistency threats • Common denormalization opportunities – One-to-one relationship – Many-to-many relationship with attributes – Reference data (1:N relationship where 1-side has data not used in any other relationship) Common Design problems • • • • Misplaced relationships Incorrect Cardinalities Missing Relationships Overuse of specialized data modeling tools (ex: Inheritance, multiway relationships) • Redundant Relationships Goals of Database Development • • • • Develop a Common Vocabulary Define the meaning of Data Ensure Data Quality Find an Efficient Implementation Final Word • Remember that the goal of the DB development is to produce a DB that provides an important resource for an organization. • The DB should be designed so that it can serve the customers and other team members efficiently.