* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Management Systems - Health Information Technology
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Database Management Systems Pierce College CIS260 • Course Description: Concepts and theory of relational database management systems (RDBMS) including the analysis and design of relational database systems. This is a project-based class. Entity Relationship modeling and advanced Microsoft Access techniques, in preparation for the Microsoft Office Specialist exam, will be covered. Course discussion and hands-on case studies in the healthcare industry with comparison to other industries, as applicable, provides practical knowledge and experience. • Course Content Outline/Topics: A. Database Management Systems B. Database Development C. Business rules and user requirements D. Entity Relationship Diagrams E. Normalization F. Database Design Patterns G. Validate and manage data H. Data Queries I. Database Forms, Views and Reports J. Database Security and Administration K. Software, Data and User Testing 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 2 Learning Objectives A1. Discuss database management systems and database administration. B1. Analyze, design, and create relational databases to meet industry and customer needs using current relational database management system software. C1. Identify the business rules and customer requirements to be included in the data model. D1. Differentiate and create conceptual data models, logical data models and physical data models. E1. Normalize relationships in tables. F1. Use database design patterns in data modeling. F2. Use modeling/diagramming software to model data. G1. Validate, import, convert, and export data from one application to another H1. Create data queries that sort, filter, manipulate and calculate data I1. Develop effective queries, forms, reports and custom user interfaces for databases. J1. Discuss ethics and security issues and regulations surrounding data and databases. K1. Test the integrity of the database design. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 3 CAHIMS Requirements • Case studies, critiques/peer reviews, diagrams/models, group or team discussions/debates, individual projects/assignments, online research, team projects, computer labs (CAHIMS 3.3, 5.2, 6.1, 6.2, 7.1, 7.3, 7.4, 9.4) 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 4 Module 1: Database Management Systems Lecture 1: Define and Discuss Databases and Database Management Systems (DBMS) 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 5 Database versus Database Management System(DBMS) • Database “a comprehensive collection of related data organized for convenient access, generally in a computer.”, ~Dictionary.com, “Database” • Database System aka Database Management System (DBMS) “a software system designed to allow the definition, creation, querying, update, and administration of databases.” ~Wikipedia.com, “Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 6 Variations of Database Management Systems • Relational Database and Relational Database Management System (RDBMS) • The most common type of DBMS. • “A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS). “ • “First defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. Codd's view of what qualifies as an RDBMS is summarized in Codd's 12 rules.” Wikipedia.com, “Relational Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 7 Variations of Database Management Systems • Object Relational Database Management System (ORDBMS) • “An object-relational database (ORD), or object-relational database management system (ORDBMS), is … similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. • An object-relational database can be said to provide a middle ground between relational databases and object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language;…” • Object Oriented Database Management System (OODBMS) • “…at the other extreme are OODBMSes in which the database is essentially a persistent object store for software written in an object-oriented programming language, with a programming API for storing and retrieving objects, and little or no specific support for querying.” http://en.wikipedia.org/wiki/ORDBMS, 3/26/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 8 Common RDBMS According to research company Gartner, the five leading commercial relational database vendors by revenue in 2011 were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%). According to Gartner, in 2008, the percentage of database sites using any given technology were (a given site may deploy multiple technologies): • Oracle Database - 70% • Microsoft SQL Server - 68% • MySQL (Oracle Corporation) - 50% • IBM DB2 - 39% • IBM Informix - 18% • Adaptive Server Enterprise (Sybase Corporation) - 15% • Sybase IQ - 14% • Teradata - 11% • Amazon Relational Database Service is a database as a service offering MySQL and Oracle database engines. ~ Source Wikipedia.com “RDBMS”, 3/22/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 9 Data Warehouse “A database designed to support decision making in an organization. Data from the production databases are copied to the data warehouse so that queries can be performed without disturbing the performance or the stability of the production systems.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse, 4/18/13 4/19/2013 5:00 PM https://upload.wikimedia.org/wikipedia/commons/4/46/Data_warehouse_overview.JPG, 4/18/2013 Pierce College - CIS260 Database Management Systems 10 Data Marts, Cubes and Mining Data Marts “Data warehouses can become enormous with hundreds of gigabytes of transactions. As a result, subsets, known as "data marts," are often created for just one department or product line.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse, 4/18/13 Data Cubes “An OLAP cube is an array of data understood in terms of its 0 or more dimensions…A cube can be considered a generalization of a three-dimensional spreadsheet.” http://en.wikipedia.org/wiki/OLAP_cube, 4/18/13 Data Mining “Data mining …is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.” http://en.wikipedia.org/wiki/Data_mining, 4/18/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 11 Business Intelligence/Business Analytics “Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting. Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering.” http://www.cio.com/article/40296/Business_Intelligence_Definition_and_Solutions, 4/18/13 “Thomas Davenport [an American academic and author specializing in analytics] argues that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI based on statistics, prediction, and optimization.” http://en.wikipedia.org/wiki/Business_intelligence#Business_intelligence_and_data_warehousing, 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 12 Big Data “Every minute, 48 hours of video are uploaded onto YouTube. 204 million e-mail messages are sent and 600 new websites generated. 600,000 pieces of content are shared on Facebook, and more than 100,000 tweets are sent. And that does not even begin to scratch the surface of data generation, which spans to sensors, medical records, corporate databases, and more.” http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/, 4/18/13 “By 2015, the average hospital will have two-thirds of a petabyte (665 terabytes) of patient data, 80% of which will be unstructured data like CT scans and X-rays.” http://www.forbes.com/sites/netapp/2013/04/17/healthcare-big-data/, 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 13 Module 2: Database Administration Lecture 1: Database Administration 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 14 Database Administration Wikipedia defines database administration as “ the function of managing and maintaining database management systems (DBMS) software” and lists the following database administrator (DBA) responsibilities: “DBA Responsibilities • • • • • • • • • • • • Installation, configuration and upgrading of Database server software and related products. Evaluate Database features and Database related products. Establish and maintain sound backup and recovery policies and procedures. Take care of the Database design and implementation. Implement and maintain database security (create and maintain users and roles, assign privileges). Database tuning and performance monitoring. Application tuning and performance monitoring. Setup and maintain documentation and standards. Plan growth and changes (capacity planning). Work as part of a team and provide 24x7 support when required. Do general technical troubleshooting and give cons. Database recovery.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 15 Module 2: Database Administration Lecture 2: Governance, Policies, and Procedures 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 16 Governance, Policies, and Procedures • To perform DBA responsibilities, there needs to be an understanding of expectations and consequences. • Who sets those expectations, who enforces those expectations, and who implements those expectations is the purpose behind governance, policies and procedures. Who is responsible for those expectations? EVERY stakeholder. • Even the DBA has specific roles for helping set these expectations as shown in the list of responsibilities: • “Establish and maintain sound backup and recovery policies and procedures. • Setup and maintain documentation and standards.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 17 Example: Why? Governance Our college will comply with FERPA regulations and protect the privacy of student education records. What? Policies Grades will not be posted. How? Procedures Student grades will be mailed to the student three days after final exams. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 18 Governance • Governance addresses all levels of the organization and answers the question “why are we setting these expectations?” • “IT governance enables an organization to attain three vital objectives: regulatory and legal compliance, operational excellence, and risk optimization.” [1] • “An IT governance framework should not exist in isolation from either the overarching corporate governance model or the ERM [enterprise risk management] model.”[2] [1], [2]“IT Excellence Starts with Governance”, An Ernst & Young White Paper, By Nick Robinson, Manager, Technology & Security Risk Services, http://www.technologyexecutivesclub.com/Articles/itgovernance/excellence.php 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 19 Policies and Procedures “A set of policies are principles, rules, and guidelines formulated or adopted by an organization to reach its long-term goals and typically published in a booklet or other form that is widely accessible. Policies and procedures are designed to influence and determine all major decisions and actions, and all activities take place within the boundaries set by them. Procedures are the specific methods employed to express policies in action in day-to-day operations of the organization. Together, policies and procedures ensure that a point of view held by the governing body of an organization is translated into steps that result in an outcome compatible with that view. “ Businessdictionary.com, http://www.businessdictionary.com/definition/policies-and-procedures.html, 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 20 Module 2: Database Administration Lecture 3: Database Career Paths and Certification 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 21 Healthcare Database Career Paths and Certification Career Paths • database administrator • database developer • database analyst • data analyst • business analyst • application analyst 4/19/2013 5:00 PM Certifications • HIMSS CAHIMS • Microsoft MTA • Microsoft MOS • Microsoft MCSA • CompTIA HIT Pierce College - CIS260 Database Management Systems 22 Module 2: Database Administration Lecture 4: CAHIMS 9.4 Staying Current with Technology and Industry 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 23 Module 3: Data Management Lecture 1: What is Data Management? 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 24 Data Management “Data management (DM) is the business function of planning for, controlling, and delivering data and information assets. This function includes • • • • The disciplines of development, execution and supervision of plans, policies, programs, projects, processes, practices and procedures that control, protect, deliver and enhance the value of data and information assets.“ “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 25 Data, Information, Knowledge The DAMA-DMBOK Guide defines • data as the “representation of facts as text, numbers, graphics, images, sound or video” • information as “data in context”. “This context includes …definition…format…timeframe…relevance” • knowledge as “information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience.” “We gain in knowledge when we understand the significance of information.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 26 Adding Value “The Business Value of Data What's the value of your organization's data? The ability of business and IT managers to answer that question directly correlates to the success of their company's business continuity and data recovery efforts.” “The Business Value of Data”, by Michael Croy, Forsythe Solutions Group, Inc., ,http://www.technologyexecutivesclub.com/Articles/management/artBusinessValueofData.php, 03/25/13 “Data has value only when it is actually used, or can be useful in the future. All data lifecycle stages have associated costs and risks, but only the “use” stage adds business value.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 3 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 27 Big Data “is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” “Examples include Big Science, web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.” http://en.wikipedia.org/wiki/Big_data, 3/26/13 “A recent McKinsey report found that value gained from data in the US health care sector alone could be more than US $300 billion every year. But traditional tools aren’t enough to manage these extremely large amounts of fast-changing information.” http://www.net-security.org/secworld.php?id=14594 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 28 Module 3: Data Management Lecture 2: Data Governance 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 29 Data Governance “Data Governance is "a system of decision rights and accountabilities for informationrelated processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.“” Datagovernance.org, http://www.datagovernance.com/adg_data_governance_basics.html, 03/18/13 “Data governance is an approach that public and private entities can use to organize one or more aspects of their data management efforts, including business intelligence (BI), data security and privacy, master data management (MDM), and data quality (DQ) management.” Microsoft Corporation, http://www.microsoft.com/en-us/download/details.aspx?id=10985, 03/18/13 “A Governance Plan typically includes your objectives, administration and maintenance, policies, support, stakeholder teams and other elements.” “Governance”, http://www.webopedia.com/TERM/V/validation.html , 03/15/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 30 Parts of Data Governance © DAMA International 2010 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 31 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 32 Module 3: Data Management Lecture 3: Data Validation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 33 Data Validation Validation (n.) Verification that something is correct or conforms to a certain standard. In data collection or data entry, it is the process of ensuring that the data that are entered fall within the accepted boundaries of the application collecting the data. “Validation”, http://www.webopedia.com/TERM/V/validation.html , 03/15/13 “…data validation is the process of ensuring that a program operates on clean, correct and useful data.” “Data validation”, http://en.wikipedia.org/wiki/Data_validation, 3/13/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 34 Module 3: Data Management Lecture 4: The Data Dictionary 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 35 Data Dictionary “Database about a database. A data dictionary defines the structure of the database itself (not that of the data held in the database) and is used in control and maintenance of large databases. Among other items of information, it records (1) (2) (3) (4) what data is stored, name, description, and characteristics of each data element, types of relationships between data elements, access rights and frequency of access. Also called system dictionary when used in the context of a system design.” “Data dictionary”, http://www.businessdictionary.com/definition/data-dictionary.html#ixzz2NuokM0Dn , 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 36 Reasons for a data dictionary • ““High-Availability and Disaster Recovery (HA/DR) or what is often called “Business Continuity.”” • “to determine authoritative sources” • “to gather the disparate elements and cull them into a single reporting entity” • “for systems optimization” • “No one group knows where all of the data is, which of it is authoritative, and which you should track. “ “Developing a Data Dictionary”, Apr 8, 2011, http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=382 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 37 Guidelines for Developing the Data Dictionary Recommended guidelines from the e-HIM Workgroup of the American Health Information Management Association (AHIMA), 2006; 1. Design a plan: Preplan the development, implementation, and maintenance of the data dictionary. 2. Develop an enterprise data dictionary; integrate common data elements across the entire institution to ensure consistency. 3. Ensure collaborative involvement: Make sure there is support from all key stakeholders. 4. Develop an approvals process: Ensure a documentation tail for all decision, updates, and maintenance. 5. Identify and retain details of data versions: Version control is important. 6. Design for flexibility and growth. 7. Design room for expansion of field values. 8. Follow established ISO/International Electro technical Commission (IEC) 11179 guidelines for metadata registry: to promote interoperability follow standards. 9. Adopt nationally recognized standards. 10. Beware of differing standards for the same concepts. 11. Use geographic codes and conform to the National Spatial Data Infrastructure and the Federal Geographic Data Committee. 12. Test the information system: Develop a test plan to ensure the system supports the data dictionary. 13. Provide ongoing education and training. 14. Assess the extend to which the data elements maintain consistency and avoid duplication. Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 38 Key Points in the Guidelines: Involve Stakeholders • Involve all stakeholders in the discussion of creating the data dictionary. • Stakeholders may include data creators, owners, users which will affect: -Departments (represented across either facility or enterprise) -Outside collaborating agencies/facilities -Public health agencies -Clinical providers including all specialties -HIM administrative support services -Reimbursement support service -Legal support services -IT support services Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 39 Key Points in Guidelines: Train Employees • Ongoing education of long term employees to assure compliance is an important tactic for maintaining quality • New employees should also receive appropriate training to involve them in assuring the quality of data being collected Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 40 Data Dictionary Examples and Best Practices California Department of Pesticide Regulation “Data Dictionary” http://www.cdpr.ca.gov/docs/enforce/residue/datadict.htm Northwest Environmental Data Network (NED) “Best practices for data dictionary definitions and usage” http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 41 Module 4: Stakeholders & Requirements Lecture 1: Business, Functional, and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 42 Define Stakeholders • Sponsor(s) – person(s) with decision and funding authority • Subject Matter Expert(s) – person(s) who know the content • End User(s)- person(s), or representatives of user groups, who will utilize or benefit from the new product or process • Support Staff – all other persons needed to implement and/or maintain the product or process, i.e. training, IT, HR 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 43 Documenting Requirements BUSINESS, FUNCTIONAL, AND TECHNICAL REQUIREMENTS • Variety of titles exist for these requirements documents, but the objectives are the same. • Business Requirements • Answers the question “how does this project fill an organizational/department need?” • High level. Used for management approval and resource allocation. • Functional Requirements • Answers the question “what do the users need?” • End User level. Used for capturing and verifying the users’ requirements. • Technical Requirements • Answers the question “what does the technology (hardware/software) need?” • Highly detailed. Used for programmers, developers, networking, DBA’s, testers, and all other needed IT staff. • For small deliverables, all three may be described in one document, for complex projects, may need many versions and levels. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 44 Requirements Tips and Templates • “Top 10 Writing good requirements tips”, http://www.requirementone.com/Blog/2012/01/15/Top-10-Writing-good-requirements-tips, 3/18/13 • “Business Requirements Document: A High-level Review”, http://www.isixsigma.com/implementation/project-selection-tracking/business-requirements-document-high-level-review/, 3/25/13 • Centers for Medicare & Medicaid Services, Requirements Document (template) http://www.hsd.state.nm.us/pdf/hcr/HIX/CMS%20Requirements%20Document.pdf, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 45 Module 4: Stakeholders & Requirements Lecture 2: CAHIMS 3.3 Business, User and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 46 Module 4: Stakeholders & Requirements Lecture 3: CAHIMS 9.1 Business Communication and Ethics 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 47 Module 4: Stakeholders & Requirements Lecture 4: CAHIMS 9.3 Professionalism and Customer Service 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 48 Module 5: Data Modeling Lecture 1: Data Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 49 Why Model? Earth House 4/19/2013 5:00 PM Molecule Pierce College - CIS260 Database Management Systems 50 Data Modeling “Communication and precision are the two key benefits that make a data model so important.” Data Modeling Made Simple: A Practical Guide for Business and IT Professionals, Technics Publications, LLC , 2nd Edition, Steve Hoberman, pg. 37 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 51 Transfer User Requirements into Data Models Three types of data models: Conceptual Model–simplest model; used to communicate and validate requirements with stakeholders • A Conceptual data model is the most abstract form of data model. It is helpful for communicating ideas to a wide range of stakeholders because of its simplicity. Therefore platform-specific information, such as data types, is omitted from a Conceptual data model.” http://www.sparxsystems.com/enterprise_architect_user_guide/9.3/domain_based_models/conceptual_data_model.html, 3/25/13 Logical Model-more complex model; follows formal database design rules • “Logical data modeling is the process of documenting the comprehensive business information requirements in an accurate and consistent format. “ “Data modeling”, http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm, 3/25/13 Physical Model-the most complex model; performance specific and vendor specific model • “The physical design of your database optimizes performance while ensuring data integrity by avoiding unnecessary data redundancies. During physical design, you transform the entities into tables, the instances into rows, and the attributes into columns.” “Physical database design”, http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 52 “In Summary The conceptual model is concerned with the real world view and understanding of data; the logical model is a generalized formal structure in the rules of information science; the physical model specifies how this will be executed in a particular DBMS instance.” http://www.aisintl.com/case/CDM-PDM.html, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 53 Data Modeling Standards • Several notation standards exist for data modeling • • • • • • • • • • Peter Chen’s Bachman notation Barker's Notation EXPRESS IDEF1X Martin notation (min, max)-notation of Jean-Raymond Abrial in 1974 UML class diagrams Merise Object-Role Modeling) http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model, 3/13/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 54 Module 5: Data Modeling Lecture 2: Entity Relationship Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 55 The Entity Relationship(ER) Model • The ER model shows information to be collected in the database (entity) and its relationship with other information collected. • Peter Chen, 1976, originator of the entity relationship model 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 56 Identify the “Entities” Identify the “entities” – the nouns – the title of the information being collected (patient, appointment, prescription, physician, etc.). 1. Draw a box for each entity and label with the entity name. 2. Label using the singular spelling of the noun and capitalize the noun.* Figure 1: The design of the "box" will depend on the software used to create it. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 57 Identify the “Relationships” Identify the “relationships” – the verbs; how one piece of data or information interacts/relates with another piece of information Draw a line between entities to show relationship. 1.Label the line with verbs that describe the relationship. 2.The first verb is for reading left to right; the second verb is for reading right to left. Figure 2: The Patient has an Appointment. The Appointment is for a Patient. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 58 Identify the “Cardinality” Identify the “cardinality” – the number of entities allowed in the relationship. 1.A single line touching an entity means “ONE”. 2.A line ending with three small lines, referred to as “crow’s feet”, means “Zero or more”. Once created, this can be set to a different minimum such as “One or more” or “Three or more”. Figure 3: Single point at line end means "ONE"; crow’s feet, means "many" or "one or more". 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 59 Identify the “Optional/Optionality” Identify “optional/optionality” – whether the relationship is required or not. 1. Microsoft VISIO uses the “O” to show “optional” as seen by the entity Appointment. 2. The “||” means one required. Figure 4: The Patient may have "one or more“ Appointments; the Appointment must have one Patient. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 60 Add the “Attributes” Add the “attributes” – descriptors of the entity. 1. Label the attributes as singular tense. 2. Don’t put spaces or symbols between words if more than one is needed for clarity. 3. Type or write as “camel case” – first letter of each word is upper case, all other letters are lower case. Figure 5: We identify the Patient by his/her name; we identify the Appointment by the date, time and Physician . 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 61 Add “Primary Key (PK)” and “Foreign Key (FK)” Add “primary key (PK)” – use an attribute of the entity, or create a new attribute, that uniquely identifies the entity. 1. A new attribute, primary key ID, is usually created for most entities because none of the attributes identified are guaranteed to always be “unique”. 2. Add “foreign key(FK)” –this is the Primary key of the parent table in a relationship. The PK of the parent (in this case “Patient”) is added to the child table (in this case “Appointment”) thereby becoming the FK. Figure 6: 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 62 Unique Identifiers (UID) • A unique identifier/unique ID/UID is a number or combination of numbers and letters that when used will only identify one entity or record. • Examples of ID’s we think uniquely identify us: • Driver’s license • Social Security Number • Telephone number • Why they might not be unique: • http://www.idanalytics.com/news-and-events/news-releases/2010/8-11-2010.php • CustomerID, or PatientID, or AccountNumber are examples of new identifiers created for the purpose of keeping the information unique. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 63 Read the Data Models Practice reading the diagrams. 1. Keep nouns singular when starting each sentence. 2. Read from left to right, then from right to left. The sentences must make sense in both directions. Noun A may/must have relationship(s) with some number of Noun B. ** Bike must be sold with one or more Wheel(s) *** A Wheel May Be sold with A Bike Wheel Bike is sold with / is part of 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 64 Module 5: Data Modeling Lecture 3: Introduction to Microsoft VISIO 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 65 Diagramming Software There are many ER diagramming tools. • Free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis • Proprietary ER diagramming tools • Free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#ER_diagramming_tools • Microsoft VISIO is a proprietary diagramming tool that is free to students in college programs covered under the Microsoft Academic license. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 66 Diagramming Software • Get to know VISIO http://office.microsoft.com/en-us/visio-help/get-to-know-visio-RZ001126777.aspx • Visio 2007 training courses http://office.microsoft.com/en-us/visio-help/visio-2007-training-courses-HA010214368.aspx?CTT=1 • Create a Database Model (also known as Entity Relationship diagram) http://office.microsoft.com/en-us/visio-help/create-a-database-model-also-known-as-entity-relationship-diagram-HA010115477.aspx • Make the switch to VISIO 2010 http://office.microsoft.com/en-us/access-help/download-office-2010-training-HA101901726.aspx?CTT=1 • Make the switch to VISIO 2013 http://office.microsoft.com/en-us/support/make-the-switch-to-visio-2013-RZ102925050.aspx?CTT=5&origin=HA104032123 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 67 Module 5: Data Modeling Lecture 4: Normalization 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 68 Normalization “Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.” http://databases.about.com/od/specificproducts/a/normalization.htm ”Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations.” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 69 Normalization Rules “There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications. “ http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 70 First Normal Form (1NF) “First Normal Form • Eliminate repeating groups in individual tables. • Create a separate table for each set of related data. • Identify each set of related data with a primary key. Do not use multiple fields in a single table to store similar data.” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 71 1NF Example Not 1NF 1NF Student Student StudentID Student Name CourseID SD12345 Sally Doe Sally Doe CIS260, CIS270, CIS280 JD23456 Jefferson Doe Jefferson Doe CIS260, ENG101 Multiple entries for one field is a violation of 1NF. Split into two tables using a student unique identifier to tie the information together. 4/19/2013 5:00 PM Course CourseID CourseTitle StudentID CIS260 Database SD12345 CIS270 Programming SD12345 CIS280 Analysis SD12345 CIS260 Database JD23456 ENG101 English JD23456 Pierce College - CIS260 Database Management Systems 72 Second Normal Form (2NF) “Second Normal Form • Create separate tables for sets of values that apply to multiple records. • Relate these tables with a foreign key. Records should not depend on anything other than a table's primary key (a compound key, if necessary).” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 73 2NF Example Not 2NF Student StudentID Student SD12345 Sally Doe JD23456 Jefferson Doe 2NF Each Student is unique; expanded Course table violates 2NF. Enrollment StudentID CourseID SD12345 CIS260 SD12345 CIS270 SD12345 CIS280 JD23456 CIS260 Course CourseID CourseTitle StudentID JD23456 ENG101 CIS260 Database SD12345 CIS270 Programming SD12345 Course CourseID CourseTitle CIS280 Analysis SD12345 CIS260 Database CIS260 Database JD23456 CIS270 Programming ENG101 English JD23456 CIS280 Analysis ENG101 English 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems Now each Enrollment is unique, and each Course is unique. 74 Third Normal Form (3NF) “Third Normal Form • Eliminate fields that do not depend on the key. Values in a record that are not part of that record's key do not belong in the table.” http://support.microsoft.com/kb/283878 • “A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd".” http://en.wikipedia.org/wiki/Third_normal_form 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 75 3NF Example Not 3NF Enrollment Enrollment EnrollmentID Expanded Enrollment table not 3NF because Instructor is dependent on Instructor ID, not CourseID, and the record not dependent on a unique record key. StudentID CourseID InstructorID Instructor SD12345 CIS260 FAC123 SD12345 CIS270 SD12345 3NF StudentID CourseID InstructorID 2013Fall001 SD12345 CIS260 FAC123 2013Fall002 SD12345 CIS270 FAC234 2013Fall003 SD12345 CIS280 FAC345 2013Fall004 JD23456 CIS260 FAC456 2013Fall005 JD23456 ENG101 FAC567 Schmidt Instructor InstructorID Instructor FAC234 Jones FAC123 Schmidt CIS280 FAC345 Nguyen FAC234 Jones JD23456 CIS260 FAC456 Doe FAC345 Nguyen JD23456 ENG101 FAC567 Chen FAC456 Doe FAC567 Chen 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 76 Finished Normalized Entities Course CourseID CourseTitle CIS260 Database CIS270 Programming CIS280 Analysis ENG101 English 4/19/2013 5:00 PM Each entity now holds unique data. Student StudentID Student SD12345 Sally Doe JD23456 Jefferson Doe Enrollment Instructor InstructorID Instructor EnrollmentID FAC123 Schmidt FAC234 StudentID CourseID InstructorID 2013Fall001 SD12345 CIS260 FAC123 Jones 2013Fall002 SD12345 CIS270 FAC234 FAC345 Nguyen 2013Fall003 SD12345 CIS280 FAC345 FAC456 Doe 2013Fall004 JD23456 CIS260 FAC456 FAC567 Chen 2013Fall005 JD23456 ENG101 FAC567 Pierce College - CIS260 Database Management Systems 77 Matching ER Diagram Student PK StudentID has / is of Student Enrollment Instructor PK has / is of PK PK,FK1 PK,FK3 EnrollmentID StudentID CourseID FK2 InstructorID Course has / is of PK CourseID CourseTitle InstructorID Instructor 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 78 Normalization Exception “From a relational model point of view, it is standard to have tables that are in Third Normal Form. Normalized physical design provides the greatest ease of maintenance, and databases in this form are clearly understood by developers. However, a fully normalized design may not always yield the best performance. Sybase recommends that you design databases for Third Normal Form, however, if performance issues arise, you may have to denormalize to solve them.” Copyright © 2003. Sybase Inc. All rights reserved. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/databases215.htm 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 79 Module 5: Lecture 5: Other Modeling Concepts 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 80 Many-to-Many Relationships “In a many-to-many relationship, a record in one table relates to multiple records in a second table, and a record in the second table relates to multiple records in the first table. This type of relationship requires a third table, called a junction table. The junction table contains the primary keys from the other two tables as its foreign keys.” http://office.microsoft.com/en-us/access-help/table-that-data-RZ006149432.aspx?section=26, 4/16/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 81 Junction/Intersection Tables 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 82 Lookup Tables Lookup tables are generally tables that contain rarely changing data and are considered the master or reference list.* An example could be a zip code table with associated city, state, county, country, latitude/longitude information. Many different tables and/or databases could use this same table for lookup and for presenting various combinations of the fields on different reports. • Lookup tables generally are single topic, used when the data rarely changes, when multiple applications could share the same information, to minimize the likelihood of typos, and to minimize storage for other tables needing to reference this table. • The relationship between a lookup table and another entity table is strictly a lookup relationship. 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 83 Lookup Table Examples OrderStatusID OrderStatus StateNamesID StateNameShort StateNameLong 1 Ordered 1 AL Alabama 2 Back Ordered 2 AK Alaska 3 Shipped 3 AZ Arizona 4 Cancelled 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 84 Recursion The Supervisor is also an Employee, so a lot of duplication of information. 4/19/2013 5:00 PM By adding the EmployeeID of the Supervisor to the field SupervisorID, the relationship is established with only one extra field needed. Pierce College - CIS260 Database Management Systems 85 Supertypes and Subtypes PATIENT *First Name *Last Name *Gender *Birthdate MALE FEMALE *Number of Pregnancies 4/19/2013 5:00 PM INFANT *Head Circumference Pierce College - CIS260 Database Management Systems 86 Module 5: Data Modeling Lecture 6: Data Patterns 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 87 Patterns/Templates 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 88 Data Patterns "Thus Universal Patterns for Data Modeling are reusable guides that provide a data modeling template for very prevalent or "universal" themes that occur in data modeling.“ The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling, Len Silverston, Paul Agnew, Wiley; 1 edition (January 9, 2009) , page 5 Watch Youtube video of Len Silverston, author of The Data Model Resource Book, http://www.youtube.com/watch?v=D5Rgpl6humE 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 89 Example Data Pattern – Electronic Medical Records Source: http://www.databaseanswers.org/data_models/electronic_medical_records/index.htm, 4/16/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 90