Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DBMS history and models NURS6803 Clinical DB Design Katherine Sward, PhD, RN 1 Models • Provide a high-level description of a system that is intended to be understandable by a variety of users – Physical models: how will we configure hardware/software – Conceptual models (DB architecture, data models): how will we configure the data structures within the DBMS • Main purposes of data models are – to assist in understanding the meaning (semantics) of the data – to facilitate communication about requirements 2 Terminology sidetrack • Data model, information model • Often used interchangeably, sometimes not • Sometimes used this way – Information model: abstract, conceptual, protocol neutral (ERD, ontologies) – Data model: concrete, include implementation and protocol-specific details (logical and physical design) 3 Goals of DB design • Regardless of hardware and architecture decisions, general goals typically include – – – – – Minimize duplication of data Minimize space Increase ease of change Maximize accessibility No loss of data or creation of spurious data (data integrity) – Maximize speed of data/information retrieval 4 History • History of DBMS is related to the development of different database architectures • Progressive but not exclusive series of events – – – – File based systems Hierarchical and Network models Relational model Others: EAV, Object/Object-relational 5 File based systems • We have previously discussed. Records defined by each individual application. Record structure tied tightly to application code (program-data dependence) • Still in use for some purposes 6 Flat file databases • A simple database model with one table • Each row represents one “instance” • Columns are attributes. If the attribute is multi-valued, use multiple columns • Still used. Common way to set up statistical datasets, simple research studies 7 Flat file example Name ID num Date of Admission Diagnosis 1 Diagnosis 2 Diagnosis 3 Smith, John 1234567 8/15/2009 abd pain fever vomiting Oyl, Olive 98765 8/15/2009 injury L leg ID num Admission abd pain fever Vomiting 1234567 8/15/2009 Yes Yes Yes 98765 8/15/2009 No No No Leg injury No Yes Benefits: easy, simple, familiar Drawbacks: similar to file-based system re duplication, potential for anomalies; hard to model a complex system 8 Hierarchical • Roots: 1960s moon-landing projects to handle large volume of data? – National American Aviation (now Rockwell International) developed software based on concept that components can be bundled together to form larger components until a final product is assembled – IBM joined NAA and expanded into Information Management System (IMS) • Envision as an upside down tree – Multiple tables related in a tree structure – Each child table has 1 parent 9 Hierarchical • Inverted Tree view Drugs Antibiotics Analgesics Narcotic OTC Aspirin Acetaminophen 10 Another way to visualize • Like folders on your computer Drugs Analgesics Narcotics OTC Aspirin Acetaminophen 11 Hierarchical • Benefits – Fast (Physical pointers explicitly link records) – Automatic integrity – Allows inheritance and inference good for decision support • Potential Problems: – May be difficult to model complex situations – Because each child has only 1 parent, data may be stored in redundant locations (For example, aspirin is an OTC analgesic … but it’s also an anti-inflammatory, and anti pyretic…) – Must travel through hierarchy to access target data— users need to understand the hierarchy 12 Network • Mid 1960’s, GE {Charles Bachman} developed Integrated Data Store (IDS) • Needed to model more complex systems than hierarchical model accommodated • Network model allowed multiple parents for 1 child. – Aspirin is an OTC analgesic AND an antiinflammatory AND an anti-thrombotic… 13 Meanwhile… • Conference on Data Systems Languages (CODASYL) – World wide group formed in 1967 to develop standard specifications for DB • Schema: overall organization of the DB as seen by the DBA (includes system tables, etc.) • Subschema: parts of the DB seen by applications (user tables) • Data management language(s) 14 Languages • DDL: Data definition language – Enables the DBA to manipulate structures • DML: Data Manipulation language – Enables users to manipulate content 15 Relational • 1970’s E.F. Codd (IBM) wrote landmark paper describing relational data model – Intended to address some of the limitations of earlier approaches – Based on sound mathematical principles – Quickly shown to be pragmatically feasible and practical – Led to development of SQL – the development language for relational DBMS 16 Relational • Any type of relationship can be modeled (sibling, aggregation/specialization, part-of, etc.) • Based on set theory • Uses LOGICAL pointers between tables instead of physical pointers. That makes it flexible and more easilitymodifiable 17 Relational • Benefits – Predictable Outputs – Flexible – Data independence (logical pointers). • Logical and physical design separate. • Add new fields - don’t need to reestablish pointers. • Can change gracefully as organizations evolve. Rest of design not changed when you change or delete 1 table – Multiple layers of data integrity 18 Relational • Drawbacks – Can be slower than other models – Consumes more disk space – Hard to determine how changes in structure impact performance • Still by far the most widely used architecture for databases • Basis for most large current DBMS 19 Object oriented, Object/relational • Objects encapsulate structure (attributes) & behavior (operations) • Very flexible, allows user-defined and complex data types Person object – might hold demographic info Decision object – info about a decision/alert (when it was made, type of decision, what person it concerned) Hypertonic saline decision object – extends decision with specifics includes the complex data type called saline lab values Saline lab values are a set of items (array) – modeled as inside this object rather than as separate item 20 OODB • The object-oriented database (OODB) paradigm is a combination of object-oriented programming and persistent data storage. It includes the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs http://www.unixspace.com/context/databases.html • For application programmers, this puts the application code and database code in the same environment • On the other hand, you can lose the advantage of clear division between the application code and the database 21 E-A-V • EAV (E-A-V) – Entity, attribute, value – In relational model, a row is a set of facts (attributes) – In EAV model, each attribute (fact) is separate row, containing • Entity (who the fact is about) • Attribute (what type of fact is it) • Value 22 Example Entity Attribute Patient Date Survey 1 8/1/2011 2 Value Question Response 5 A 23 EAV – benefits & uses • Extremely flexible, allows dynamic data structures • Good when the number of potential attributes for the entity is much greater than the number of attributes that apply to any given instance (lots of nulls) • Good for facts that have transitory existence • The basis of Web cookies , the Microsoft Windows Registry, and tagged data interchange formats such as ASN.1. • XML can be regarded as a form of EAV design that supports nesting of attributes to an arbitrary degree. • EAV models are important components of Electronic Patient Record Systems, such as the HELP system at Intermountain http://ycmi.med.yale.edu/nadkarni/eav_cr_contents.htm 24 EAV - disadvantages • Harder for people to understand • Can be complex when you need to link related items • Can be slow to query • Note: EAV can be transformed to relational tables. But…it can take some programming 25 Summary: DBMS advantages • • • • • Control redundancy Data consistency, integrity Data sharing, improved security Data independence Improved efficiency – common functions are in one location • Enforcement of standards • Balance conflicting requirements 26 Summary: DBMS disadvantages • Complexity • Size • Cost – Direct, hardware costs, costs of conversion • Performance • Higher impact of failure 27