* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Physical pointers
Survey
Document related concepts
Transcript
DBMS history and models NURS6803 Clinical DB Design Katherine Sward, PhD, RN 1 Models • Provide a high-level description of a system that is intended to be understandable by a variety of users – Physical models: how will we configure hardware/software – Conceptual models (DB architecture, data models): how will we configure the data structures within the DBMS • Main purposes of data models are – to assist in understanding the meaning (semantics) of the data – to facilitate communication about requirements 2 Terminology sidetrack • Data model, information model • Often used interchangeably, sometimes not • Sometimes used this way – Information model: abstract, conceptual, protocol neutral (ERD, ontologies) – Data model: concrete, include implementation and protocol-specific details (logical and physical design) 3 Goals of DB design • Regardless of hardware and architecture decisions, general goals typically include – – – – – Minimize duplication of data Minimize space Increase ease of change Maximize accessibility No loss of data or creation of spurious data (data integrity) – Maximize speed of data/information retrieval 4 History • History of DBMS is related to the development of different database architectures • Progressive but not exclusive series of events – – – – File based systems Hierarchical and Network models Relational model Others: EAV, Object/Object-relational 5 File based systems • We have previously discussed. Records defined by each individual application. Record structure tied tightly to application code (program-data dependence) • Still in use for some purposes 6 Flat file databases • A simple database model with one table • Each row represents one “instance” • Columns are attributes. If the attribute is multi-valued, use multiple columns • Still used. Common way to set up statistical datasets, simple research studies 7 Flat file example Name ID num Date of Admission Diagnosis 1 Diagnosis 2 Diagnosis 3 Smith, John 1234567 8/15/2009 abd pain fever vomiting Oyl, Olive 98765 8/15/2009 injury L leg ID num Admission abd pain fever Vomiting 1234567 8/15/2009 Yes Yes Yes 98765 8/15/2009 No No No Leg injury No Yes Benefits: easy, simple, familiar Drawbacks: similar to file-based system re duplication, potential for anomalies; hard to model a complex system 8 Hierarchical • Roots: 1960s moon-landing projects to handle large volume of data? – National American Aviation (now Rockwell International) developed software based on concept that components can be bundled together to form larger components until a final product is assembled – IBM joined NAA and expanded into Information Management System (IMS) • Envision as an upside down tree – Multiple tables related in a tree structure – Each child table has 1 parent 9 Hierarchical • Inverted Tree view Drugs Antibiotics Analgesics Narcotic OTC Aspirin Acetaminophen 10 Another way to visualize • Like folders on your computer Drugs Analgesics Narcotics OTC Aspirin Acetaminophen 11 Hierarchical • Benefits – Fast (Physical pointers explicitly link records) – Automatic integrity – Allows inheritance and inference good for decision support • Potential Problems: – May be difficult to model complex situations – Because each child has only 1 parent, data may be stored in redundant locations (For example, aspirin is an OTC analgesic … but it’s also an anti-inflammatory, and anti pyretic…) – Must travel through hierarchy to access target data— users need to understand the hierarchy 12 Network • Mid 1960’s, GE {Charles Bachman} developed Integrated Data Store (IDS) • Needed to model more complex systems than hierarchical model accommodated • Network model allowed multiple parents for 1 child. – Aspirin is an OTC analgesic AND an antiinflammatory AND an anti-thrombotic… 13 Meanwhile… • Conference on Data Systems Languages (CODASYL) – World wide group formed in 1967 to develop standard specifications for DB • Schema: overall organization of the DB as seen by the DBA (includes system tables, etc.) • Subschema: parts of the DB seen by applications (user tables) • Data management language(s) 14 Languages • DDL: Data definition language – Enables the DBA to manipulate structures • DML: Data Manipulation language – Enables users to manipulate content 15 Relational • 1970’s E.F. Codd (IBM) wrote landmark paper describing relational data model – Intended to address some of the limitations of earlier approaches – Based on sound mathematical principles – Quickly shown to be pragmatically feasible and practical – Led to development of SQL – the development language for relational DBMS 16 Relational • Any type of relationship can be modeled (sibling, aggregation/specialization, part-of, etc.) • Based on set theory • Uses LOGICAL pointers between tables instead of physical pointers. That makes it flexible and more easilitymodifiable 17 Relational • Benefits – Predictable Outputs – Flexible – Data independence (logical pointers). • Logical and physical design separate. • Add new fields - don’t need to reestablish pointers. • Can change gracefully as organizations evolve. Rest of design not changed when you change or delete 1 table – Multiple layers of data integrity 18 Relational • Drawbacks – Can be slower than other models – Consumes more disk space – Hard to determine how changes in structure impact performance • Still by far the most widely used architecture for databases • Basis for most large current DBMS 19 Object oriented, Object/relational • Objects encapsulate structure (attributes) & behavior (operations) • Very flexible, allows user-defined and complex data types Person object – might hold demographic info Decision object – info about a decision/alert (when it was made, type of decision, what person it concerned) Hypertonic saline decision object – extends decision with specifics includes the complex data type called saline lab values Saline lab values are a set of items (array) – modeled as inside this object rather than as separate item 20 OODB • The object-oriented database (OODB) paradigm is a combination of object-oriented programming and persistent data storage. It includes the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs http://www.unixspace.com/context/databases.html • For application programmers, this puts the application code and database code in the same environment • On the other hand, you can lose the advantage of clear division between the application code and the database 21 E-A-V • EAV (E-A-V) – Entity, attribute, value – In relational model, a row is a set of facts (attributes) – In EAV model, each attribute (fact) is separate row, containing • Entity (who the fact is about) • Attribute (what type of fact is it) • Value 22 Example Entity Attribute Patient Date Survey 1 8/1/2011 2 Value Question Response 5 A 23 EAV – benefits & uses • Extremely flexible, allows dynamic data structures • Good when the number of potential attributes for the entity is much greater than the number of attributes that apply to any given instance (lots of nulls) • Good for facts that have transitory existence • The basis of Web cookies , the Microsoft Windows Registry, and tagged data interchange formats such as ASN.1. • XML can be regarded as a form of EAV design that supports nesting of attributes to an arbitrary degree. • EAV models are important components of Electronic Patient Record Systems, such as the HELP system at Intermountain http://ycmi.med.yale.edu/nadkarni/eav_cr_contents.htm 24 EAV - disadvantages • Harder for people to understand • Can be complex when you need to link related items • Can be slow to query • Note: EAV can be transformed to relational tables. But…it can take some programming 25 Summary: DBMS advantages • • • • • Control redundancy Data consistency, integrity Data sharing, improved security Data independence Improved efficiency – common functions are in one location • Enforcement of standards • Balance conflicting requirements 26 Summary: DBMS disadvantages • Complexity • Size • Cost – Direct, hardware costs, costs of conversion • Performance • Higher impact of failure 27