* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Physical pointers
Survey
Document related concepts
Transcript
DBMS history and models
NURS6803
Clinical DB Design
Katherine Sward, PhD, RN
1
Models
• Provide a high-level description of a system that is
intended to be understandable by a variety of users
– Physical models: how will we configure
hardware/software
– Conceptual models (DB architecture, data models): how
will we configure the data structures within the DBMS
• Main purposes of data models are
– to assist in understanding the meaning (semantics) of the
data
– to facilitate communication about requirements
2
Terminology sidetrack
• Data model, information model
• Often used interchangeably, sometimes not
• Sometimes used this way
– Information model: abstract, conceptual,
protocol neutral (ERD, ontologies)
– Data model: concrete, include implementation
and protocol-specific details (logical and
physical design)
3
Goals of DB design
• Regardless of hardware and architecture
decisions, general goals typically include
–
–
–
–
–
Minimize duplication of data
Minimize space
Increase ease of change
Maximize accessibility
No loss of data or creation of spurious data (data
integrity)
– Maximize speed of data/information retrieval
4
History
• History of DBMS is related to the
development of different database
architectures
• Progressive but not exclusive series of
events
–
–
–
–
File based systems
Hierarchical and Network models
Relational model
Others: EAV, Object/Object-relational
5
File based systems
• We have previously discussed. Records
defined by each individual application.
Record structure tied tightly to application
code (program-data dependence)
• Still in use for some purposes
6
Flat file databases
• A simple database model with one table
• Each row represents one “instance”
• Columns are attributes. If the attribute is
multi-valued, use multiple columns
• Still used. Common way to set up statistical
datasets, simple research studies
7
Flat file example
Name
ID num
Date of Admission Diagnosis 1 Diagnosis 2 Diagnosis 3
Smith, John
1234567
8/15/2009 abd pain
fever
vomiting
Oyl, Olive
98765
8/15/2009 injury L leg
ID num
Admission abd pain fever Vomiting
1234567 8/15/2009 Yes
Yes Yes
98765 8/15/2009 No
No
No
Leg injury
No
Yes
Benefits: easy, simple, familiar
Drawbacks: similar to file-based system re duplication,
potential for anomalies; hard to model a complex system
8
Hierarchical
• Roots: 1960s moon-landing projects to handle
large volume of data?
– National American Aviation (now Rockwell
International) developed software based on concept that
components can be bundled together to form larger
components until a final product is assembled
– IBM joined NAA and expanded into Information
Management System (IMS)
• Envision as an upside down tree
– Multiple tables related in a tree structure
– Each child table has 1 parent
9
Hierarchical
• Inverted Tree view
Drugs
Antibiotics
Analgesics
Narcotic
OTC
Aspirin
Acetaminophen
10
Another way to visualize
• Like folders
on your computer
Drugs
Analgesics
Narcotics
OTC
Aspirin
Acetaminophen
11
Hierarchical
• Benefits
– Fast (Physical pointers explicitly link records)
– Automatic integrity
– Allows inheritance and inference good for
decision support
• Potential Problems:
– May be difficult to model complex situations
– Because each child has only 1 parent, data may be
stored in redundant locations (For example, aspirin is an
OTC analgesic … but it’s also an anti-inflammatory, and
anti pyretic…)
– Must travel through hierarchy to access target data—
users need to understand the hierarchy
12
Network
• Mid 1960’s, GE {Charles Bachman}
developed Integrated Data Store (IDS)
• Needed to model more complex systems
than hierarchical model accommodated
• Network model allowed multiple parents for
1 child.
– Aspirin is an OTC analgesic AND an antiinflammatory AND an anti-thrombotic…
13
Meanwhile…
• Conference on Data Systems Languages
(CODASYL)
– World wide group formed in 1967 to develop
standard specifications for DB
• Schema: overall organization of the DB as seen by
the DBA (includes system tables, etc.)
• Subschema: parts of the DB seen by applications
(user tables)
• Data management language(s)
14
Languages
• DDL: Data definition language
– Enables the DBA to manipulate structures
• DML: Data Manipulation language
– Enables users to manipulate content
15
Relational
• 1970’s E.F. Codd (IBM) wrote landmark
paper describing relational data model
– Intended to address some of the limitations of
earlier approaches
– Based on sound mathematical principles
– Quickly shown to be pragmatically feasible and
practical
– Led to development of SQL – the development
language for relational DBMS
16
Relational
• Any type of relationship can be modeled
(sibling, aggregation/specialization, part-of,
etc.)
• Based on set theory
• Uses LOGICAL pointers between tables
instead of physical pointers. That makes it
flexible and more easilitymodifiable
17
Relational
• Benefits
– Predictable Outputs
– Flexible
– Data independence (logical pointers).
• Logical and physical design separate.
• Add new fields - don’t need to reestablish pointers.
• Can change gracefully as organizations evolve. Rest
of design not changed when you change or delete 1
table
– Multiple layers of data integrity
18
Relational
• Drawbacks
– Can be slower than other models
– Consumes more disk space
– Hard to determine how changes in structure
impact performance
• Still by far the most widely used architecture
for databases
• Basis for most large current DBMS
19
Object oriented, Object/relational
• Objects encapsulate structure (attributes) &
behavior (operations)
• Very flexible, allows user-defined and
complex data types
Person object – might hold demographic info
Decision object – info about a decision/alert
(when it was made, type of decision, what person it concerned)
Hypertonic saline decision object – extends decision with specifics
includes the complex data type called saline lab values
Saline lab values are a set of items (array) – modeled as inside
this object rather than as separate item
20
OODB
• The object-oriented database (OODB) paradigm is
a combination of object-oriented programming and
persistent data storage. It includes the seamless
treatment of both persistent data, as found in
databases, and transient data, as found in executing
programs
http://www.unixspace.com/context/databases.html
• For application programmers, this puts the
application code and database code in the same
environment
• On the other hand, you can lose the advantage of
clear division between the application code and the
database
21
E-A-V
• EAV (E-A-V)
– Entity, attribute, value
– In relational model, a row is a set of facts
(attributes)
– In EAV model, each attribute (fact) is separate
row, containing
• Entity (who the fact is about)
• Attribute (what type of fact is it)
• Value
22
Example
Entity
Attribute
Patient
Date
Survey
1
8/1/2011 2
Value
Question Response
5
A
23
EAV – benefits & uses
• Extremely flexible, allows dynamic data structures
• Good when the number of potential attributes for the entity
is much greater than the number of attributes that apply to
any given instance (lots of nulls)
• Good for facts that have transitory existence
• The basis of Web cookies , the Microsoft Windows
Registry, and tagged data interchange formats such as
ASN.1.
• XML can be regarded as a form of EAV design that
supports nesting of attributes to an arbitrary degree.
• EAV models are important components of Electronic
Patient Record Systems, such as the HELP system at
Intermountain
http://ycmi.med.yale.edu/nadkarni/eav_cr_contents.htm
24
EAV - disadvantages
• Harder for people to understand
• Can be complex when you need to link
related items
• Can be slow to query
• Note: EAV can be transformed to relational
tables. But…it can take some programming
25
Summary: DBMS advantages
•
•
•
•
•
Control redundancy
Data consistency, integrity
Data sharing, improved security
Data independence
Improved efficiency – common functions
are in one location
• Enforcement of standards
• Balance conflicting requirements
26
Summary: DBMS disadvantages
• Complexity
• Size
• Cost
– Direct, hardware costs, costs of conversion
• Performance
• Higher impact of failure
27