Download Physical pointers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DBMS history and models
NURS6803
Clinical DB Design
Katherine Sward, PhD, RN
1
Models
• Provide a high-level description of a system that is
intended to be understandable by a variety of users
– Physical models: how will we configure
hardware/software
– Conceptual models (DB architecture, data models): how
will we configure the data structures within the DBMS
• Main purposes of data models are
– to assist in understanding the meaning (semantics) of the
data
– to facilitate communication about requirements
2
Terminology sidetrack
• Data model, information model
• Often used interchangeably, sometimes not
• Sometimes used this way
– Information model: abstract, conceptual,
protocol neutral (ERD, ontologies)
– Data model: concrete, include implementation
and protocol-specific details (logical and
physical design)
3
Goals of DB design
• Regardless of hardware and architecture
decisions, general goals typically include
–
–
–
–
–
Minimize duplication of data
Minimize space
Increase ease of change
Maximize accessibility
No loss of data or creation of spurious data (data
integrity)
– Maximize speed of data/information retrieval
4
History
• History of DBMS is related to the
development of different database
architectures
• Progressive but not exclusive series of
events
–
–
–
–
File based systems
Hierarchical and Network models
Relational model
Others: EAV, Object/Object-relational
5
File based systems
• We have previously discussed. Records
defined by each individual application.
Record structure tied tightly to application
code (program-data dependence)
• Still in use for some purposes
6
Flat file databases
• A simple database model with one table
• Each row represents one “instance”
• Columns are attributes. If the attribute is
multi-valued, use multiple columns
• Still used. Common way to set up statistical
datasets, simple research studies
7
Flat file example
Name
ID num
Date of Admission Diagnosis 1 Diagnosis 2 Diagnosis 3
Smith, John
1234567
8/15/2009 abd pain
fever
vomiting
Oyl, Olive
98765
8/15/2009 injury L leg
ID num
Admission abd pain fever Vomiting
1234567 8/15/2009 Yes
Yes Yes
98765 8/15/2009 No
No
No
Leg injury
No
Yes
Benefits: easy, simple, familiar
Drawbacks: similar to file-based system re duplication,
potential for anomalies; hard to model a complex system
8
Hierarchical
• Roots: 1960s moon-landing projects to handle
large volume of data?
– National American Aviation (now Rockwell
International) developed software based on concept that
components can be bundled together to form larger
components until a final product is assembled
– IBM joined NAA and expanded into Information
Management System (IMS)
• Envision as an upside down tree
– Multiple tables related in a tree structure
– Each child table has 1 parent
9
Hierarchical
• Inverted Tree view
Drugs
Antibiotics
Analgesics
Narcotic
OTC
Aspirin
Acetaminophen
10
Another way to visualize
• Like folders
on your computer
Drugs
Analgesics
Narcotics
OTC
Aspirin
Acetaminophen
11
Hierarchical
• Benefits
– Fast (Physical pointers explicitly link records)
– Automatic integrity
– Allows inheritance and inference  good for
decision support
• Potential Problems:
– May be difficult to model complex situations
– Because each child has only 1 parent, data may be
stored in redundant locations (For example, aspirin is an
OTC analgesic … but it’s also an anti-inflammatory, and
anti pyretic…)
– Must travel through hierarchy to access target data—
users need to understand the hierarchy
12
Network
• Mid 1960’s, GE {Charles Bachman}
developed Integrated Data Store (IDS)
• Needed to model more complex systems
than hierarchical model accommodated
• Network model allowed multiple parents for
1 child.
– Aspirin is an OTC analgesic AND an antiinflammatory AND an anti-thrombotic…
13
Meanwhile…
• Conference on Data Systems Languages
(CODASYL)
– World wide group formed in 1967 to develop
standard specifications for DB
• Schema: overall organization of the DB as seen by
the DBA (includes system tables, etc.)
• Subschema: parts of the DB seen by applications
(user tables)
• Data management language(s)
14
Languages
• DDL: Data definition language
– Enables the DBA to manipulate structures
• DML: Data Manipulation language
– Enables users to manipulate content
15
Relational
• 1970’s E.F. Codd (IBM) wrote landmark
paper describing relational data model
– Intended to address some of the limitations of
earlier approaches
– Based on sound mathematical principles
– Quickly shown to be pragmatically feasible and
practical
– Led to development of SQL – the development
language for relational DBMS
16
Relational
• Any type of relationship can be modeled
(sibling, aggregation/specialization, part-of,
etc.)
• Based on set theory
• Uses LOGICAL pointers between tables
instead of physical pointers. That makes it
flexible and more easilitymodifiable
17
Relational
• Benefits
– Predictable Outputs
– Flexible
– Data independence (logical pointers).
• Logical and physical design separate.
• Add new fields - don’t need to reestablish pointers.
• Can change gracefully as organizations evolve. Rest
of design not changed when you change or delete 1
table
– Multiple layers of data integrity
18
Relational
• Drawbacks
– Can be slower than other models
– Consumes more disk space
– Hard to determine how changes in structure
impact performance
• Still by far the most widely used architecture
for databases
• Basis for most large current DBMS
19
Object oriented, Object/relational
• Objects encapsulate structure (attributes) &
behavior (operations)
• Very flexible, allows user-defined and
complex data types
Person object – might hold demographic info
Decision object – info about a decision/alert
(when it was made, type of decision, what person it concerned)
Hypertonic saline decision object – extends decision with specifics
includes the complex data type called saline lab values
Saline lab values are a set of items (array) – modeled as inside
this object rather than as separate item
20
OODB
• The object-oriented database (OODB) paradigm is
a combination of object-oriented programming and
persistent data storage. It includes the seamless
treatment of both persistent data, as found in
databases, and transient data, as found in executing
programs
http://www.unixspace.com/context/databases.html
• For application programmers, this puts the
application code and database code in the same
environment
• On the other hand, you can lose the advantage of
clear division between the application code and the
database
21
E-A-V
• EAV (E-A-V)
– Entity, attribute, value
– In relational model, a row is a set of facts
(attributes)
– In EAV model, each attribute (fact) is separate
row, containing
• Entity (who the fact is about)
• Attribute (what type of fact is it)
• Value
22
Example
Entity
Attribute
Patient
Date
Survey
1
8/1/2011 2
Value
Question Response
5
A
23
EAV – benefits & uses
• Extremely flexible, allows dynamic data structures
• Good when the number of potential attributes for the entity
is much greater than the number of attributes that apply to
any given instance (lots of nulls)
• Good for facts that have transitory existence
• The basis of Web cookies , the Microsoft Windows
Registry, and tagged data interchange formats such as
ASN.1.
• XML can be regarded as a form of EAV design that
supports nesting of attributes to an arbitrary degree.
• EAV models are important components of Electronic
Patient Record Systems, such as the HELP system at
Intermountain
http://ycmi.med.yale.edu/nadkarni/eav_cr_contents.htm
24
EAV - disadvantages
• Harder for people to understand
• Can be complex when you need to link
related items
• Can be slow to query
• Note: EAV can be transformed to relational
tables. But…it can take some programming
25
Summary: DBMS advantages
•
•
•
•
•
Control redundancy
Data consistency, integrity
Data sharing, improved security
Data independence
Improved efficiency – common functions
are in one location
• Enforcement of standards
• Balance conflicting requirements
26
Summary: DBMS disadvantages
• Complexity
• Size
• Cost
– Direct, hardware costs, costs of conversion
• Performance
• Higher impact of failure
27