Download What is a Data Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
Evolution of Databases
Chapter1
Contents
Evolution of Databases .......................................................................................................................................... 28
Content .................................................................................................................................................................. 28
What is a Data Model ? ......................................................................................................................................... 28
Disadvantages of File Systems .............................................................................................................................. 29
Major advantages of DBMS ................................................................................................................................... 30
Hierarchical model................................................................................................................................................. 31
Parts-Suppliers example ........................................................................................................................................ 33
Network data model.............................................................................................................................................. 36
Network model ...................................................................................................................................................... 36
CODASYL Network Model ...................................................................................................................................... 37
Network Database Schema ................................................................................................................................... 38
Parts-Suppliers example in CODASYL .................................................................................................................... 39
Network Model...................................................................................................................................................... 40
Characteristics of Network model ......................................................................................................................... 41
Disadvantages of Hierarchical and Network models............................................................................................. 42
Relational data models .......................................................................................................................................... 43
Characteristic of Relational Data Model................................................................................................................ 43
Relational DBMS (1980’s) ...................................................................................................................................... 45
Object Oriented data models ................................................................................................................................ 46
Object-Oriented Database Schema ....................................................................................................................... 48
Objects in OO Database......................................................................................................................................... 48
Advantages of OO Databases ................................................................................................................................ 50
Evolution of Data models ...................................................................................................................................... 50
Evolution of Database Technology ........................................................................................................................ 50
What Is Data Mining? ............................................................................................................................................ 51
Query Language component of evolution ............................................................................................................. 52
DMQL ..................................................................................................................................................................... 56
XML ........................................................................................................................................................................ 56
Find names of salesman over 40 in “Outland” region........................................................................................... 58
26
Evolution of Databases
Chapter1
Query response?.................................................................................................................................................... 58
Current concerns of database community ............................................................................................................ 59
Developments contributing to massive databases: .............................................................................................. 59
Bioinformatics........................................................................................................................................................ 60
Problems ................................................................................................................................................................ 61
Intelligence ............................................................................................................................................................ 61
Data avalanche pressures machine to evolve intelligence ................................................................................... 61
Pace of machine evolution? .................................................................................................................................. 62
27
Evolution of Databases
Chapter1
Evolution of Databases
 Data Models
 Languages
Objective
 To show the evolution of Data models and languages from simple file
systems to more advanced types.
Content
 Data Models and Languages
o File system
o Hierarchical
o Network
o Relational
o Object Oriented
o OOQL
o DMQL
o XML –QL
What is a Data Model ?
 A Data model is a collection of tools for describing
o data
o data relationships
o data semantics
28
Evolution of Databases
Chapter1
o data constraints
 E.g. Data Models
o Entity-Relationship model
o Relational model
o Object-oriented model
o Network model
o Hierarchical model
Disadvantages of File Systems
 Uncontrolled data redundancy, data inconsistency
 Poor data sharing
29
Evolution of Databases
Chapter1
 Difficult to keep up with changes
o If the structure of the data changed (ex: adding more fields),
programs that were using the file had to change
 Low productivity
 High maintenance cost
 Applications have to enforced referential integrity constraints
 No common error recovery procedure (rollback)
 Severed dependence between programs and data
Major advantages of DBMS
 Redundancy control
 Ad hoc queries
 Resilience- protect data from failure
 Data sharing and concurrent access
 Data integrity and security
 Separation of applications from the DB
30
Evolution of Databases
Chapter1
Hierarchical model
 Hierarchical model uses trees.
 A tree represents parent/child relationships
o For example, a car consists of body, engine, transmission, etc.
 Pointers were used to link a parent to its children or a child to another
child
 Retrieving the data in a hierarchical database required navigating
through the records, moving up, down, and sideways one record at the
time
 The most popular hierarchical database was Information Management
System (IMS) introduced in 1968
31
Evolution of Databases
Chapter1
32
Evolution of Databases
Chapter1
Parts-Suppliers example
 Data is represented to the user in the form of a set of tree structures and
operators for traversing paths
 Each child can be reached from the parent
 Without parent node, children node does not exist
 For parts- suppliers require two hierarchical trees
33
Evolution of Databases
Q1: Find supplier numbers for
suppliers who supply part P2.
get [next] part where P#=P2;
Chapter1
Q2: Find part numbers for parts
supplied by supplier S2.
do until no more parts;
do until no more suppliers
get next part;
under this part;
get [next] suppliers
get next supplier
under this part
under this part;
where S#=S2;
print S#;
if found
end;
then print P#;
end;
 Although the queries are symmetric but the two procedures are not.
34
Evolution of Databases
Chapter1
 Problem in some operations:
 Insertion - Enter a new supplier S4
o Not possible until we know what parts S4 provide
o Parent is not known
o Use a Dummy parts record as parent of S4
 Deletion – delete supplier S3 which provides P2 where QTY=200
o Logically possible
o But causes deletion of other information about S3 (S3 does not
exist anywhere)
o Other problem- deletion of a parent causes deletion of all
dependent/children
 Update – update city S1 from C1 to C4
o All copies of S1 have to be updated
o Propagating update  increased processing
o If all copies are not updated inconsistency
 Characteristics of Hierarchical database:
o Simple Structure
o Best suited to environments where 1:n relationship exists
o Performance
o Traversing starts from parent
o Symmetric queries do not have symmetric processing
35
Evolution of Databases
Chapter1
o Problems with some operations (Insert, delete, update)
Network data model
Network model
 Hierarchical database could not answer the demand of some business
oriented environment.
 For example, in an order processing company, a single order might
participate in more than one parent/child relationship.
 For instance, a particular order should be linked to
o The customer who placed it
o The sales person who took it
o The product ordered
o This could not be done by IMS
 To deal with these situations, network data model was developed:
children could have more than one parent
36
Evolution of Databases
Chapter1
• Example of parent/child relationship in network database models
CODASYL Network Model
 In 1971, the conference on the systems languages published an official
standard for network databases which became known as CODASYL
model
 A programmers would access the network database as follows:
o Find a specific parent record by key (ex: customer number)
o Move down to the first child in a particular set (the first order
placed by this customer
o Move sideways from one child to the next in the set (the next
order placed by this customer)
o Move up from a child to its parent in another set ( the salesperson
who took the order)
37
Evolution of Databases
Chapter1
Network Database Schema
38
Evolution of Databases
Chapter1
Parts-Suppliers example in CODASYL
 Conceptual design is based on concepts of sets
 Consider the set as a type of tree, where in each level there exist a type
of record
 Records at the highest and lowest levels are the parent/owner records
 Records at the middel level are the childeren/member records
 Owner record is linked to the first member record according to some
order
 Member records are connected together
 Last member is connected to the owner
39
Evolution of Databases
Q1: Find supplier numbers for suppliers who
supply part P2.
Chapter1
Q2: Find part numbers for parts supplied by supplier
S2.
get [next] part where P#=P2;
get [next] supplier where S#=S2;
do until no more connectors
do until no more connectors
under this part;
under this supplier;
get next connector
get next connector
under this part;
under this supplier;
get supplier over this
get part over this
connector;
connector;
print S#;
print P#;
end;
end;
Network Model
 queries are symmetric but more complex than hierarchical
 Operations:
o Insertion - Enter a new supplier S4
 Does not have hierarchical problems
40
Evolution of Databases
Chapter1
 Can insert a new supplier without knowing what parts it
supplies
 i.e., insert a new record for it and set its link to itself
o Deletion – delete supplier S3 which provides P2 where QTY=200
 No problem- does not cause S3 to be deleted
o Update – update city S1 from C1 to C4
 No problem – only stored once in the DB
 Queries are symmetric but more complex than hierarchical
 Operations:
o Insertion - Enter a new supplier S4
o Deletion – delete supplier S3 which provides P2 where QTY=200
o Update – update city S1 from C1 to C4
Characteristics of Network model
 Flexibility to represent a two way 1:n relationship
 Performance
 Symmetric queries exist
 Insertion causes no problem
41
Evolution of Databases
Chapter1
 greater complexity
 For some queries, there is a path selection problem
Disadvantages of Hierarchical and Network models
 They have rigid structure:
o The structure of the records had to be known in advance.
o Changing the database structure required rebuilding the entire
database
 Querying the database was not always easy. Retrieving simple
information form the database could cause programmer to write lots
of code
o Some of this code was quite complicated
42
Evolution of Databases
Chapter1
Relational data models
Characteristic of Relational Data Model
 Most of the current DB systems are relational
 Data is perceived by the user as tables
 Operators generate new tables from old
 Data and their relationships are represented by records
 Retrieval is simple – ad hoc queries
 Based on mathematical concepts
 Queries are symmetric on simple flat files
 Operations: no problems with insertion, deletion, update
43
Evolution of Databases
Chapter1
S S# SNAME STATUS CITY
SP
S# P#
QTY
S1 Smith
20
London
S1 P1
300
S2 Jones
10
Paris
S1 P2
200
S3 Blake
30
Paris
S1 P3
400
S2 P1
300
WEIGHT CITY
S2 P2
400
S3 P2
200
P P# PNAME COLOR
P1 Nut
Red
12
London
P2 Bolt
Green
17
Paris
P3 Screw
Blue
17
Rome
P4 Screw
Red
14
London
44
Evolution of Databases
Chapter1
Relational DBMS (1980’s)
Student (ID char(30), Name char(30), DOB date
Address char(40), GPA number)
Student
ID
Name
DOB
Address
GPA
s1
Jose
2/3/67
Stone
Mountain
3.7
s2
Alice
3/12/72
Buck Head
4.0
s3
Tom
10/2/78
Dunwoody
3.0
s4
Sue
4/6/45
Atlanta
2.9
s5
Steve
9/7/71
Stone
Mountain
3.5
45
Evolution of Databases
Chapter1
Object Oriented data models
 Incorporates features from object-oriented programming (1980s)
o classes (tables) and objects (table rows)
o complex attributes (objects, sets, lists, etc.)
o encapsulation
o incremental class definitions via inheritance hierarchies and
networks
o polymorphism
 Many-to-many relationships directly represented
 Relationships via logical inclusion
 Commercial products:
o Jasmine (Computer Associates, 1998)
o Gemstone (Gemstone Systems Inc. -- SUN Microsystems Inc.)
46
Evolution of Databases
Chapter1
o Many relational product claim ―object-oriented database features‖
e.g. Microsoft’s SQL-Server and Access
47
Evolution of Databases
Chapter1
Object-Oriented Database Schema
Objects in OO Database
48
Evolution of Databases
Chapter1
Query: Find salesmen over 40 in region “Outland”
SmallTalk syntax:
TheSalesmen do: [S | (S age > 40 & S region name = ―Outland‖) ifTrue: [
S name display.
Newline display
]
]
C++ syntax:
S = firstSalesman (TheSalesmen);
while (S != null) {
if ((S.age > 40) && (S.region.name == ―Outland‖))
S.name >> cout;
S = nextSalesman (TheSalesmen);
}
Query: Find salesmen over 40 in region “Outland”
OQL syntax (Object-oriented SQL):
select S.name
from S in TheSalesmen
where S.age > 40 and S.region.name = ―Outland‖;
49
Evolution of Databases
Chapter1
Advantages of OO Databases
 Group data processes
 Understand complex objects
 Easy to maintain and change
 Improve productivity
 Examples : ONTOS, GemStone, ObjectStore from Object Design,
OpenODB
Evolution of Data models
• Decreasing technical details in queries
• Increasing use of application vocabulary
• Data objects have attributes and behavior of real-world counterparts
Evolution of Database Technology
 1960s:
o Data collection, database creation, IMS and network DBMS
 1970s:
o Relational data model, relational DBMS implementation
50
Evolution of Databases
Chapter1
 1980s:
o RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
 1990s—2000s:
o Data mining and data warehousing, multimedia databases, and
Web databases, XML databases.
What Is Data Mining?
 Data mining (knowledge discovery in databases):
o Extraction of interesting information / knowledge or patterns from
data in large databases
 Alternative names
o knowledge discovery(mining) in databases (KDD),
o knowledge extraction,
o data/pattern analysis,
o information harvesting,
o business intelligence, etc.
51
Evolution of Databases
Chapter1
Query Language component of evolution
Data Mining example:
52
Evolution of Databases
Chapter1
Characterize sales quantities by quadrant and salesman age
DMQL:
mine characteristics as regionAgeBreakout
analyze sum(S.Quantity)
in relevance to R.Quadrant, T.age
from Sale S, Salesman T, Region R
where S.SalesmanID = T.SalesmanID and T.RegID = R.RegID
Roughly equivalent to:
select sum(S.Quantity)
from Sale S, Salesman T, Region R
where S.SalesmanID = T.SalesmanID
and T.RegID = R.RegID
53
Evolution of Databases
Chapter1
groupby R.Quadrant, S.age
except:
 age is quantized
o can generalize any dimension to obtain small number of possible
values
 can mine comparisons (similar breakouts for competing categories)
associations
o e.g. percent VCR sales that accompany television sales
 Classifications
o identify natural clusters
 can specify measures of interest
o e.g. support, confidence, minimal intercluster distance
54
Evolution of Databases
Chapter1
55
Evolution of Databases
Chapter1
DMQL
 DMQL retains SQL syntax for locating relevant data
 includes statistical measures of interest level
 tailors language to needs of Customer Relationship Management
(CMR)
Query Language component of evolution
XML
self describing text via embedded structure tags, e.g.
<marketing>
<salesman>
<name> Tom Jones </name>
<age> 42 </age>
<region>
56
Evolution of Databases
Chapter1
<name> Seattle </name>
<quadrant> Northwest </quadrant>
</region>
<department>
<name> televisions </name>
<division> electronics </division>
</department>
</salesman>
</marketing>
57
Evolution of Databases
Chapter1
Find names of salesman over 40 in “Outland” region.
XML-QL:
CONSTRUCT <result> {
WHERE
<marketing>
<salesman>
<name> $x </name>
<age> $y </age>
<region>
<name> Outland </name>
</region>
</salesman>
</marketing> IN ―www.publisher.com/markdb.xml‖,$y > 40
CONSTRUCT <name> $x </name
} </result>
Query response?
<result>
<name> Alan Alsop </name>
<name> Barbara Benson </name>
<name> Cindy Carson </name>
</result>
58
Evolution of Databases
Chapter1
Current concerns of database community
 Very large data repositories
 Data integration across heterogeneous sources, especially web sources
o graphics, images, video
 Legacy data access and transformation
 Optimizing structure
highly structured (tables)
semistructured (XML documents)
unstructured (natural language text)
 Common data-exchange templates
Developments contributing to massive databases:
 graphics, images, video
e.g. LightSurf (Philippe Kahn)
exchanges images over call phones
archives 10 terabytes of data on company servers
data atoms are discernible chunks (images)
59
Evolution of Databases
Chapter1
require specialized agents to interrogate
e.g. face recognition algorithms
 consumer transactional data
e.g. Teradata division of National Cash Register
analyzes purchasing patterns and correlates with
demographics
 biotechnology and bioinformatics
o Biotechnology is a general term describing the directed
modification of biological processes.
o Bioinformatics is the application of statistics and computer
science to the field of molecular biology
e.g. GeneMine projects at University of California, LA
integrates heterogenous databases across web
correlates subsequences with known patterns
suggests interesting associations, as opposed to
responding to queries
30,000 genes
3,000,000,000 base pairs to sequence
correlated, or partially correlated patterns
exponentially larger
Bioinformatics
 The primary goal of bioinformatics is to increase the understanding of
biological processes.
 What sets it apart from other approaches, however, is its focus on
developing and applying computationally intensive techniques (e.g.,
pattern recognition, data mining, machine learning algorithms, and
visualization) to achieve this goal.
60
Evolution of Databases
Chapter1
 Major research efforts in the field include sequence alignment, gene
finding, genome assembly, drug design, drug discovery, protein
structure alignment, protein structure prediction, prediction of gene
expression and protein-protein interactions, genome-wide association
studies and the modeling of evolution.
Problems
 Truly vast data volumes
 No opportunity to augment data stream with structure information
 System overwhelmed
categorization
with
data
volume
that
require
simple
 Urgent need to comprehend higher level information inherent in the
data stream, i.e., Require Intelligence
Intelligence
 Intelligence is the ability to reduce input data streams to a manageable
size while retaining detail sufficient to the tasks at hand.
 Intelligence presumably requires organization and categorization of the
input data streams. These are database query operations.
Data avalanche pressures machine to evolve intelligence
 For machine: inventiveness with data streams
o e.g. Find associations, useful at human level, that
concern diet and disease
vs.
Find proportion of heart attacks under age 50
by quantized daily calorie intake
61
Evolution of Databases
Chapter1
 roughly illustrates capability of today’s data mining systems
Pace of machine evolution?
 Human brain achieves equivalent of 0.1 peta-ops
 Moore’s Law (processor speeds double every 18-24 months)
gives required power in next few decades.
 IBM currently developing a 1.0 petaflop machine for studying protein
folding (Blue Gene)
 Optimists suggest superintelligent machines in the first half of the 21st
century
 If successful, this approach provides the required intelligent data stream
reduction without human understanding of the mechanism.
 Will system require emotion or consciousness to motivate the learning
algorithms?
62