* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a Data Model
Survey
Document related concepts
Transcript
Evolution of Databases
Chapter1
Contents
Evolution of Databases .......................................................................................................................................... 28
Content .................................................................................................................................................................. 28
What is a Data Model ? ......................................................................................................................................... 28
Disadvantages of File Systems .............................................................................................................................. 29
Major advantages of DBMS ................................................................................................................................... 30
Hierarchical model................................................................................................................................................. 31
Parts-Suppliers example ........................................................................................................................................ 33
Network data model.............................................................................................................................................. 36
Network model ...................................................................................................................................................... 36
CODASYL Network Model ...................................................................................................................................... 37
Network Database Schema ................................................................................................................................... 38
Parts-Suppliers example in CODASYL .................................................................................................................... 39
Network Model...................................................................................................................................................... 40
Characteristics of Network model ......................................................................................................................... 41
Disadvantages of Hierarchical and Network models............................................................................................. 42
Relational data models .......................................................................................................................................... 43
Characteristic of Relational Data Model................................................................................................................ 43
Relational DBMS (1980’s) ...................................................................................................................................... 45
Object Oriented data models ................................................................................................................................ 46
Object-Oriented Database Schema ....................................................................................................................... 48
Objects in OO Database......................................................................................................................................... 48
Advantages of OO Databases ................................................................................................................................ 50
Evolution of Data models ...................................................................................................................................... 50
Evolution of Database Technology ........................................................................................................................ 50
What Is Data Mining? ............................................................................................................................................ 51
Query Language component of evolution ............................................................................................................. 52
DMQL ..................................................................................................................................................................... 56
XML ........................................................................................................................................................................ 56
Find names of salesman over 40 in “Outland” region........................................................................................... 58
26
Evolution of Databases
Chapter1
Query response?.................................................................................................................................................... 58
Current concerns of database community ............................................................................................................ 59
Developments contributing to massive databases: .............................................................................................. 59
Bioinformatics........................................................................................................................................................ 60
Problems ................................................................................................................................................................ 61
Intelligence ............................................................................................................................................................ 61
Data avalanche pressures machine to evolve intelligence ................................................................................... 61
Pace of machine evolution? .................................................................................................................................. 62
27
Evolution of Databases
Chapter1
Evolution of Databases
Data Models
Languages
Objective
To show the evolution of Data models and languages from simple file
systems to more advanced types.
Content
Data Models and Languages
o File system
o Hierarchical
o Network
o Relational
o Object Oriented
o OOQL
o DMQL
o XML –QL
What is a Data Model ?
A Data model is a collection of tools for describing
o data
o data relationships
o data semantics
28
Evolution of Databases
Chapter1
o data constraints
E.g. Data Models
o Entity-Relationship model
o Relational model
o Object-oriented model
o Network model
o Hierarchical model
Disadvantages of File Systems
Uncontrolled data redundancy, data inconsistency
Poor data sharing
29
Evolution of Databases
Chapter1
Difficult to keep up with changes
o If the structure of the data changed (ex: adding more fields),
programs that were using the file had to change
Low productivity
High maintenance cost
Applications have to enforced referential integrity constraints
No common error recovery procedure (rollback)
Severed dependence between programs and data
Major advantages of DBMS
Redundancy control
Ad hoc queries
Resilience- protect data from failure
Data sharing and concurrent access
Data integrity and security
Separation of applications from the DB
30
Evolution of Databases
Chapter1
Hierarchical model
Hierarchical model uses trees.
A tree represents parent/child relationships
o For example, a car consists of body, engine, transmission, etc.
Pointers were used to link a parent to its children or a child to another
child
Retrieving the data in a hierarchical database required navigating
through the records, moving up, down, and sideways one record at the
time
The most popular hierarchical database was Information Management
System (IMS) introduced in 1968
31
Evolution of Databases
Chapter1
32
Evolution of Databases
Chapter1
Parts-Suppliers example
Data is represented to the user in the form of a set of tree structures and
operators for traversing paths
Each child can be reached from the parent
Without parent node, children node does not exist
For parts- suppliers require two hierarchical trees
33
Evolution of Databases
Q1: Find supplier numbers for
suppliers who supply part P2.
get [next] part where P#=P2;
Chapter1
Q2: Find part numbers for parts
supplied by supplier S2.
do until no more parts;
do until no more suppliers
get next part;
under this part;
get [next] suppliers
get next supplier
under this part
under this part;
where S#=S2;
print S#;
if found
end;
then print P#;
end;
Although the queries are symmetric but the two procedures are not.
34
Evolution of Databases
Chapter1
Problem in some operations:
Insertion - Enter a new supplier S4
o Not possible until we know what parts S4 provide
o Parent is not known
o Use a Dummy parts record as parent of S4
Deletion – delete supplier S3 which provides P2 where QTY=200
o Logically possible
o But causes deletion of other information about S3 (S3 does not
exist anywhere)
o Other problem- deletion of a parent causes deletion of all
dependent/children
Update – update city S1 from C1 to C4
o All copies of S1 have to be updated
o Propagating update increased processing
o If all copies are not updated inconsistency
Characteristics of Hierarchical database:
o Simple Structure
o Best suited to environments where 1:n relationship exists
o Performance
o Traversing starts from parent
o Symmetric queries do not have symmetric processing
35
Evolution of Databases
Chapter1
o Problems with some operations (Insert, delete, update)
Network data model
Network model
Hierarchical database could not answer the demand of some business
oriented environment.
For example, in an order processing company, a single order might
participate in more than one parent/child relationship.
For instance, a particular order should be linked to
o The customer who placed it
o The sales person who took it
o The product ordered
o This could not be done by IMS
To deal with these situations, network data model was developed:
children could have more than one parent
36
Evolution of Databases
Chapter1
• Example of parent/child relationship in network database models
CODASYL Network Model
In 1971, the conference on the systems languages published an official
standard for network databases which became known as CODASYL
model
A programmers would access the network database as follows:
o Find a specific parent record by key (ex: customer number)
o Move down to the first child in a particular set (the first order
placed by this customer
o Move sideways from one child to the next in the set (the next
order placed by this customer)
o Move up from a child to its parent in another set ( the salesperson
who took the order)
37
Evolution of Databases
Chapter1
Network Database Schema
38
Evolution of Databases
Chapter1
Parts-Suppliers example in CODASYL
Conceptual design is based on concepts of sets
Consider the set as a type of tree, where in each level there exist a type
of record
Records at the highest and lowest levels are the parent/owner records
Records at the middel level are the childeren/member records
Owner record is linked to the first member record according to some
order
Member records are connected together
Last member is connected to the owner
39
Evolution of Databases
Q1: Find supplier numbers for suppliers who
supply part P2.
Chapter1
Q2: Find part numbers for parts supplied by supplier
S2.
get [next] part where P#=P2;
get [next] supplier where S#=S2;
do until no more connectors
do until no more connectors
under this part;
under this supplier;
get next connector
get next connector
under this part;
under this supplier;
get supplier over this
get part over this
connector;
connector;
print S#;
print P#;
end;
end;
Network Model
queries are symmetric but more complex than hierarchical
Operations:
o Insertion - Enter a new supplier S4
Does not have hierarchical problems
40
Evolution of Databases
Chapter1
Can insert a new supplier without knowing what parts it
supplies
i.e., insert a new record for it and set its link to itself
o Deletion – delete supplier S3 which provides P2 where QTY=200
No problem- does not cause S3 to be deleted
o Update – update city S1 from C1 to C4
No problem – only stored once in the DB
Queries are symmetric but more complex than hierarchical
Operations:
o Insertion - Enter a new supplier S4
o Deletion – delete supplier S3 which provides P2 where QTY=200
o Update – update city S1 from C1 to C4
Characteristics of Network model
Flexibility to represent a two way 1:n relationship
Performance
Symmetric queries exist
Insertion causes no problem
41
Evolution of Databases
Chapter1
greater complexity
For some queries, there is a path selection problem
Disadvantages of Hierarchical and Network models
They have rigid structure:
o The structure of the records had to be known in advance.
o Changing the database structure required rebuilding the entire
database
Querying the database was not always easy. Retrieving simple
information form the database could cause programmer to write lots
of code
o Some of this code was quite complicated
42
Evolution of Databases
Chapter1
Relational data models
Characteristic of Relational Data Model
Most of the current DB systems are relational
Data is perceived by the user as tables
Operators generate new tables from old
Data and their relationships are represented by records
Retrieval is simple – ad hoc queries
Based on mathematical concepts
Queries are symmetric on simple flat files
Operations: no problems with insertion, deletion, update
43
Evolution of Databases
Chapter1
S S# SNAME STATUS CITY
SP
S# P#
QTY
S1 Smith
20
London
S1 P1
300
S2 Jones
10
Paris
S1 P2
200
S3 Blake
30
Paris
S1 P3
400
S2 P1
300
WEIGHT CITY
S2 P2
400
S3 P2
200
P P# PNAME COLOR
P1 Nut
Red
12
London
P2 Bolt
Green
17
Paris
P3 Screw
Blue
17
Rome
P4 Screw
Red
14
London
44
Evolution of Databases
Chapter1
Relational DBMS (1980’s)
Student (ID char(30), Name char(30), DOB date
Address char(40), GPA number)
Student
ID
Name
DOB
Address
GPA
s1
Jose
2/3/67
Stone
Mountain
3.7
s2
Alice
3/12/72
Buck Head
4.0
s3
Tom
10/2/78
Dunwoody
3.0
s4
Sue
4/6/45
Atlanta
2.9
s5
Steve
9/7/71
Stone
Mountain
3.5
45
Evolution of Databases
Chapter1
Object Oriented data models
Incorporates features from object-oriented programming (1980s)
o classes (tables) and objects (table rows)
o complex attributes (objects, sets, lists, etc.)
o encapsulation
o incremental class definitions via inheritance hierarchies and
networks
o polymorphism
Many-to-many relationships directly represented
Relationships via logical inclusion
Commercial products:
o Jasmine (Computer Associates, 1998)
o Gemstone (Gemstone Systems Inc. -- SUN Microsystems Inc.)
46
Evolution of Databases
Chapter1
o Many relational product claim ―object-oriented database features‖
e.g. Microsoft’s SQL-Server and Access
47
Evolution of Databases
Chapter1
Object-Oriented Database Schema
Objects in OO Database
48
Evolution of Databases
Chapter1
Query: Find salesmen over 40 in region “Outland”
SmallTalk syntax:
TheSalesmen do: [S | (S age > 40 & S region name = ―Outland‖) ifTrue: [
S name display.
Newline display
]
]
C++ syntax:
S = firstSalesman (TheSalesmen);
while (S != null) {
if ((S.age > 40) && (S.region.name == ―Outland‖))
S.name >> cout;
S = nextSalesman (TheSalesmen);
}
Query: Find salesmen over 40 in region “Outland”
OQL syntax (Object-oriented SQL):
select S.name
from S in TheSalesmen
where S.age > 40 and S.region.name = ―Outland‖;
49
Evolution of Databases
Chapter1
Advantages of OO Databases
Group data processes
Understand complex objects
Easy to maintain and change
Improve productivity
Examples : ONTOS, GemStone, ObjectStore from Object Design,
OpenODB
Evolution of Data models
• Decreasing technical details in queries
• Increasing use of application vocabulary
• Data objects have attributes and behavior of real-world counterparts
Evolution of Database Technology
1960s:
o Data collection, database creation, IMS and network DBMS
1970s:
o Relational data model, relational DBMS implementation
50
Evolution of Databases
Chapter1
1980s:
o RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
1990s—2000s:
o Data mining and data warehousing, multimedia databases, and
Web databases, XML databases.
What Is Data Mining?
Data mining (knowledge discovery in databases):
o Extraction of interesting information / knowledge or patterns from
data in large databases
Alternative names
o knowledge discovery(mining) in databases (KDD),
o knowledge extraction,
o data/pattern analysis,
o information harvesting,
o business intelligence, etc.
51
Evolution of Databases
Chapter1
Query Language component of evolution
Data Mining example:
52
Evolution of Databases
Chapter1
Characterize sales quantities by quadrant and salesman age
DMQL:
mine characteristics as regionAgeBreakout
analyze sum(S.Quantity)
in relevance to R.Quadrant, T.age
from Sale S, Salesman T, Region R
where S.SalesmanID = T.SalesmanID and T.RegID = R.RegID
Roughly equivalent to:
select sum(S.Quantity)
from Sale S, Salesman T, Region R
where S.SalesmanID = T.SalesmanID
and T.RegID = R.RegID
53
Evolution of Databases
Chapter1
groupby R.Quadrant, S.age
except:
age is quantized
o can generalize any dimension to obtain small number of possible
values
can mine comparisons (similar breakouts for competing categories)
associations
o e.g. percent VCR sales that accompany television sales
Classifications
o identify natural clusters
can specify measures of interest
o e.g. support, confidence, minimal intercluster distance
54
Evolution of Databases
Chapter1
55
Evolution of Databases
Chapter1
DMQL
DMQL retains SQL syntax for locating relevant data
includes statistical measures of interest level
tailors language to needs of Customer Relationship Management
(CMR)
Query Language component of evolution
XML
self describing text via embedded structure tags, e.g.
<marketing>
<salesman>
<name> Tom Jones </name>
<age> 42 </age>
<region>
56
Evolution of Databases
Chapter1
<name> Seattle </name>
<quadrant> Northwest </quadrant>
</region>
<department>
<name> televisions </name>
<division> electronics </division>
</department>
</salesman>
</marketing>
57
Evolution of Databases
Chapter1
Find names of salesman over 40 in “Outland” region.
XML-QL:
CONSTRUCT <result> {
WHERE
<marketing>
<salesman>
<name> $x </name>
<age> $y </age>
<region>
<name> Outland </name>
</region>
</salesman>
</marketing> IN ―www.publisher.com/markdb.xml‖,$y > 40
CONSTRUCT <name> $x </name
} </result>
Query response?
<result>
<name> Alan Alsop </name>
<name> Barbara Benson </name>
<name> Cindy Carson </name>
</result>
58
Evolution of Databases
Chapter1
Current concerns of database community
Very large data repositories
Data integration across heterogeneous sources, especially web sources
o graphics, images, video
Legacy data access and transformation
Optimizing structure
highly structured (tables)
semistructured (XML documents)
unstructured (natural language text)
Common data-exchange templates
Developments contributing to massive databases:
graphics, images, video
e.g. LightSurf (Philippe Kahn)
exchanges images over call phones
archives 10 terabytes of data on company servers
data atoms are discernible chunks (images)
59
Evolution of Databases
Chapter1
require specialized agents to interrogate
e.g. face recognition algorithms
consumer transactional data
e.g. Teradata division of National Cash Register
analyzes purchasing patterns and correlates with
demographics
biotechnology and bioinformatics
o Biotechnology is a general term describing the directed
modification of biological processes.
o Bioinformatics is the application of statistics and computer
science to the field of molecular biology
e.g. GeneMine projects at University of California, LA
integrates heterogenous databases across web
correlates subsequences with known patterns
suggests interesting associations, as opposed to
responding to queries
30,000 genes
3,000,000,000 base pairs to sequence
correlated, or partially correlated patterns
exponentially larger
Bioinformatics
The primary goal of bioinformatics is to increase the understanding of
biological processes.
What sets it apart from other approaches, however, is its focus on
developing and applying computationally intensive techniques (e.g.,
pattern recognition, data mining, machine learning algorithms, and
visualization) to achieve this goal.
60
Evolution of Databases
Chapter1
Major research efforts in the field include sequence alignment, gene
finding, genome assembly, drug design, drug discovery, protein
structure alignment, protein structure prediction, prediction of gene
expression and protein-protein interactions, genome-wide association
studies and the modeling of evolution.
Problems
Truly vast data volumes
No opportunity to augment data stream with structure information
System overwhelmed
categorization
with
data
volume
that
require
simple
Urgent need to comprehend higher level information inherent in the
data stream, i.e., Require Intelligence
Intelligence
Intelligence is the ability to reduce input data streams to a manageable
size while retaining detail sufficient to the tasks at hand.
Intelligence presumably requires organization and categorization of the
input data streams. These are database query operations.
Data avalanche pressures machine to evolve intelligence
For machine: inventiveness with data streams
o e.g. Find associations, useful at human level, that
concern diet and disease
vs.
Find proportion of heart attacks under age 50
by quantized daily calorie intake
61
Evolution of Databases
Chapter1
roughly illustrates capability of today’s data mining systems
Pace of machine evolution?
Human brain achieves equivalent of 0.1 peta-ops
Moore’s Law (processor speeds double every 18-24 months)
gives required power in next few decades.
IBM currently developing a 1.0 petaflop machine for studying protein
folding (Blue Gene)
Optimists suggest superintelligent machines in the first half of the 21st
century
If successful, this approach provides the required intelligent data stream
reduction without human understanding of the mechanism.
Will system require emotion or consciousness to motivate the learning
algorithms?
62