Download Introduction to Database Systems Introduction to Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to Database
Systems
Slides adapted from Loreto Bravo
Introduction to Database
Systems
Slides adapted from Loreto Bravo
Introduction to Database
Systems
Slides adapted from Loreto Bravo
Introduction to Database
Systems
Slides adapted from Loreto Bravo
What is a database?
! 
! 
A database is an organized collection of data for
one or more multiple uses.
Databases organizes the data in a database
according to a data model.
" 
" 
! 
! 
A database is an organized collection of data for
one or more multiple uses.
Databases organizes the data in a database
according to a data model.
" 
" 
! 
A data model is a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
Components:
! 
! 
! 
A database is an organized collection of data for
one or more multiple uses.
Databases organizes the data in a database
according to a data model.
" 
" 
structural part
manipulative part
integrity rules
What is a database?
! 
! 
A data model is a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
Components:
! 
! 
What is a database?
structural part
manipulative part
integrity rules
A data model is a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
Components:
! 
! 
! 
structural part
manipulative part
integrity rules
What is a database?
! 
! 
A database is an organized collection of data for
one or more multiple uses.
Databases organizes the data in a database
according to a data model.
" 
" 
A data model is a collection of conceptual tools for describing
data, data relationships, data semantics and data constraints.
Components:
! 
! 
! 
structural part
manipulative part
integrity rules
Data Models
! 
Types of Data Models:
" 
! 
Describe data at the conceptual and external levels
! 
! 
! 
Types of Data Models:
" 
Describe data at the conceptual and external levels
! 
Object-based Data Models
" 
Record-based Data Models
! 
relational, network, and hierarchical data model, etc.
Unifying model or Frame memory.
Unifying model or Frame memory.
Data Models
Entity-relationship model, Object-oriented model, Semantic data model,
Functional data model
Describe data at the internal level
relational, network, and hierarchical data model, etc.
Describe data at the internal level
! 
Object-based Data Models
" 
Entity-relationship model, Object-oriented model, Semantic data model,
Functional data model
Record-based Data Models
" 
" 
Unifying model or Frame memory.
" 
" 
! 
relational, network, and hierarchical data model, etc.
Types of Data Models:
Object-based Data Models
" 
Describe data at the internal level
! 
Describe data at the conceptual and external levels
! 
Entity-relationship model, Object-oriented model, Semantic data model,
Functional data model
Data Models
" 
" 
Record-based Data Models
" 
! 
Types of Data Models:
Object-based Data Models
" 
! 
! 
Describe data at the conceptual and external levels
! 
" 
Data Models
Record-based Data Models
" 
" 
Entity-relationship model, Object-oriented model, Semantic data model,
Functional data model
relational, network, and hierarchical data model, etc.
Describe data at the internal level
! 
Unifying model or Frame memory.
Historical Perspective -- Before 1960
! 
File systems
" 
! 
! 
! 
! 
! 
! 
! 
File systems
Problems:
! 
! 
! 
! 
! 
! 
! 
! 
File systems
" 
data redundancy
data is separated: they cannot be easily combined
high cost of propagation of updates
update anomalies and inconsistencies
no abstract data model
requires knowledge of storage details
no standard query language
need to enforce security policies in which different users have
permission to access different subsets of the data
Historical Perspective -- Before 1960
" 
! 
Problems:
! 
! 
Historical Perspective -- Before 1960
data redundancy
data is separated: they cannot be easily combined
high cost of propagation of updates
update anomalies and inconsistencies
no abstract data model
requires knowledge of storage details
no standard query language
need to enforce security policies in which different users have
permission to access different subsets of the data
Problems:
! 
! 
! 
! 
! 
! 
! 
! 
data redundancy
data is separated: they cannot be easily combined
high cost of propagation of updates
update anomalies and inconsistencies
no abstract data model
requires knowledge of storage details
no standard query language
need to enforce security policies in which different users have
permission to access different subsets of the data
Historical Perspective -- Before 1960
! 
File systems
" 
Problems:
! 
! 
! 
! 
! 
! 
! 
! 
data redundancy
data is separated: they cannot be easily combined
high cost of propagation of updates
update anomalies and inconsistencies
no abstract data model
requires knowledge of storage details
no standard query language
need to enforce security policies in which different users have
permission to access different subsets of the data
File Systems
File Systems
Cliente processing
Client Files
User
Cliente processing
Client Files
Loans processing
Loan Files
User
Loans processing
Loan Files
User
User
For each loan the information of the client is stored: Redundancy
File Systems
For each loan the information of the client is stored: Redundancy
File Systems
Cliente processing
Client Files
User
Cliente processing
Client Files
Loans processing
Loan Files
User
Loans processing
Loan Files
User
For each loan the information of the client is stored: Redundancy
User
For each loan the information of the client is stored: Redundancy
File Systems
! 
Enroll “Mary Johnson” in “CSE444”:
Write a C program to do the following:
File Systems
! 
Write a C program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
File Systems
! 
Enroll “Mary Johnson” in “CSE444”:
Write a C program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
Enroll “Mary Johnson” in “CSE444”:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
File Systems
! 
Enroll “Mary Johnson” in “CSE444”:
Write a C program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
File Systems
! 
File Systems
System crashes:
! 
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
" 
! 
! 
What is the problem ?
" 
! 
What is the problem ?
! 
Need locks
! 
System crashes:
" 
! 
! 
Need locks
System crashes:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
" 
! 
What is the problem ?
Simultaneous access by many users
" 
Need locks
CRASH !
What is the problem ?
Large data sets (say 50GB)
" 
What is the problem ?
File Systems
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
! 
What is the problem ?
Simultaneous access by many users
" 
File Systems
CRASH !
Large data sets (say 50GB)
" 
Simultaneous access by many users
" 
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
CRASH !
Large data sets (say 50GB)
" 
System crashes:
" 
! 
What is the problem ?
Large data sets (say 50GB)
What is the problem ?
Simultaneous access by many users
" 
Need locks
CRASH !
1960
! 
1960
Hierarchical Databases
" 
! 
! 
" 
" 
IMS formed the basis for hierarchical data model
Still Available!! http://www-01.ibm.com/software/data/ims/
! 
! 
! 
! 
" 
" 
! 
! 
SABRE is used today to populate Web-based travel services such as
Travelocity
Based on a tree structure
Problems:
! 
! 
! 
! 
Changes in data structure require changes in application programs that access
that structure
No Many-to-Many relationships
Programmers must be thoroughly familiar with the database structure.
Hierarchical Databases
" 
Changes in data structure require changes in application programs that access
that structure
No Many-to-Many relationships
Programmers must be thoroughly familiar with the database structure.
Developed by North American Rockwell and IBM as the IMS
(Information Management System)
! 
! 
" 
SABRE is used today to populate Web-based travel services such as
Travelocity
Based on a tree structure
Problems:
! 
" 
IMS formed the basis for hierarchical data model
Still Available!! http://www-01.ibm.com/software/data/ims/
American Airlines and IBM jointly developed SABRE for making
airline reservations
! 
American Airlines and IBM jointly developed SABRE for making
airline reservations
! 
" 
IMS formed the basis for hierarchical data model
Still Available!! http://www-01.ibm.com/software/data/ims/
1960
Developed by North American Rockwell and IBM as the IMS
(Information Management System)
! 
" 
" 
Changes in data structure require changes in application programs that access
that structure
No Many-to-Many relationships
Programmers must be thoroughly familiar with the database structure.
Hierarchical Databases
" 
! 
SABRE is used today to populate Web-based travel services such as
Travelocity
1960
Developed by North American Rockwell and IBM as the IMS
(Information Management System)
! 
Based on a tree structure
Problems:
! 
Hierarchical Databases
" 
American Airlines and IBM jointly developed SABRE for making
airline reservations
! 
" 
! 
Developed by North American Rockwell and IBM as the IMS
(Information Management System)
American Airlines and IBM jointly developed SABRE for making
airline reservations
! 
" 
" 
IMS formed the basis for hierarchical data model
Still Available!! http://www-01.ibm.com/software/data/ims/
SABRE is used today to populate Web-based travel services such as
Travelocity
Based on a tree structure
Problems:
! 
! 
! 
Changes in data structure require changes in application programs that access
that structure
No Many-to-Many relationships
Programmers must be thoroughly familiar with the database structure.
1960
! 
1960
Network Databases
" 
Integrated data store, first general-purpose DBMS
designed by Charles Bachman at GE
! 
! 
" 
" 
" 
Formed basis for network data model
Bachman received Turing Award in 1973 for his work in
database area
! 
" 
" 
Extension of the hierarchical data model
! 
" 
" 
" 
! 
Based on acyclic digraph
Standardized (1971) by the CODASYL group (Conference
on Data Systems Languages)
Advantage: Many-to-Many relationships are
implemented
Problems: “Navigation” is even harder
Network Databases
" 
Formed basis for network data model
Bachman received Turing Award in 1973 for his work in
database area
Extension of the hierarchical data model
! 
" 
" 
Formed basis for network data model
Bachman received Turing Award in 1973 for his work in
database area
1960
Integrated data store, first general-purpose DBMS
designed by Charles Bachman at GE
! 
" 
! 
Standardized (1971) by the CODASYL group (Conference
on Data Systems Languages)
Advantage: Many-to-Many relationships are
implemented
Problems: “Navigation” is even harder
Network Databases
" 
Integrated data store, first general-purpose DBMS
designed by Charles Bachman at GE
! 
Based on acyclic digraph
1960
! 
Network Databases
" 
Extension of the hierarchical data model
! 
" 
! 
! 
! 
" 
Based on acyclic digraph
Standardized (1971) by the CODASYL group (Conference
on Data Systems Languages)
Advantage: Many-to-Many relationships are
implemented
Problems: “Navigation” is even harder
Integrated data store, first general-purpose DBMS
designed by Charles Bachman at GE
Extension of the hierarchical data model
! 
" 
" 
" 
Formed basis for network data model
Bachman received Turing Award in 1973 for his work in
database area
Based on acyclic digraph
Standardized (1971) by the CODASYL group (Conference
on Data Systems Languages)
Advantage: Many-to-Many relationships are
implemented
Problems: “Navigation” is even harder
Problems with first DBMS’
! 
! 
! 
! 
! 
! 
Access to database was through low level pointer operations
Storage details depended on the type of data to be stored
Adding a field to the DB required rewriting the underlying
access/modification scheme
Emphasis on records to be processed, not overall structure
User had to know physical structure of the DB in order to
query for information
Overall first DBMS’ were very complex and inflexible which
made life difficult when it came to adding new applications or
reorganizing the data
Problems with first DBMS’
! 
! 
! 
! 
! 
! 
Access to database was through low level pointer operations
Storage details depended on the type of data to be stored
Adding a field to the DB required rewriting the underlying
access/modification scheme
Emphasis on records to be processed, not overall structure
User had to know physical structure of the DB in order to
query for information
Overall first DBMS’ were very complex and inflexible which
made life difficult when it came to adding new applications or
reorganizing the data
Problems with first DBMS’
! 
! 
! 
! 
! 
! 
Access to database was through low level pointer operations
Storage details depended on the type of data to be stored
Adding a field to the DB required rewriting the underlying
access/modification scheme
Emphasis on records to be processed, not overall structure
User had to know physical structure of the DB in order to
query for information
Overall first DBMS’ were very complex and inflexible which
made life difficult when it came to adding new applications or
reorganizing the data
Problems with first DBMS’
! 
! 
! 
! 
! 
! 
Access to database was through low level pointer operations
Storage details depended on the type of data to be stored
Adding a field to the DB required rewriting the underlying
access/modification scheme
Emphasis on records to be processed, not overall structure
User had to know physical structure of the DB in order to
query for information
Overall first DBMS’ were very complex and inflexible which
made life difficult when it came to adding new applications or
reorganizing the data
1970
! 
Relational Databases
" 
" 
! 
" 
! 
" 
! 
! 
" 
Data independence from hardware and storage implementation
High level, nonprocedural language for accessing data. Instead of
processing one record at a time, a programmer could use the language
to specify single operations that would be performed across the entire
data.
Codd won 1981 Turing Award.
1970
! 
Relational Databases
" 
" 
Edgar Codd, at IBM, proposed relational data model.
Codd's paper “A Relational Model of Data for Large Shared Data
Banks.”
! 
" 
Data independence from hardware and storage implementation
High level, nonprocedural language for accessing data. Instead of
processing one record at a time, a programmer could use the language
to specify single operations that would be performed across the entire
data.
Codd won 1981 Turing Award.
“It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine
representation purposes. Accordingly, it provides a basis for a high
level data language which will yield maximal independence between
programs on the one hand and machine representation on the
other.”(Codd 1970)
In other words the Relational Model consisted of:
! 
“It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine
representation purposes. Accordingly, it provides a basis for a high
level data language which will yield maximal independence between
programs on the one hand and machine representation on the
other.”(Codd 1970)
In other words the Relational Model consisted of:
! 
" 
" 
Codd won 1981 Turing Award.
Edgar Codd, at IBM, proposed relational data model.
Codd's paper “A Relational Model of Data for Large Shared Data
Banks.”
Edgar Codd, at IBM, proposed relational data model.
Codd's paper “A Relational Model of Data for Large Shared Data
Banks.”
! 
Data independence from hardware and storage implementation
High level, nonprocedural language for accessing data. Instead of
processing one record at a time, a programmer could use the language
to specify single operations that would be performed across the entire
data.
Relational Databases
" 
" 
“It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine
representation purposes. Accordingly, it provides a basis for a high
level data language which will yield maximal independence between
programs on the one hand and machine representation on the
other.”(Codd 1970)
1970
! 
Relational Databases
" 
In other words the Relational Model consisted of:
! 
" 
! 
Edgar Codd, at IBM, proposed relational data model.
Codd's paper “A Relational Model of Data for Large Shared Data
Banks.”
! 
" 
1970
In other words the Relational Model consisted of:
! 
! 
" 
“It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine
representation purposes. Accordingly, it provides a basis for a high
level data language which will yield maximal independence between
programs on the one hand and machine representation on the
other.”(Codd 1970)
Data independence from hardware and storage implementation
High level, nonprocedural language for accessing data. Instead of
processing one record at a time, a programmer could use the language
to specify single operations that would be performed across the entire
data.
Codd won 1981 Turing Award.
Codd vs. IBM
! 
Codd’s model had an immediate impact on research, however, to
become a legitimacy within the field, it had to survive at least two
battles:
" 
" 
! 
" 
" 
" 
! 
One in the technical community at large
One within IBM
Within IBM
" 
Codd vs. IBM
" 
" 
! 
Conflict with existing product IMS which had been heavily invested into
New technology had to prove itself before replacing existing revenue
producing product
Codd published his paper in open literature because no one at IBM (himself
included) recognized its eventual impact
Outside technical community showed that the idea had great potential
Codd’s model had an immediate impact on research, however, to
become a legitimacy within the field, it had to survive at least two
battles:
" 
" 
! 
" 
" 
" 
" 
" 
" 
Conflict with existing product IMS which had been heavily invested into
New technology had to prove itself before replacing existing revenue
producing product
Codd published his paper in open literature because no one at IBM (himself
included) recognized its eventual impact
Outside technical community showed that the idea had great potential
Conflict with existing product IMS which had been heavily invested into
New technology had to prove itself before replacing existing revenue
producing product
Codd published his paper in open literature because no one at IBM (himself
included) recognized its eventual impact
Outside technical community showed that the idea had great potential
Codd vs. IBM
! 
One in the technical community at large
One within IBM
Within IBM
" 
One in the technical community at large
One within IBM
Within IBM
" 
Codd vs. IBM
! 
Codd’s model had an immediate impact on research, however, to
become a legitimacy within the field, it had to survive at least two
battles:
Codd’s model had an immediate impact on research, however, to
become a legitimacy within the field, it had to survive at least two
battles:
" 
" 
! 
One in the technical community at large
One within IBM
Within IBM
" 
" 
" 
" 
Conflict with existing product IMS which had been heavily invested into
New technology had to prove itself before replacing existing revenue
producing product
Codd published his paper in open literature because no one at IBM (himself
included) recognized its eventual impact
Outside technical community showed that the idea had great potential
Codd vs. IBM
! 
Within IBM
" 
" 
" 
! 
! 
Ingres from UC-Berkeley
Within IBM
" 
" 
" 
! 
! 
" 
" 
Finally, Two main relational prototypes emerge in the 70’s
! 
Ingres from UC-Berkeley
" 
System R from IBM
Codd vs. IBM
! 
Within IBM
" 
" 
" 
IBM declared IMS its sole strategic product, setting up Codd and his ideas
as counter to company goals
Codd speaks out in spite of IBM’s dissatisfaction and promotes relational
model to computer scientists. He arranges a public debate between himself
and Charles Bachmann, who at the time was a key proponent of the
CODASYL standard.
Debate produced further criticism from IBM for undermining its goals, but
also proved his relational model as a cornerstone to the technical
community.
! 
Finally, Two main relational prototypes emerge in the 70’s
! 
Ingres from UC-Berkeley
System R from IBM
Ingres from UC-Berkeley
IBM declared IMS its sole strategic product, setting up Codd and his ideas
as counter to company goals
Codd speaks out in spite of IBM’s dissatisfaction and promotes relational
model to computer scientists. He arranges a public debate between himself
and Charles Bachmann, who at the time was a key proponent of the
CODASYL standard.
Debate produced further criticism from IBM for undermining its goals, but
also proved his relational model as a cornerstone to the technical
community.
! 
IBM declared IMS its sole strategic product, setting up Codd and his ideas
as counter to company goals
Codd speaks out in spite of IBM’s dissatisfaction and promotes relational
model to computer scientists. He arranges a public debate between himself
and Charles Bachmann, who at the time was a key proponent of the
CODASYL standard.
Debate produced further criticism from IBM for undermining its goals, but
also proved his relational model as a cornerstone to the technical
community.
Finally, Two main relational prototypes emerge in the 70’s
" 
Within IBM
" 
System R from IBM
Codd vs. IBM
! 
! 
IBM declared IMS its sole strategic product, setting up Codd and his ideas
as counter to company goals
Codd speaks out in spite of IBM’s dissatisfaction and promotes relational
model to computer scientists. He arranges a public debate between himself
and Charles Bachmann, who at the time was a key proponent of the
CODASYL standard.
Debate produced further criticism from IBM for undermining its goals, but
also proved his relational model as a cornerstone to the technical
community.
Finally, Two main relational prototypes emerge in the 70’s
" 
Codd vs. IBM
" 
System R from IBM
System R
! 
! 
! 
! 
Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity
and robustness.
Led to a query language called SEQUEL(Structured English Query
Language) later renamed to Structured Query Language(SQL) for
legal reasons. Now a standard for database access.
Project finished with the conclusion that relational databases were a
feasible commercial product
Eventually evolved into SQL/DS which later became DB2
System R
! 
! 
! 
! 
Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity
and robustness.
Led to a query language called SEQUEL(Structured English Query
Language) later renamed to Structured Query Language(SQL) for
legal reasons. Now a standard for database access.
Project finished with the conclusion that relational databases were a
feasible commercial product
Eventually evolved into SQL/DS which later became DB2
System R
! 
! 
! 
! 
Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity
and robustness.
Led to a query language called SEQUEL(Structured English Query
Language) later renamed to Structured Query Language(SQL) for
legal reasons. Now a standard for database access.
Project finished with the conclusion that relational databases were a
feasible commercial product
Eventually evolved into SQL/DS which later became DB2
System R
! 
! 
! 
! 
Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity
and robustness.
Led to a query language called SEQUEL(Structured English Query
Language) later renamed to Structured Query Language(SQL) for
legal reasons. Now a standard for database access.
Project finished with the conclusion that relational databases were a
feasible commercial product
Eventually evolved into SQL/DS which later became DB2
Ingres
! 
! 
! 
! 
Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases
Used QUEL as its query language
Similar to System R, but based on different hardware and operating
system
Developers eventually branched off to form Ingres Corp, Sybase, MS
SQL Server, Britton-Lee.
Ingres
! 
! 
! 
! 
Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases
Used QUEL as its query language
Similar to System R, but based on different hardware and operating
system
Developers eventually branched off to form Ingres Corp, Sybase, MS
SQL Server, Britton-Lee.
System R and Ingres inspire the development of virtually all
commercial relational databases, including those from
Sybase, Informix, Tandem, and even Microsoft’s SQL Server
System R and Ingres inspire the development of virtually all
commercial relational databases, including those from
Sybase, Informix, Tandem, and even Microsoft’s SQL Server
Ingres
Ingres
! 
! 
! 
! 
Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases
Used QUEL as its query language
Similar to System R, but based on different hardware and operating
system
Developers eventually branched off to form Ingres Corp, Sybase, MS
SQL Server, Britton-Lee.
System R and Ingres inspire the development of virtually all
commercial relational databases, including those from
Sybase, Informix, Tandem, and even Microsoft’s SQL Server
! 
! 
! 
! 
Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases
Used QUEL as its query language
Similar to System R, but based on different hardware and operating
system
Developers eventually branched off to form Ingres Corp, Sybase, MS
SQL Server, Britton-Lee.
System R and Ingres inspire the development of virtually all
commercial relational databases, including those from
Sybase, Informix, Tandem, and even Microsoft’s SQL Server
Where’s Oracle!?
! 
! 
! 
! 
Larry Ellison learned of IBM’s work and founded Relational Software
Inc. in 1977 in California
Their first product was a relational database based off of IBM’s System
R model and SQL technology
Released in 1979, it was the first commercial RDBMS, beating IBM to
the market by 2 years.
In the 1980’s the company was renamed to Oracle Corporation and
throughout the 80’s new features were added and performance
improved as the price of hardware came down and Oracle became the
largest independent RDBMS vendor.
Where’s Oracle!?
! 
! 
! 
! 
Larry Ellison learned of IBM’s work and founded Relational Software
Inc. in 1977 in California
Their first product was a relational database based off of IBM’s System
R model and SQL technology
Released in 1979, it was the first commercial RDBMS, beating IBM to
the market by 2 years.
In the 1980’s the company was renamed to Oracle Corporation and
throughout the 80’s new features were added and performance
improved as the price of hardware came down and Oracle became the
largest independent RDBMS vendor.
Where’s Oracle!?
! 
! 
! 
! 
Larry Ellison learned of IBM’s work and founded Relational Software
Inc. in 1977 in California
Their first product was a relational database based off of IBM’s System
R model and SQL technology
Released in 1979, it was the first commercial RDBMS, beating IBM to
the market by 2 years.
In the 1980’s the company was renamed to Oracle Corporation and
throughout the 80’s new features were added and performance
improved as the price of hardware came down and Oracle became the
largest independent RDBMS vendor.
Where’s Oracle!?
! 
! 
! 
! 
Larry Ellison learned of IBM’s work and founded Relational Software
Inc. in 1977 in California
Their first product was a relational database based off of IBM’s System
R model and SQL technology
Released in 1979, it was the first commercial RDBMS, beating IBM to
the market by 2 years.
In the 1980’s the company was renamed to Oracle Corporation and
throughout the 80’s new features were added and performance
improved as the price of hardware came down and Oracle became the
largest independent RDBMS vendor.
1975
! 
ANSI-SPARC Three-Level
Architecture
" 
" 
Views describe how users
see the data.
Conceptual schema defines
logical structure
! 
" 
Describes what data is
stored and relationships
among the data.
1975
View 1
View 2
View 3
Conceptual Schema
Views describe how users
see the data.
Conceptual schema defines
logical structure
! 
" 
Describes what data is
stored and relationships
among the data.
Physical schema describes
the files and indexes used.
! 
Describes how the data is
stored in the database
Views describe how users
see the data.
Conceptual schema defines
logical structure
! 
" 
Describes how the data is
stored in the database
ANSI-SPARC Three-Level
Architecture
" 
" 
Physical Schema
Describes what data is
stored and relationships
among the data.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
Physical schema describes
the files and indexes used.
! 
1975
" 
ANSI-SPARC Three-Level
Architecture
" 
Physical schema describes
the files and indexes used.
! 
! 
! 
Describes how the data is
stored in the database
1975
View 1
View 2
View 3
! 
ANSI-SPARC Three-Level
Architecture
" 
Conceptual Schema
" 
Physical Schema
Views describe how users
see the data.
Conceptual schema defines
logical structure
! 
" 
Describes what data is
stored and relationships
among the data.
Physical schema describes
the files and indexes used.
! 
Describes how the data is
stored in the database
View 1
View 2
View 3
Conceptual Schema
Physical Schema
1976
! 
Entity-Relationship(ER) Models
" 
" 
Entity-Relationship(ER) Models
" 
" 
! 
Proposed by Peter Chen for database design giving an important
insight into conceptual data models
Allows the designer to concentrate on the use of data instead of the
logical table structure
1976
! 
1976
Proposed by Peter Chen for database design giving an important
insight into conceptual data models
Allows the designer to concentrate on the use of data instead of the
logical table structure
Entity-Relationship(ER) Models
" 
" 
Proposed by Peter Chen for database design giving an important
insight into conceptual data models
Allows the designer to concentrate on the use of data instead of the
logical table structure
1976
! 
Entity-Relationship(ER) Models
" 
" 
Proposed by Peter Chen for database design giving an important
insight into conceptual data models
Allows the designer to concentrate on the use of data instead of the
logical table structure
1980's
! 
! 
! 
Birth of IBM PC. RDBMS market begins to boom.
SQL becomes standardized through ANSI (American National
Standards Institute) and ISO (International Organization for
Standardization)
By Mid 80’s it had become apparent that there were some fields
(medicine, multimedia, physics) where relational databases were not
practical, due to the types of data involved.
" 
! 
! 
! 
This led to research in Object Oriented Databases in which users
could define their own methods of access to data and how to
represent and manipulate it. This coincided with the introduction of
Object Oriented Programming languages such as C++ which started
to appear
Birth of IBM PC. RDBMS market begins to boom.
SQL becomes standardized through ANSI (American National
Standards Institute) and ISO (International Organization for
Standardization)
By Mid 80’s it had become apparent that there were some fields
(medicine, multimedia, physics) where relational databases were not
practical, due to the types of data involved.
" 
! 
! 
! 
! 
More flexibility was needed in how their data was represented and
accessed.
1980's
! 
1980's
" 
! 
More flexibility was needed in how their data was represented and
accessed.
This led to research in Object Oriented Databases in which users
could define their own methods of access to data and how to
represent and manipulate it. This coincided with the introduction of
Object Oriented Programming languages such as C++ which started
to appear
1980's
! 
! 
! 
More flexibility was needed in how their data was represented and
accessed.
This led to research in Object Oriented Databases in which users
could define their own methods of access to data and how to
represent and manipulate it. This coincided with the introduction of
Object Oriented Programming languages such as C++ which started
to appear
Birth of IBM PC. RDBMS market begins to boom.
SQL becomes standardized through ANSI (American National
Standards Institute) and ISO (International Organization for
Standardization)
By Mid 80’s it had become apparent that there were some fields
(medicine, multimedia, physics) where relational databases were not
practical, due to the types of data involved.
Birth of IBM PC. RDBMS market begins to boom.
SQL becomes standardized through ANSI (American National
Standards Institute) and ISO (International Organization for
Standardization)
By Mid 80’s it had become apparent that there were some fields
(medicine, multimedia, physics) where relational databases were not
practical, due to the types of data involved.
" 
! 
More flexibility was needed in how their data was represented and
accessed.
This led to research in Object Oriented Databases in which users
could define their own methods of access to data and how to
represent and manipulate it. This coincided with the introduction of
Object Oriented Programming languages such as C++ which started
to appear
1990’s
! 
! 
! 
! 
! 
Considerable research into more powerful query language and
richer data model, with emphasis on supporting complex
analysis of data from all parts of an enterprise
First OODBMS’ start to appear from companies like Objectivity.
Object Relational DBMS’ hybrids also begin to appear.
Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS,
extended their systems with the ability to store new data types
such as images and text, and to ask more complex queries
New application areas: Data warehousing and OLAP(Online
Analytical Processing, a category of software tools that
provides analysis of data stored in a database), internet,
multimedia, etc
Development of personal/small business productivity tools such
as Excel and Access from Microsoft.
1990’s
! 
! 
! 
! 
! 
Considerable research into more powerful query language and
richer data model, with emphasis on supporting complex
analysis of data from all parts of an enterprise
First OODBMS’ start to appear from companies like Objectivity.
Object Relational DBMS’ hybrids also begin to appear.
Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS,
extended their systems with the ability to store new data types
such as images and text, and to ask more complex queries
New application areas: Data warehousing and OLAP(Online
Analytical Processing, a category of software tools that
provides analysis of data stored in a database), internet,
multimedia, etc
Development of personal/small business productivity tools such
as Excel and Access from Microsoft.
1990’s
! 
! 
! 
! 
! 
Considerable research into more powerful query language and
richer data model, with emphasis on supporting complex
analysis of data from all parts of an enterprise
First OODBMS’ start to appear from companies like Objectivity.
Object Relational DBMS’ hybrids also begin to appear.
Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS,
extended their systems with the ability to store new data types
such as images and text, and to ask more complex queries
New application areas: Data warehousing and OLAP(Online
Analytical Processing, a category of software tools that
provides analysis of data stored in a database), internet,
multimedia, etc
Development of personal/small business productivity tools such
as Excel and Access from Microsoft.
1990’s
! 
! 
! 
! 
! 
Considerable research into more powerful query language and
richer data model, with emphasis on supporting complex
analysis of data from all parts of an enterprise
First OODBMS’ start to appear from companies like Objectivity.
Object Relational DBMS’ hybrids also begin to appear.
Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS,
extended their systems with the ability to store new data types
such as images and text, and to ask more complex queries
New application areas: Data warehousing and OLAP(Online
Analytical Processing, a category of software tools that
provides analysis of data stored in a database), internet,
multimedia, etc
Development of personal/small business productivity tools such
as Excel and Access from Microsoft.
Late 90’s-2000’s
! 
XML
" 
! 
! 
! 
Data Junction, ADO, Delphi
Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix
Late 90’s-2000’s
! 
XML
" 
! 
! 
! 
! 
! 
Active Server Pages, Front page, Java Servlets, JDBC, Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, etc
Open source projects come online with widespread use of gcc,cgi,
Apache, MySQL
Three main companies dominate in the large DB market: IBM,
Microsoft, and Oracle
Late 90’s-2000’s
XML
" 
Data Junction, ADO, Delphi
Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix
Starts incorporation (as middleware or enabled DBMS) in 1997
! 
! 
" 
TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep
! 
! 
TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep
Large investment in internet companies fuels tools-market boom for
Web/Internet/DB connectors:
" 
! 
Data Junction, ADO, Delphi
Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix
Native XML DBMS, 2000
! 
Active Server Pages, Front page, Java Servlets, JDBC, Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, etc
Open source projects come online with widespread use of gcc,cgi,
Apache, MySQL
Three main companies dominate in the large DB market: IBM,
Microsoft, and Oracle
TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep
Large investment in internet companies fuels tools-market boom for
Web/Internet/DB connectors:
" 
! 
Large investment in internet companies fuels tools-market boom for
Web/Internet/DB connectors:
" 
! 
Data Junction, ADO, Delphi
Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix
Native XML DBMS, 2000
! 
Native XML DBMS, 2000
! 
! 
" 
Starts incorporation (as middleware or enabled DBMS) in 1997
! 
" 
! 
Active Server Pages, Front page, Java Servlets, JDBC, Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, etc
Open source projects come online with widespread use of gcc,cgi,
Apache, MySQL
Three main companies dominate in the large DB market: IBM,
Microsoft, and Oracle
Starts incorporation (as middleware or enabled DBMS) in 1997
! 
TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep
Large investment in internet companies fuels tools-market boom for
Web/Internet/DB connectors:
" 
XML
" 
Native XML DBMS, 2000
! 
! 
! 
Starts incorporation (as middleware or enabled DBMS) in 1997
! 
" 
Late 90’s-2000’s
Active Server Pages, Front page, Java Servlets, JDBC, Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, etc
Open source projects come online with widespread use of gcc,cgi,
Apache, MySQL
Three main companies dominate in the large DB market: IBM,
Microsoft, and Oracle
2010’s….
! 
Big Data:
" 
" 
" 
" 
! 
! 
Big Data:
" 
" 
" 
! 
" 
" 
! 
For example:
! 
! 
INSERT only, not UPDATES/DELETES
No JOINs, thereby reducing query time
"  This involves de-normalizing data
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
Facebook has 36 PB of user data + 80-90 TB/day (6/2010)
New ways for efficient query answering are needed:
" 
For example:
! 
! 
INSERT only, not UPDATES/DELETES
No JOINs, thereby reducing query time
"  This involves de-normalizing data
2010’s….
! 
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
Facebook has 36 PB of user data + 80-90 TB/day (6/2010)
New ways for efficient query answering are needed:
" 
" 
INSERT only, not UPDATES/DELETES
No JOINs, thereby reducing query time
"  This involves de-normalizing data
2010’s….
" 
Big Data:
" 
For example:
! 
! 
! 
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
Facebook has 36 PB of user data + 80-90 TB/day (6/2010)
New ways for efficient query answering are needed:
" 
2010’s….
Big Data:
" 
" 
" 
" 
! 
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month (3/2009)
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
Facebook has 36 PB of user data + 80-90 TB/day (6/2010)
New ways for efficient query answering are needed:
" 
For example:
! 
! 
INSERT only, not UPDATES/DELETES
No JOINs, thereby reducing query time
"  This involves de-normalizing data
Entender los datos: medidas….
Nombre
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zettabyte
Standard SI
10 e 3
10 e 6
10 e 9
10 e 12
10 e 15
10 e 18
10 e 21
Uso Binario
2 e 10
2 e 20
2 e 30
2 e 40
2 e 50
2 e 60
2 e 70
Entender los datos: medidas….
Nombre
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zettabyte
Standard SI
10 e 3
10 e 6
10 e 9
10 e 12
10 e 15
10 e 18
10 e 21
Uso Binario
2 e 10
2 e 20
2 e 30
2 e 40
2 e 50
2 e 60
2 e 70
Entender los datos: medidas….
Nombre
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zettabyte
Standard SI
10 e 3
10 e 6
10 e 9
10 e 12
10 e 15
10 e 18
10 e 21
Uso Binario
2 e 10
2 e 20
2 e 30
2 e 40
2 e 50
2 e 60
2 e 70
Entender los datos: medidas….
Nombre
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zettabyte
Standard SI
10 e 3
10 e 6
10 e 9
10 e 12
10 e 15
10 e 18
10 e 21
Uso Binario
2 e 10
2 e 20
2 e 30
2 e 40
2 e 50
2 e 60
2 e 70
Human Scale
Human Scale
KILO 10^3 (2^10)
KILO 10^3 (2^10)
Cellular memory
Text (email, document)
MEGA 10^6 (2^20)
Book, Picture
GIGA 10^9
(2^30)
Cellular memory
Text (email, document)
MEGA 10^6 (2^20)
Book, Picture
GIGA 10^9
(2^30)
RAM, Good video
RAM, Good video
(This is our world)
(This is our world)
Human Scale
Human Scale
KILO 10^3 (2^10)
KILO 10^3 (2^10)
Cellular memory
Text (email, document)
MEGA 10^6 (2^20)
Book, Picture
GIGA 10^9
(2^30)
Cellular memory
Text (email, document)
MEGA 10^6 (2^20)
Book, Picture
GIGA 10^9
(2^30)
RAM, Good video
RAM, Good video
(This is our world)
(This is our world)
More
More
TERA 10^12 2^{40}
TERA 10^12 2^{40}
-- Congress
library (USA): 160 TB
-- Daily internet traffic (100 TB)
-- Wikipedia: 6 Terabyte dump (2010)
--3-D movie Monsters Vs Aliens (necesitó 100 TB
disco)
ill it is usua
an scale, but st
It is not a hum
l for any norm
-- Congress
library (USA): 160 TB
-- Daily internet traffic (100 TB)
-- Wikipedia: 6 Terabyte dump (2010)
--3-D movie Monsters Vs Aliens (necesitó 100 TB
disco)
al company
ill it is usua
an scale, but st
It is not a hum
More
al company
l for any norm
More
TERA 10^12 2^{40}
TERA 10^12 2^{40}
-- Congress
library (USA): 160 TB
-- Daily internet traffic (100 TB)
-- Wikipedia: 6 Terabyte dump (2010)
--3-D movie Monsters Vs Aliens (necesitó 100 TB
disco)
It is not a hum
ill it is usua
an scale, but st
l for any norm
-- Congress
library (USA): 160 TB
-- Daily internet traffic (100 TB)
-- Wikipedia: 6 Terabyte dump (2010)
--3-D movie Monsters Vs Aliens (necesitó 100 TB
disco)
al company
It is not a hum
ill it is usua
an scale, but st
al company
l for any norm
Even More…
Even More…
PETA 10^15 2^50
PETA 10^15 2^50
" 
" 
" 
" 
" 
" 
World of Warcraft uses 1.3 PB to keep its game
Internet Archive (3 PB) (it increases a 100 TB per month)
Google procesdes 24 petabytes per day
1/2 PB:to films the life of a person (100 years in high definition).
Facebook has 60 thousend millions of images, that is, 1,5PB.
AT&T transfers around 19 petabytes per day.
" 
" 
" 
" 
" 
" 
World of Warcraft uses 1.3 PB to keep its game
Internet Archive (3 PB) (it increases a 100 TB per month)
Google procesdes 24 petabytes per day
1/2 PB:to films the life of a person (100 years in high definition).
Facebook has 60 thousend millions of images, that is, 1,5PB.
AT&T transfers around 19 petabytes per day.
Even More…
Even More…
PETA 10^15 2^50
PETA 10^15 2^50
" 
" 
" 
" 
" 
" 
World of Warcraft uses 1.3 PB to keep its game
Internet Archive (3 PB) (it increases a 100 TB per month)
Google procesdes 24 petabytes per day
1/2 PB:to films the life of a person (100 years in high definition).
Facebook has 60 thousend millions of images, that is, 1,5PB.
AT&T transfers around 19 petabytes per day.
" 
" 
" 
" 
" 
" 
World of Warcraft uses 1.3 PB to keep its game
Internet Archive (3 PB) (it increases a 100 TB per month)
Google procesdes 24 petabytes per day
1/2 PB:to films the life of a person (100 years in high definition).
Facebook has 60 thousend millions of images, that is, 1,5PB.
AT&T transfers around 19 petabytes per day.
2010’s
! 
NoSQL
" 
" 
" 
! 
" 
! 
! 
NoSQL
" 
" 
! 
" 
" 
! 
BigTable (Google)
Dynamo (Amazon)
! 
! 
! 
Gossip protocol (discovery and error detection)
Distributed key-value data store
Eventual consistency
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor do they use the
concept of joins
NoSQL movement started from:
" 
" 
BigTable (Google)
Dynamo (Amazon)
! 
! 
! 
Gossip protocol (discovery and error detection)
Distributed key-value data store
Eventual consistency
2010’s
! 
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor do they use the
concept of joins
NoSQL movement started from:
" 
" 
Gossip protocol (discovery and error detection)
Distributed key-value data store
Eventual consistency
2010’s
" 
NoSQL
" 
BigTable (Google)
Dynamo (Amazon)
! 
! 
! 
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor do they use the
concept of joins
NoSQL movement started from:
" 
2010’s
NoSQL
" 
" 
" 
! 
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor do they use the
concept of joins
NoSQL movement started from:
" 
" 
BigTable (Google)
Dynamo (Amazon)
! 
! 
! 
Gossip protocol (discovery and error detection)
Distributed key-value data store
Eventual consistency
NoSQL solutions
! 
NoSQL solutions fall into two major areas:
" 
" 
! 
" 
" 
" 
" 
" 
NoSQL solutions fall into two major areas:
" 
" 
! 
" 
" 
" 
" 
" 
" 
! 
joins
group by
order by
ACID transactions
SQL as a sometimes frustrating but still powerful query
language
easy integration with other applications that support SQL
Key/Value or ‘the big hash table’.
Schema-less which comes in multiple flavors, columnbased, document-based or graph-based.
In NoSQL solutions we are giving up:
" 
" 
" 
" 
" 
" 
joins
group by
order by
ACID transactions
SQL as a sometimes frustrating but still powerful query
language
easy integration with other applications that support SQL
NoSQL solutions
! 
Key/Value or ‘the big hash table’.
Schema-less which comes in multiple flavors, columnbased, document-based or graph-based.
In NoSQL solutions we are giving up:
" 
NoSQL solutions fall into two major areas:
" 
joins
group by
order by
ACID transactions
SQL as a sometimes frustrating but still powerful query
language
easy integration with other applications that support SQL
NoSQL solutions
! 
! 
Key/Value or ‘the big hash table’.
Schema-less which comes in multiple flavors, columnbased, document-based or graph-based.
In NoSQL solutions we are giving up:
" 
NoSQL solutions
NoSQL solutions fall into two major areas:
" 
" 
! 
Key/Value or ‘the big hash table’.
Schema-less which comes in multiple flavors, columnbased, document-based or graph-based.
In NoSQL solutions we are giving up:
" 
" 
" 
" 
" 
" 
joins
group by
order by
ACID transactions
SQL as a sometimes frustrating but still powerful query
language
easy integration with other applications that support SQL
A lot has been left out!
1970's
2000's
A lot has been left out!
1970's
2000's
A lot has been left out!
1970's
2000's
A lot has been left out!
1970's
2000's
References
! 
! 
"The History of Databases" By Patrick RogersOstema
Database Management Systems, R. Ramakrishnan
and J. Gehrke (slides)
References
! 
! 
"The History of Databases" By Patrick RogersOstema
Database Management Systems, R. Ramakrishnan
and J. Gehrke (slides)
References
! 
! 
"The History of Databases" By Patrick RogersOstema
Database Management Systems, R. Ramakrishnan
and J. Gehrke (slides)
References
! 
! 
"The History of Databases" By Patrick RogersOstema
Database Management Systems, R. Ramakrishnan
and J. Gehrke (slides)