Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Database Systems Slides adapted from Loreto Bravo Introduction to Database Systems Slides adapted from Loreto Bravo Introduction to Database Systems Slides adapted from Loreto Bravo Introduction to Database Systems Slides adapted from Loreto Bravo What is a database? ! ! A database is an organized collection of data for one or more multiple uses. Databases organizes the data in a database according to a data model. " " ! ! A database is an organized collection of data for one or more multiple uses. Databases organizes the data in a database according to a data model. " " ! A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. Components: ! ! ! A database is an organized collection of data for one or more multiple uses. Databases organizes the data in a database according to a data model. " " structural part manipulative part integrity rules What is a database? ! ! A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. Components: ! ! What is a database? structural part manipulative part integrity rules A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. Components: ! ! ! structural part manipulative part integrity rules What is a database? ! ! A database is an organized collection of data for one or more multiple uses. Databases organizes the data in a database according to a data model. " " A data model is a collection of conceptual tools for describing data, data relationships, data semantics and data constraints. Components: ! ! ! structural part manipulative part integrity rules Data Models ! Types of Data Models: " ! Describe data at the conceptual and external levels ! ! ! Types of Data Models: " Describe data at the conceptual and external levels ! Object-based Data Models " Record-based Data Models ! relational, network, and hierarchical data model, etc. Unifying model or Frame memory. Unifying model or Frame memory. Data Models Entity-relationship model, Object-oriented model, Semantic data model, Functional data model Describe data at the internal level relational, network, and hierarchical data model, etc. Describe data at the internal level ! Object-based Data Models " Entity-relationship model, Object-oriented model, Semantic data model, Functional data model Record-based Data Models " " Unifying model or Frame memory. " " ! relational, network, and hierarchical data model, etc. Types of Data Models: Object-based Data Models " Describe data at the internal level ! Describe data at the conceptual and external levels ! Entity-relationship model, Object-oriented model, Semantic data model, Functional data model Data Models " " Record-based Data Models " ! Types of Data Models: Object-based Data Models " ! ! Describe data at the conceptual and external levels ! " Data Models Record-based Data Models " " Entity-relationship model, Object-oriented model, Semantic data model, Functional data model relational, network, and hierarchical data model, etc. Describe data at the internal level ! Unifying model or Frame memory. Historical Perspective -- Before 1960 ! File systems " ! ! ! ! ! ! ! File systems Problems: ! ! ! ! ! ! ! ! File systems " data redundancy data is separated: they cannot be easily combined high cost of propagation of updates update anomalies and inconsistencies no abstract data model requires knowledge of storage details no standard query language need to enforce security policies in which different users have permission to access different subsets of the data Historical Perspective -- Before 1960 " ! Problems: ! ! Historical Perspective -- Before 1960 data redundancy data is separated: they cannot be easily combined high cost of propagation of updates update anomalies and inconsistencies no abstract data model requires knowledge of storage details no standard query language need to enforce security policies in which different users have permission to access different subsets of the data Problems: ! ! ! ! ! ! ! ! data redundancy data is separated: they cannot be easily combined high cost of propagation of updates update anomalies and inconsistencies no abstract data model requires knowledge of storage details no standard query language need to enforce security policies in which different users have permission to access different subsets of the data Historical Perspective -- Before 1960 ! File systems " Problems: ! ! ! ! ! ! ! ! data redundancy data is separated: they cannot be easily combined high cost of propagation of updates update anomalies and inconsistencies no abstract data model requires knowledge of storage details no standard query language need to enforce security policies in which different users have permission to access different subsets of the data File Systems File Systems Cliente processing Client Files User Cliente processing Client Files Loans processing Loan Files User Loans processing Loan Files User User For each loan the information of the client is stored: Redundancy File Systems For each loan the information of the client is stored: Redundancy File Systems Cliente processing Client Files User Cliente processing Client Files Loans processing Loan Files User Loans processing Loan Files User For each loan the information of the client is stored: Redundancy User For each loan the information of the client is stored: Redundancy File Systems ! Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: File Systems ! Write a C program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” File Systems ! Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” Enroll “Mary Johnson” in “CSE444”: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” File Systems ! Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” File Systems ! File Systems System crashes: ! Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” " ! ! What is the problem ? " ! What is the problem ? ! Need locks ! System crashes: " ! ! Need locks System crashes: Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” " ! What is the problem ? Simultaneous access by many users " Need locks CRASH ! What is the problem ? Large data sets (say 50GB) " What is the problem ? File Systems Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” ! What is the problem ? Simultaneous access by many users " File Systems CRASH ! Large data sets (say 50GB) " Simultaneous access by many users " Read ‘students.txt’ Read ‘courses.txt’ Find&update the record “Mary Johnson” Find&update the record “CSE444” Write “students.txt” Write “courses.txt” CRASH ! Large data sets (say 50GB) " System crashes: " ! What is the problem ? Large data sets (say 50GB) What is the problem ? Simultaneous access by many users " Need locks CRASH ! 1960 ! 1960 Hierarchical Databases " ! ! " " IMS formed the basis for hierarchical data model Still Available!! http://www-01.ibm.com/software/data/ims/ ! ! ! ! " " ! ! SABRE is used today to populate Web-based travel services such as Travelocity Based on a tree structure Problems: ! ! ! ! Changes in data structure require changes in application programs that access that structure No Many-to-Many relationships Programmers must be thoroughly familiar with the database structure. Hierarchical Databases " Changes in data structure require changes in application programs that access that structure No Many-to-Many relationships Programmers must be thoroughly familiar with the database structure. Developed by North American Rockwell and IBM as the IMS (Information Management System) ! ! " SABRE is used today to populate Web-based travel services such as Travelocity Based on a tree structure Problems: ! " IMS formed the basis for hierarchical data model Still Available!! http://www-01.ibm.com/software/data/ims/ American Airlines and IBM jointly developed SABRE for making airline reservations ! American Airlines and IBM jointly developed SABRE for making airline reservations ! " IMS formed the basis for hierarchical data model Still Available!! http://www-01.ibm.com/software/data/ims/ 1960 Developed by North American Rockwell and IBM as the IMS (Information Management System) ! " " Changes in data structure require changes in application programs that access that structure No Many-to-Many relationships Programmers must be thoroughly familiar with the database structure. Hierarchical Databases " ! SABRE is used today to populate Web-based travel services such as Travelocity 1960 Developed by North American Rockwell and IBM as the IMS (Information Management System) ! Based on a tree structure Problems: ! Hierarchical Databases " American Airlines and IBM jointly developed SABRE for making airline reservations ! " ! Developed by North American Rockwell and IBM as the IMS (Information Management System) American Airlines and IBM jointly developed SABRE for making airline reservations ! " " IMS formed the basis for hierarchical data model Still Available!! http://www-01.ibm.com/software/data/ims/ SABRE is used today to populate Web-based travel services such as Travelocity Based on a tree structure Problems: ! ! ! Changes in data structure require changes in application programs that access that structure No Many-to-Many relationships Programmers must be thoroughly familiar with the database structure. 1960 ! 1960 Network Databases " Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE ! ! " " " Formed basis for network data model Bachman received Turing Award in 1973 for his work in database area ! " " Extension of the hierarchical data model ! " " " ! Based on acyclic digraph Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) Advantage: Many-to-Many relationships are implemented Problems: “Navigation” is even harder Network Databases " Formed basis for network data model Bachman received Turing Award in 1973 for his work in database area Extension of the hierarchical data model ! " " Formed basis for network data model Bachman received Turing Award in 1973 for his work in database area 1960 Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE ! " ! Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) Advantage: Many-to-Many relationships are implemented Problems: “Navigation” is even harder Network Databases " Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE ! Based on acyclic digraph 1960 ! Network Databases " Extension of the hierarchical data model ! " ! ! ! " Based on acyclic digraph Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) Advantage: Many-to-Many relationships are implemented Problems: “Navigation” is even harder Integrated data store, first general-purpose DBMS designed by Charles Bachman at GE Extension of the hierarchical data model ! " " " Formed basis for network data model Bachman received Turing Award in 1973 for his work in database area Based on acyclic digraph Standardized (1971) by the CODASYL group (Conference on Data Systems Languages) Advantage: Many-to-Many relationships are implemented Problems: “Navigation” is even harder Problems with first DBMS’ ! ! ! ! ! ! Access to database was through low level pointer operations Storage details depended on the type of data to be stored Adding a field to the DB required rewriting the underlying access/modification scheme Emphasis on records to be processed, not overall structure User had to know physical structure of the DB in order to query for information Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or reorganizing the data Problems with first DBMS’ ! ! ! ! ! ! Access to database was through low level pointer operations Storage details depended on the type of data to be stored Adding a field to the DB required rewriting the underlying access/modification scheme Emphasis on records to be processed, not overall structure User had to know physical structure of the DB in order to query for information Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or reorganizing the data Problems with first DBMS’ ! ! ! ! ! ! Access to database was through low level pointer operations Storage details depended on the type of data to be stored Adding a field to the DB required rewriting the underlying access/modification scheme Emphasis on records to be processed, not overall structure User had to know physical structure of the DB in order to query for information Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or reorganizing the data Problems with first DBMS’ ! ! ! ! ! ! Access to database was through low level pointer operations Storage details depended on the type of data to be stored Adding a field to the DB required rewriting the underlying access/modification scheme Emphasis on records to be processed, not overall structure User had to know physical structure of the DB in order to query for information Overall first DBMS’ were very complex and inflexible which made life difficult when it came to adding new applications or reorganizing the data 1970 ! Relational Databases " " ! " ! " ! ! " Data independence from hardware and storage implementation High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire data. Codd won 1981 Turing Award. 1970 ! Relational Databases " " Edgar Codd, at IBM, proposed relational data model. Codd's paper “A Relational Model of Data for Large Shared Data Banks.” ! " Data independence from hardware and storage implementation High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire data. Codd won 1981 Turing Award. “It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation on the other.”(Codd 1970) In other words the Relational Model consisted of: ! “It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation on the other.”(Codd 1970) In other words the Relational Model consisted of: ! " " Codd won 1981 Turing Award. Edgar Codd, at IBM, proposed relational data model. Codd's paper “A Relational Model of Data for Large Shared Data Banks.” Edgar Codd, at IBM, proposed relational data model. Codd's paper “A Relational Model of Data for Large Shared Data Banks.” ! Data independence from hardware and storage implementation High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire data. Relational Databases " " “It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation on the other.”(Codd 1970) 1970 ! Relational Databases " In other words the Relational Model consisted of: ! " ! Edgar Codd, at IBM, proposed relational data model. Codd's paper “A Relational Model of Data for Large Shared Data Banks.” ! " 1970 In other words the Relational Model consisted of: ! ! " “It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine representation purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation on the other.”(Codd 1970) Data independence from hardware and storage implementation High level, nonprocedural language for accessing data. Instead of processing one record at a time, a programmer could use the language to specify single operations that would be performed across the entire data. Codd won 1981 Turing Award. Codd vs. IBM ! Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two battles: " " ! " " " ! One in the technical community at large One within IBM Within IBM " Codd vs. IBM " " ! Conflict with existing product IMS which had been heavily invested into New technology had to prove itself before replacing existing revenue producing product Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact Outside technical community showed that the idea had great potential Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two battles: " " ! " " " " " " Conflict with existing product IMS which had been heavily invested into New technology had to prove itself before replacing existing revenue producing product Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact Outside technical community showed that the idea had great potential Conflict with existing product IMS which had been heavily invested into New technology had to prove itself before replacing existing revenue producing product Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact Outside technical community showed that the idea had great potential Codd vs. IBM ! One in the technical community at large One within IBM Within IBM " One in the technical community at large One within IBM Within IBM " Codd vs. IBM ! Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two battles: Codd’s model had an immediate impact on research, however, to become a legitimacy within the field, it had to survive at least two battles: " " ! One in the technical community at large One within IBM Within IBM " " " " Conflict with existing product IMS which had been heavily invested into New technology had to prove itself before replacing existing revenue producing product Codd published his paper in open literature because no one at IBM (himself included) recognized its eventual impact Outside technical community showed that the idea had great potential Codd vs. IBM ! Within IBM " " " ! ! Ingres from UC-Berkeley Within IBM " " " ! ! " " Finally, Two main relational prototypes emerge in the 70’s ! Ingres from UC-Berkeley " System R from IBM Codd vs. IBM ! Within IBM " " " IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical community. ! Finally, Two main relational prototypes emerge in the 70’s ! Ingres from UC-Berkeley System R from IBM Ingres from UC-Berkeley IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical community. ! IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical community. Finally, Two main relational prototypes emerge in the 70’s " Within IBM " System R from IBM Codd vs. IBM ! ! IBM declared IMS its sole strategic product, setting up Codd and his ideas as counter to company goals Codd speaks out in spite of IBM’s dissatisfaction and promotes relational model to computer scientists. He arranges a public debate between himself and Charles Bachmann, who at the time was a key proponent of the CODASYL standard. Debate produced further criticism from IBM for undermining its goals, but also proved his relational model as a cornerstone to the technical community. Finally, Two main relational prototypes emerge in the 70’s " Codd vs. IBM " System R from IBM System R ! ! ! ! Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity and robustness. Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. Project finished with the conclusion that relational databases were a feasible commercial product Eventually evolved into SQL/DS which later became DB2 System R ! ! ! ! Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity and robustness. Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. Project finished with the conclusion that relational databases were a feasible commercial product Eventually evolved into SQL/DS which later became DB2 System R ! ! ! ! Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity and robustness. Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. Project finished with the conclusion that relational databases were a feasible commercial product Eventually evolved into SQL/DS which later became DB2 System R ! ! ! ! Prototype intended to provide a high-level, nonnavigational, dataindependent interface to many users simultaneously, with high integrity and robustness. Led to a query language called SEQUEL(Structured English Query Language) later renamed to Structured Query Language(SQL) for legal reasons. Now a standard for database access. Project finished with the conclusion that relational databases were a feasible commercial product Eventually evolved into SQL/DS which later became DB2 Ingres ! ! ! ! Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases Used QUEL as its query language Similar to System R, but based on different hardware and operating system Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. Ingres ! ! ! ! Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases Used QUEL as its query language Similar to System R, but based on different hardware and operating system Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. System R and Ingres inspire the development of virtually all commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft’s SQL Server System R and Ingres inspire the development of virtually all commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft’s SQL Server Ingres Ingres ! ! ! ! Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases Used QUEL as its query language Similar to System R, but based on different hardware and operating system Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. System R and Ingres inspire the development of virtually all commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft’s SQL Server ! ! ! ! Two scientists, Michael Stonebraker and Eugene Wong at UCBerkeley) became interested in relational databases Used QUEL as its query language Similar to System R, but based on different hardware and operating system Developers eventually branched off to form Ingres Corp, Sybase, MS SQL Server, Britton-Lee. System R and Ingres inspire the development of virtually all commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft’s SQL Server Where’s Oracle!? ! ! ! ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Their first product was a relational database based off of IBM’s System R model and SQL technology Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. Where’s Oracle!? ! ! ! ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Their first product was a relational database based off of IBM’s System R model and SQL technology Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. Where’s Oracle!? ! ! ! ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Their first product was a relational database based off of IBM’s System R model and SQL technology Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. Where’s Oracle!? ! ! ! ! Larry Ellison learned of IBM’s work and founded Relational Software Inc. in 1977 in California Their first product was a relational database based off of IBM’s System R model and SQL technology Released in 1979, it was the first commercial RDBMS, beating IBM to the market by 2 years. In the 1980’s the company was renamed to Oracle Corporation and throughout the 80’s new features were added and performance improved as the price of hardware came down and Oracle became the largest independent RDBMS vendor. 1975 ! ANSI-SPARC Three-Level Architecture " " Views describe how users see the data. Conceptual schema defines logical structure ! " Describes what data is stored and relationships among the data. 1975 View 1 View 2 View 3 Conceptual Schema Views describe how users see the data. Conceptual schema defines logical structure ! " Describes what data is stored and relationships among the data. Physical schema describes the files and indexes used. ! Describes how the data is stored in the database Views describe how users see the data. Conceptual schema defines logical structure ! " Describes how the data is stored in the database ANSI-SPARC Three-Level Architecture " " Physical Schema Describes what data is stored and relationships among the data. View 1 View 2 View 3 Conceptual Schema Physical Schema Physical schema describes the files and indexes used. ! 1975 " ANSI-SPARC Three-Level Architecture " Physical schema describes the files and indexes used. ! ! ! Describes how the data is stored in the database 1975 View 1 View 2 View 3 ! ANSI-SPARC Three-Level Architecture " Conceptual Schema " Physical Schema Views describe how users see the data. Conceptual schema defines logical structure ! " Describes what data is stored and relationships among the data. Physical schema describes the files and indexes used. ! Describes how the data is stored in the database View 1 View 2 View 3 Conceptual Schema Physical Schema 1976 ! Entity-Relationship(ER) Models " " Entity-Relationship(ER) Models " " ! Proposed by Peter Chen for database design giving an important insight into conceptual data models Allows the designer to concentrate on the use of data instead of the logical table structure 1976 ! 1976 Proposed by Peter Chen for database design giving an important insight into conceptual data models Allows the designer to concentrate on the use of data instead of the logical table structure Entity-Relationship(ER) Models " " Proposed by Peter Chen for database design giving an important insight into conceptual data models Allows the designer to concentrate on the use of data instead of the logical table structure 1976 ! Entity-Relationship(ER) Models " " Proposed by Peter Chen for database design giving an important insight into conceptual data models Allows the designer to concentrate on the use of data instead of the logical table structure 1980's ! ! ! Birth of IBM PC. RDBMS market begins to boom. SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standardization) By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. " ! ! ! This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started to appear Birth of IBM PC. RDBMS market begins to boom. SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standardization) By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. " ! ! ! ! More flexibility was needed in how their data was represented and accessed. 1980's ! 1980's " ! More flexibility was needed in how their data was represented and accessed. This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started to appear 1980's ! ! ! More flexibility was needed in how their data was represented and accessed. This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started to appear Birth of IBM PC. RDBMS market begins to boom. SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standardization) By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. Birth of IBM PC. RDBMS market begins to boom. SQL becomes standardized through ANSI (American National Standards Institute) and ISO (International Organization for Standardization) By Mid 80’s it had become apparent that there were some fields (medicine, multimedia, physics) where relational databases were not practical, due to the types of data involved. " ! More flexibility was needed in how their data was represented and accessed. This led to research in Object Oriented Databases in which users could define their own methods of access to data and how to represent and manipulate it. This coincided with the introduction of Object Oriented Programming languages such as C++ which started to appear 1990’s ! ! ! ! ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types such as images and text, and to ask more complex queries New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, multimedia, etc Development of personal/small business productivity tools such as Excel and Access from Microsoft. 1990’s ! ! ! ! ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types such as images and text, and to ask more complex queries New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, multimedia, etc Development of personal/small business productivity tools such as Excel and Access from Microsoft. 1990’s ! ! ! ! ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types such as images and text, and to ask more complex queries New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, multimedia, etc Development of personal/small business productivity tools such as Excel and Access from Microsoft. 1990’s ! ! ! ! ! Considerable research into more powerful query language and richer data model, with emphasis on supporting complex analysis of data from all parts of an enterprise First OODBMS’ start to appear from companies like Objectivity. Object Relational DBMS’ hybrids also begin to appear. Several vendors, e.g., IBM’s DB2, Oracle 8, Informix UDS, extended their systems with the ability to store new data types such as images and text, and to ask more complex queries New application areas: Data warehousing and OLAP(Online Analytical Processing, a category of software tools that provides analysis of data stored in a database), internet, multimedia, etc Development of personal/small business productivity tools such as Excel and Access from Microsoft. Late 90’s-2000’s ! XML " ! ! ! Data Junction, ADO, Delphi Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix Late 90’s-2000’s ! XML " ! ! ! ! ! Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle Late 90’s-2000’s XML " Data Junction, ADO, Delphi Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix Starts incorporation (as middleware or enabled DBMS) in 1997 ! ! " TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep ! ! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: " ! Data Junction, ADO, Delphi Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix Native XML DBMS, 2000 ! Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: " ! Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: " ! Data Junction, ADO, Delphi Oracle 8i, 9i, MS Access 2002, SQL Server 2000, DB2, Informix Native XML DBMS, 2000 ! Native XML DBMS, 2000 ! ! " Starts incorporation (as middleware or enabled DBMS) in 1997 ! " ! Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle Starts incorporation (as middleware or enabled DBMS) in 1997 ! TigerLogic XDMS, Raining Data, Tamino, Software AG, Birdstep Large investment in internet companies fuels tools-market boom for Web/Internet/DB connectors: " XML " Native XML DBMS, 2000 ! ! ! Starts incorporation (as middleware or enabled DBMS) in 1997 ! " Late 90’s-2000’s Active Server Pages, Front page, Java Servlets, JDBC, Java Beans, ColdFusion, Dream Weaver, Oracle Developer 2000, etc Open source projects come online with widespread use of gcc,cgi, Apache, MySQL Three main companies dominate in the large DB market: IBM, Microsoft, and Oracle 2010’s…. ! Big Data: " " " " ! ! Big Data: " " " ! " " ! For example: ! ! INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time " This involves de-normalizing data Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) Facebook has 36 PB of user data + 80-90 TB/day (6/2010) New ways for efficient query answering are needed: " For example: ! ! INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time " This involves de-normalizing data 2010’s…. ! Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) Facebook has 36 PB of user data + 80-90 TB/day (6/2010) New ways for efficient query answering are needed: " " INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time " This involves de-normalizing data 2010’s…. " Big Data: " For example: ! ! ! Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) Facebook has 36 PB of user data + 80-90 TB/day (6/2010) New ways for efficient query answering are needed: " 2010’s…. Big Data: " " " " ! Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) Facebook has 36 PB of user data + 80-90 TB/day (6/2010) New ways for efficient query answering are needed: " For example: ! ! INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time " This involves de-normalizing data Entender los datos: medidas…. Nombre Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Standard SI 10 e 3 10 e 6 10 e 9 10 e 12 10 e 15 10 e 18 10 e 21 Uso Binario 2 e 10 2 e 20 2 e 30 2 e 40 2 e 50 2 e 60 2 e 70 Entender los datos: medidas…. Nombre Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Standard SI 10 e 3 10 e 6 10 e 9 10 e 12 10 e 15 10 e 18 10 e 21 Uso Binario 2 e 10 2 e 20 2 e 30 2 e 40 2 e 50 2 e 60 2 e 70 Entender los datos: medidas…. Nombre Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Standard SI 10 e 3 10 e 6 10 e 9 10 e 12 10 e 15 10 e 18 10 e 21 Uso Binario 2 e 10 2 e 20 2 e 30 2 e 40 2 e 50 2 e 60 2 e 70 Entender los datos: medidas…. Nombre Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Standard SI 10 e 3 10 e 6 10 e 9 10 e 12 10 e 15 10 e 18 10 e 21 Uso Binario 2 e 10 2 e 20 2 e 30 2 e 40 2 e 50 2 e 60 2 e 70 Human Scale Human Scale KILO 10^3 (2^10) KILO 10^3 (2^10) Cellular memory Text (email, document) MEGA 10^6 (2^20) Book, Picture GIGA 10^9 (2^30) Cellular memory Text (email, document) MEGA 10^6 (2^20) Book, Picture GIGA 10^9 (2^30) RAM, Good video RAM, Good video (This is our world) (This is our world) Human Scale Human Scale KILO 10^3 (2^10) KILO 10^3 (2^10) Cellular memory Text (email, document) MEGA 10^6 (2^20) Book, Picture GIGA 10^9 (2^30) Cellular memory Text (email, document) MEGA 10^6 (2^20) Book, Picture GIGA 10^9 (2^30) RAM, Good video RAM, Good video (This is our world) (This is our world) More More TERA 10^12 2^{40} TERA 10^12 2^{40} -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) ill it is usua an scale, but st It is not a hum l for any norm -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) al company ill it is usua an scale, but st It is not a hum More al company l for any norm More TERA 10^12 2^{40} TERA 10^12 2^{40} -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) It is not a hum ill it is usua an scale, but st l for any norm -- Congress library (USA): 160 TB -- Daily internet traffic (100 TB) -- Wikipedia: 6 Terabyte dump (2010) --3-D movie Monsters Vs Aliens (necesitó 100 TB disco) al company It is not a hum ill it is usua an scale, but st al company l for any norm Even More… Even More… PETA 10^15 2^50 PETA 10^15 2^50 " " " " " " World of Warcraft uses 1.3 PB to keep its game Internet Archive (3 PB) (it increases a 100 TB per month) Google procesdes 24 petabytes per day 1/2 PB:to films the life of a person (100 years in high definition). Facebook has 60 thousend millions of images, that is, 1,5PB. AT&T transfers around 19 petabytes per day. " " " " " " World of Warcraft uses 1.3 PB to keep its game Internet Archive (3 PB) (it increases a 100 TB per month) Google procesdes 24 petabytes per day 1/2 PB:to films the life of a person (100 years in high definition). Facebook has 60 thousend millions of images, that is, 1,5PB. AT&T transfers around 19 petabytes per day. Even More… Even More… PETA 10^15 2^50 PETA 10^15 2^50 " " " " " " World of Warcraft uses 1.3 PB to keep its game Internet Archive (3 PB) (it increases a 100 TB per month) Google procesdes 24 petabytes per day 1/2 PB:to films the life of a person (100 years in high definition). Facebook has 60 thousend millions of images, that is, 1,5PB. AT&T transfers around 19 petabytes per day. " " " " " " World of Warcraft uses 1.3 PB to keep its game Internet Archive (3 PB) (it increases a 100 TB per month) Google procesdes 24 petabytes per day 1/2 PB:to films the life of a person (100 years in high definition). Facebook has 60 thousend millions of images, that is, 1,5PB. AT&T transfers around 19 petabytes per day. 2010’s ! NoSQL " " " ! " ! ! NoSQL " " ! " " ! BigTable (Google) Dynamo (Amazon) ! ! ! Gossip protocol (discovery and error detection) Distributed key-value data store Eventual consistency Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins NoSQL movement started from: " " BigTable (Google) Dynamo (Amazon) ! ! ! Gossip protocol (discovery and error detection) Distributed key-value data store Eventual consistency 2010’s ! Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins NoSQL movement started from: " " Gossip protocol (discovery and error detection) Distributed key-value data store Eventual consistency 2010’s " NoSQL " BigTable (Google) Dynamo (Amazon) ! ! ! Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins NoSQL movement started from: " 2010’s NoSQL " " " ! Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins NoSQL movement started from: " " BigTable (Google) Dynamo (Amazon) ! ! ! Gossip protocol (discovery and error detection) Distributed key-value data store Eventual consistency NoSQL solutions ! NoSQL solutions fall into two major areas: " " ! " " " " " NoSQL solutions fall into two major areas: " " ! " " " " " " ! joins group by order by ACID transactions SQL as a sometimes frustrating but still powerful query language easy integration with other applications that support SQL Key/Value or ‘the big hash table’. Schema-less which comes in multiple flavors, columnbased, document-based or graph-based. In NoSQL solutions we are giving up: " " " " " " joins group by order by ACID transactions SQL as a sometimes frustrating but still powerful query language easy integration with other applications that support SQL NoSQL solutions ! Key/Value or ‘the big hash table’. Schema-less which comes in multiple flavors, columnbased, document-based or graph-based. In NoSQL solutions we are giving up: " NoSQL solutions fall into two major areas: " joins group by order by ACID transactions SQL as a sometimes frustrating but still powerful query language easy integration with other applications that support SQL NoSQL solutions ! ! Key/Value or ‘the big hash table’. Schema-less which comes in multiple flavors, columnbased, document-based or graph-based. In NoSQL solutions we are giving up: " NoSQL solutions NoSQL solutions fall into two major areas: " " ! Key/Value or ‘the big hash table’. Schema-less which comes in multiple flavors, columnbased, document-based or graph-based. In NoSQL solutions we are giving up: " " " " " " joins group by order by ACID transactions SQL as a sometimes frustrating but still powerful query language easy integration with other applications that support SQL A lot has been left out! 1970's 2000's A lot has been left out! 1970's 2000's A lot has been left out! 1970's 2000's A lot has been left out! 1970's 2000's References ! ! "The History of Databases" By Patrick RogersOstema Database Management Systems, R. Ramakrishnan and J. Gehrke (slides) References ! ! "The History of Databases" By Patrick RogersOstema Database Management Systems, R. Ramakrishnan and J. Gehrke (slides) References ! ! "The History of Databases" By Patrick RogersOstema Database Management Systems, R. Ramakrishnan and J. Gehrke (slides) References ! ! "The History of Databases" By Patrick RogersOstema Database Management Systems, R. Ramakrishnan and J. Gehrke (slides)