Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolution of Databases Chapter1 Contents Evolution of Databases .......................................................................................................................................... 28 Content .................................................................................................................................................................. 28 What is a Data Model ? ......................................................................................................................................... 28 Disadvantages of File Systems .............................................................................................................................. 29 Major advantages of DBMS ................................................................................................................................... 30 Hierarchical model................................................................................................................................................. 31 Parts-Suppliers example ........................................................................................................................................ 33 Network data model.............................................................................................................................................. 36 Network model ...................................................................................................................................................... 36 CODASYL Network Model ...................................................................................................................................... 37 Network Database Schema ................................................................................................................................... 38 Parts-Suppliers example in CODASYL .................................................................................................................... 39 Network Model...................................................................................................................................................... 40 Characteristics of Network model ......................................................................................................................... 41 Disadvantages of Hierarchical and Network models............................................................................................. 42 Relational data models .......................................................................................................................................... 43 Characteristic of Relational Data Model................................................................................................................ 43 Relational DBMS (1980’s) ...................................................................................................................................... 45 Object Oriented data models ................................................................................................................................ 46 Object-Oriented Database Schema ....................................................................................................................... 48 Objects in OO Database......................................................................................................................................... 48 Advantages of OO Databases ................................................................................................................................ 50 Evolution of Data models ...................................................................................................................................... 50 Evolution of Database Technology ........................................................................................................................ 50 What Is Data Mining? ............................................................................................................................................ 51 Query Language component of evolution ............................................................................................................. 52 DMQL ..................................................................................................................................................................... 56 XML ........................................................................................................................................................................ 56 Find names of salesman over 40 in “Outland” region........................................................................................... 58 26 Evolution of Databases Chapter1 Query response?.................................................................................................................................................... 58 Current concerns of database community ............................................................................................................ 59 Developments contributing to massive databases: .............................................................................................. 59 Bioinformatics........................................................................................................................................................ 60 Problems ................................................................................................................................................................ 61 Intelligence ............................................................................................................................................................ 61 Data avalanche pressures machine to evolve intelligence ................................................................................... 61 Pace of machine evolution? .................................................................................................................................. 62 27 Evolution of Databases Chapter1 Evolution of Databases Data Models Languages Objective To show the evolution of Data models and languages from simple file systems to more advanced types. Content Data Models and Languages o File system o Hierarchical o Network o Relational o Object Oriented o OOQL o DMQL o XML –QL What is a Data Model ? A Data model is a collection of tools for describing o data o data relationships o data semantics 28 Evolution of Databases Chapter1 o data constraints E.g. Data Models o Entity-Relationship model o Relational model o Object-oriented model o Network model o Hierarchical model Disadvantages of File Systems Uncontrolled data redundancy, data inconsistency Poor data sharing 29 Evolution of Databases Chapter1 Difficult to keep up with changes o If the structure of the data changed (ex: adding more fields), programs that were using the file had to change Low productivity High maintenance cost Applications have to enforced referential integrity constraints No common error recovery procedure (rollback) Severed dependence between programs and data Major advantages of DBMS Redundancy control Ad hoc queries Resilience- protect data from failure Data sharing and concurrent access Data integrity and security Separation of applications from the DB 30 Evolution of Databases Chapter1 Hierarchical model Hierarchical model uses trees. A tree represents parent/child relationships o For example, a car consists of body, engine, transmission, etc. Pointers were used to link a parent to its children or a child to another child Retrieving the data in a hierarchical database required navigating through the records, moving up, down, and sideways one record at the time The most popular hierarchical database was Information Management System (IMS) introduced in 1968 31 Evolution of Databases Chapter1 32 Evolution of Databases Chapter1 Parts-Suppliers example Data is represented to the user in the form of a set of tree structures and operators for traversing paths Each child can be reached from the parent Without parent node, children node does not exist For parts- suppliers require two hierarchical trees 33 Evolution of Databases Q1: Find supplier numbers for suppliers who supply part P2. get [next] part where P#=P2; Chapter1 Q2: Find part numbers for parts supplied by supplier S2. do until no more parts; do until no more suppliers get next part; under this part; get [next] suppliers get next supplier under this part under this part; where S#=S2; print S#; if found end; then print P#; end; Although the queries are symmetric but the two procedures are not. 34 Evolution of Databases Chapter1 Problem in some operations: Insertion - Enter a new supplier S4 o Not possible until we know what parts S4 provide o Parent is not known o Use a Dummy parts record as parent of S4 Deletion – delete supplier S3 which provides P2 where QTY=200 o Logically possible o But causes deletion of other information about S3 (S3 does not exist anywhere) o Other problem- deletion of a parent causes deletion of all dependent/children Update – update city S1 from C1 to C4 o All copies of S1 have to be updated o Propagating update increased processing o If all copies are not updated inconsistency Characteristics of Hierarchical database: o Simple Structure o Best suited to environments where 1:n relationship exists o Performance o Traversing starts from parent o Symmetric queries do not have symmetric processing 35 Evolution of Databases Chapter1 o Problems with some operations (Insert, delete, update) Network data model Network model Hierarchical database could not answer the demand of some business oriented environment. For example, in an order processing company, a single order might participate in more than one parent/child relationship. For instance, a particular order should be linked to o The customer who placed it o The sales person who took it o The product ordered o This could not be done by IMS To deal with these situations, network data model was developed: children could have more than one parent 36 Evolution of Databases Chapter1 • Example of parent/child relationship in network database models CODASYL Network Model In 1971, the conference on the systems languages published an official standard for network databases which became known as CODASYL model A programmers would access the network database as follows: o Find a specific parent record by key (ex: customer number) o Move down to the first child in a particular set (the first order placed by this customer o Move sideways from one child to the next in the set (the next order placed by this customer) o Move up from a child to its parent in another set ( the salesperson who took the order) 37 Evolution of Databases Chapter1 Network Database Schema 38 Evolution of Databases Chapter1 Parts-Suppliers example in CODASYL Conceptual design is based on concepts of sets Consider the set as a type of tree, where in each level there exist a type of record Records at the highest and lowest levels are the parent/owner records Records at the middel level are the childeren/member records Owner record is linked to the first member record according to some order Member records are connected together Last member is connected to the owner 39 Evolution of Databases Q1: Find supplier numbers for suppliers who supply part P2. Chapter1 Q2: Find part numbers for parts supplied by supplier S2. get [next] part where P#=P2; get [next] supplier where S#=S2; do until no more connectors do until no more connectors under this part; under this supplier; get next connector get next connector under this part; under this supplier; get supplier over this get part over this connector; connector; print S#; print P#; end; end; Network Model queries are symmetric but more complex than hierarchical Operations: o Insertion - Enter a new supplier S4 Does not have hierarchical problems 40 Evolution of Databases Chapter1 Can insert a new supplier without knowing what parts it supplies i.e., insert a new record for it and set its link to itself o Deletion – delete supplier S3 which provides P2 where QTY=200 No problem- does not cause S3 to be deleted o Update – update city S1 from C1 to C4 No problem – only stored once in the DB Queries are symmetric but more complex than hierarchical Operations: o Insertion - Enter a new supplier S4 o Deletion – delete supplier S3 which provides P2 where QTY=200 o Update – update city S1 from C1 to C4 Characteristics of Network model Flexibility to represent a two way 1:n relationship Performance Symmetric queries exist Insertion causes no problem 41 Evolution of Databases Chapter1 greater complexity For some queries, there is a path selection problem Disadvantages of Hierarchical and Network models They have rigid structure: o The structure of the records had to be known in advance. o Changing the database structure required rebuilding the entire database Querying the database was not always easy. Retrieving simple information form the database could cause programmer to write lots of code o Some of this code was quite complicated 42 Evolution of Databases Chapter1 Relational data models Characteristic of Relational Data Model Most of the current DB systems are relational Data is perceived by the user as tables Operators generate new tables from old Data and their relationships are represented by records Retrieval is simple – ad hoc queries Based on mathematical concepts Queries are symmetric on simple flat files Operations: no problems with insertion, deletion, update 43 Evolution of Databases Chapter1 S S# SNAME STATUS CITY SP S# P# QTY S1 Smith 20 London S1 P1 300 S2 Jones 10 Paris S1 P2 200 S3 Blake 30 Paris S1 P3 400 S2 P1 300 WEIGHT CITY S2 P2 400 S3 P2 200 P P# PNAME COLOR P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London 44 Evolution of Databases Chapter1 Relational DBMS (1980’s) Student (ID char(30), Name char(30), DOB date Address char(40), GPA number) Student ID Name DOB Address GPA s1 Jose 2/3/67 Stone Mountain 3.7 s2 Alice 3/12/72 Buck Head 4.0 s3 Tom 10/2/78 Dunwoody 3.0 s4 Sue 4/6/45 Atlanta 2.9 s5 Steve 9/7/71 Stone Mountain 3.5 45 Evolution of Databases Chapter1 Object Oriented data models Incorporates features from object-oriented programming (1980s) o classes (tables) and objects (table rows) o complex attributes (objects, sets, lists, etc.) o encapsulation o incremental class definitions via inheritance hierarchies and networks o polymorphism Many-to-many relationships directly represented Relationships via logical inclusion Commercial products: o Jasmine (Computer Associates, 1998) o Gemstone (Gemstone Systems Inc. -- SUN Microsystems Inc.) 46 Evolution of Databases Chapter1 o Many relational product claim ―object-oriented database features‖ e.g. Microsoft’s SQL-Server and Access 47 Evolution of Databases Chapter1 Object-Oriented Database Schema Objects in OO Database 48 Evolution of Databases Chapter1 Query: Find salesmen over 40 in region “Outland” SmallTalk syntax: TheSalesmen do: [S | (S age > 40 & S region name = ―Outland‖) ifTrue: [ S name display. Newline display ] ] C++ syntax: S = firstSalesman (TheSalesmen); while (S != null) { if ((S.age > 40) && (S.region.name == ―Outland‖)) S.name >> cout; S = nextSalesman (TheSalesmen); } Query: Find salesmen over 40 in region “Outland” OQL syntax (Object-oriented SQL): select S.name from S in TheSalesmen where S.age > 40 and S.region.name = ―Outland‖; 49 Evolution of Databases Chapter1 Advantages of OO Databases Group data processes Understand complex objects Easy to maintain and change Improve productivity Examples : ONTOS, GemStone, ObjectStore from Object Design, OpenODB Evolution of Data models • Decreasing technical details in queries • Increasing use of application vocabulary • Data objects have attributes and behavior of real-world counterparts Evolution of Database Technology 1960s: o Data collection, database creation, IMS and network DBMS 1970s: o Relational data model, relational DBMS implementation 50 Evolution of Databases Chapter1 1980s: o RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s—2000s: o Data mining and data warehousing, multimedia databases, and Web databases, XML databases. What Is Data Mining? Data mining (knowledge discovery in databases): o Extraction of interesting information / knowledge or patterns from data in large databases Alternative names o knowledge discovery(mining) in databases (KDD), o knowledge extraction, o data/pattern analysis, o information harvesting, o business intelligence, etc. 51 Evolution of Databases Chapter1 Query Language component of evolution Data Mining example: 52 Evolution of Databases Chapter1 Characterize sales quantities by quadrant and salesman age DMQL: mine characteristics as regionAgeBreakout analyze sum(S.Quantity) in relevance to R.Quadrant, T.age from Sale S, Salesman T, Region R where S.SalesmanID = T.SalesmanID and T.RegID = R.RegID Roughly equivalent to: select sum(S.Quantity) from Sale S, Salesman T, Region R where S.SalesmanID = T.SalesmanID and T.RegID = R.RegID 53 Evolution of Databases Chapter1 groupby R.Quadrant, S.age except: age is quantized o can generalize any dimension to obtain small number of possible values can mine comparisons (similar breakouts for competing categories) associations o e.g. percent VCR sales that accompany television sales Classifications o identify natural clusters can specify measures of interest o e.g. support, confidence, minimal intercluster distance 54 Evolution of Databases Chapter1 55 Evolution of Databases Chapter1 DMQL DMQL retains SQL syntax for locating relevant data includes statistical measures of interest level tailors language to needs of Customer Relationship Management (CMR) Query Language component of evolution XML self describing text via embedded structure tags, e.g. <marketing> <salesman> <name> Tom Jones </name> <age> 42 </age> <region> 56 Evolution of Databases Chapter1 <name> Seattle </name> <quadrant> Northwest </quadrant> </region> <department> <name> televisions </name> <division> electronics </division> </department> </salesman> </marketing> 57 Evolution of Databases Chapter1 Find names of salesman over 40 in “Outland” region. XML-QL: CONSTRUCT <result> { WHERE <marketing> <salesman> <name> $x </name> <age> $y </age> <region> <name> Outland </name> </region> </salesman> </marketing> IN ―www.publisher.com/markdb.xml‖,$y > 40 CONSTRUCT <name> $x </name } </result> Query response? <result> <name> Alan Alsop </name> <name> Barbara Benson </name> <name> Cindy Carson </name> </result> 58 Evolution of Databases Chapter1 Current concerns of database community Very large data repositories Data integration across heterogeneous sources, especially web sources o graphics, images, video Legacy data access and transformation Optimizing structure highly structured (tables) semistructured (XML documents) unstructured (natural language text) Common data-exchange templates Developments contributing to massive databases: graphics, images, video e.g. LightSurf (Philippe Kahn) exchanges images over call phones archives 10 terabytes of data on company servers data atoms are discernible chunks (images) 59 Evolution of Databases Chapter1 require specialized agents to interrogate e.g. face recognition algorithms consumer transactional data e.g. Teradata division of National Cash Register analyzes purchasing patterns and correlates with demographics biotechnology and bioinformatics o Biotechnology is a general term describing the directed modification of biological processes. o Bioinformatics is the application of statistics and computer science to the field of molecular biology e.g. GeneMine projects at University of California, LA integrates heterogenous databases across web correlates subsequences with known patterns suggests interesting associations, as opposed to responding to queries 30,000 genes 3,000,000,000 base pairs to sequence correlated, or partially correlated patterns exponentially larger Bioinformatics The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques (e.g., pattern recognition, data mining, machine learning algorithms, and visualization) to achieve this goal. 60 Evolution of Databases Chapter1 Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. Problems Truly vast data volumes No opportunity to augment data stream with structure information System overwhelmed categorization with data volume that require simple Urgent need to comprehend higher level information inherent in the data stream, i.e., Require Intelligence Intelligence Intelligence is the ability to reduce input data streams to a manageable size while retaining detail sufficient to the tasks at hand. Intelligence presumably requires organization and categorization of the input data streams. These are database query operations. Data avalanche pressures machine to evolve intelligence For machine: inventiveness with data streams o e.g. Find associations, useful at human level, that concern diet and disease vs. Find proportion of heart attacks under age 50 by quantized daily calorie intake 61 Evolution of Databases Chapter1 roughly illustrates capability of today’s data mining systems Pace of machine evolution? Human brain achieves equivalent of 0.1 peta-ops Moore’s Law (processor speeds double every 18-24 months) gives required power in next few decades. IBM currently developing a 1.0 petaflop machine for studying protein folding (Blue Gene) Optimists suggest superintelligent machines in the first half of the 21st century If successful, this approach provides the required intelligent data stream reduction without human understanding of the mechanism. Will system require emotion or consciousness to motivate the learning algorithms? 62