* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides from Lecture 24 - Courses - University of California, Berkeley
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Object-Oriented, Intelligent and Object-Relational Database Models University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management IS 257 – Spring 2004 2004.04.20 - SLIDE 1 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 2 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 3 What is Decision Support? • Technology that will help managers and planners make decisions regarding the organization and its operations based on data in the Data Warehouse. – What was the last two years of sales volume for each product by state and city? – What effects will a 5% price discount have on our future income for product X? • Increasing common term is KDD – Knowledge Discovery in Databases IS 257 – Spring 2004 2004.04.20 - SLIDE 4 Conventional Query Tools • Ad-hoc queries and reports using conventional database tools – E.g. Access queries. • Typical database designs include fixed sets of reports and queries to support them – The end-user is often not given the ability to do ad-hoc queries IS 257 – Spring 2004 2004.04.20 - SLIDE 5 OLAP • Online Line Analytical Processing – Intended to provide multidimensional views of the data – I.e., the “Data Cube” – The PivotTables in MS Excel are examples of OLAP tools IS 257 – Spring 2004 2004.04.20 - SLIDE 6 Data Cube IS 257 – Spring 2004 2004.04.20 - SLIDE 7 Operations on Data Cubes • Slicing the cube – Extracts a 2d table from the multidimensional data cube – Example… • Drill-Down – Analyzing a given set of data at a finer level of detail IS 257 – Spring 2004 2004.04.20 - SLIDE 8 Star Schema • Typical design for the derived layer of a Data Warehouse or Mart for Decision Support – Particularly suited to ad-hoc queries – Dimensional data separate from fact or event data • Fact tables contain factual or quantitative data about the business • Dimension tables hold data about the subjects of the business • Typically there is one Fact table with multiple dimension tables IS 257 – Spring 2004 2004.04.20 - SLIDE 9 Star Schema for multidimensional data Order OrderNo OrderDate … Customer CustomerName CustomerAddress City … Salesperson SalespersonID SalespersonName City Quota IS 257 – Spring 2004 Fact Table OrderNo Salespersonid Customerno ProdNo Datekey Cityname Quantity TotalPrice Product ProdNo ProdName Category Description … City CityName State Country … Date DateKey Day Month Year … 2004.04.20 - SLIDE 10 Data Mining • Data mining is knowledge discovery rather than question answering – May have no pre-formulated questions – Derived from • Traditional Statistics • Artificial intelligence • Computer graphics (visualization) IS 257 – Spring 2004 2004.04.20 - SLIDE 11 Goals of Data Mining • Explanatory – Explain some observed event or situation • Why have the sales of SUVs increased in California but not in Oregon? • Confirmatory – To confirm a hypothesis • Whether 2-income families are more likely to buy family medical coverage • Exploratory – To analyze data for new or unexpected relationships • What spending patterns seem to indicate credit card fraud? IS 257 – Spring 2004 2004.04.20 - SLIDE 12 Data Mining Applications • • • • • • Profiling Populations Analysis of business trends Target marketing Usage Analysis Campaign effectiveness Product affinity IS 257 – Spring 2004 2004.04.20 - SLIDE 13 Data Mining Algorithms • • • • • Market Basket Analysis Memory-based reasoning Cluster detection Link analysis Decision trees and rule induction algorithms • Neural Networks • Genetic algorithms IS 257 – Spring 2004 2004.04.20 - SLIDE 14 Market Basket Analysis • A type of clustering used to predict purchase patterns. • Identify the products likely to be purchased in conjunction with other products – E.g., the famous (and apocryphal) story that men who buy diapers on Friday nights also buy beer. IS 257 – Spring 2004 2004.04.20 - SLIDE 15 Memory-based reasoning • Use known instances of a model to make predictions about unknown instances. • Could be used for sales forcasting or fraud detection by working from known cases to predict new cases IS 257 – Spring 2004 2004.04.20 - SLIDE 16 Cluster detection • Finds data records that are similar to each other. • K-nearest neighbors (where K represents the mathematical distance to the nearest similar record) is an example of one clustering algorithm IS 257 – Spring 2004 2004.04.20 - SLIDE 17 Link analysis • Follows relationships between records to discover patterns • Link analysis can provide the basis for various affinity marketing programs • Similar to Markov transition analysis methods where probabilities are calculated for each observed transition. IS 257 – Spring 2004 2004.04.20 - SLIDE 18 Decision trees and rule induction algorithms • Pulls rules out of a mass of data using classification and regression trees (CART) or Chi-Square automatic interaction detectors (CHAID) • These algorithms produce explicit rules, which make understanding the results simpler IS 257 – Spring 2004 2004.04.20 - SLIDE 19 Neural Networks • Attempt to model neurons in the brain • Learn from a training set and then can be used to detect patterns inherent in that training set • Neural nets are effective when the data is shapeless and lacking any apparent patterns • May be hard to understand results IS 257 – Spring 2004 2004.04.20 - SLIDE 20 Genetic algorithms • Imitate natural selection processes to evolve models using – Selection – Crossover – Mutation • Each new generation inherits traits from the previous ones until only the most predictive survive. IS 257 – Spring 2004 2004.04.20 - SLIDE 21 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 22 Object-Oriented DBMS Basic Concepts • Each real-world entity is modeled by an object. Each object is associated with a unique identifier (sometimes call the object ID or OID) IS 257 – Spring 2004 2004.04.20 - SLIDE 23 Object-Oriented DBMS Basic Concepts • Each object has a set of instance attributes (or instance variables) and methods. – The value of an attribute can be an object or set of objects. Thus complex object can be constructed from aggregations of other objects. – The set of attributes of the object and the set of methods represent the object structure and behavior, respectively IS 257 – Spring 2004 2004.04.20 - SLIDE 24 Object-Oriented DBMS Basic Concepts • The attribute values of an object represent the object’s status. – Status is accessed or modified by sending messages to the object to invoke the corresponding methods IS 257 – Spring 2004 2004.04.20 - SLIDE 25 Object-Oriented DBMS Basic Concepts • Objects sharing the same structure and behavior are grouped into classes. – A class represents a template for a set of similar objects. – Each object is an instance of some class. IS 257 – Spring 2004 2004.04.20 - SLIDE 26 Object-Oriented DBMS Basic Concepts • A class can be defined as a specialization of of one or more classes. – A class defined as a specialization is called a subclass and inherits attributes and methods from its superclass(es). IS 257 – Spring 2004 2004.04.20 - SLIDE 27 Object-Oriented DBMS Basic Concepts • An OODBMS is a DBMS that directly supports a model based on the objectoriented paradigm. – Like any DBMS it must provide persistent storage for objects and their descriptions (schema). – The system must also provide a language for schema definition and and for manipulation of objects and their schema – It will usually include a query language, indexing capabilities, etc. IS 257 – Spring 2004 2004.04.20 - SLIDE 28 Generalization Hierarchy employee Employee No Name Address Date hired Date of Birth calculateAge Hourly Salaried consultant Hourly Rate Annual Salary Stock Option Contract No. Date Hired calculateWage calculateStockBenefit AllocateToContract IS 257 – Spring 2004 2004.04.20 - SLIDE 29 OODBMS • Many available commercially: – Gemstone, Polyhedra, Objectivity/DB, MetaKit, ObjectDB, etc. • Many Open Source: – SHORE, GOODS (Generic Object Oriented Database System), The Zope Object DataBase (ZODB), Ozone, etc. • If interested in finding more about oodbms – See http://cbbrowne.com/info/oodbms.html IS 257 – Spring 2004 2004.04.20 - SLIDE 30 Example: Ozone • Version 1 of the MMM datastore used for the phone project in 202 this year was based on Ozone. • “The Ozone Database Project is a open initiative for the creation of an open source, Java based, object-oriented database management system.” • Definitely a “work in progress” IS 257 – Spring 2004 2004.04.20 - SLIDE 31 Example: Ozone • “ozone is a fully featured, object-oriented database management system completely implemented in Java and distributed under an open source license. The ozone project aims to evolve a database system that allows developers to build pure object-oriented, pure Java database applications. Just program your Java objects and let them run in a transactional database environment.” • “ozone includes a fully W3C compliant DOM implementation that allows you to store XML data. You can use any XML tool to provide and access these data. Support classes for Apache Xerces-J and Xalan-J are included.” • “Besides the native API, ozone provides a ODMG 3.0 interface. Although not fully ODMG compliant it helps you to port applications to/from ozone.” • “ozone does not depend on any back-end database or mapping technology to actually save objects. It contains its own clustered storage and cache system to handle persistent Java objects. “ • From http://www.ozone-db.org/frames/home/what.html IS 257 – Spring 2004 2004.04.20 - SLIDE 32 Example: Ozone • • • • Database objects are the persistent objects designed by developers to fullfill their application logic needs. Database objects implement a given interface (in more concrete terms, a Java interface that extends org.ozoneDB.OzoneRemote), and this interface is the "visible" side of database objects. There is only one instance of a database object, which lives inside the database server. This database object is controlled via proxy objects. A given proxy object represents its corresponding database object - inside the client applications and inside other database objects. A proxy object can be seen as a persistent reference. Proxy classes are automatically generated out of the database classes by the Ozone post-processor and implement the same public interface as their respective database object counterpart - which means that they also implement the OzoneRemote interface that their corresponding database object implements. All ozone API methods return proxies for the actual database object inside the database. Therefore, the client deals with proxies only. However, this is transparent to the client: proxies can be used as if they were the actual database objects, since they implement the same interface. Database objects are different from ordinary Java objects (other systems and specs, like JDO, respectively call them "primary" and "secondary", or "first-class" and "second-class"). Only one instance of a given database object reference exists in the database, as opposed to standard Java objects, which are treated in a "by-copy" fashion each time they are serialized. By analogy, database objects are a bit like rows in a relational database table, and members of these database objects that are standard Java objects correspond to the columns in the row - database object members would correspond to links to other tables, if we push the analogy. • From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695 IS 257 – Spring 2004 2004.04.20 - SLIDE 33 Example: Ozone Ozone Architecture From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695 IS 257 – Spring 2004 2004.04.20 - SLIDE 34 Example: Ozone • • The Ozone architecture, very generally represented by the preceding diagram, has four main layers: Client – This is the client application area; the client obtains a connection to an Ozone server, connection that can be shared by many threads. The client application interacts with the database API to create, delete, update and search persistent objects in the underlying Ozone storage • Network – The network layer is where the Ozone protocol plays a role similar to RMI. It carries method invocation information targeted at persistent objects, in addition to all other commands relayed to the Ozone server. • Server – The server manages client connections, security, transactions, and incoming method invocations from the clients. If required, it is in charge of invoking methods on persistent objects, therefore tightly interacting with the underlying object storage facility. The server maintains a transactionally safe environment for multiple clients that access persistent objects through a remote proxy. • Storage – The storage system is always accessed through an Ozone server. The storage is responsible for object persistence, clustering, object identification, and other task pertaining to low-level database-like operations. • From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695 IS 257 – Spring 2004 2004.04.20 - SLIDE 35 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 36 Inverted File DBMS • Usually similar to Hierarchic DBMS in record structure – Support for repeating groups of fields and multiple value fields • All access is via inverted file indexes to DBS specified fields. • Examples: ADABAS DBMS from Software AG -- used in the MELVYL system IS 257 – Spring 2004 2004.04.20 - SLIDE 37 Flat File DBMS • Data is stored as a simple file of records. – Records usually have a simple structure • May support indexing of fields in the records. – May also support scanning of the data • No mechanisms for relating data between files. • Usually easy to use and simple to set up IS 257 – Spring 2004 2004.04.20 - SLIDE 38 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 39 Object Relational Databases • Began with UniSQL/X unified objectoriented and relational system • Some systems (like OpenODB from HP) were Object systems built on top of Relational databases. • Miro/Montage/Illustra built on Postgres. • Informix Buys Illustra. (DataBlades) • Oracle Hires away Informix Programmers. (Cartridges) IS 257 – Spring 2004 2004.04.20 - SLIDE 40 PostgreSQL • Derived from POSTGRES – Developed at Berkeley by Mike Stonebraker and his students (EECS) starting in 1986 • Postgres95 – Andrew Yu and Jolly Chen adapted POSTGRES to SQL and greatly improved the code base • PostgreSQL – Name changed in 1996, and since that time the system has been expanded to support most SQL92 and many SQL99 features IS 257 – Spring 2004 2004.04.20 - SLIDE 41 Object Relational Data Model • Class, instance, attribute, method, and integrity constraints • OID per instance • Encapsulation • Multiple inheritance hierarchy of classes • Class references via OID object references • Set-Valued attributes • Abstract Data Types IS 257 – Spring 2004 2004.04.20 - SLIDE 42 PostgreSQL Classes • The fundamental notion in Postgres is that of a class, which is a named collection of object instances. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has a permanent object identifier (OID) that is unique throughout the installation. Because SQL syntax refers to tables, we will use the terms table and class interchangeably. Likewise, an SQL row is an instance and SQL columns are attributes. IS 257 – Spring 2004 2004.04.20 - SLIDE 43 Creating a Class • You can create a new class by specifying the class name, along with all attribute names and their types: CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date ); IS 257 – Spring 2004 2004.04.20 - SLIDE 44 PostgreSQL • Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard. • So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model. IS 257 – Spring 2004 2004.04.20 - SLIDE 45 PostgreSQL • All of the usual SQL commands for creation, searching and modifying classes (tables) are available. With some additions… • Inheritance • Non-Atomic Values • User defined functions and operators IS 257 – Spring 2004 2004.04.20 - SLIDE 46 Inheritance CREATE TABLE cities ( name text, population float, altitude int -- (in ft) ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities); IS 257 – Spring 2004 2004.04.20 - SLIDE 47 Inheritance • In Postgres, a class can inherit from zero or more other classes. • A query can reference either – all instances of a class – or all instances of a class plus all of its descendants IS 257 – Spring 2004 2004.04.20 - SLIDE 48 Inheritance • For example, the following query finds all the cities that are situated at an attitude of 500ft or higher: SELECT name, altitude FROM cities WHERE altitude > 500; +----------+----------+ |name | altitude | +----------+----------+ |Las Vegas | 2174 | +----------+----------+ |Mariposa | 1953 | +----------+----------+ IS 257 – Spring 2004 2004.04.20 - SLIDE 49 Inheritance • On the other hand, to find the names of all cities, including state capitals, that are located at an altitude over 500ft, the query is: SELECT c.name, c.altitude FROM cities* c WHERE c.altitude > 500; which returns: +----------+----------+ |name | altitude | +----------+----------+ |Las Vegas | 2174 | +----------+----------+ |Mariposa | 1953 | +----------+----------+ |Madison | 845 | +----------+----------+ IS 257 – Spring 2004 2004.04.20 - SLIDE 50 Inheritance • The "*" after cities in the preceding query indicates that the query should be run over cities and all classes below cities in the inheritance hierarchy • Many of the PostgreSQL commands (SELECT, UPDATE and DELETE, etc.) support this inheritance notation using "*" IS 257 – Spring 2004 2004.04.20 - SLIDE 51 Non-Atomic Values • One of the tenets of the relational model is that the attributes of a relation are atomic – I.e. only a single value for a given row and column • Postgres does not have this restriction: attributes can themselves contain subvalues that can be accessed from the query language – Examples include arrays and other complex data types. IS 257 – Spring 2004 2004.04.20 - SLIDE 52 Non-Atomic Values - Arrays • Postgres allows attributes of an instance to be defined as fixed-length or variable-length multidimensional arrays. Arrays of any base type or user-defined type can be created. To illustrate their use, we first create a class with arrays of base types. CREATE TABLE SAL_EMP ( name text, pay_by_quarter int4[], schedule text[][] ); IS 257 – Spring 2004 2004.04.20 - SLIDE 53 Non-Atomic Values - Arrays • The preceding SQL command will create a class named SAL_EMP with a text string (name), a one-dimensional array of int4 (pay_by_quarter), which represents the employee's salary by quarter and a two-dimensional array of text (schedule), which represents the employee's weekly schedule • Now we do some INSERTSs; note that when appending to an array, we enclose the values within braces and separate them by commas. IS 257 – Spring 2004 2004.04.20 - SLIDE 54 Inserting into Arrays INSERT INTO SAL_EMP VALUES ('Bill', '{10000, 10000, 10000, 10000}', '{{"meeting", "lunch"}, {}}'); INSERT INTO SAL_EMP VALUES ('Carol', '{20000, 25000, 25000, 25000}', '{{"talk", "consult"}, {"meeting"}}'); IS 257 – Spring 2004 2004.04.20 - SLIDE 55 Querying Arrays • This query retrieves the names of the employees whose pay changed in the second quarter: SELECT name FROM SAL_EMP WHERE SAL_EMP.pay_by_quarter[1] <> SAL_EMP.pay_by_quarter[2]; +------+ |name | +------+ |Carol | +------+ IS 257 – Spring 2004 2004.04.20 - SLIDE 56 Querying Arrays • This query retrieves the third quarter pay of all employees: SELECT SAL_EMP.pay_by_quarter[3] FROM SAL_EMP; +---------------+ |pay_by_quarter | +---------------+ |10000 | +---------------+ |25000 | +---------------+ IS 257 – Spring 2004 2004.04.20 - SLIDE 57 Querying Arrays • We can also access arbitrary slices of an array, or subarrays. This query retrieves the first item on Bill's schedule for the first two days of the week. SELECT SAL_EMP.schedule[1:2][1:1] FROM SAL_EMP WHERE SAL_EMP.name = 'Bill'; +-------------------+ |schedule | +-------------------+ |{{"meeting"},{""}} | +-------------------+ IS 257 – Spring 2004 2004.04.20 - SLIDE 58 Lecture Outline • Review – Applications for Data Warehouses – Data Mining • Thanks again to lecture notes from Joachim Hammer of the University of Florida • • • • Object Oriented DBMS Inverted File and Flat File DBMS Object-Relational DBMS (revisited) Intelligent DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 59 Intelligent Database Systems • Intelligent DBS are intended to handle more than just data, and may be used in tasks involving large amounts of information where analysis and “discovery” are needed. The following is based on “Intelligent Databases” by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong AI Expert, March 1990, v. 5 no. 3. Pp 38-47 IS 257 – Spring 2004 2004.04.20 - SLIDE 60 Intelligent Database Systems • They represent the evolution and merging of several technologies: – Automatic Information Discovery – Hypermedia – Object Orientation – Expert Systems – Conventional DBMS IS 257 – Spring 2004 2004.04.20 - SLIDE 61 Intelligent Database Systems Automatic discovery Expert Systems Intelligent Databases Hypermedia Traditional Databases IS 257 – Spring 2004 Object Orientation 2004.04.20 - SLIDE 62 Intelligent Database Architecture High-Level Tools High-Level User Interface Intelligent Database Engine IS 257 – Spring 2004 2004.04.20 - SLIDE 63 Environment Components Flexible queries Data Dictionary Error detection Automatic Discovery Concept Dictionary IS 257 – Spring 2004 2004.04.20 - SLIDE 64 Intelligent Databases • Data Dictionary contains the system metadata • Concept Dictionary defines ‘virtual fields’ based on approximate definitions • Data Analysis and discovery – Find patterns – detect errors – Process queries IS 257 – Spring 2004 2004.04.20 - SLIDE 65 Intelligent Databases • Automatic Discovery – Data comprehension – Form Hypotheses – Make queries – View results and perhaps modify hypotheses – Repeat IS 257 – Spring 2004 2004.04.20 - SLIDE 66 Intelligent Databases • Automatic Error Detection – Integrity Constraints – Rule systems – Analysis of data for anomalies IS 257 – Spring 2004 2004.04.20 - SLIDE 67 Intelligent Databases • Flexible Query Processing – Approximate and “fuzzy” queries • SELECT NAME, AGE, TELEPHONE FROM PERSONEL WHERE NAME = ‘Dovid Smith’ and AGE IS-CLOSE-TO 19; • confidence factors – Ranked query results IS 257 – Spring 2004 2004.04.20 - SLIDE 68 Intelligent Databases • Intelligent User Interfaces – Hyperlinked data in the data/knowledge base – Multimedia presentations – Dynamic linking of related information IS 257 – Spring 2004 2004.04.20 - SLIDE 69 Intelligent Databases • Intelligent Database Engine – – – – – – – – – OO support Inference features Global optimization Rule manager Explanation manager Transaction manager Metadata manager Access module Multimedia manager IS 257 – Spring 2004 2004.04.20 - SLIDE 70