* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Management System
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Lecture 16 TIM 50 Autumn 2012 Tuesday November 20, 2012 Announcement 1. The grades for every assignment will be given in eCommons. 2. It's important to check webpage to get the latest information and assignments changes. 3. No Office hours on Wednesday, Friday( 11/21, 11/23) No Class on Thanks Giving Day, 11/22 Thursday Final Exam 1st Choice December 7, Friday 2nd Choice December 10, Monday depending on Schedule Permission Format is same as Midterm Covering Up to Midterm 30‐ % After Midterm 70+ % Topics of Business intelligences The problems of managing data resources in a traditional file environment Important database design principles The database management system The capabilities and value of a database management system Tools and technologies for accessing information from databases Business Intelligence, Data Mining The role of information policy, data administration, and data quality assurance in the management of a firm’s data resources Foundation of business Intelligence Division Oriented Paper File Systems Manual Processing Data redundancy: Data inconsistency: Program‐data dependence: Lack of flexibility Poor security Lack of data sharing and availability Data Base Systems Relational DB Object Oriented DB DBMS Data Base System Information Management File Management System File Processing Procedure System Inefficiencies Longer Business Cycle No Firm wise Information or Data Access No Data Security No Decision son Integrated Data and inform High Business Process Expenditure Data Base Systems DBMS,SQL Intelligence from Collection of Data Information Management Business Applications Data Integrity Control Business Data Maintenances Less Redundancy Data Integrity Efficiency Data Confidentiality OrganizingDatainaTraditionalFile Environment File organization concepts Database: Group of related files File: Group of records of same type Record: Group of related fields Field: Group of characters as word(s) or number Describes an entity (person, place, thing on which we store information) Attribute: Each characteristic, or quality, describing entity E.g., Attributes Date or Grade belong to entity COURSE THE DATA HIERARCHY A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database. Information as Processed Data Problems with the traditional file environment Old Business Process; Files maintained separately by different departments Data redundancy: Presence of duplicate data in multiple files Data inconsistency: Same attribute has different values Program‐data dependence: When changes in program requires changes to data accessed by program Lack of flexibility Poor security Lack of data sharing and availability TRADITIONAL FILE PROCESSING The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources. Business Processes with Old Data Processing System Inefficiencies Longer Business Cycle No Firm wise Information or Data Access No Data Security No Decision on Integrated Data and information High Business Process Expenditure Introduction of Data Processing System Database – collection of persistent data from business divisions Database Management System (DBMS) – software system that supports creation, population, and querying of a database TheDatabaseApproachtoDataManagement Database Serves many business applications by centralizing data and controlling redundant data across division boundaries Database management system (DBMS) Interfaces between applications and physical data files Separates logical and physical views of data Solves problems of traditional file environment Controls redundancy Eliminates inconsistency Uncouples programs and data Enables organization to centrally manage data and data security Definition Although it is difficult to give a universally agreed definition of a database, we use the following common definition: Definition: A database is a collection of related, logically coherent data used by the application programs in an organization. 14.13 DATABASE ARCHITECTURE The American National Standards Institute/Standards Planning and Requirements Committee (ANSI/SPARC) has established a three-level architecture for a DBMS: internal, conceptual and external . 14.14 Database architecture Hardware The hardware is the physical computer system that allows access to data. Software The software is the actual program that allows users to access, maintain and update data. In addition, the software controls which user can access which parts of the data in the database. Confidentiality The data in a database is stored physically on the storage devices. In a database, data is a separate entity from the software that accesses it. 14.15 Users In a DBMS, the term users has a broad meaning. We can divide users into two categories: end users and application programs. Procedures The last component of a DBMS is a set of procedures or rules that should be clearly defined and followed by the users of the database. 14.16 Advantages of databases Comparing the flat-file system, we can mention several advantages for a database system. Less redundancy In a flat-file system there is a lot of redundancy. For example, in the flat file system for a university, the names of professors and students are stored in more than one file. Avoidance of Inconsistency Inconsistency If the same piece of information is stored in more than one place, then any changes in the data need to occur in all places that data is stored. 14.17 Efficiency A database is usually more efficient that a flat file system, because a piece of information is stored in fewer locations. Data integrity In a database system it is easier to maintain data integrity , because a piece of data is stored in fewer locations. Data integrity contains guidelines for, data retention, specifying or guaranteeing the length of time of data can be retained Confidentiality It is easier to maintain the confidentiality of the information if the storage of data is centralized in one location. 14.18 Evolution of Database Technologies Evolution of database systems • 2000 and beyond – multi –tier, client‐server, • Distributed environments, • Web‐based, • Content‐addressable storage, data mining DATA BASE MODEL OVERVIEW • • • • • ER‐Model Hierarchical Model Network Model Relational Model Object‐Oriented Model(s) ER‐Model • Data Structures • Integrity Constraints • Operations The ER‐Model is extremely successful as a database design model Translation algorithms to many data models Commercial database design tools, e.g., ERwin No generally accepted query language No database system is based on the model ER: Entry Relation ER‐Model ‐ Integrity Constraints E1 1 R n E2 E cardinality: 1:n for E1:E2 in R E1 R E1 E2 total participation of E2 in R E1 R weak entity type E2; identifying relationship type R E2 d x p E3 disjoint exclusion partition E2 A key attribute Hierarchical Database Model In the hierarchical model, data is organized as an inverted tree. Each entity has only one parent but can have several children. At the top of the hierarchy, there is one entity, which is called the root. An example of the hierarchical model representing a university 14.24 Network Database Model In the network model, the entities are organized in a graph, in which some entities can be accessed through several paths (Figure 14.4). An example of the network model representing a university 14.25 Object-Oriented Databases(OODB) An object-oriented database tries to keep the advantages of the relational model and at the same time allows applications to access structured data. In an object-oriented database, objects and their relations are defined. In addition, each object can have attributes that can be expressed as fields. XML The query language normally used for objected-oriented databases is XML (Extensible Markup Language). As we discussed in Chapter 6, XML was originally designed to add markup information to text documents, but it has also found its application as a query language in databases. XML can represent data with nested structures. 14.26 Object‐Oriented Model based on the object‐oriented paradigm, e.g., Simula, Smalltalk, C++, Java object‐oriented model has object‐oriented repository model; adds persistence and database capabilities; (see ODMG‐93, ODL, OQL) object‐oriented commercial systems include GemStone, Ontos, Orion‐2, Statice, Versant, O2 Relational Database Model In the relational model, data is organized in two-dimensional tables called relations. The tables or relations are, however, related to each other, as we will see shortly. 14.28 An example of the relational model representing a university Relational DBMS; Represent data as two‐dimensional tables called relations or files. In the relational database management system (RDBMS), the data is represented as a set of relations. Each table contains data on entity and attributes Table: grid of columns and rows Rows (tuples): Records for different entities Fields (columns): Represents attribute for entity Key field: Field used to uniquely identify each record Primary key: Field in table used for key fields Foreign key: Primary key used in second table as look‐up field to identify records from original table Relations A relation appears as a two-dimensional table. The RDBMS organizes the data so that its external view is a set of relations or tables. This does not mean that data is stored as tables: the physical storage of the data is independent of the way in which the data is logically organized. 14.30 An example of a relation A relation in an RDBMS has the following features: Name. Each relation in a relational database should have a name that is unique among other relations. Attributes. Each column in a relation is called an attribute. The attributes are the column headings in the table in Figure 14.6. Tuples. Each row in a relation is called a tuple. A tuple defines a collection of attribute values. The total number of rows in a relation is called the cardinality of the relation. Note that the cardinality of a relation changes when tuples are added or deleted. This makes the database dynamic. 14.31 Schemas • The name of a relation and the set of attributes for a relation is called a schema. • We show the schema for the relation with the relation name followed by a parenthesized list of its attributes. • Movies (title, year, length) . • Relational database schema = collection of relation schemas. RELATIONAL DATABASE TABLES A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table. RELATIONAL DATABASE TABLES A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table. Operations of a Relational DBMS Three basic operations used to develop useful sets of data SELECT: Creates subset of data of all records that meet stated criteria JOIN: Combines relational tables to provide user with more information than available in individual tables PROJECT: Creates subset of columns in table, creating tables with only the information specified THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS The select, join, and project operations enable data from two different tables to be combined and only selected attributes to be displayed. Relational Database Example • Relational Database Management System (RDBMS) – Consists of a number of tables and single schema (definition of tables and attributes) – Students (sid, name, login, age, gpa),Students identifies the table sid, name, login, age, gpa identify attributes, sid is primary key An Example Table • Students (sid: string, name: string, login: string, age: integer, gpa: real) S1 sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu login dave@cs jones@cs smith@ee smith@math madayan@music guldu@music age 19 18 18 19 11 12 gpa 3.3 3.4 3.2 3.8 1.8 2.0 Another table: Courses • Courses (cid, instructor, quarter, dept) E cid instructor quarter dept Carnatic101 Jane Fall 06 Music Reggae203 Bob Summer 06 Music Topology101 Mary Spring 06 Math History105 Fall 06 History Alice Keys • Primary key – minimal subset of fields that is unique identifier for a tuple – sid is primary key for Students – cid is primary key for Courses • Foreign key –connections between tables – Courses (cid, instructor, quarter, dept) – Students (sid, name, login, age, gpa) – How do we express which students take each course? Many to many relationships • In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table Foreign key Enrolled cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 Student sid name login 50000 Dave dave@cs 53666 Jones jones@cs 53688 Smith smith@ee 53650 Smith smith@math 53831 Madaya n madayan@musi c 53832 Guldu guldu@music Relational Algebra process for working • Collection of operators for specifying queries • Query describes step‐by‐step procedure for computing answer (i.e., operational) • Each operator accepts one or two relations as input and returns a relation as output • Relational algebra expression composed of multiple operators Basic operators • Selection – return rows that meet some condition • Projection – return column values • Union • Cross product • Difference • Other operators can be defined in terms of basic operators Simplified Schema Example • Courses (cid, instructor, quarter, dept) • Students (sid, name, gpa) • Enrolled (cid, grade, studid) Set Operations • Union (R U S) – All tuples in R or S (or both) – R and S must have same number of fields – Corresponding fields must have same domains • Intersection (R ∩ S) – All tuples in both R and S • Set difference (R – S) – Tuples in R and not S Set Operations (continued) • Cross product or Cartesian product (R x S) – All fields in R followed by all fields in S – One tuple (r,s) for each pair of tuples r R, s S Selection Select students with gpa higher than 3.3 from S1: σgpa>3.3(S1) S1 sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 sid name gpa 53666 Jones 3.4 53650 Smith 3.8 Projection Project name and gpa of all students in S1: name, gpa(S1) S1 Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 Combine Selection and Projection • Project name and gpa of students in S1 with gpa higher than 3.3: name,gpa(σgpa>3.3(S1)) Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 name gpa Jones 3.4 Smith 3.8 S1 sid 50000 53666 53688 53650 53831 53832 Example: Intersection name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 S1 S2 = S2 sid 53666 53688 53700 53777 53832 sid 53666 53688 53832 name Jones Smith Tom Jerry Guldu name Jones Smith Guldu gpa 3.4 3.2 3.5 2.8 2.0 gpa 3.4 3.2 2.0 Joins • Combine information from two or more tables • Example: students enrolled in courses: S1 S1.sid=E.studidE S1 Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 E cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 Joins S1 Sid 50000 53666 53688 53650 53831 53832 name Dave Jones Smith Smith Madayan Guldu gpa 3.3 3.4 3.2 3.8 1.8 2.0 E sid 53666 53650 53831 53832 name Jones Smith Madayan Guldu gpa 3.4 3.8 1.8 2.0 cid History105 Topology112 Carnatic101 Reggae203 cid grade studid Carnatic101 C 53831 Reggae203 B 53832 Topology112 A 53650 History 105 B 53666 grade B A C B studid 53666 53650 53831 53832 Relational Data Model: summary Relation as table Rows = tuples Columns = components Names of columns = attributes Relation name + set of attribute names= schema REL (A1,A2,...,An) C a r d i n a l i t y A1 A2 A3 ... An a1 a2 a3 an b1 b2 a3 cn a1 c2 b3 . . . bn x1 v2 d3 wn Arity Attributes Tuple • Set theoretic • Domain — set of values • like a data type • Cartesian product (or product) • D1 D2 ... Dn • n‐tuples (V1,V2,...,Vn) • s.t., V1 D1, V2 D2,...,Vn Dn –Relation=subset of cartesian product of one or more domains • FINITE only; empty set allowed Component –Tuples = members of a relation inst. –Arity = number of domains –Components = values in a tuple –Domains — corresp. with attributes –Cardinality = number of tuples What is Object Oriented Database? (OODB) • A database system that incorporates all the important object‐oriented concepts • Some additional features – Unique Object identifiers – Persistent object handling Object‐Oriented Concepts Abstract Data Types Encapsulation Implementation of operations and object structure hidden Inheritance Class definition, provides extension to complex attribute types Sharing of data within hierarchy scope, supports code reusability Polymorphism • Operator overloading Object‐Oriented DBMS (OODBMS) Stores data and procedures as objects Objects can be graphics, multimedia, Java applets Relatively slow compared with relational DBMS for processing large numbers of transactions Hybrid object‐relational DBMS: Provide capabilities of both OODBMS and relational DBMS Object Relationships Object-Oriented Databases • Support data abstraction, encapsulation, and inheritance. • Allow object identification and communication. • Reuse and modify objects. • Deal with complex data types. Object Relationships Class representation Object Inheritance Employee Name Parents Date of Birth Sex GetAge() ComputeSalar y() Attributes Methods Nelson Caballero - 4/16/2001 Advantages of OODBS • Designer can specify the structure of objects and their behavior (methods) • Multimedia Contents • Better interaction with object‐oriented languages such as Java and C++ • Definition of complex and user‐defined types • Encapsulation of operations and user‐defined methods Relational and Object-Oriented Databases Database Management System A software system that enables users to create and maintain the database. Object Oriented Decision support applications. Engineering design applications. Ordinary business applications. Multimedia applications. Applications that integrate with Knowledge bases. legacy systems. Conservative implementations. Applications with demanding distribution and concurrency. Applications that require advanced features. Electronic devices with embedded software. Source: Object oriented Modeling and design for database applications. Blaha, M. and Premerlani, W. Nelson Caballero - 4/16/2001 Database management system (DBMS) • A specific type of software for creating, storing, organizing, and accessing data from a database • Separates the logical and physical views of the data • Logical view: how end users view data • Physical view: how data are actually structured and organized • Examples of DBMS: Microsoft Access, DB2, Oracle Database, Microsoft SQL Server, MYSQL HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS A single human resources database provides many different views of data, depending on the information requirements of the user. Illustrated here are two possible views, one of interest to a benefits specialist and one of interest to a member of the company’s payroll department. Capabilities of Database Management Systems Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fields Data dictionary: Automated or manual file storing definitions of data elements and their characteristics Data manipulation language(DML): Used to add, change, delete, retrieve data from database Meta data Data that describes the properties or characteristics of other data Does not include sample data Allows database designers and users to understand the meaning of the data Structured Query Language (SQL) Microsoft Access user tools for generation SQL Many DBMS have report generation capabilities for creating polished reports (Crystal Reports) Each database will have a set of schemas associated with a catalog. Schema = the structure that contains descriptions of objects created by a user (base tables, views, constraints) Structured Query Language Structured Query Language (SQL) is the language standardized by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) for use on relational databases. It is a declarative rather than procedural language, which means that users declare what they want without having to write a step-by-step procedure. The SQL language was first implemented by the Oracle Corporation in 1979, with various versions of SQL being released since then. 14.67 SQL Is: • The standard and most common language for relational database management systems • An SQL‐based relational database application involves a user interface, a set of tables in the database, and a RDBMS with an SQL capability • Within the RDBMS SQL will be used to create the tables, translate user requests, maintain the data dictionary and system catalog, update an maintain the tables, establish security, and carry out backup and recovery procedures A simplified schematic of a typical SQL environment 3 types of SQL commands Data Definition Language (DDL) commands ‐ that define a database, including creating, altering, and dropping tables and establishing constraints • Data Manipulation Language (DML) commands ‐ that maintain and query a database • Data Control Language (DCL) commands ‐ that control a database, including administering privileges and committing data • Insert The insert operation is a unary operation—that is, it is applied to a single relation. The operation inserts a new tuple into the relation. The insert operation uses the following format: 14.71 Figure 14.7 An example of an insert operation Delete The delete operation is also a unary operation. The operation deletes a tuple defined by a criterion from the relation. The delete operation uses the following format: 14.72 An example of a delete operation Update The update operation is also a unary operation that is applied to a single relation. The operation changes the value of some attributes of a tuple. The update operation uses the following format: 14.73 An example of an update operation Select The select operation is a unary operation. The tuples (rows) in the resulting relation are a subset of the tuples in the original relation. 14.74 An example of an select operation Project The project operation is also a unary operation and creates another relation. The attributes (columns) in the resulting relation are a subset of the attributes in the original relation. 14.75 Figure 14.11 An example of a project operation Join The join operation is a binary operation that combines two relations on common attributes. 14.76 An example of a join operation Union The union operation takes two relations with the same set of attributes. 14.77 An example of a union operation Intersection The intersection operation takes two relations and creates a new relation, which is the intersection of the two. 14.78 An example of an intersection operation Difference The difference operation is applied to two relations with the same attributes. The tuples in the resulting relation are those that are in the first relation but not the second. 14.79 Figure 14.15 An example of a difference operation DATABASE DESIGN The design of any database is a lengthy and involved task that can only be done through a step-by-step process. The first step normally involves interviewing potential users of the database. The second step is to build an entity-relationship model (ERM) that defines the entities, the attributes of those entities and the relationship between those entities. Designing Databases Conceptual (logical) design: Abstract model from business perspective Physical design: How database is arranged on direct‐access storage devices Design process identifies Relationships among data elements, redundant database elements Most efficient way to group data elements to meet business requirements, needs of application programs Normalization Streamlining complex groupings of data to minimize redundant data elements and awkward many‐to‐many relationships Entity-relationship models (ERM) Database Design In this step, the database designer creates an entityrelationship (E-R) diagram to show the entities for which information needs to be stored and the relationship between those entities. E-R diagrams uses several geometric shapes, but we use only a few of them here: Rectangles represent entity sets Ellipses represent attributes Diamonds represent relationship sets Lines link attributes to entity sets and link entity sets to relationships sets 14.82 A very simple E-R diagram with three entity sets, their attributes and the relationship between the entity sets. 14.83 From E-R diagrams to relations After the E-R diagram has been finalized, relations (tables) in the relational database can be created. Relations for entity sets For each entity set in the E-R diagram, we create a relation (table) in which there are n columns related to the n attributes defined for that set. Entities, attributes and relationships in an E-R diagram 14.84 We can have three relations (tables), one for each entity set defined in Figure . Relations for entity set 14.85 Relations for relationship sets For each relationship set in the E-R diagram, we create a relation (table). This relation has one column for the key of each entity set involved in this relationship and also one column for each attribute of the relationship itself if the relationship has attributes (not in our case). 14.86 The relations for these relationship sets are added to the previous relations for the entity set and shown Relations for E-R diagram 14.87 Normalization Normalization is the process by which a given set of relations are transformed to a new set of relations with a more solid structure. Normalization is needed to allow any relation in the database to be represented, to allow a language like SQL to use powerful retrieval operations composed of atomic operations, to remove anomalies in insertion, deletion, and updating, and reduce the need for restructuring the database as new data types are added. 14.88 First normal form (1NF) When we transform entities or relationships into tabular relations, there may be some relations in which there are more values in the intersection of a row or column. 14.89 Figure 14.19 An example of 1NF Second normal form (2NF) In each relation we need to have a key (called a primary key) on which all other attributes (column values) need to depend. For example, if the ID of a student is given, it should be possible to find the student’s name. 14.90 An example of 2NF Other normal forms Other normal forms use more complicated dependencies among attributes. We leave these dependencies to books dedicated to the discussion of database topics. 14.91 AN UNNORMALIZED RELATION FOR ORDER Example An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date. NORMALIZED TABLES CREATED FROM ORDER An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date. Entity‐relationship diagram Used by database designers to document the data model Illustrates relationships between entities Map binary relationships • The procedure for representing relationships depends on both the degree of the relationships (unary, binary, ternary) and the cardinalities of the relationships Map binary one‐to‐one relationships (1:1) In a 1:1 relationship, the association in one direction is nearly always optional one, whilst the association in the other direction is mandatory one You should include in the relation on the optional side of the relationship the foreign key of the entity type that has the mandatory participation in the 1:1 relationship Map binary one‐to‐one relationships • Any attributes associated wit the relationship itself are also included in the same relation as the foreign key • The following Fig. Shows a binary 1:1 relationship between NURSE and CARE_CENTER, where each care centre must have a nurse who is in charge of that centre – so the association from care centre to nurse is a mandatory one, while the association from nurse to care centre is an optional one (since any nurse may or may not be in charge of a care centre) Mapping a binary 1:1 relationship Binary 1:1 relationship Map binary one‐to‐many (1:M) relationships • First create a relation for each of the two entity types participating in the relationship • Next include the primary key attribute(s) of the entity on the one‐side as a foreign key in the relation that is on the many‐ side • ‘Submits’ relationship in the following Fig. shows the primary key Customer_ID of CUSTOMER (the one‐side) included as a foreign key in ORDER (the many‐side) (signified by the arrow) Example of mapping a 1:M relationship Relationship between customers and orders Note the mandatory one Map binary many‐to‐many (M:N) relationships • If such a relationship exists between entity types A and B, we create a new relation C, then include as foreign keys in C the primary keys for A and B, then these attributes become the primary key of C • In the following Fig., first a relation is created for VENDOR and RAW_MATERIALS, then a relation QUOTE is created for the ‘Supplies’ relationship – with primary key formed from a combination of Vendor_ID and Material_ID (primary keys of VENDOR and RAW_MATERIALS). These are foreign keys that point to the respective primary keys Example of mapping an M:N relationship ER diagram (M:N) The Supplies relationship will need to become a separate relation AN ENTITY‐RELATIONSHIP DIAGRAM This graphic shows an example of an entity relationship diagram. It shows that one ORDER can contain many LINE_ITEMs. (A PART can be ordered many times and appear many times as a line item in a single order.) Each LINE ITEM can contain only one PART. Each PART can have only one SUPPLIER, but many PARTs can be provided by the same SUPPLIER. This diagram shows the relationships between the entities SUPPLIER, PART, LINE_ITEM, and ORDER that might be used to model the database Distributing databases: Operations Storing database in more than one place Partitioned: Separate locations store different parts of database Replicated: Central database duplicated in entirety at different locations Distributed Databases There are alternative ways of distributing a database. The central database can be partitioned (a) so that each remote processor has the necessary data to serve its own local needs. The central database also can be replicated (b) at all remote locations. UsingDatabasestoImproveBusiness PerformanceandDecisionMaking Very large databases and systems require special capabilities, tools To analyze large quantities of data To access data from multiple systems Three key techniques 1.Data warehousing 2.Data mining 3.Tools for accessing internal databases through the Web DATABASE MANAGEMENT SYSTEM TOOLS Five software components: 1. DBMS engine 2. Data definition subsystem 3. Data manipulation subsystem 4. Application generation subsystem 5. Data administration subsystem 3‐106 DATABASE MANAGEMENT SYSTEM TOOLS 3‐107 DBMS Engine • DBMS engine – accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device • DBMS engine separates the logical from the physical 3‐108 DBMS Engine • Physical view – how information is physically arranged, stored, and accessed on some type of storage device • Logical view – how you as a knowledge worker need to arrange and access information • With a database, you only concern yourself with your logical view 3‐109 Data Definition Subsystem • Data definition subsystem – helps you create and maintain the data dictionary and define the structure of the files in a database • You must create a data dictionary before entering information into a database • Module J covers this for Microsoft Access 3‐110 Data Manipulation Subsystem • Data manipulation subsystem – helps you add, change, and delete information • This is your primary DBMS interface as you work with a database 3‐111 – Views – Report generators – QBE tools – SQL Views • View – allows you to see the contents of a database file – Make whatever changes you want – Perform simple sorting – Query to find the location of information – Looks similar to a workbook with no row numbers 3‐112 Views 3‐113 Report Generators • Report generator – helps you quickly define formats of reports and what information you want to see in a report • You can save report formats and generate reports at any time with up‐to‐date information 3‐114 Report Generators 3‐115 QBE Tools • Query‐by‐example (QBE) tool – helps you graphically design the answer to a question • “What driver most often delivers concrete to Triple A Homes?” 3‐116 QBE Tools 3‐117 SQL • Structured query language (SQL) – standardized fourth‐generation language found in most DBMSs • Performs the same task as a QBE tool – But uses a sentence structure instead of point‐ and‐click interface • SQL is used mostly by IT people 3‐118 Application Generation Subsystem • Application generation subsystem – contains facilities to help you develop transaction‐ intensive applications – Data entry screen (called forms) – Programming languages • Used mostly by IT specialists 3‐119 Data Administration Subsystem • Data administration subsystem – helps you manage the overall database environment – Backup and recovery – Security management – Query optimization – Concurrency control – Change management 3‐120 Data Administration Subsystem • Backup and recovery – Periodically back up information – Recover a database if a failure occurs • Security management – Who has access to what information – Who can perform certain tasks (e.g., add, change, or delete) on information 3‐121 Data Administration Subsystem • Query optimization – Restructure physical view of information to optimize response times to queries • Concurrency control – What happens if two people makes changes to the same information at the same time? 3‐122 Data Administration Subsystem • Change management – What is the effect of structural changes to a database? – What if you add a new column? – What happens if you delete a column? – What happens if you change a column’s attributes? 3‐123 DATA WAREHOUSES AND DATA MINING • Data warehouses support OLAP and decision making • Data warehouses do not support OLTP • Data‐mining tools are the tools you use to work with a data warehouse – DBMS software = database – Data‐mining tools = data warehouse 3‐124 What Is a Data Warehouse? • Data warehouse – logical collection of information – gathered from operational databases – used to create business intelligence that supports business analysis activities and decision‐making tasks 3‐125 Components of a Data Warehouse 9‐126 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Data Warehouse Summary • • • • Multidimensional Rows and columns Also layers Many times called hypercubes 3‐127 TheDatabaseApproachtoDataManagement MULTIDIMENSIONAL DATA MODEL The view that is showing is product versus region. If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible. Functions • Online transaction processing (OLTP) – the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information – Databases support OLTP – Operational database – databases that support OLTP 3‐129 Functions • Online analytical processing (OLAP) – the manipulation of information to support decision making – – – – 3‐130 Databases can support some OLAP Data warehouses only support OLAP, not OLTP Why? Data warehouses are special forms of databases that support decision making Online analytical processing (OLAP) Supports multidimensional data analysis Viewing data using multiple dimensions Each aspect of information (product, pricing, cost, region, time period) is different dimension E.g., how many washers sold in the East in June compared with other regions? OLAP enables rapid, online answers to ad hoc queries Data marts: Subset of data warehouse Summarized or highly focused portion of firm’s data for use by specific population of users Typically focuses on single subject or line of business Data warehouse: Stores current and historical data from many core operational transaction systems Consolidates and standardizes information for use across enterprise, but data cannot be altered Data warehouse system will provide query, analysis, and reporting tools Data Marts • Data warehouses can support all of an organization’s information • Data marts have subsets of an organizationwide data warehouse • Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept 3‐133 Components of a Data Mart 9‐134 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Data Marts 3‐135 Object in Business Information Systems Business Intelligence(BI): Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers Principle tools include: Software for database query and reporting Online analytical processing (OLAP) Data mining More definition UsingDatabasestoImproveBusiness PerformanceandDecisionMaking Data mining: More discovery driven than OLAP Finds hidden patterns, relationships in large databases and infers rules to predict future behavior E.g., Finding patterns in customer data for one‐to‐one marketing campaigns or to identify profitable customers. Types of information obtainable from data mining Associations, Sequences, Classification Clustering, Forecasting Predictive analysis in Data Mining; Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events E.g., Probability a customer will respond to an offer Information Vs. Intelligence 3‐142 What Are Data‐Mining Tools? • Data‐mining tools – software tools that you use to query information in a data warehouse – Query‐and‐reporting tools – Intelligence agents – Multidimensional analysis tools – Statistical tools 3‐143 What Are Data‐Mining Tools? 3‐144 Converging Disciplines 9‐145 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Query‐And‐Reporting Tools • Query‐and‐reporting tools – similar to QBE tools, SQL, and report generators in the typical database environment 3‐146 Intelligent Agents • Use various artificial intelligence tools such as neural networks and fuzzy logic to form the basis for “information discovery” and building business intelligence • Help you find hidden patterns in information 3‐147 Multidimensional Analysis Tools • Multidimensional analysis (MDA) tools – slice‐and‐dice techniques that allow you to view multidimensional information from different perspectives – Bring new layers to the front – Reorganize rows and columns 3‐148 Statistical Tools • Help you apply various mathematical models to the information stored in a data warehouse to discover new information – Regression – Analysis of variance – And so on 3‐149 Enterprise Application Integration • “Re‐architecting” existing programs so that an intermediate layer, termed middleware, is developed between the applications and the databases • Designed to make calls to the middleware layer rather than the other applications • Streamlines maintenance process because changes to an application will not affect all the interfaces connected to it © Gabriele Piccoli Meta Data Operations The EAI Approach Legacy Application Legacy Application Database 2 Middleware ERP SCM Database 1 SCM: Supply Chain Management ERP: Enterprise Resource Planning © Gabriele Piccoli CRM Infrastructure © Gabriele Piccoli DSS Characteristics and Capabilities DSS Components Data Management Subsystem • • • • DSS database DBMS Data directory Query facility A Web‐Based DSS Architecture Expert Systems vs. DSS Expert System • Inject expert knowledge in to a computer system. • Automate decision making. • The decision environments have structure • The alternatives and goals are often established in advance. • The expert system can eventually replace the human decision maker. Decision Support System • Extract or gain knowledge from a computer system • Facilitates decision making • Unstructured environment • Alternatives may not be fully realized yet • Use goals and the system data to establish alternatives and outcomes, so a good decision can be made Artificial Intelligence and Decision Support System in Bussiness are attached in Appendix Webs, Documents are Data Where House Too WHAT CAN BUSINESSES LEARN FROM TEXT MINING? Text mining Extracts key elements from large unstructured data sets (e.g., stored e‐mails) What challenges does the increase in unstructured data present for businesses? How does text‐mining improve decision‐making? What kinds of companies are most likely to benefit from text mining software? In what ways could text mining potentially lead to the erosion of personal information privacy? Web mining Discovery and analysis of useful patterns and information from WWW E.g., to understand customer behavior, evaluate effectiveness of Web site, etc. Web content mining Knowledge extracted from content of Web pages Web structure mining E.g., links to and from Web page Web usage mining User interaction data recorded by Web server Databases and the Web Many companies use Web to make some internal databases available to customers or partners Typical configuration includes: Web server Application server/middleware/CGI scripts Database server (hosting DBM) Advantages of using Web for database access: Ease of use of browser software Web interface requires few or no changes to database Inexpensive to add Web interface to system Firms use the Web to make information from their internal databases available to customers and partners • Middleware and other software make this possible • Database servers • CGI(Computer Gateway Interface) • Web interfaces provide familiarity to users and savings over redesigning and rebuilding legacy systems LINKING INTERNAL DATABASES TO THE WEB Users access an organization’s internal database through the Web using their desktop PCs and Web browser software. ManagingDataResources Establishing an information policy Firm’s rules, procedures, roles for sharing, managing, standardizing data Data administration: Firm function responsible for specific policies and procedures to manage data Data governance: Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations Database administration: Defining, organizing, implementing, maintaining database; performed by database design and management group Nature and Quality of Data Basic : True Data Good: Many(File, Record) Better : Organized(Database, Data Where house) Best : Analysis, Intelligence( Data mining, Intelligence) MANAGING THE INFORMATION RESOURCE • Information is an organizational resource • Just like people, capital, and equipment • It must be managed effectively based on True data and Systems 3‐169 MANAGING THE INFORMATION RESOURCE • Who should oversee your organization’s information resource? – Chief information officer (CIO) – oversees an organization’s information resource – Data administration – plans for, oversees the development of, and monitors the information resource – Database administration – technical and operational aspects of managing information 3‐170 MANAGING THE INFORMATION RESOURCE • Is information ownership a consideration? – If you create information, you “own” it – You will also share it with others – Because you “own” it, you are responsible for its quality 3‐171 MANAGING THE INFORMATION RESOURCE • How “clean” must your information be? – Duplicate information (records) must be eliminated – Inaccurate information must be corrected – Information forms the basis of business intelligence – If your business intelligence is bad, you will make poor decisions 3‐172 Ensuring data quality More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete Most data quality problems stem from faulty input Before new database in place, need to: Identify and correct faulty data Establish better routines for editing data once database in operation Data quality audit: Structured survey of the accuracy and level of completeness of the data in an information system Survey samples from data files, or Survey end users for perceptions of quality Data cleansing Software to detect and correct data that are incorrect, incomplete, improperly formatted, or redundant Enforces consistency among different sets of data from separate information systems CREDIT BUREAU ERRORS —BIG PEOPLE PROBLEMS Assess the business impact of credit bureaus’ data quality problems for the credit bureaus, for lenders, for individuals. Are any ethical issues raised by credit bureaus’ data quality problems? Analyze the people, organization, and technology factors responsible for credit bureaus’ data quality problems. What can be done to solve these problems? Data Mining as a Career Opportunity • Knowledge of data mining can be a substantial career opportunity for you – Query and Analysis and Enterprise Analytic Tools (Business Objects) – Business Intelligence and Information Access tools (SAS) – Many in Cognos (the data warehouse leader) – PowerAnalyzer (Informatica) SAS: System Analysis Scientist 3‐176 Review ? Describe how a relational database organizes data and compare its benefits Identify and describe the principles of a database management system. Evaluate tools and technologies for providing information from databases to improve business performance and decision making. CAN YOU… Describe business intelligence and its role Compare databases and data warehouses by OLTP and OLAP Define 5 software components of a DBMS 3‐178 CAN YOU… List/describe key characteristics of a data warehouse Define 4 major types of data‐mining tools List key considerations in managing information as a resource 3‐179 Appendix for business Intelligence DSS: Decision Support Systems and AI: Artificial Intelligence In Business AI in Business Some Commercial Applications • Decision Support • Expert Systems • Information Retrieval • Virtual Reality • Robotics I’m ready to do some business Overview of AI • Goal of AI – develop computer systems that exhibit intelligence or simulate the ability to think • AI pioneered by Computer Science • But, AI involves a combination of – Computer Science, Biology, Psychology, Linguistics, Mathematics,Engineering What really is Intelligence? • Specifically, what are the signs of Intelligent Behavior? • Think about it for a while Which of the following is the best example of intelligent behavior? 1. Ability to add numbers 2. Ability to see and recognize 25% objects 3. Ability to adapt to surroundings 4. Ability to learn for mistakes 1 25% 25% 25% 2 3 4 10 What really is Intelligence? • You are about to start an online chat (IM) with two entities: – One entity is a human – The other is a computer • After hours of conversation, you can not tell which entity is a computer. • Does this mean the computer is Intelligent? Intelligent Behavior • What are some of the signs, attributes, or characteristics of Intelligent Behavior Characteristics of Intelligent Behavior 1. Learn from experience & apply the knowledge Computer can automatically improve performance based on Experience Machine Learning Computational Learning Characteristics of Intelligent Behavior 2. Handle complex situations Computer Systems can often handle complexity better than humans Consider a process control system that must simultaneous track 100 different system variables. Characteristics of Intelligent Behavior 3. Solve problems when important information is missing Computer Systems can find patterns and deal with all sorts of missing information Characteristics of Intelligent Behavior 4. React quickly & correctly to new situations; Acquire & Apply Knowledge Here is where computers start to fail. Adapting to completely new situations is a problem for computer systems. Its very difficult to design a computer system that can combine, connect, and acquire knowledge to solve completely new problems 5. 6. 7. 8. Characteristics of Intelligent Behavior Determine what is important. Exhibit creativity and imagination Process visual information efficiently Use reason to solve problems These are some other Characteristics that humans possess. Computer systems have a lot of catching up to do. Which of the following do computer need to catch up on? 1. Determine what is important. 2. Exhibit creativity and 25% imagination 3. Process visual information efficiently 4. Use reason to solve problems 1 25% 25% 25% 2 3 4 10 AI in Business • AI continues to improve and evolve. • Scientists and Engineers are pushing the envelope of what is possible. • In Business, there is a better understanding of the capabilities of Intelligent Computer Systems • It is important to know which types of problems are suited for humans, and which are suited for Computers. Human Intelligence vs. AI Attribute Human Intelligence Artificial Intelligence Use a variety of information sources High High Ability to acquire large amounts of external info. Medium High Ability to do rapid, accurate, and complex calculations Low High Ability to transfer information rapidly Low High Human Intelligence vs. AI Attribute Human Intelligence Artificial Intelligence Ability to use sensors or senses High Medium Creativity or imagination High Low Ability to learn from experience High Medium Ability of be adaptive High Medium AI: Application Domains AI: Commercial Domains • Decision Support – Integrating the advantages of AI with Human Intelligence. – More intelligent Interfaces – More intelligent processing for massive data AI: Commercial Domains • Information Retrieval – Automatic simplification for massive data – Natural language technology: computer can speak our language. AI: Commercial Domains • Virtual Reality – Better training environment from pilots to doctors • Robotics – Bringing the precision and speed of computers into the physical world – Goes beyond manufacturing and assembly lines; Baggage Inspection, Bomb Removal, Replacement Limbs. Expert Systems • The idea is to inject expert knowledge in to a computer system. • The primary purpose is to automate decision making. • The decision environments have structure • The alternatives and goals are often established in advance. Expert Systems vs. DSS Expert System • Inject expert knowledge in to a computer system. • Automate decision making. • The decision environments have structure • The alternatives and goals are often established in advance. • The expert system can eventually replace the human decision maker. Decision Support System • Extract or gain knowledge from a computer system • Facilitates decision making • Unstructured environment • Alternatives may not be fully realized yet • Use goals and the system data to establish alternatives and outcomes, so a good decision can be made What is the biggest difference between a Decision Support System and an MIS 1. DSS’s are interactive and ad hoc 2. DSS’s focus on transforming 25% information into knowledge 3. MIS’s focus on transforming data into information 4. All of the above 1 25% 25% 25% 2 3 4 10 What is the biggest difference between an MIS and TPS 1. in a TPS there is no analysis 2. an MIS focuses on reports 3. an TPS focuses on updating a database 4. All of the above 25% 25% 25% 25% 1 2 3 4 10 How is the analysis different for a MIS vs. DSS 1. MIS: Analysis involves computing aggregates 2. MIS: Analysis involves creating useful charts and graphs 3. DSS: Connects information with decisions 4. DSS: Builds scenarios 25% 25% 25% 25% 1 2 3 4 10 Some Interesting Applications of Expert Systems • Triage – Medical Diagnosis (Medical Expert System) – User enters symptoms – System makes diagnosis – Doctors collective expertise is captured in the system • Patriot Missile Guidance System – Radar identifies Scud missile – System steers Patriot missile to it intercepts Scud missile – Laws of physics, expert knowledge about missile trajectory is captured in the system • Financial Decision Making – Currency Trading Expert System Categories • Decision Making – buy/sell – risk/no risk – rain/ no rain • Trouble Shooting / Diagnosis • Selection/Classification – Tell me what you see, expert system figures out what it really is... • Process Monitoring and Control – Robot control, assembly‐line control, missile control – Hello welcome to Dell; • Design/Configuration how can I help you? – Specify what you want, – Suddenly an idiot seems expert system figures out specifically how to do it. like an expert. Expert System Components Expert System Software User Interface user Engine Knowledge base Expert System Components Expert System Software User Interface Engine Knowledge base user Expert System Development Process Knowledge Acquisition Program Expert or Knowledge Engineer Raw Data or Facts Expert System Components Non‐ expert Robot Missile Expert System Software Interface Engine Knowledge base Expert System Development Process Knowledge Acquisition Program Expert or Knowledge Engineer Raw Data or Facts Expert System vs. DSS Someone with Knowledge Decision Maker DSS Software Model Base User Interface Analytical & Statistical Models Engine DSS Processes Data Management Extraction, Generation, Validation, etc. Raw Data or Facts