* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Versant Object Database wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Relational model wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
Clusterpoint wikipedia , lookup
Databases Chapter 5 Objectives Understanding the strategic advantage of using Databases? How does it support operations? How does it support management? What are the “best practices” today? Concepts Major types of databases. Data warehouses and data mining. Logical data elements. Fundamental database structures. Database development. Databases are Important and Exciting Q: Why are databases so important? A: Today’s business enterprises cannot survive or succeed without quality data about their internal operations and external environment. Q: Why are database so exciting? A: This question makes no sense; its like asking why are Brad Pitt and Angelina Jolie so hot? Data Resource Management Definition: A managerial activity that applies information systems technologies to the task of managing an organization’s data resources to meet the information needs of their business stakeholders. Deep down inside, you all want to be data resource managers; you really do; trust me! Foundation Data Concepts Character – single alphabetic, numeric or other symbol Field – group of related characters Entity – person, place, object or event Attribute – characteristic of an entity Foundation Data Concepts Record – collection of attributes that describe an entity File – group of related records Database – integrated collection of logically related data elements Logical Data Elements Entities and Relationships Entities and Relationships Entities Real world things with attributes Abstract things that can be recorded A Bill, A Person, A Building, A Product, A Fishmonger, A Recipe Relationships How are the entities related How are entities connected logically Relationships Parent has Child Customer makes An Order Student takes A Class Fishmonger works at A Dock Recipe includes An Ingredient Brad Pitt dates Angelina Jolie Types of Databases Types of Databases Operational – store detailed data needed to support the business processes and operations of a company Distributed – databases that are replicated and distributed in whole or in part to network servers at a variety of sites External – contain a wealth of information available from commercial online services and from many sources on the World Wide Web Hypermedia – consist of hyperlinked pages of multimedia Data Warehouse Definition: Large database that stores data that have been extracted from the various operational, external, and other databases of an organization Data Warehouse System Data Mart Definition: Databases that hold subsets of data from a data warehouse that focus on specific aspects of a company, such as a department or a business process Data Warehouse & Data Marts Data Mining Definition: Analyzing the data in a data warehouse to reveal hidden patterns and trends in historical business activity Data Mining Data Mining Uses Perform “market-basket analysis” to identify new product bundles. Find root causes to quality or manufacturing problems. Prevent customer attrition and acquire new customers. Cross-sell to existing customers. Profile customers with more accuracy. Traditional File Processing Definition: Data are organized, stored, and processed in independent files of data records Flat file BUS100,ACCT 1,SMITH J.,M 10-11 BUS100,ACCT 1,JONES A., T 11-12 BUS200,ACCT 2,SMITH J., T 1-2 BUS200,ACCT 2,ANDERSON B.,T 3-4 CSI120,PROG 1,WILLIAMS C.,W 4-6 Problems with “flat files” Concurrent access update problems Serialization Searching for a record – sequential Inserting a record Update/change size of record Deleting a record Problems of File Processing Data Redundancy – duplicate data requires an update to be made to all files storing that data Lack of Data Integration – data stored in separate files require special programs for output making ad hoc reporting difficult Data Dependence – programs must include information about how the data is stored so a change in storage format requires a change in programs Database Management Approach Definition: Consolidates data records into one database that can be accessed by many different application programs. Software interface between users and databases Data definition is stored once, separately from application programs Database Management Approach Database Management Software (DBMS) Definition: Software that controls the creation, maintenance, and use of databases Database Interrogation Definition: Capability of a DBMS to report information from the database in response to end users’ requests Query Language – allows easy, immediate access to ad hoc data requests Report Generator - allows quick, easy specification of a report format for information users have requested Natural Language vs. SQL Queries Database Structures Hierarchical – relationships between records form a hierarchy or treelike structure Network – data can be accessed by one of several paths because any data element or record can be related to any number of other data elements Relational Database Structure Definition: All data elements within the database are viewed as being stored in the form of simple tables Relational Database M : N relationship N M EMPLOYEE WORKS_ ON PROJECT Modeling Relationships using Tables 1:1 1:Many (1:N) Many : Many (N:M) Instructors InstID LName FName 100 Babych Dave 200 Sundin Susan 300 Griffey Rose 400 Chan Tom Compensation InstID Salary Benefits 100 60000 YES One to One 200 45000 YES PK 300 75000 NO 400 39000 YES Instructors InstID 100 LName Babych FName Dave Requires FOREIGN KEY 200 Sundin Susan 300 Griffey Rose 400 Chan Tom One to Many Sections SectID InstID Term C195 100 S-05 C220 300 S-05 C295 300 S-05 F750 400 S-05 Many-to-many Instructor/Courses No-No’s: PK violation Instr Name Course Name Time Smith Smith Jones CSIS114 CSIS120 CSIS114 MW 1-2 TR 3-4 MW 9-10 Many-to-many Instructor/Courses No-No’s: Multiple entry fields Instr Name Course Name Time Smith Jones CSIS114, CSIS120 CSIS114 MW 1-2, TR 3-4 MW 9-10 Instructors InstID LName 100 Babych 200 Sundin 300 Griffey 400 Chan FName Dave Susan Rose Tom Courses CrsID Title CPSC310 CPSC440 ENGL150 MKTG350 MIS Intro. Prog. Short story Marketing Section – linking Table SecID InstID CrsID Term C195 100 MKTG350 F-00 C220 300 CPSC310 F-00 C295 300 CPSC440 W-00 F750 400 CPSC440 S-00 FK FK Searching a file UNSORTED file: Sequential search On average, must read N/2 records Very inefficient Search Keys Primary keys – unique (SSN) or SID ! ORDERS the Table and is always indexed Must be unique Indexed allows fast search Secondary keys – together make the record unique (NAME, BIRTHDAY) Fig 5.2 Search a sorted file (using direct access) Can use Binary Search for a record Skip to “Middle” of file Compare search key with record If smaller – record in first half of file If larger – record in second half of file Continue until record found or no more records to search Efficient – about Log base 2(N) probes Indexing Adding additional navigation information Help locate records quickly Primary Index: Contains KEY field and BLOCK number Only KEY of first record in BLOCK Do binary search on (smaller) Index file Primary Key Index Non-dense Block anchor key, Pointer Aaron, Ed Adams, John ... Aaron, Ed ENTIRE RECORD Other names... ENTIRE RECORD Acosta, Marc ENTIRE RECORD Blocks on Disk Adams, John ENTIRE RECORD Other names... ENTIRE RECORD Akers, Jan ENTIRE RECORD Secondary index Need way to also quickly find records by an attribute that the file is not stored by. (nonprimary index) EX: file stored by customer number, need to reference by name as well. ABLE 1, CHARLIE BAKER 2, PAUL CHARLIE 3, DAVID DAVID 4, ABLE MARY 5, BAKER PAUL 6, MARY Dense Secondary Index Database approach: advantages Centralized repository Program/db independence Can form new data relationships Reduce redundancy Data integrity – transaction rollback Data recovery – log files More database advantages Logical view of data – Schemas Standardized access routines Security Concurrency – record locking DBMS utilities – reorganize/backup etc. Disadvantages??? $$$$ Additional staff All eggs in one basket Lock in to a specific vendor Example Design database tables that model an advisor at Siena and student advisees. Advisor Attributes: Professor Employee ID (PK) Name Department Student Siena ID (PK) Name, address, Major, e-mail address etc. Advisor ID (FK) Design tables to model Actors in Movies Movie table Title or movie-id [PK] Studio Year filmed, etc. Actor table Actor ID [PK] Actor name Address Age, etc Acted-In Link table Movie Title [FK] Actor ID [FK] Compensation Role the actor played, etc. Database Management Systems (DBMSs) re-cap Provide a user view Provide tools to create & modify the database Store & retrieve data Manipulate data Produce reports Data Design Issues Content: What data should be collected? Access: What data should be given to what users? Logical structure: How will the data be organized to make sense to a particular user? Physical organization: Where will the data actually be located? What indexes to build? Fig 5.11 Fig 5.17 Selecting a DBMS Database size Number of concurrent users Performance Integration Selecting a DBMS Features Vendor Cost