* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Concepts - Information Systems
Survey
Document related concepts
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Database Introduction Objectives • To understand the nature and characteristics of databases • To survey some important and interesting database applications • To gain a general understanding of tables and relationships • To define the term database management system (DBMS) and describe the functions of a DBMS • To define the term database and describe what is contained within the database • To define the term metadata and provide examples of metadata • To define and understand database design from existing data TRADITIONAL FILE PROCESSING SYSTEMS CASE STUDY: Sleepy Valley High School Information Split Among Different Departments • We will look at the offices at Sleepy Valley H.S. who keep files on the teachers at Sleepy Valley Principal's Office Assistant Principal's Office Personnel Department Payroll department Where is data stored? The Principal, Mr. Halprin, uses a regular notebook. The Assistant Principal, Ms. Lewis, uses index cards. The Personnel Office Director, Ms. Lauren, uses a Windows word processing program. The Payroll Office Director, Mr. Smith, uses a Lotus 1-2-3 Spreadsheet. Principal Mr. Halprin is the Principal He keeps two very important files on teachers: A Teacher File with current data on all teachers. A Job History File for every teacher who has taught at the school. Teachers Degree Dept. Code • 001 Mr. Halprin's Teacher FileRd. Sill, Bill 1 Apple M.A. 3 Lan, Jackie 18 Lord Rd. M.A. 2 M.S. 5 M.A. 1 M.S. 3 M.S. 2 Teacher ID# 012 Teacher Name Address 15 Pine Terrace 014 James, Hal 027 Lan, Hennie 20 Lord Rd. 035 Gold, Amy 041 Doe, Jon 21 Deer Lane 4 High Street Pay Scale ID# SCALE • Mr. Halprin's Job History File 001 001 001 012 014 014 027 035 042 044 2 3 4 4 1 2 1 3 2 1 DATE ATTAINED 9/15/1995 9/15/1996 9/15/2001 12/15/2008 1/15/1999 9/15/2002 9/15/1994 1/15/1987 9/15/1995 9/1/1996 Why maintain data? One reason is REPORTS. The world demands REPORTS. For Mr. Halprin, the Teachers Union demands an annual salary report, giving for each teacher their salary scale history. Teacher Salary Scale Information Teach. ID Name Address Degreee Current Scale • 001 Teacher's Job History Report Sill, Bill 1 Apple M.A. 4 Road Scale Date 2 3 4 9/15/95 9/15/96 9/15/01 012 Lan, Jackie 18 Lord Rd. M.A. 4 4 12/15/08 014 James, Hal 15 Pine Terrace M.S. 2 1 2 1/15/99 9/15/02 Assistant Principal Ms. Lewis is the Asst. Principal One of her assignments is to maintain a "Speakers List" She maintains 2 files – Teacher-Speaker File – Seminar Topic File Speaker File Teacher ID Teacher Name Teacher Address • Ms. Lewis' Teacher File 001 Sill, Bill 1 Apple Rd. Highest Degree Home Phone M.A. 648-2174 012 Lan, Jackie 18 Lord Rd. Ph.D. 334-9392 014 James, Hal 15 Pine Terrace M.S. 647-1267 027 Lan, Hennie 20 Lord Rd. M.A. 325-1111 035 Silver, Amy 65 Bliss Ave. M.S. 246-1342 042 Doe, John 4 High Street M.S. 371-1994 Do You See Any Problems With these different files? • Look closely at Teacher ID 35 in both Mr. Halprin’s Teacher File, and Ms. Lewis’s Teacher File? – Can you explain the differences? – Why did they occur? – Is this a desirable feature for this file processing system? • Also look at the “Doe’s” in both Teacher Files. Seminar Information Teacher ID Seminar Title Audience Level Date Last Given 001 Crisis in Bosnia H-So 1/8/2003 H-Jr 5/2/2009 • Ms. Lewis' Seminar Topic File Shakespeare's 012 Poetry 014 Finding Primes H-Fr 10/4/2001 014 Non-Euclidean Geometry C-So 4/15/2002 014 Digital Photography H-Sr. 11/1/2002 027 Modern Portugese Poetry H-Fr 4/4/2002 035 Hypergeometric Invariant transformations C-Sr 5/3/2003 Personnel Office Ms. Lauren directs the Personnel Office Among several files, she maintains the following 2 files Personnel Teachers File Insurance Plan File Teacher / Medical Plan Teacher ID Dependent Teacher name Medical Plan # Coverage ? 001 Sill, Bill • Personnel's Teacher File 3 N 012 Lan, Jackie 3 Y 014 James, Hal 1 Y 027 Lan, Hennie 2 N 035 Silver, Amy 2 Y 042 Doe, John 4 Y Medical Plan Information Medical Plan # Plan Name 1 Best Medical 2 Good Health 3 USMEDCO 4 National Medical 5 Republic Care • Insurance File Payroll Office Mr. Smith is the Director of the Payroll Office Mr. Smith maintains two files of interest to us –a Teacher Master File – a Department File Salary Teach. Name Address Zip Code Salary ID • Payroll Dept's Teacher File 001 Sill, Bill 1 Apple 07102 $54,613 Rd. 012 Lan, 18 Lord 07122 $46,215 Jackie Rd. 014 James, 17 Main 07145 $48,112 Hal Street 027 Lan, 20 Lord 07104 $44,000 Hennie Rd. 035 Gold, 21 Dear 07103 $42,465 Amy Lane 042 Doe, 4 High 07102 $40,810 John Street Year to Dept. # date $27,502 3 $28,102 2 $24,016 5 $22,032 4 $26,217 5 $25,419 5 Payroll's Department File Dept # 1 2 Dept. Name Science English Dept. Loc. 207 315 Phone Ext. 3111 3164 3 Social Studies 117 3152 215 3123 461 3154 4 5 Foreign Languages Mathematics File Processing Problems Redundancy (duplication of data) • wasteful of space (storage) • update inefficiencies (when a teacher moves to a new address, or changes her name, the teacher's "record" must be changed each place it is stored) • data inconsistency (different addresses for the same teacher in different files) Legacy Programs Programs to work on these files at Sleepy Valley H.S. The Payroll Department has written some lengthy Pascal programs to access their files and perform queries and reports. The Personnel Department has written some C programs to access their files and perform queries and reports. Sleepy Valley H.S. - continued The Board of Education asks the Principal to provide a report giving: • For EACH department, list separately the scale 1,2,3 and 4 teachers the date of promotion to current rank for each teacher the seminar topics for each teacher. How To Create the Board of Education requested report We must access files from 3 different offices, since no single office has all the files needed to create this report • From the Payroll Office files, we can obtain, for each Department, the teachers' names and id's and departments • From the Principal's Office Files, we can obtain, for each teacher, the current salary scale and date this salary scale became effective • From the Asst. Principal's Office files, we can obtain the seminar topics for each teacher. A Potential Solution… Build a SINGLE pool of interrelated files, rather than SEPARATE collection of files. • (This is an INFORMAL description of a database). The Teacher File in a Database ID# - NAM ADD SALARY YTD DEPT # DEG SCL INS DEP 001Sill, Bill 1 Apple 54613 27500 3 M.A. 4 3 N 012Lan, Jacki 18 Lord Rd. 46215 24256 2 PhD 4 3 Y 014etc.- - - - - - - - - - How would other files appear? The other files would have the identical structure (scheme) as they had in their original departments. • • • • • The Department File from Payroll The Insurance File from Personnel The Seminar Topic File from the Asst. Principal The Job History file from the Principal's Office etc. The DEPARTMENT "schema" Dept # Dept. Name Dept. Loc. Dept. Phone 1 2 3 Science 207 3111 or (Dept#, Deptname, DeptLoc, Phone) Important Notes: We eliminate redundancy in the database by listing each teacher only once. The report requested by the Board of Education is much easier to produce, as all of the needed data is present in the single database. The teacher "table" has "relationships" with other "tables." Why Use A Database? (Summary) • The purpose of a database is to help people and organizations keep track of things • Problems of using list to store data – Data inconsistencies – Data privacy: The departments want to share some, but not all, of their data • Databases store data in single-theme tables • Tables are related through primary and foreign keys Database Theory: Entities, Attributes and Relationships An ENTITY is an object that exists and is distinguishable . from other objects. • "a person, place or thing" Examples of entities include individual teachers, departments, seminar topics, etc. An ENTITY SET is the set of all entities of the same type. • ALL of the teachers at Sleepy Valley HS constitute an ENTITY SET. Each individual teacher is an ENTITY Database Theory: Attributes An ATTRIBUTE is a defining property or quality of an entity (we will see that relationships will be permitted to have attributes also) • • Attributes of Teachers might include their id#, address, degree, department #, etc. Attributes of Departments might include their name, room#, telephone, chairman, etc. The DOMAIN OF AN ATTRIBUTE will be the set of allowable values for an attribute. Database Theory: Relationships A RELATIONSHIP is an association between entities in different entity sets. – There is a RELATIONSHIP between entities (teachers) in the TEACHER entity set and entities (departments) in the DEPARTMENT entity set, namely each teacher (e.g., Jackie Lan), is associated with a given department (English) A MAPPING CARDINALITY is the number of entities to which another entity can be associated with in a relationship. Lakeview Equipment List Lakeview List Issues • Four “themes” in the list (Jobs, Contactors, Equipment, & Rentals) • Suppose KH Services changes its phone# ? • Suppose we delete RB Partnership’s row? • Can a new contractor be added if there is no rental? • What about the missing data for the scaffolding rental? • Are the different daily rental rates for the backhoe rental valid? Four themes – four tables? • JOB (Name,…) • CONTRACTOR (Contractor, Phone,…) • EQUIPMENT(EquipmentType, EquipmentNumber, Dailyrate,…) • RENTAL (StartDate, EndDate, Days,…) Unique Identifiers, Relationships • JOB (JobID, Name) • CONTRACTOR (ContractorID, Contractor, Phone, Street, City, State, Zip) • EQUIPMENT (EquipmentID, EquipmentType, EquipmentNumber, DailyRate) • RENTAL (ID, JobID, ContractorID, EquipmentID, StartDate, EndDate, Days) Populated Lakeview Database Tables in Microsoft Access, With Relationships Sample SQL Query Database Definition A Database is a structure that can house information about multiple types of entities, as well as relationships among the entities. Database System Components DBMS • DBMS: Database Management System – Create database, tables, and supporting structures – Read and update database data – Maintain database structures – Enforce rules – Control concurrency – Provide security – Perform backup and recovery • Examples: Oracle, DB2, Microsoft Access, SQL Server Database • Database is a self-describing collection of related records or tables – User Data – Metadata: data about the structure of a database – Indexes and related structures – Stored procedures: program modules stored within the database – Triggers: a procedure that is executed when a particular data activity occurs – Application metadata: data describing application elements such as forms and reports Kroenke-Auer Definition of a Database (Important!) "A database is a self-describing collection of integrated records." An Analogy Is a Library a collection of books and periodicals? • A Library also contains a card-catalog, which describes the books in the Library. So, a Library might be considered a selfdescribing collection of books and periodicals. Self Describing In addition to source data, a database contains a description of its own structure. – This description is referred to as either: a data dictionary a data directory metadata Different Types of Databases Typical Size Typical Number of of Concurrent Users Database Type Example Personal Mary Richards House Painting 1 < 10 megabytes Work Group Treble Clef Music <25 <100 megabytes Organizational Licensing & Registration Hundreds to Thousands 1 Trillion Bytes Mary Richards House Painting CUSTOMER( CUST-ID, CustomerName, Street, City, State, Zip, PhoneNumber, SOURCE-ID) JOB( JOB-ID, JobDate, Description, AmtBilled, AmtPaid, CUST-ID) SOURCE( SOURCE-ID, Name, PhoneNumber) Sea View Yacht Sales - Work Group BOAT-TYPE(BOAT-TYPE-ID, Description, NewPurchasePrice, NumberPurchased) SAILBOAT(SAILBOAT-ID, BoatName, DateListed, Year, AskingPrice, CurrentLocation, Description, BOAT-TYPE-ID) EQUIPMENT( EQUIPMENT-ID, ItemName, Item-Desc, SAILBOAT-ID) SALESPERSON(SALESPERSON-ID, Name, AreaCode, LocalNumber) CUSTOMER(CUSTOMER-ID, Custname, AreaCode, LocalNum, Street, City, State, Zip, Notes, SALESPERSON-ID) CUSTOMER-BOAT-INT(CUSTOMER-ID, BOAT-TYPE-ID) CUSTOMER-SAILBOAT-INT( CUSTOMER-ID, SAILBOAT-ID) Database Management System A Database Management System (DBMS) is a software product through which users interact with a database. • By "interact with a database" we mean the user can create, edit, query, produce reports, and perform other functions with a database. Desktop Database History • Ashton Tate (1979) introduces dBase II • In 1980's, dBase, Paradox and Foxpro are the "major players" • Microsoft buys out Foxpro • Borland (Paradox) buys out Ashton-Tate • Windows versions of DBMS products appear • Microsoft brings out Access 1.0 at price of $99 • Currently MS Access is most widely used microcomputer database product. Relational Database Model Developed by E. F. Codd in 1970 Based upon concepts in a branch of mathematics called Relational Algebra A (simple) methodology for structuring and processing a database Data is stored (conceptually) as tables. • Relationships between tables are "visible" in the data Sample Student & Course Tables StudentName Phone 100 200 300 Jones, Mary Parks, Franklin Thomas, Martha 323-0098 232-9987 887-4484 Course# CourseName Semester Grade StudentId BD100 BA402 F S A C 100 200 S B 200 BD100 BD200 BA150 Intro MIS Accting Seminar Mgmt Principles Intro MIS Database Intro to Accting F S F C B B 300 300 100 MA102 Intro Calculus F B 100 CS100 Intro Comp Sci F A 300 BF315 Example of a Data Model Departments Write to Single DB Payroll • Programs Personnel Programs Without a DBMS, we would still have individual departments writing their own programs to directly interact with the database Database Principal's Programs Interacting with the database through the DBMS Payroll Database Personnel Management System Principal Database Database Views A VIEW is some subset of the database which a DBMS permits a particular user, or set of users, to see. • {It is actually the Database Administrator who decides upon the views.} Another name for a view is a subschema (a subset of a schema). Teacher Database ID# NAM ADD SALARY YTD 001Sill, Bill 1 Apple SCL INS DEP 44613 17500 3 M.A. 4 3 N 012Lan, 18 Lord 36215 Jacki Rd. 14256 2 PhD 4 3 Y - - - - - 014etc.- DEPT # DEG - - - - Student Database ID# • - NAM ADD DEPT # DEG SCL INS DEP A View of the Teacher Database with NO Salary Information 001Sill, Bill 1 Apple 3 M.A. 4 3 N 012Lan, Jacki 18 Lord Rd. 2 PhD 4 3 Y 014etc.- - - - - - - - DB View Benefits provides the user with a "simpler" structure containing only relevant information protects sensitive data from users who should not be allowed to see such data note that a dbms will permit other security features, such as password protection, so a user may be able to view salary data, but not edit any changes in salary data A Second DBMS Feature Most DBMS allow you to specify INTEGRITY CONSTRAINTS on attribute values (i.e., you can specify the domain of an attribute) • Example: you can specify that SALARY for a teacher be numeric, non-negative and below $123,456. Specifying integrity constraints provides some protection against data-entry errors. Database Designers have a primary concern to maintain the integrity of the database. Relational Databases • The purpose of a database is to help people and organizations track things of interest to them • A relational database stores data in tables, which have rows and columns (like a spreadsheet). A database typically has many different tables, where each table stores data about a different thing • Each row in a table stores data about an occurrence or instance of the thing of interest (an entity) – rows are also known as records. • A column of a table stores an attribute common to all of the rows in a table. – For example, in a STUDENT table, we might have columns/attributes for firstname, lastname, major, GPA, emailaddress, etc. • A database stores data and other things. A Student Table and a Class Table A Database Has Data and Relationships • A database contains both data (stored in tables) and relationships between the data in the tables. • There could be several types of relationships between data. Tables have Relationships • Notice That The Tables Below Do Not Have Explicit Relationships, And Thus The Grades in the Grade Table Are Meaningless Naming Conventions • Table Names are written with all capital letters: – STUDENT, CLASS, GRADE • Column names are written with an initial capital letter, and compound names are written with a capital letter on each word: – Term, Section, ClassNumber, StudentName Adding Relationships • We link data in one table with data in another table and form relationships. • The links are accomplished by having an attribute in one table point to an attribute in another table. • Each row in a table has an attribute (or set of attributes) which serves as the unique identifier, known as the primary key. – So, in the STUDENT table, the attribute StudentNumber serves as the primary key, while the attribute ClassNumber serves as the primary key of the CLASS table. – Could ClassName serve as the primary key of the CLASS table? (Answer is “No” – do you see why?) – Could LastName serve as the primary key of the STUDENT table? Will LastName ALWAYS uniquely identify a record???? – Could EmailAddress serve as a primary key? (What if a student did not have an email address?) Foreign Keys • A foreign key is an attribute in one table which point to an attribute in another table. • Foreign keys enable us to form relationships between tables. Database (With Relationships) For Students & Grades Databases Create Information From Data • Data = Recorded facts and figures • Information = Knowledge derived from data, i.e., data presented in a meaningful context, where we may have applied some form of processing to the data, such as computing averages, sorting, grouping, etc. • Databases record data, but they do so in such a way that we can produce information from the data – The data on STUDENTs, CLASSes and GRADEs could produce information about each student’s GPA Database Examples Types of Databases • Single User Database Applications – Supports only one user at a time • Multi-User Database Applications – Supports multiple users concurrently • E-Commerce Database Applications • Similar database design concepts are apropos to the above types. Components of a Database Systems • A database system generally contains 4 components: – Users (from end-users to database administrators – The database application – The Database Management System (such as Microsoft Access, Oracle, etc.) – The data (the collection of facts stored in the database) • Aside : databases systems will also have metadata (data about data) and procedures/rules which govern the overall design and usage of the database What is a DBMS (Database Management System)? • A DBMS (Database Management System) is a collection of programs that manages the database structure and controls access to the data stored in the database • The DBMS is the intermediary between the user and the database – it receives requests from users, and translates them into the operations required to fulfill those requests – The DBMS makes it possible to share data among multiple applications or users – The DBMS hides much of the database’s internal complexity from end-users – The DBMS makes data management more efficient and effective The Components Components of a DBMS Using SQL SQL (Structured Query Language) is an internationally recognized language used by commercial DBMS products – database applications typically send SQL statements to the DBMS for processing. Summary: Database Applications, DBMS, and SQL • Applications are the computer programs that users work with • The Database Management System (DBMS) creates, processes and administers databases • Structured Query Language (SQL) is an internationally recognized standard database language that is used by all commercial DBMSs Some Prominent DBMS Products • • • • • Microsoft Access Microsoft SQL Server IBM DB2 Oracle Corporation’s ORACLE MySQL DBMS Power vs. Ease of Use Database Applications and SQL • Some typical functions of database application programs – Create and process forms for entering data • The form presents the data entry details in an easy-to-use format • Behind the form, the Application generates the requisite SQL statements to insert and update the data for any of the tables underlying the form – Process user queries (a query is a question the user asks of a database) • The Application processes the query and sends it to the DBMS – Create and process reports for the end-user • The Application queries (using SQL) the DBMS for data needed in the report, and formats the query results according to the end-user specifications. An Example Data Entry Form for Our Student-Class Database Producing Reports • A Class Grade Report might be generated as follows: Processing an SQL Query We present an SQL query to list the first name, last name and email address of all students having a student number >2 . (We will cover SQL later, this is only a small preview) SELECT Lastname, Firstname, EmailAddress FROM STUDENT WHERE StudentNumber > 2; The result might look like: The Database Summary • DEFINITION: A database is a self-describing collection of integrated tables • The tables are called integrated because they store both end-user data and the relationships among the data. • A database is called self-describing because it stores a description of itself • The self-describing data are called metadata, which means “data about data” – (The form and format of metadata varies between different DBMS products.) Why a “self-describing” collection? – To understand this definition, if you were asked to define a Library, would it be reasonable to say a Library is a collection of books and periodicals? • If yes, if we were to pile up 5,000 books and periodicals in a stack in a room, would that be a Library? (No !) – What makes a Library a true Library is the organizational structure of the books and periodicals, and the fact that a true Library will have a card catalog which will enable us to locate a book (or periodical). – So, in fact, a Library should be defined as a selfdescribing collection of books and periodicals. Typical Metadata Tables in User / Class Example Query Metadata, Just As Query “Regular” Tables of Data • Example: The following SQL queries the metadata table SYSOBJECTS to determine if a user table (Type = ‘U’) named CLASS exists in the database, and, if yes, that table will be removed from the database.( Note : reminder that the syntax will be more meaningful after we study SQL – this is merely a preview) IF EXISTS (SELECT * FROM SYSOBJECTS WHERE [Name] = ‘CLASS’ AND Type= ‘U’) DROP TABLE CLASS; Database Design Database design is the most crucial activity in database development, and involves designing the proper structure of tables, the proper relationships among tables, the appropriate data constraints, and other structural components. Poorly designed databases are very problematic, and thus much of this course is devoted to reviewing the principles behind optimal database design. The Three Types of Database Design • From Existing Data – Analyze Spreadsheets and other data tables – Extract data from other databases – Design using normalization principles • New Systems Development – Create data model from application requirements – Transform data model into a database design • Database Redesign – Migrate databases to newer databases – Integrate two of more databases – Reverse engineer and design a new database using normalization principles and data model transformation Database Design From Existing Data • Often involves importing data from already existing spreadsheets or other files with tables of data. • The data might also arise from an existing operational database, such as a CRM or ERP application, and will be brought into a new database specifically used for analysis and research. The existing database might arise from a data warehouse. • Database designers must design the appropriate structure for the new database. Databases Originating From Existing Data A Common Issue in Creating a Database from Existing Data • How should multiple tables and files from the existing data be handled in the new database? Should a single table be maintained as a single table? Resolving the Design Issue from Existing Data • Database designers follow a rigorous set of design principles, called Normalization or Normal Forms, which guide us to optimal database designs. • Normalization is discussed later Database Design for New Systems Development • In the development of new information systems for business enterprises, government and organizations, databases are often designed “from scratch.” • The databases are predicated on the database designer’s specification of user forms and the reports to be produced for the organization. • Part of database design is eliciting and specifying User Requirements • A data model is designed to graphically portray the User Requirements. The data model is then transformed into a database design. – Data models, such as Entity Relationship (ER) data modeling, are also studied along with how to transform our data models into database designs which can then be implemented. Databases Designed for New Information Systems Database Redesign • Databases often need to be adapted to new or changing requirements (often referred to as database migration). • We often need to integrate two (or more) databases, such as when adapting or removing legacy systems, or in enterprise application integration, where previously separate information systems are adapted to work with each other. Database Redesign The Relational Database Model • The dominant database model is the relational database model, which is based upon tables of data (called relations) – all current major DBMS products are based on it • Created by E. F. Codd in 1970 • It was based on a mathematical theory called Relational Algebra • Focus is on the relational database model.