* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a “database”? - Regis University: Academic Web Server for
Microsoft SQL Server wikipedia , lookup
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
File and Databases CS208 File Organization The three principal file organizations are Sequential Direct Indexed Sequential Sequential File Organization Records physically stored one after another in order determined by key field Advantages: Very efficient when many or all records in a file need to be accessed Cost of tape and drives very low Disadvantages: Major drawback is very slow access time for a particular record. Must rewrite all records following record insertion. Direct File Organization Records stored at a specific address, determined by their key field Advantages: A mathematical technique called hashing converts the key field value into a corresponding address Record can be quickly accessed by going directly to its address Disadvantages: Must be done using random access storage (disk/optical) which have higher cost than sequential Can only use one key Indexed Sequential File Organization Compromise between direct and sequential methods Advantages: Records stored sequentially Index created that records the address of each record Good compromise between previous two methods Can have multiple index tables to use multiple keys Disadvantages: Slower than direct-access What is a “database”? A “database” is just a collection of related data. Databases can exist in many forms. Examples: Sheets of paper in folders in a file cabinet A book (think of it as a collection of sentences and illustrations) Books in a collection (e.g., a library) Sets of 3"x5" cards containing notes Maps and other geographic information systems Blood samples in a medical laboratory Common Elements Sets of data and information composed of, and/or represented by: bits alphanumeric symbols lines and shapes in drawings, pictures, and maps audio and video recordings actual substances A means by which the sets of data and information are organized in order to facilitate access to individual desired sets Examples of Access Methods Phone book - Collection of several independent databases, each consisting of names and corresponding phone numbers: Blue-pages governmental listings: primary arrangement alphabetical by type of government (city, county, state, federal), secondary arrangement alphabetical by agency within type of government, tertiary arrangement alphabetical by office within agency Examples of Access Methods (continued) White-pages personal listings: arranged alphabetically by surname within surname by first names Yellow-pages listings: primary arrangement by type of business, secondary arrangement alphabetically by company within type of business, plus various special groupings (e.g., restaurants by ethnic type) Flat File Databases (DBs) Flat-file DBs are like the DBs you can construct in a single spreadsheet page All the information in the DB is in one file consisting of one array of rows and columns. For example: SSN 123-45-6789 987-65-4321 567-89-0123 Surname Doe Smith Johnson First Name(s) John X. Martin Billy Bob Telephone Number 303-555-1234 720-111-2222 303-444-5555 Flat File Advantages/Disadvantages Advantages Simple Good for few records with few fields Disadvantages Unnecessary duplication of data or data redundancy Inconsistent, incomplete or inaccurate data, lacking data integrity Changes in data are difficult to implement Separate and isolated data with limited data sharing Data Heirarchcy in Computerized Databases Character - single letter, number or special character Field - a set of related characters Record - collection of related fields File - collection of related records Database - collection of related files Types of Databases Individual - often on a PC used by one person Company or shared - usually on a mainframe and managed by a database administrator Example: Common operational databases contain information about company operations Types of Databases Distributed databases - have data stored in multiple locations, but the data is accessible through communications networks Proprietary databases - created by an organization, and stored information is offered to others for a fee Examples include Dialog Information Services and Dow Jones Interactive Publishing Database Terminology Logical data view – How humans see things Physical data view – How things are stored in a computer Database Management System (DBMS) – Allows a user to deal with data in logical terms, without having to understand the computer's physical view. DataBase Management System (DBMS) DBMS – provides storage, retrieval, analysis, sorting, and printing of information in a database. DBMS is: • A collection of program independent, interrelated data • A set of programs to access the data • Information about a particular enterprise • An environment that is both convenient and efficient to use. Current DBMS Systems Mainframe database vendors: Oracle IBM DB2 Microsoft (SQL Server) Sybase Informix Desktop: MS Access Borland Paradox Some free database systems (Unix) : Postgres MySQL, mSQL Predator DBMS Uses The processing power of a DBMS allows it to: Sort Match Link Aggregate Skip fields Calculate Arrange Database Administrator (DBA) Coordinates all the activities of the database system. Must have a good understanding of the enterprise’s information resources and needs. Database Administrator (DBA) Database administrator's duties include: Database definition Database modification Granting user authority to access the database for security and privacy Specifying integrity constraints Acting as liaison with users Monitoring DB performance and responding to changes in requirements Decides strategy for backup and recovery Types of DBMS Users End-User: Non-specialist accessing data via a query language Naïve user accessing data via a special-purpose interface Performs data retrieval and update (extend/modify) Applications Programmer: Writes programs that use the DB by embedding queries to the DB in a HLL Develops interfaces for the naïve user DBMS Organization The four principal DBMS organizations are Hierarchical Network Relational Object-Oriented Hierarchical Databases Viewed as branches of an upside-down tree Each item is subordinate to its parent item Only one parent per item Any element (node) in the database is linked only to the elements directly above it and directly below it. If parent node is deleted, all the child nodes are as well New parent node must be created before adding a new child node No direct relationships between child nodes Hierarchical Databases Limited by rigid structure Typically require custom programming Example: Original computer-based databases were designed for banking. Hierarchical databases were appropriate for such purposes, e.g.: individual accounts can be grouped by family or business; sets of accounts, grouped by branch; accounts in different branches, grouped by city; accounts in different cities, grouped by state. Hierarchical Model Department DeptNum Courses CourseNum Name Students Name Section IDNum Name Course Professors IDNum Name - Must always begin at top to search for data - One parent per child, no other relationships allowed Network Databases Permits links among all components (i.e. elements can be linked to other elements anywhere in the database, not just those directly above and below) The interconnected design allows for access via multiple pathways Can be extremely difficult to manage The World-Wide Web is a very large example of a network database. Network Model Department DeptNum Students IDNum Name Professors Courses CourseNum Name Name Section IDNum Name - Added data paths across the tree (instead of just up and down) - Reduced time required to access data, but increased overhead space requirements Course Relational Databases A relational database is a set of one or more tables that together embody information about a set of related concepts and entities. The tables are connected (related) via fields within the table that are shared by a pair of tables. Relational Model Department DeptNum … … … Professors Name … … … IDNum … … … Courses Num … … … Name … … … DeptNum … … … Course … … … Students ProfID … … … Section … … … IDNum … … … Name … … … Course … … … Based on tables of objects (the data), rather than specific paths (ways to access the data) Relational Database Rules Each row is unique (distinct) Each column name is unique within a table It is permissible to have the same name for a column in two different tables in the same database. To distinguish between them we use a qualified name: Pet.Name vs Family.Name Relational Databases In a Relational DB, the information content of a table does not depend on either The order of the rows; or The order of the columns In other words, the rows and columns of a table can be rearranged at will without affecting the table's information content Relational DB Terminology Fields (Columns, Attributes) Num CS208 MT360 CS320 Name CS Fundamentals Calculus C Programming Credits 3 4 3 Records (Rows, Tuples) Courses File (Table) = Relation Rows (Tuples) = Records Columns (Attributes) = Fields Relationships between objects are defined by common attributes Primary Keys In a Relational DB, each table Must have a primary key (unique identifier) Must have no duplicate rows A primary key is A data attribute (column), or a combination of attributes, that uniquely identifies each record in the table. A simple key consists of a single attribute A composite key consists of two or more attributes Primary Keys (continued) Primary Key Provides unique way to identify each record Can be obvious from the structure of the table. If there is no easy natural choice, you can add a column containing a unique identifier. May consist of the entire record (especially with two-column tables, which occur often in the development of RDBs) Other Types of Keys Secondary Key A column that is used to aid in the retrieval of information from a table. A secondary key is not required to have unique values in each of its rows, Foreign Key A column used to retrieve information from one table (i.e., is a secondary key) that is also the primary key in another table. Foreign keys are a major tool in Relational DBs. Relational Database Design Design Goals: Avoid redundant data Ensure that relationships among attributes are represented Facilitate checking updates for violation of database integrity constraints Creating a Student Database Student ID First Name Middle Initial Last Name Home Address School Address Street Address City State Zip Home Phone Work Phone Cell Phone Course ID Course Name Course Instructor Database Normalization Optimize the Tables by: Storing each piece of data once and only once (i.e. Eliminate redundant data). Ensuring data dependencies make sense (only storing related data within a particular table). Maintaining data integrity. First Normal Form Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key. 1st Normal Student Database Student Student ID First Name Middle Initial Last Name Phone Student ID Phone Type Phone Number StudentCourse Student ID Course ID Course Name Course Instructor Address Student ID Address Type Street Address City State Zip Second Normal Form Create separate tables for sets of values that apply to multiple records i.e. Remove partial data dependencies. Relate these tables using a foreign key. 2nd Normal Student Database Student Student ID First Name Middle Initial Last Name Phone Student ID Phone Type Phone Number StudentCourse Student ID Course ID Course Name Course Instructor Address Student ID Address Type Street Address Zip ZipCode Zip City State Third Normal Form Eliminate any fields that do not depend on the key. 3rd Normal Student Database Student Student ID First Name Middle Initial Last Name StudentCourse Student ID Course ID Phone Student ID Phone Type Phone Number Course Course ID Course Name Course Instructor Address Student ID Address Type Street Address Zip ZipCode Zip City State Relational Design Summary Data is stored in records, inside of tables Primary keys uniquely identify a record Foreign keys link data in one table to the primary key in another table Designs should maintain data integrity Normalization concepts should be used Dis/Advantages of Relational Datebases Disadvantages Require more overhead Advantages Cut down on needless repetition of information Ensure more accuracy Facilitate updating and deletion of information. Design avoids errors that occur when adding/deleting information from flat files Query Languages Query languages allow non-programmers to question the DBMS Structured Query Language (SQL) – the only standard structured query language Structured Query Language (SQL) Pronounced either "S, Q, L" or "sequel" Widely used standard set of commands and syntax for doing things with Relational DBMSs Used especially for query and retrieval Includes commands for defining Relational DBs, conducting transactions, storing data, etc. Each Relational DBMS also has additional features unique to it, because SQL does not handle all the practical details involved in using a Relational DB Select The select clause directs the DBMS to choose a subset of fields from one or more tables. For example, SELECT last_name, first_name chooses all last names and first names and will place them in this order in the result set. From clause The from clause directs the query toward one or more tables. from Student (directs the query to use the student table) Used with the SELECT clause: SELECT last_name, first_name FROM Student chooses the last and first name of all rows in Student table. Note: As long as the column names within the tables used in the from clause are unique, qualified names are not required. Where clause A where clause specifies a selection criterion that we will use to limit our choice of records. A where clause follows the name of the table. Format: SELECT field1,…, fieldN FROM tablename WHERE (boolean expression) Boolean Expressions Boolean expressions used in where clauses typically involve a field compared to another field or a field compared to a value. For example: WHERE GPA > 3.50 WHERE last_name > “M” WHERE amount_owed >= amount_paid Comparison Operators = is equal to > is greater than < is less than >= is greater than or equal to <= is less than or equal to not is not equal to Simple SQL Query Textbooks Title Price Category Publisher Java Intro 109.99 Computers Prentice Hall Calculus 129.99 Math Kaufman Advanced C 115.99 Computers Tech Inc Philosophy 83.99 Lib Arts Prentice Hall SELECT * FROM Textbooks WHERE category=‘Computers’ “selection” Title Price Category Publisher Java Intro 109.99 Computers Prentice Hall Advanced C 115.99 Computers Tech Inc Simple SQL Query Textbooks Title Price Category Publisher Java Intro 109.99 Computers Prentice Hall Calculus 129.99 Math Kaufman Advanced C 115.99 Computers Tech Inc Philosophy 83.99 Lib Arts Prentice Hall SELECT Title, Price, Publisher FROM Textbooks WHERE Price < 110 “selection” and “projection” Title Price Publisher Java Intro 109.99 Prentice Hall Philosophy 83.99 Prentice Hall Joins You can use more than one table in a query. The join clause tells the DBMS that you are using two tables. You must also specify the fields on which you are linking the files, using the ON clause. Example SELECT Textbooks.Title, Price, Required FROM Textbooks, CourseTexts ON Textbooks.Title = CourseTexts.Title WHERE Category = ‘Computers’ The dot (.) notation specifies the table then the field within the table to reconcile the ambiguity with the two names. Join SQL Query Textbooks Title Price Category Publisher Java Intro 109.99 Computers Prentice Hall CourseTexts Calculus 129.99 Math Kaufman Title Course Required Adv C 115.99 Computers Tech Inc Java Intro CS434 Yes Philosophy 83.99 Lib Arts Prentice Hall Adv C CS422 No SELECT Textbooks.Title, Price, Required FROM Textbooks, CourseTexts ON Textbooks.Title = CourseTexts.Title WHERE Category = ‘Computers’ Title Price Required Java Intro 109.99 Yes Adv C 115.99 No Database Concerns Privacy/Unauthorized Access to Data Data is easier to gather and exploit using a computer, so precautions must be taken to guard the data. Accuracy/Completeness Owner of database must ensure accuracy of the data. Users must take data with grain of salt must verify