Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Databases INTD 388 – Spring 2008 by A.Guercio and E. Walker Definitions • A database is a collection of related data • A database can be of any size and of varying complexity • A database management system (DBMS) is a collection of programs that enables users to define, create and maintain a database Database Nomenclature Data database, Data Warehouse: Repository: A A central database frequently that can provide used as anlarge, information storage Data Mart: searchable database authenticated with access facility, withAresearchers minimal analysis system according to the to of organized an institution's information. orall querying functionality. user's has that focus on It is populated of datadata from a variety Used toneeds. mirrorIt existing database specific to a particular of sources. fornon-compatible concurrent access to data. research project Data Management Repurposing: When data are archived, available data can be used for different purposes. Ex. Linear sequences captured to discover new genes are commonly Use: Many, including simulation, repurposed to support 3D statistical analysis, visualization, visualization of Of protein communication. course private data structures (i.e. not releasable) should be Data Life Cycle protected at this stage. Data Dictionary: Defines data format standards. Ex. Date of Birth: MM/DD/YY is tranformed to the standard Data of Birth: DD/MM/YY Creation & Acquisition: collection and insertion of data in the system Archiving: Making data available for future use. Issues: Space Requirements, Completeness of Data, Longevity, Security, Archival Process, … Data Life Cycle Database Architecture Centralized: Concentrates all organizational activity in one location. Disadvantages: When the central system crashes all the data are not available. Distributed: Supports a variety of hardware and software and uses physically disparate storage media, each computationally independent Disadvantages: We need to develop interfaces between databases (n DB require 2n-1 interfaces) What is a Database System? • Computerized System for the management of records. – Example: Oracle, Sybase, SQL Server, DB2… • The user of the system can perform various operations: • • • • • add a new file to the database; insert new data in an existing file; search for data; update data; remove data; remove file. Advantages of Database Systems • • • • • • • Redundancy Control Data Consistency and Integrity Data Sharing and Integration Security Improved Maintenance Concurrency (without data loss) Backup & Recovery services Disadvantages… • • • • • • Complexity Size Cost (Hardware & Software) Performance Conversion Risk of failure Example Bin Wine Producer Year Bottles Ready 2 Chardonnay Buena Vista 88 1 91 3 Chardonnay Louis Martini 89 5 90 21 Fume Blanc Robt. Mondavi 88 4 91 22 Fume Blanc Mirassou 87 2 90 72 Gamay Robt. Mondavi 85 2 90 Search SELECT WINE, BIN, PRODUCER FROM CELLAR WHERE READY=91; Result Wine Bin Producer Chardonnay 2 Buena Vista Fume Blanc 21 Robt. Mondavi Insertion INSERT INTO CELLAR VALUES (S3, ‘PINOT NOIR’, ‘SAINTSBURY’, 87, 1, 93); Update UPDATE CELLAR SET BOTTLES=4 WHERE BIN=3; Deletion DELETE FROM CELLAR WHERE BIN=2; Data Models The Relational Data Model • The relational model was introduced by E.F. Codd in 1970 • The model is based on the mathematical structure known as the relation – Informally, a relation resembles a table – More precisely, a relation is a set of n-tuples r = {t1, t2, …, tm} (i.e. a set of rows) and there is no order in the rows or in the tuples.Therefore in order to distinguish one row from another you need a unique key • Ex. of a customer record: S.S. First_Name Last_Name Address – S.S. is the unique key that distinguishes each customer record Parts of a Relational Database Schema Table (relation) Column* (attribute) name address age Row (tuple) * Values of an attribute are chosen from its domain, e.g. the domain of “age” is integers > 0 Relational Model Constraints • No two tuples are equal, thus some set of attributes must be unique, they are the key – If there are multiple keys in a schema, they are called candidate keys – One of the candidate keys may be designated as a primary key and be used to identify tuples in the relation – A foreign key is an attribute in a relation which is the primary key of another relation Ex. Customer(S.S., First_Name, Last_Name, Address) Checking Account(Account_Number, S.S., Balance) Relational Model Constraints • The entity integrity constraint states that no primary key value can be null – We don’t store descriptions of entities we can’t identify • Ex. We do not allow to insert a customer without inserting the S.S. • The referential integrity constraint is used to maintain consistency among the tuples of two relations • Ex. We do not insert the checking account of a customer for whom there is no data in the Customer table • The integrity constraints may be displayed diagrammatically by drawing an arrow from the relation with the foreign key to the relation with the primary key What is good in relational DB? • Advantage: We have data organized efficiently and in a way that saves space. – Ex. We do need to repeat all the information about the customer who has 500 accounts in our bank! – Flat files with their independent tables would not allow that. Flat File (One Table) Name Class Location Minnie Mouse English 101 BC H. Higgins Minnie Mouse Minnie Mouse Psych 240 Lucy van Pelt Accounting Hins. 209 110 Scrooge McD Goofy Psych 240 Lucy van Pelt Goofy English 101 BC Gerst. 120 Gerst. 120 Professor H. Higgins Relational Database Has Multiple Tables Schedule Class Enrollment Prof Name Class English 101 BC H. Higgins Minnie Mouse English 101 Psych 240 Lucy van Pelt Minnie Mouse Psych 240 Scrooge McD Minnie Mouse Accounting 110 Goofy Psych 240 Goofy English 101 Location Gerst. 120 Accounting Hins. 110 209 SQL A Relational DB Language • SQL (Structured Query Language) was developed at IBM as part of an experimental relational system called System R • Along with QUEL (based on the relational calculus), SQL is one of the earliest DB languages • SQL includes statements to define, manipulate and interrogate the DB • Ex: create table Student ( Name char[20], GradYear integer, ) Basic SQL Select statement • SELECT attribute • FROM table(s) • WHERE conditions • Examples – Select Class from Enrollment where Name = ‘Minnie Mouse’ – Select Class from Schedule where Location = ‘BC’ More Complex Queries • Multiple Tables joined together • More conditions in where, joined by and, or, not – Select Name, Class, Location from Schedule, Enrollment where Name = Goofy and Schedule.Class = Enrollment.Class – Select Prof, Name from Schedule, Enrollment where Prof = ‘Scrooge McD’ and Schedule.Class = Enrollment.Class More WHERE conditions SELECT staffNO, fName, lName, salary FROM Staff WHERE… …salary BETWEEN 20,000 and 30,000 …position IN {‘Manager’, ‘Supervisor’} …fName LIKE {‘Al%n’} Allen or Alan or Allison … …lName LIKE {‘_ _ _ _’} all 4-letter names …mName IS NOT NULL has a middle name Sorting the Results • List all rooms, sorted by price (most expensive first) SELECT hotelName, roomNo, roomType, price FROM Hotel, Room WHERE Hotel.hotelNo = Room.hotelNo ORDER BY price DESC; • Multiple sort keys can be specified, e.g. …ORDER BY lName, fName ASC; Databases in Bioinformatics • NCBI – – – – GenBank (sequences) PubMed (Literature) Human Genome Etc!