Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Clusterpoint wikipedia , lookup
CSC443 Database Management Course Introduction Professor Pepper adapted from presentations given by Professor Juliana Freire & Karl Aberer & Yan Chen & Silberschatz, Korth and Sudarshan Today’s Goals Course Overview Why study databases? Why use databases? Intro to Databases Major Course Objectives Design and diagram relational databases Create Access and Oracle databases Use SQL commands Be able to design a good relational database Know how to get information out of a database to answer any question Diagramming Use Case Class Diagram Entity Relationship Diagram Algebraic Relation Model Tools Panther Unix Oracle 9.2.0.1.0 FTP Explorer – register for trial MS Access Books Database System Concepts 5th Ed Theory Cross Reference for fourth ed Oracle 9i Programming - A Primer Practical examples See course syllabus Available in Library Learning Resources Blackboard: my.adelphi.edu Web site Database System Concepts: www.db-book.com/ My office hours: Tuesday & Thursday 12:15-1:30; Wed 1212:30 Alumni 114 or Science Lab My email: [email protected] My phone: 516-747-2362 My Web: www.adelphi.edu/~pepperk Adelphi Account Setup Panther Oracle Blackboard E-mail Signin Sheet Projects / Grading Projects: 40% Access – 15 Oracle - 25 Homework assignments: 20% Midterm: 20% Final: 20%. Assignments 2% dropped for anything 1 day late. 10% dropped for anything 2 weeks late. Delivering assignments Email ftp drop box discussion board mailbox in math department E-mail me if making a change in delivery place. forward your email from Adelphi What is a Database Management System? Database Management System = DBMS A collection of files that store the data A big program written by someone else that accesses and updates those files for you Relational DBMS = RDBMS Data files are structured as relations (tables) Why Study Databases? What is behind this Web Site? http://www.ticketmaster.com/ Search on a large database Specify search conditions Many users Updates Access through a web interface Central to Modern Computer Science Database Systems: Then Database Systems: Today Field is developing quickly From Friendster.com on-line tour Other databases you may use Databases are EVERYWHERE Current Commercial Outlook A major part of the software industry: Oracle, IBM, Microsoft, Sybase also Informix (now IBM), Teradata smaller players: java-based dbms, devices, OO, … Well-known benchmarks (esp. TPC) Lots of related industries data warehouse, document management, storage, backup, reporting, business intelligence, app integration Relational products dominant and evolving adapting for extensibility (user-defined types), adding native XML support. Open Source coming on strong MySQL, PostgreSQL, BerkeleyDB Why Study Databases?? ? Need exploded Corporate: retail swipe/clickstreams, “customer relationship mgmt”, “supply chain mgmt”, “data warehouses”, etc. Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network Why study databases? Data is valuable: bank account records, tax records, student records… Protect It! - no matter what • Hurricane • Flood • Human error Why study databases? Data often structured: Example: Bank account records all follow the same structure We can exploit this regular structure To retrieve data in useful ways (that is, we can use a query language) To store data efficiently Why Study Databases Summary Central to modern computer science Databases are everywhere Commercially successful Fast moving technology Plethora of structured data that business and people need What is a database? Whiteboard Exercise Database Definition Database – a very large, integrated collection of data. (the stuff) Models a real-world enterprise Entities (e.g., teams, games) Relationships (e.g., The Forty-Niners are playing in The Superbowl) Database Management System – software that stores and manages databases (the tools) Database is better than simple file system because: Data redundancy, inconsistency and isolation Difficult to access Integrity problems Atomicity of updates (change one file and die before the other completes) Multiple user issues So a Database Has: representing information data modeling languages and systems for querying data complex queries with real semantics* over massive data sets concurrency control for data manipulation controlling concurrent access ensuring transactional semantics reliable data storage maintain data semantics even if you pull the plug • * semantics: the meaning or relationship of meanings of a sign or set of signs Why Use a Database Why use a database presentation What is in a database? Describing Data: Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using a given data model. A relation is the data stored in a certain schema The relational model of data is the most widely used model today. Entities and relations among them Integrity constraints and business rules Perspective dependent (warehouse & sales view item differently) Database Design The process of designing the general structure of the database: Logical Design – Deciding on the database schema. Business decision – What attributes Computer Science decision – What relation schemas Physical Design – Deciding on the physical layout of the database Data Models A collection of tools for describing Data Data relationships Data semantics Data constraints Relational model Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object-relational) Semistructured data model (XML) Other older models: Network model Hierarchical model The Entity-Relationship Model Models an enterprise as a collection of entities and relationships Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • Described by a set of attributes Relationship: an association among several entities Represented diagrammatically by an entity-relationship diagram: Relational Model ER for concept map to Algebraic Relational Model Relations (tables of possible data) Instance (actual data at a given time) Schema (description of those tables, their relations) Relational Model Terminology Relational Model Look Notation: p(r) p is called the selection predicate Defined as: p(r) = {t | t r and p(t)} Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not) Each term is one of: <attribute>op <attribute> or <constant> where op is one of: =, , >, . <. Example of selection: branch_name=“Perryridge”(account) Object-Relational Data Models Extend the relational data model by including object orientation and constructs to deal with added data types. Allow attributes of tuples to have complex types, including non-atomic values such as nested relations. Preserve relational foundations, in particular the declarative access to data, while extending modeling power. Provide upward compatibility with existing relational languages. Design Goals Design Goals: Avoid redundant data Ensure that relationships among attributes represented Ensure constraints are properly modeled: updates check for violation of database integrity constraints. Bad Design Queries What the programmer sees Some Basic SQL Commands Select – Get rows of data * - everything From – the name of the table (relation) will follow Where – Only get the stuff that matches Example: Select * from movies where theater = Loews Exercise – Write down the query to select all of your friends that live in NY State Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) External Schema (View): Course_info(cid:string,enrollment: integer) Physical schema: Relations stored as unordered files. Index on first column of Students. Key to good performance View 1 View 2 View 3 Conceptual Schema Physical Schema DB Data Independence (levels of abstraction) Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data – stablize views. Physical data independence: Protection from changes in physical structure of data. Q: Why are these particularly important for DBMS? View 1 View 2 View 3 Conceptual Schema Physical Schema DB Queries Change and get data from a database Run over data model Easy & efficient Not good for complex calculations DML and DDL Data Manipulation Language (DML) Language for accessing and manipulating the data organized by the appropriate data model DML also known as query language Two classes of languages Procedural – user specifies what data is required and how to get those data Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data SQL is the most widely used query language Data Definition Language (DDL) Specification notation for defining the database schema Example: create table account ( account-number char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (i.e., data about data) Database schema Data storage and definition language • Specifies the storage structure and access methods used Integrity constraints • Domain constraints • Referential integrity (references constraint in SQL) • Assertions Authorization Queries - What does it look like? SELECT SELECT eid, E.loc, ename, COUNT DISTINCT (E.eid) title AVG(E.sal) FROM E, Proj FROMEmp Emp E P, Asgn A WHERE E.eid = A.eid GROUP E.sal BY E.loc WHERE > $50K AND P.pid = A.pid HAVING Count(*) AND E.loc <> P.loc> 5 System handles query plan generation & optimization; ensures correct execution. Count Having distinct Group(agg) Join Select Join Emp Proj Emp Emp Asgn Employees Projects Assignments Issues: view reconciliation, operator ordering, physical operator choice memory management, access path (index) use, … SQL SQL: widely used non-procedural language Example: Find the name of the customer with customer-id 192-83-7465 select customer.customer_name from customer where customer.customer_id = ‘192-83-7465’ Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer_id = ‘192-83-7465’ and depositor.account_number = account.account_number Application programs generally access databases through one of Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database For us: Oracle and Access SQL languages A Look underneath Concurrency Control Concurrent execution of user programs: key to good DBMS performance. Disk accesses frequent, pretty slow Keep the CPU working on several programs concurrently. Interleaving actions of different programs: trouble! e.g., account-transfer & print statement at same time DBMS ensures such problems don’t arise. Users/programmers can pretend they are using a singleuser system. (called “Isolation”) Thank goodness! Don’t have to program “very, very carefully”. Transactions: ACID Properties Key concept is a transaction: a sequence of database actions (reads/writes). DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle. Each transaction, executed completely, must take the DB between consistent states or must not run at all. DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes. DBMS can enforce simple integrity constraints on the data. These layers must consider concurrency control and recovery Structure of a DBMS A typical DBMS has a layered architecture. The figure does not show the concurrency control and recovery components. Each database system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Overall System Structure Databases make these folks happy ... DBMS vendors, programmers $20 million industry Oracle, IBM, MS, Sybase, … End users Business, education, science, … DB application programmers Eg smart webmasters Build web services that run off DBMSs Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve …must understand how a DBMS works Summary What is a database – lots of data organized into entities and schemes with a manager Why study databases? – common use, needed for programming apps Why use databases? – all the advantages over flat file systems Intro to Databases Logical layer: Query language, data models, transactions Physical layer Actual files with indexes, query processing, concurrency, recovery & logs