Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
CPS 216: Advanced Database Systems Shivnath Babu Fall 2006 Outline for Today • What this class is about: Data management • What we will cover in this class • Logistics Application Data Management Query Query Query Data DataBase Management System (DBMS) Example: At a Company Query 1: Is there an employee named “Nemo”? Query 2: What is “Nemo’s” salary? Query 3: How many departments are there in the company? Query 4: How many employees have Salary >= 80K? Query 5: What is the name of “Nemo’s” department? Query 6: How many employees are there in the “Accounts” department? Employee Department ID Name DeptID Salary … ID Name … 10 Nemo 12 120K … 12 IT … 20 Dory 156 79K … 34 Accounts … 40 Gill 89 76K … 89 HR … 52 Ray 34 85K … 156 Marketing … … … … … … … … … DataBase Management System (DBMS) High-level Query Q Answer DBMS Data Translates Q into best execution plan for current conditions, runs plan Example: Store that Sells Cars Make Model OwnerID ID Name Owners of Honda Accords Honda Accord 12 12 Nemo who are <= Honda Accord 156 156 Dory 23 years old Join (Cars.OwnerID = Owners.ID) Filter (Make = Honda and Model = Accord) Cars Age 22 21 Filter (Age <= 23) Owners Make Model OwnerID ID Name Age Honda Accord 12 12 Nemo 22 Toyota Camry 34 34 Ray 42 Mini Cooper 89 89 Gill 36 Honda Accord 156 156 Dory 21 … … … … … … DataBase Management System (DBMS) High-level Query Q Answer DBMS Keeps data safe and correct despite failures, concurrent updates, online processing, etc. Data Translates Q into best execution plan for current conditions, runs plan DBMS is multi-user • Example Get account balance from database; If balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database; • Homer at ATM1 withdraws $100 • Marge at ATM2 withdraws $50 • Initial balance = $400, final balance = ? – Should be $250 no matter who goes first Final balance = $300 Homer withdraws $100: Marge withdraws $50: read balance; $400 read balance; $400 if balance > amount then balance = balance - amount; write balance; $350 if balance > amount then balance = balance - amount; $300 write balance; $300 Final balance = $350 Homer withdraws $100: Marge withdraws $50: read balance; $400 read balance; $400 if balance > amount then balance = balance - amount; $300 write balance; $300 if balance > amount then balance = balance - amount; write balance; $350 Concurrency control in DBMS • Similar to concurrent programming problems – But data is not all in main-memory • Appears similar to file system concurrent access? – Approach taken by MySQL initially; now MySQL offers better alternatives • But want to control at much finer granularity • Or else one withdrawal would lock up all accounts! Recovery in DBMS • Example: balance transfer decrement the balance of account X by $100; increment the balance of account Y by $100; • Scenario 1: Power goes out after the first instruction • Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out • Log updates; undo/redo during recovery DataBase Management System (DBMS) High-level Query Q Answer DBMS Keeps data safe and correct despite failures, concurrent updates, online processing, etc. Data Translates Q into best execution plan for current conditions, runs plan Summary of modern DBMS features • Persistent storage of data • Logical data model; declarative queries and updates ! physical data independence • Multi-user concurrent access • Safety from system failures • Performance, performance, performance – Massive amounts of data (terabytes ~ petabytes) – High throughput (thousands ~ millions transactions per minute) – High availability (¸ 99.999% uptime) Modern DBMS Architecture Applications SQL DBMS Parser Logical query plan Query Optimizer Physical query plan Query Executor Access method API calls Storage Manager Storage system API calls File system API calls OS Disk(s) Course Outline • 50% of the class is about core DBMS concepts – Query execution, query optimization, transactions, recovery, etc. – Textbook material • 50% of the class is on “what is happening today in data management” – – – – – Data streams Web search – Google, Yahoo! XML and data integration Data mining Sensor data management Using a Traditional DBMS User/Application Query Query … Loader Result Result … Table R Table S New Approach for Data Streams User/Application Register Continuous Query (Standing Query) Input streams Result Stream Query Processor Example Continuous (Standing) Queries • Web – Amazon’s best sellers over last hour • Network Intrusion Detection – Track HTTP packets with destination address matching a prefix in given table and content matching “*\.ida” • Finance – Monitor NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes Course Outline • 50% of the class is about core DBMS concepts – Query execution, query optimization, transactions, recovery, etc. – Textbook material • 50% of the class is on “what is happening today in data management” – – – – – Data streams Web search – Google, Yahoo! XML and data integration Data mining Sensor data management New Challenges in DBMSs High-level Query Q Answer DBMS TeraBytes PetaBytes Data <CD> <TITLE>Empire B.</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia </COMPANY> <PRICE>10.90</PRICE> </CD> Course Logistics • Recommended reference: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom • Web site: http://www.cs.duke.edu/education/courses/fall06/cps216 • Grading: – Homework Assignments 15% – Project 25% – Midterm 25% – Final 35% Summary: Data Management is Important • Core aspect of most sciences and engineering today • Core need in industry • Cool mix of theory and systems • Chances are you will find something interesting even if you primary interest is elsewhere