* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Course Overview: CS 386 Database
Survey
Document related concepts
Transcript
Course Overview: CS 395T Semantic Web, Ontologies and Cloud Databases Daniel P. Miranker Objectives: • Get to know each other • Set expectations 1: Introduction Data Management & Engineering 1 Course Requirements • Several lab homeworks. (completion grade) – Build an ontology – Write SPARQL queries – Simple HADOOP exercise • 2 paper presentations – (may overlap term project) • Term project 1: Introduction Data Management & Engineering 2 Presentation Content • Miranker will present about 1/2 of CS386d, Database Management Systems, in about 1/2 of the material’s normal time. • Student presentations of papers. Attendance is required 1: Introduction Data Management & Engineering 3 Papers • Miranker will provide an initial set of papers • Remainder of the class will be crowd-sourced. – – – – Student’s each required to nominate >= 3 papers. List is compiled. Each paper is assigned to 3 referees (just like a conference). Miranker organizes the class from the referee reports. 1: Introduction Data Management & Engineering 4 Presentations • Elements of Public Speaking. • Structure – Two presentations, • improvement will be noted – Draft slides due one week ahead of time, • will be reviewed in a one-on-one meeting – Feedback from the class, • Miranker not in the room 1: Introduction Data Management & Engineering 5 Database Systems Getting Exciting Again ---> weren’t exciting for a long time. 1: Introduction Data Management & Engineering 6 In the recent past, DBMS witnessed: • Commoditization – Database ---> database management system ----> relational database management system – Canonical RDBMS architecture is a mainstay. • (and so are architectures for Operating Systems and Networks) 1: Introduction Data Management & Engineering 7 DBMS Architecture Query Engine Transaction Manager Storage Manager 1: Introduction Data Management & Engineering 8 DBMS Architecture Storage Manager • Exploit memory hierarchy to compensate for slow disks. – working sets (from OS) – search algorithms • Specifics – manage a heap of disk pages – allocation of main memory (buffer management) – index methods, e.g. B+ tree (access paths) 1: Introduction Data Management & Engineering Storage Manager RAM 9 RAM organized as page buffers Indexes data blocks 1: Introduction Data Management & Engineering 10 DBMS Architecture, 2 Transaction Manager • Manage many users sharing a database, (speed) • – Every thing gets written at least 3 times. Transaction Manager log Storage Manager – Every DB write, also logged to redundant disks, (a.k.a. stable store) • RAM 1: Introduction Cope with machine crashes ACID properties – – – – Data Management & Engineering Atomic Consistent Isolated Durable 11 DBMS Architecture 3 Query Engine • SQL execution environment – parse – compile to logical operators – optimize: Choose a good set of access paths and sequence of database operators (a.k.a. a physical plan) 1: Introduction Data Management & Engineering Query Engine Transaction Manager Storage Manager 12 DBMS Architecture 3 Query Engine • SQL execution environment – parse – compile to logical operators – optimize: Choose a good set of access paths and sequence of database operators (a.k.a. a physical plan) 1: Introduction Data Management & Engineering Query Engine Transaction Manager Storage Manager 13 What Changed? 1: Introduction Data Management & Engineering 14 What Changed? • Internet • Moore’s law – Computing is forever getting cheaper • Processing • Storage • Bandwidth – People are not getting cheaper 1: Introduction Data Management & Engineering 15 Implications • Business models founded on – in the asymptote computing and bandwidth are _ _ e e • Economy of scale • People cost dominate Data Centers Computing as Services Massive application of commodity components (Electricity?) 1: Introduction Data Management & Engineering 16 Software Engineering Implications • A DBMS is a shared resource and the place to persist all data. Thus, unambiguously • Content and programming of a DBMS is – at the center of – must interoperate with all the other software development 1: Introduction Data Management & Engineering 17 Three Tier Architecture • Three Tier Architecture – Pervasive? every hardware vendor sells a preloaded rack. 1: Introduction Data Management & Engineering 18 XML has become a standard for data transfer 1: Introduction Data Management & Engineering 19 Service Oriented Architecture Internet Service:= (usually) a remote database query or transaction 1: Introduction Data Management & Engineering 20 Three Tier Architecture • Three Tier Architecture – Pervasive? every hardware vendor sells a preloaded rack. – What does it mean if you know about databases? you’re a king – What if you are a professor of Computer Science? 1: Introduction Data Management & Engineering 21 Definitions (old slide 1) • Database*: A collection of data • Database Management System (DBMS): A software system that provides a set of services on a database. *A word on notation. Underlined terms are technical terms whose definition I expect you to know well. 1: Introduction Data Management & Engineering 22 Examples: (old slide 2) • Relational Database Management System • Operating Systems – – – – What about operating systems? __________ __________ __________ • facebook 1: Introduction Data Management & Engineering 23 RDBMS Architecture Motived • Core Database Architecture – How to cope with disks. • The only [computational] moving part. • Its not changing. 1: Introduction Data Management & Engineering 24 What about disks? • Why do computers have disks? (good) – inexpensive, large persistent, storage. • persistent storage: data is unaltered if the power goes off. • Why do we wish they didn’t? (bad) – slow • 8-12 msec. seek time. ~ 0.1 that in rotational latency – they break 1: Introduction Data Management & Engineering 25 Solid State Disk Drives! (SSD) • Have been promised for 40 years • Long term impact on DBMS architecture promises to be great. • Current impact, negligible. • Let’s look at the real numbers: ___________ 1: Introduction Data Management & Engineering 26 Renaissance of Database Research • Semantic Web • Cloud Databases (NoSQL) • Other Specialized Databases – General purpose database, all thing to all people. • amortize cost of product over largest possible market – Today market is so large, functionality so broad • focussed feature set --> more effective product • market fragments are large enough to support specialized product. 1: Introduction Data Management & Engineering 27 Semantic Web • Knowledge-base techniques to simplify large-scale systems • SPARQL/linked data query Data Integration Query Engine Transaction Manager NoSQL Cloud Databases • Non-ACID transaction models • Fault-tolerance through redundancy vs. stable-store Storage Manager 1: Introduction Data Management & Engineering 28 Renaissance of Database Research • Semantic Web – started with focus on search – now, data interchange and data integration • Cloud Databases • Other Specialized Databases – General purpose database, all thing to all people. • amortize cost of product over largest possible market – Today market is so large, functionality so broad • focussed feature set --> more effective product • market fragments are large enough to support specialized product. 1: Introduction Data Management & Engineering 29