Download Notes (Wrapup)

Wrapup Amol Deshpande CMSC424 DBMS at a glance  Data Models  Conceptual representation of the data  Data Retrieval  How to ask questions of the database  How to answer those questions  Data Storage  How/where to store data, how to access it  Data Integrity  Manage crashes, concurrency  Manage semantic inconsistencies  Not fully disjoint categorization !! DBMS at a glance  Data Models  E/R Model, Relational model  Very simple and hence effective  Easy to make things complicated, very hard to keep them simple  No other data model has survived for so long  What is the future of XML ? DBMS at a glance  Data Retrieval  How to ask questions of the database  Declarative languages are great  Hide complexity from users, can optimize things, can evolve easily  SQL – More or less declarative  How to answer those questions  Parsing --> Optimization --> Processing  Operators: Hashing, sorting, joins, aggregation  Data structures – Hash indexes: Good for equality queries – Tree indexes: For everything else  Optimization: Complex, but key piece of a database system DBMS at a glance  Data Storage  How/where to store data, how to access it  Need to be cognizant of the memory hierarchy  Memory is cheap, disk is very expensive to access  Further disk is cheap to access sequentially, much more expensive to access randomly – Many of our decisions are influenced by this  RAID: Surviving failures  Accessing data: Indexes  What happens if a new form of storage comes along with different properties (say holographic storage ?)  We will need to rethink the tradeoffs, but we now know the approach DBMS at a glance  Data Integrity  Manage crashes, concurrency  Transactions, 2-phase locking  Write-ahead logging  DBMS pretty much the last word on concurrency/recovery  OSs don’t come close to supporting anything like that  Manage semantic inconsistencies  Normalization, FDs  Not easy to identify tools, but we have learned how to think about them – Try to capture them in the E/R diagram as much as possible Motivation: Data Overload  We began the first lecture with discussing the data overload  Huge amounts of data generated every day  Much faster than our ability to process it  Increasing ability to capture more enterprise data  Web, blogs, RSS Feeds etc  Multimedia – Flickr and cellphone cameras has led a revolution in how people take pictures – Videos will be next – Not hard to imagine capturing every moment of your life  Sensor/RFID data – Tiny sensors/RFID just beginning to become ubiquitous – Billions of these generating a tiny amount of data every second is still too much  Biological/Scientific data Motivation: Data Overload  Relational databases help for structured data  But increasingly not sufficient  The things we want to do with data can’t be expressed in SQL  E.g. with biological data, web  Too much unstructured data  Distributed data generation creates additional headaches  Almost impossible to try to collect the data in one location  Making sense of this requires not only advances in data processing, but also in data understanding/mining  Interdisciplinary efforts Some Lessons from RDBMS  But can use the lessons learned from developing RDBMS  Data independence / abstraction is good  Hide details, even if initially it leads to inefficiency  Look for structure  Every seemingly highly unstructured data might have structure  Look for patterns in usage  Relational database are fast because query processing is predictable – Unlike say OS workloads which are very hard to optimize for  If you can identify patterns, you can probably optimize them  Declarative languages are great  Say what you want, not how to get it

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes (Wrapup)