Download Notes (Wrapup)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia, lookup

Relational model wikipedia, lookup

Database model wikipedia, lookup

Extensible Storage Engine wikipedia, lookup

Functional Database Model wikipedia, lookup

Database wikipedia, lookup

Entity–attribute–value model wikipedia, lookup

Big data wikipedia, lookup

Transcript
Wrapup
Amol Deshpande
CMSC424
DBMS at a glance
 Data Models
 Conceptual representation of the data
 Data Retrieval
 How to ask questions of the database
 How to answer those questions
 Data Storage
 How/where to store data, how to access it
 Data Integrity
 Manage crashes, concurrency
 Manage semantic inconsistencies
 Not fully disjoint categorization !!
DBMS at a glance
 Data Models
 E/R Model, Relational model
 Very simple and hence effective
 Easy to make things complicated, very hard to keep them simple
 No other data model has survived for so long
 What is the future of XML ?
DBMS at a glance
 Data Retrieval
 How to ask questions of the database
 Declarative languages are great
 Hide complexity from users, can optimize things, can evolve easily
 SQL
– More or less declarative
 How to answer those questions
 Parsing --> Optimization --> Processing
 Operators: Hashing, sorting, joins, aggregation
 Data structures
– Hash indexes: Good for equality queries
– Tree indexes: For everything else
 Optimization: Complex, but key piece of a database system
DBMS at a glance
 Data Storage
 How/where to store data, how to access it
 Need to be cognizant of the memory hierarchy
 Memory is cheap, disk is very expensive to access
 Further disk is cheap to access sequentially, much more
expensive to access randomly
– Many of our decisions are influenced by this
 RAID: Surviving failures
 Accessing data: Indexes
 What happens if a new form of storage comes along with
different properties (say holographic storage ?)
 We will need to rethink the tradeoffs, but we now know the
approach
DBMS at a glance
 Data Integrity
 Manage crashes, concurrency
 Transactions, 2-phase locking
 Write-ahead logging
 DBMS pretty much the last word on concurrency/recovery
 OSs don’t come close to supporting anything like that
 Manage semantic inconsistencies
 Normalization, FDs
 Not easy to identify tools, but we have learned how to think
about them
– Try to capture them in the E/R diagram as much as
possible
Motivation: Data Overload
 We began the first lecture with discussing the data overload
 Huge amounts of data generated every day
 Much faster than our ability to process it
 Increasing ability to capture more enterprise data
 Web, blogs, RSS Feeds etc
 Multimedia
– Flickr and cellphone cameras has led a revolution in how
people take pictures
– Videos will be next
– Not hard to imagine capturing every moment of your life
 Sensor/RFID data
– Tiny sensors/RFID just beginning to become ubiquitous
– Billions of these generating a tiny amount of data every
second is still too much
 Biological/Scientific data
Motivation: Data Overload
 Relational databases help for structured data
 But increasingly not sufficient
 The things we want to do with data can’t be expressed in SQL
 E.g. with biological data, web
 Too much unstructured data
 Distributed data generation creates additional headaches
 Almost impossible to try to collect the data in one location
 Making sense of this requires not only advances in data
processing, but also in data understanding/mining
 Interdisciplinary efforts
Some Lessons from RDBMS
 But can use the lessons learned from developing RDBMS
 Data independence / abstraction is good
 Hide details, even if initially it leads to inefficiency
 Look for structure
 Every seemingly highly unstructured data might have structure
 Look for patterns in usage
 Relational database are fast because query processing is
predictable
– Unlike say OS workloads which are very hard to optimize for
 If you can identify patterns, you can probably optimize them
 Declarative languages are great
 Say what you want, not how to get it