Download Big Data and the Database Community

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Serializability wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
*
Daniel Abadi
Yale University
* The Big Data phenomenon is the best thing that
could have happened to the database
community
* Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data
* Which means we need scalable database systems
* Still two main components of Big Data
* Performing data analysis at scale
* Performing requests on data at scale
*
* Database community has won the battle
* Some thought that MapReduce might replace
traditional database technology as the primary
means to perform analysis at scale
* Just about every MapReduce vendor has abandoned
this goal
* Hadapt, Impala, Tez, and several others are in a
race to see who can add the most traditional
database execution technology to Hadoop fastest
* Everyone is going in the direction of cost-based
optimizers, traditional database operators, and
push-based query execution
*
* The database community is losing the battle
* NoSQL systems still have very little traditional database
technology inside (despite adding SQL interfaces)
* No race to add DB technology --- why?
* Don’t blame CAP --- CAP is only relevant when there’s a
*
network partition
We never figured out how to do ACID and active replication at
scale
*
Many new proposals make simplifying assumptions in order to
handle scale
* It’s been 30 years ---- why can’t we build a distributed
database that can handle distributed transactions over
actively replicated data at scale?
*