* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Big Data and the Database Community
Microsoft SQL Server wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Serializability wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Concurrency control wikipedia , lookup
Database model wikipedia , lookup
* Daniel Abadi Yale University * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions related to ‘3 Vs’ --Big Data means BIG Data * Which means we need scalable database systems * Still two main components of Big Data * Performing data analysis at scale * Performing requests on data at scale * * Database community has won the battle * Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale * Just about every MapReduce vendor has abandoned this goal * Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest * Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution * * The database community is losing the battle * NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces) * No race to add DB technology --- why? * Don’t blame CAP --- CAP is only relevant when there’s a * network partition We never figured out how to do ACID and active replication at scale * Many new proposals make simplifying assumptions in order to handle scale * It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale? *