Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Intel “Big Data” Science and Technology Center Michael Stonebraker Context • Intel held a national “beauty contest” to locate their next S & T center • MIT won, with a “Big Data” proposal — 160 proposals • $2.5M per year for 3-5 years plus 5 Intel scientists • 20 PIs, half at MIT 2 Big Data Means What? • Volume too large — Stupid analytics (i.e. SQL) • solved by commercial data warehouse products — Smart analytics (predictive modelling, machine learning, …) • Velocity too big — Drink from a firehose • Variety too large — Data integration problem • And what does this mean to computer architecture! 3 Big Data Means What? • Volume too large – smart analytics — — — — Array data bases Parallel algo Integration of linear algebra Scalable vis • Velocity too big — Main memory DBs • And what does this mean to computer architecture! — — — Many core Son-of-flash Xeon Phi 4 Array Data Bases • Elasticity in SciDB • Query optimizer for SciDB • Genomics benchmark — Run on SciDB, SciDB +Phi, column stores, row stores, MadLib, Hadoop • Graphs as sparse arrays • EarthDB 5 Scalable Algo • Parallelizing locality sensitive hashing • Other algo people are going to work in other areas — Pick your favorite algo, parallelize and make scale • Scalable Julia 6 Integration of Linear Algebra • Hardly anybody can beat BLAS/Lapack/Scalapack — — — 10 ** 5 difference between Python and Inteloptimized C++ If you write operation X, chances are you will lose to Jack Dongarra by an order of magnitude Don’t fight the wizard 7 Integration of Linear Algebra • DBMS + Scalapack — — — Federation required Resource manager required Recoverable Scalapack required • Someday — — A common storage format Would make ACID much easier, … 8 Visualization • Resolution reduction — Using “explain” • Choose the rendering automatically — Decision tree • Smart prefetch • Integrate with SciDB backend and Stanford visualizer front end 9 High Velocity • Big pattern – little state — — Find me a “banana” followed within 10 msec by a strawberry Historically CEP • Big state – little pattern — — Assemble my global real-time risk Main memory DBMS 10 High Velocity • Lots of commonality between CEP and MM DBMS • We are adding queues/windows to H-Store • It’s clear we will do ACID – CEP as fast as CEP • I predict the death of CEP 11 High Velocity – Other Predictions • Death of Aries — Command logging much faster than data logging • Death of disk-oriented OLTP data bases — H-store with anti-caching is wildly faster than MySQL with or without MemcacheD • Trying an emulator for “son of flash” — Will make MM DBMSs even more attractive 12 Many Core • 1000 cores will give major heartburn to all system software — Traditional DBMSs will collapse • DBMSs cannot have shared data structures — H-Store approach • Move the computation — — Hardware-supported “move” New concurrency control algorithms (revival of Dora?) 13