Download xldb-eur - CERN Indico

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Intel “Big Data” Science and
Technology Center
Michael Stonebraker
Context
• Intel held a national “beauty contest” to
locate their next S & T center
• MIT won, with a “Big Data” proposal
—
160 proposals
• $2.5M per year for 3-5 years plus 5 Intel
scientists
• 20 PIs, half at MIT
2
Big Data Means What?
• Volume too large
—
Stupid analytics (i.e. SQL)
• solved by commercial data warehouse
products
—
Smart analytics (predictive modelling,
machine learning, …)
• Velocity too big
—
Drink from a firehose
• Variety too large
—
Data integration problem
• And what does this mean to computer
architecture!
3
Big Data Means What?
• Volume too large – smart analytics
—
—
—
—
Array data bases
Parallel algo
Integration of linear algebra
Scalable vis
• Velocity too big
—
Main memory DBs
• And what does this mean to computer
architecture!
—
—
—
Many core
Son-of-flash
Xeon Phi
4
Array Data Bases
• Elasticity in SciDB
• Query optimizer for SciDB
• Genomics benchmark
—
Run on SciDB, SciDB +Phi, column stores, row
stores, MadLib, Hadoop
• Graphs as sparse arrays
• EarthDB
5
Scalable Algo
• Parallelizing locality sensitive hashing
• Other algo people are going to work in other
areas
—
Pick your favorite algo, parallelize and make
scale
• Scalable Julia
6
Integration of Linear Algebra
• Hardly anybody can beat
BLAS/Lapack/Scalapack
—
—
—
10 ** 5 difference between Python and Inteloptimized C++
If you write operation X, chances are you will
lose to Jack Dongarra by an order of
magnitude
Don’t fight the wizard
7
Integration of Linear Algebra
• DBMS + Scalapack
—
—
—
Federation required
Resource manager required
Recoverable Scalapack required
• Someday
—
—
A common storage format
Would make ACID much easier, …
8
Visualization
• Resolution reduction
—
Using “explain”
• Choose the rendering automatically
—
Decision tree
• Smart prefetch
• Integrate with SciDB backend and Stanford
visualizer front end
9
High Velocity
• Big pattern – little state
—
—
Find me a “banana” followed within 10 msec
by a strawberry
Historically CEP
• Big state – little pattern
—
—
Assemble my global real-time risk
Main memory DBMS
10
High Velocity
• Lots of commonality between CEP and MM
DBMS
• We are adding queues/windows to H-Store
• It’s clear we will do ACID – CEP as fast as CEP
• I predict the death of CEP
11
High Velocity – Other Predictions
• Death of Aries
—
Command logging much faster than data
logging
• Death of disk-oriented OLTP data bases
—
H-store with anti-caching is wildly faster than
MySQL with or without MemcacheD
• Trying an emulator for “son of flash”
—
Will make MM DBMSs even more attractive
12
Many Core
• 1000 cores will give major heartburn to all
system software
—
Traditional DBMSs will collapse
• DBMSs cannot have shared data structures
—
H-Store approach
• Move the computation
—
—
Hardware-supported “move”
New concurrency control algorithms (revival
of Dora?)
13