Download brain-SDCD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Management Challenges in
the Human Brain Project
Thomas Heinis
Data-Intensive Applications & Systems Lab
Ecole Polytechnique Federale de Lausanne
1
Human Brain Project
 Goal: develop platform to simulate
human brain!
 10 year and 1 billion Euro project to
integrate all available
knowledge/data
 Involves coordination of
more than 200
interdisciplinary partners
 Lead by Henry Markram
 Based on EPFL’s Blue
Brain Project
Single Neuron
3D Model
Neocortex Model2
Human Brain Project Workflow
Literature
Research
Experimental
Observations
Analysis
Model
Building
Medical
Records/Data
Visualization
Simulation
3
Data Deluge
 Medical Data from hundreds of sources
 Model Data alone:
Present
Goal
# of Neurons
1M
86 Billion
Full Brain
Model Data
Coarse grained
270GB
27PB
Model Data
Fine grained
(3D Mesh)
5.8TB
0.5EB
 Simulation Trace: potentially infinite
4
HBP Data Management Challenges
 Querying of Petascale Data
 Data Privacy/Anonymization
 Cloud Analytics
 Quick & efficient access to raw data
 Distributed Workflow Execution
 Provenance/Reproducibility
 Data Personalization
5
Spatial Indexing
Applications
:
Model
Building
3D
3DSpatial
SpatialRange
Range
Query
Query
Model Size
[GigaBytes]
30
Analysis
2010
25
Bottleneck in
Spatial Analysis
20
15
10
5
0
2008
2006
1K
10K
100K
1M
Simulation Size [# of Neurons]
6
Overlap Problem
R-Trees: Hierarchy of Minimum
Bounding Rectangles (MBR)
STEP 2:
1: Query
Index Construction
Execution
Range
Query
Overlap
Overlap reduces performance, increases with level of detail7
“FLAT”, A Two Phase Spatial Index*
Add reachability
information during
index construction
1) SEEDING: Find any one object
arbitrarily inside the query region.
2) CRAWLING: Retrieve remaining
results by traversing the neighbors.
*joint work with Farhan Tauheed, Laurynas Biveinis and Anastasia Ailamaki
8
Performance Evaluation
300
Hilbert R-Tree
STR R-Tree
PR-Tree
FLAT
Time [seconds]
250
200
150
100
50
0
50
100
150
200
250
300
350
400
450
Dataset Density [Million of Elements per unit Volume]
9
Impact
Blue Brain Project:
Part of the toolset used every day
February 2012: first 1 million neuron model indexed
Still 5 orders of magnitude smaller than human brain
General Applicability:
2010
30
25
Model Size
[GigaBytes]
3D surface mesh models
N-Body simulation data set
2012
(270 GB)
20
15
10
5
0
2008
2006
1K
10K
100K
1M
Simulation Size [# of Neurons]10
HBP Data Management Challenges
 Querying of Petascale Data
 Data Privacy/Anonymization
 Cloud Analytics
 Quick & efficient access to raw data
 Distributed Workflow Execution
 Provenance/Reproducibility
 Data Personalization
11
Thank you
Data Management in the
Human Brain Project
Thomas Heinis
12
Related documents