Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Management Challenges in the Human Brain Project Thomas Heinis Data-Intensive Applications & Systems Lab Ecole Polytechnique Federale de Lausanne 1 Human Brain Project Goal: develop platform to simulate human brain! 10 year and 1 billion Euro project to integrate all available knowledge/data Involves coordination of more than 200 interdisciplinary partners Lead by Henry Markram Based on EPFL’s Blue Brain Project Single Neuron 3D Model Neocortex Model2 Human Brain Project Workflow Literature Research Experimental Observations Analysis Model Building Medical Records/Data Visualization Simulation 3 Data Deluge Medical Data from hundreds of sources Model Data alone: Present Goal # of Neurons 1M 86 Billion Full Brain Model Data Coarse grained 270GB 27PB Model Data Fine grained (3D Mesh) 5.8TB 0.5EB Simulation Trace: potentially infinite 4 HBP Data Management Challenges Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization 5 Spatial Indexing Applications : Model Building 3D 3DSpatial SpatialRange Range Query Query Model Size [GigaBytes] 30 Analysis 2010 25 Bottleneck in Spatial Analysis 20 15 10 5 0 2008 2006 1K 10K 100K 1M Simulation Size [# of Neurons] 6 Overlap Problem R-Trees: Hierarchy of Minimum Bounding Rectangles (MBR) STEP 2: 1: Query Index Construction Execution Range Query Overlap Overlap reduces performance, increases with level of detail7 “FLAT”, A Two Phase Spatial Index* Add reachability information during index construction 1) SEEDING: Find any one object arbitrarily inside the query region. 2) CRAWLING: Retrieve remaining results by traversing the neighbors. *joint work with Farhan Tauheed, Laurynas Biveinis and Anastasia Ailamaki 8 Performance Evaluation 300 Hilbert R-Tree STR R-Tree PR-Tree FLAT Time [seconds] 250 200 150 100 50 0 50 100 150 200 250 300 350 400 450 Dataset Density [Million of Elements per unit Volume] 9 Impact Blue Brain Project: Part of the toolset used every day February 2012: first 1 million neuron model indexed Still 5 orders of magnitude smaller than human brain General Applicability: 2010 30 25 Model Size [GigaBytes] 3D surface mesh models N-Body simulation data set 2012 (270 GB) 20 15 10 5 0 2008 2006 1K 10K 100K 1M Simulation Size [# of Neurons]10 HBP Data Management Challenges Querying of Petascale Data Data Privacy/Anonymization Cloud Analytics Quick & efficient access to raw data Distributed Workflow Execution Provenance/Reproducibility Data Personalization 11 Thank you Data Management in the Human Brain Project Thomas Heinis 12