* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Databases and Global Environmental Change
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Brazilian Academy of Sciences, Annual Meeting, May 2012 Databases and Global Environmental Change: Information Technology for Sustainable Development Gilberto Câmara INPE, Instituto Nacional de Pesquisas Espaciais The fundamental question of our time source: IGBP How is the Earth’s environment changing, and what are the consequences for human civilization? Global Change Where are changes taking place? How much change is happening? Who is being impacted by the change? Uncertainty on basic equations Limits for Models Social and Economic Systems Quantum Gravity Particle Physics Living Systems Chemical Hydrological Models Reactions Solar System Dynamics Global Change Meteorology Complexity of the phenomenon source: John Barrow (after David Ruelle) Limits for Models Uncertainty on basic equations e-science Social and Economic Systems Quantum Gravity Particle Physics Living Systems Chemical Hydrological Models Reactions Solar System Dynamics Global Change Meteorology Complexity of the phenomenon source: John Barrow (after David Ruelle) Collaborative e-science Connect expertise from different fields Make the different conceptions explicit Territory (Geography) Money (Economy) Modelling (IT) Culture (Antropology) Deforestation in Amazonia Até 10% 10 - 20% 20 – 30% 30 – 40% 40 – 50% 50 – 60% 60 – 70% 70 – 80% 80 – 90% 90 – 100% Amazonia (4.000.000 km2 = size of Europe) Data (we need a lot of it) Deforestation in Brazilian Amazonia (1988-2011) dropped from 27,000 km2 to 6,200 km2 Real-time Deforestation Monitoring Daily warnings of newly deforested large areas How much it takes to survey Amazonia? 116-112 30 Tb of data 500.000 lines of code 150 man/years of software dev 200 man/years of interpreters 116-113 166-112 TerraAmazon – open source software for large-scale land change monitoring 116-112 116-113 Spatial database (PostgreSQL with vectors and images) 166-112 2004-2008: 5 million polygons, 500 GB images Permanent Welcome to the Age of Data-intensive Science! Vantage Points Capabilities FarSpace L1/HEO/GEO TDRSS & Commercial Satellites LEO/MEO Commercial Satellites and Manned Spacecraft NearSpace Aircraft/Balloon Event Tracking and Campaigns Deployable Airborne Terrestrial Forecasts & Predictions User Community Weather and climate source: WMO 11,000 land stations (3000 automated) 900 radiosondes, 3000 aircraft 6000 ships, 1300 buoys 5 polar, 6 geostationary satellites ARGOS Data Collection System (16000 plats) 650,000 messages processed daily Argo bouy network Data chain in Earth System Science fonte: NASA Data-intensive Science = principles and applications of information technology for handling very large data sets Conjectures IT concepts are essential to global change researchers (but most of them don’t know it) Global change challenges will motivate new research in IT (but most of us are not looking there) Challenges for data-intensive science Which data is out there? How to organize big data? How to get the data I need? How to model big data? How to access and use big data? Stage 1 – A scientist’s personal database User interface Database creation Database access Local database Analysis Stage 1 – A scientist’s personal database User interface The good: data is close to you (or so you think) The bad: no long-term data preservation Database creation Database access Analysis no data sharing Local database Stage 2 – A scientific lab database User interface Database access Database creation Corporate database Analysis Stage 2 – A scientific lab database User interface The good: long-term data preservation data sharing inside the Analysis lab Database access reusable corporate software The bad: substantial costs on data admin Corporate Database creation database little outside data sharing Metview Metview – MOPTC June 2004 - 24 ECMWF Field plotting Metview – MOPTC June 2004 - 25 ECMWF Stage 3 – A scientific lab database in the cloud User interface Database access Database creation Corporate database Analysis Stage 3 – A scientific lab database in the cloud User interface The good: long-term data preservation shared costs on data admin Database access Analysis The bad: rewrite software for cloud processing outside data sharing still not solved Corporate Database creation database Risk Analysis Analysis On-line data feed DCP Rain total Fixed time and irregular – alert Point data One file per DCP Satellite/Radar Grid 4km Total rain 1h Total rain 24h Current (mm/h) Binary file Models ETA 40, 20, 5 Km Ensemble 40 Km Total rain 72h 72 files ASCII grid file TerraMA2 - Natural Disasters Monitoring and Alert System Stage 4 – Multidatabase access Modelling Data discovery Data source Data access Data source Remote Analysis Remote Analysis Analysis Data source Remote Analysis Stage 4 – Multidatabase access Modelling Data discovery access Analysis The good: long-termData data preservation shared costs on data admin access to large external database The bad: rewrite software for cloud processing Data Data Data source finding data source source is a major problem Remote Remote Remote Analysis Analysis Analysis Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte? Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte? You don’t! Move the software to the archive Scientific Data Management in the Coming Decade (Jim Gray, 2005) Next-generation science instruments and simulations will produce peta-scale datasets. Such peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. The procedural stream-ofbytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Database systems will be judged by their support of common metadata standards and by their ability to manage and access peta-scale datasets. Virtual Observatory If data is online, internet is the world’s best telescope Scientific Data Management in the Coming Decade (Jim Gray) 36 Where is scientific database going? From tables to arrays nome CPF cargo relation (table) Scientific data selection, projection, join, relational algebra SELECT * FROM images WHERE date=“today” SQL language Spatial queries, Math operations SELECT Mean (A.B) FROM Array A Array Algebra AQL language Communicating concepts is hard vulnerability? climate change? poverty? Image source: WMO Communicating concepts is hard We’re bad at representing meaning deforestation? degradation? disturbance? degradation Communicating change is very hard When did the Aral Sea reach the tipping point? Describing events and processes is very hard When did the flood occur? Conclusions Earth System Science data management poses a major challenge for the database community We need new techniques, architectures and data handling techniques to deal with scientific data