Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP [email protected] www.rnp.br Generating Large Data Collections • Large Data Volumes can be generated much faster than they can be analyzed – Instrument Observations • • • • Particle Accelerators (Cern LHC) Telescopes, Satellites Sensor Networks Virtual Observatories – Large Model Simulations • High resolution, Very complex • Scientific Experiments – – – – – – medical imaging (fMRI): Bio-informatics queries: Satellite world imagery: Current particle physics: LHC physics (2007): LSST Astronomy (2012): ~ 1 GByte per measurement (day) 500 GByte per database ~ 5 TByte/year 1 PByte per year 10-30 PByte per year 5 PBytes per year Challenges Managing Large Volume Data • Scalability – What works for small datasets does not necessarily work for large collections • Data Integrity – At a terabyte scale failures and data corruption are very likely to occur – Is data provenance reliable? • Efficiency – Data should be accessed at a rate which keeps work feasible – More data – need for more speed • Distributed Access – Data can be at remote (and possibly unknown) location • Infrastructure Management – – – – Heterogeneous Distributed Prone to failures Very Complex Challenges – Getting to Know your Data • Extract knowledge from raw data files – Data product derivation • • • • Vizualization Relationships Patterns New derived quantities – Cross institutional and cross disciplinary collaborations • What if experiments – Your data with our model? • Dataset Access – Multiple formats • Each sensor, simulation has its own storage format – Federated collections – Discovery by content Technological Response • Integration of compute, communication, storage and instrument resources into a powerful infrastructure – Information Grids – Very powerful infrastructure – Economy of scale • Serves broad range of customers – biologists, pysicists, government, industry • Infrastructure is heterogeneous, distributed, very complex • Middleware and Data Oriented tools act as facilitators to tackle data management complexities Open Access and Preservation Functionalities • Federated Digital Libraries – – – – Integration of distributed repositories Access control – can decide who can see it Organize the data in collections Describe your data – Metadata • Data Grids – Access to efficient parallel I/O systems – Hierarchical Systems • Disk caches, tapes • Often Distributed – – – – Analysis, Data Mining Visualization Workflow based systems Transaction based data ingestion • Data provenance, Data fingerprinting – What if virtual lab • End User Oriented Portals – "I deal with the data in the way it makes sense to me" Middlewares and Tools • Data Management – – – – – Storage Resource Broker (SRB) Globus Data Management L-Store IBP Storage Resource Manager (SRM) • Data Representation Libraries – HDF5 – NetCDF • Portals – OGCE – JSR 168 Today’s Reality • Exceptional achievements by early adopters • Integration between domain scientists – data users and producers still a challenge – Need much more cross-disciplinary interaction • Emphasis on scale and performance • Failures are still a taboo – Frustration factor should be addressed in partnership with users – Focus on failure recovery and quality of service getting more attention Grid Initiatives around the World e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 9 UNAM OurGrid EELA SINAPAD SPRACE HEPGrid Ringrid CL Grid UCRAV Networking in Latin America CUDI-MX REACCIUN-VE RAAP-PE RNP-BR REUNA-CL Brazilian National Research And Education Network - RNP • In November 2005 the RNP networking infrastructure was entirely renovated. It consists of • A multigigabit core connecting 10 capitals at 2.5 and 10 Gbps • Connections at 34 Mbps to 11 capitals • Connections up to 16 Mbps to 6 capitals 12 Communitary Metropolitan Networks • It is not enough to bring high speed connectivity to each city – it is necessary bring it to the university campus / research lab as well. • The metropolitan network is the solution – Infrastructure sharing to support: • Campi interconnection of each partner institution • Access to RNP national network backbone – This sharing substantially reduces deployment costs – Preferably, the infrastructure will be owned by the partners themselves (reducing operating costs) • Pilot: The Metrobel project in the city of Belém do Pará in the Amazon region Infra-estrutura para e-Ciência 13 Metrobel – Belém Metropolitan Network Redecomep Project(2005-7) • Following Metrobel, Brazilian Ministry of Science and Technology is supporting the Communitary Networks for Education and Research (Redecomep) Project, with a R$ 39,7 M (~ U$ 19,0 M) through Finep (dec/2004) • Goals: – Extend the metropolitan optical network to other 26 cities with RNP points of presence – Promote integration in metropolitan area – High speed access to RNP point of presence Infra-estrutura para e-Ciência 15 Next steps • Integration between network, data repositories, compute, storage resources and applications – Identify who needs better connectivity – Developing Brazilian cyberinfrastructure – Generally uncoordinated funding for infrastructure resources – Need broad vision at funding agencies and partners level of application requirements and cyberinfrastructure integration • RNP articulating with scientific communities and infrastructure providers e-Science/Infrastructure initiative in Brazil JRU- Brazil: 22 members in EELA-2 # STATE INSTITUTION E-SCIENCE COMMUNITIES 1 SP CCE / USP (e-INFRASTRUCTURE only) 2 RJ CEFET-RJ e-GOVERNMENT, E-INDUSTRY 3 RJ FCM / UERJ BIOMED 4 RJ FIOCRUZ BIOMED, e-EDUCATION 5 SP IAG / USP CLIMATE 6 RJ IME BIOMED 7 SP INCOR / USP BIOMED 8 SP INPE CLIMATE 9 RJ LNCC BIOMED 10 RJ ON PHYSICS 11 BR RNP (NREN) (e-INFRASTRUCTURE only) 12 SP SPRACE / UNESP PHYSICS 13 PB UFCG CLIMATE, EARTH-SCIENCE 14 RJ UFF (e-INFRASTRUCTURE only) 15 MG UFJF BIOMED 16 MS UFMS BIOMED 17 RS UFRGS CLIMATE 18 RJ UFRJ (coordinator for EELA-2) BIOMED, PHYSICS, e-EDUCATION, CLIMATE 19 RS UFSM CLIMATE 20 DF UnB BIOMED 21 RJ UNILASALLE e-EDUCATION 22 SP UNISANTOS BIOMED, E-LEARNING, e-GOVERNMENT e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 17 Developing Together • Information infrastructure is being redefined in Brazil and Latin America • Now is the time to have as much cross-disciplinary interaction as possible to define needs, partnerships and investments • Please contact us THANK YOU!