* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download cloud computing
Survey
Document related concepts
Transcript
Опыт использования нечетких распределенных вычислений (cloud computing) в геоинформатике М.Н. Жижин Геофизический центр и Институт космических исследований РАН New technologies and innovations • Long-term preservation with metadata and lineage (Virtual Observatories) • Parallel/disrtibuted data storage with Interactive data query and network transfer of large datasets (MapReduce) • Relational -> Object -> XML -> Array databases (SciDB) • HPC data processing and modeling algorithms (Grid) • Event detection, interrelation and data mining (AlphaSearch) • Web technologies for visualization of different data types with geolocation (Neogeography) • Collaborative data visualization (Videowalls) • Scalable virtualization of CPU/network/storage resources (Cloud Computing) Multiplets of regional earthquakes Downhole multipoint measurement at Soultz geothermal reservoir Global Lambda Integrated Facility Available Advanced Network Resources GLIF is a consortium of institutions, organizations, consortia and country National Research & Education Networks who voluntarily share optical networking resources and expertise to develop the Global LambdaGrid for the advancement of scientific collaboration and discovery. Visualization courtesy of Bob Patterson, NCSA; data compilation by Maxine Brown, UIC. www.glif.is Source: Joe Mambrotti GLORIAD: 10Gb Worldwide Ring Source: Natalia Bulashova USA-Russia Lightpath for Fast Data Transfer of Terabyte-sized Scientific Datasets • National Center for Data Mining (NCDM) at the University of Illinois at Chicago, Geophysical Center RAS and Space Research Institute RAS have successfully moved 1.4 TB of data in 4.5 hours over a 1 Gbps lightpath between Chicago and Moscow as part of the Teraflow Network initiative • Using NCDM’s open-source UDP-based Data Transfer protocol (UDT), we were able to transfer the MS SQL database with SDSS astronomy catalog. The 2.5 TB database dump was compressed to 1.4 TB, split into 60 files, transferred over a 1 Gbps lightpath and then decompressed in Moscow and loaded back to MS SQL Server • The SkyServer portal and the SDSS database were developed by Jim Gray at MSR and Alex Szalay at JHU. Russian language mirror now resides at www.skyserver.ru in Moscow • Direct Lightpath link from IKI in Moscow to NGDC NOAA in Boulder has been successfully tested Russian Skyserver mirror: www.skyserver.ru Past Observations + Predictive Model = Reanalysis 1. Direct observations in the past – including raw and processed data, e.g. meteorological station or satellite, 105 observations of atmosphere each 6 h 2. Predictive numerical model – “knows” physics, uses direct observations as boundary values, e.g. Global Circulation Model, 360 lat X 180 lon X 20 levels X 100 parameters= 1.3 X 108 data values each 6 hours 3. Reanalysis – accumulated output of the numerical model forecasts each corrected for the available direct observations for a long time period, 50 years at 6 h time step Why OGSA-DAI service container? • Standard tool in the Grid community • Supports distributed workflow (in version 3.*) • Built in support for asynchronous transactions • Compatible with Web (Axis) and Grid (OMII, UNICORE, GT4) • Looked at alternatives like OpenDap, WCS, … –documentation of our analysis is available • Problem 1: it is very complex – Solution: REST wrapper • Problem 2: supports only File, SQL and XML data types and queries – Solution: implement additional data sources and functions for data in multidimensional arrays Web technologies for visualization of different data types with geolocation KML & geoRSS Web-services for CDM data sources OGC Web Map Services WMS/WFS/WCS MS Virtual Earth Google Maps Terraserver tile server by Jim Gray in 1998 http://terraserver.microsoft.com Large database on the Web (3 TB) Operational since June 1998 Public access to USGS topo maps and aerial images Low resolution images No global coverage GPS market not ready 12 Core box set image pre-processing At the core warehouse images are acquired for the whole box set To visualize them we split them into separate samples Original box sets Processed New ways to mashup raster data Above the Clouds: A Berkeley View of Cloud Computing Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. • The services themselves have long been referred to as Software as a Service (SaaS). • The datacenter hardware and software is what we call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the public, we call it a Public Cloud; the service being sold is Utility Computing: • AmazonWeb Services, • Google AppEngine, and • Microsoft Azure. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html Amazon AWS Microsoft Azure Google AppEngine VM •x86 32 and 64 bit architecture via Xen VM •Computation elasticity allows scalability, but developer must build the machinery, or third party must provide it •Microsoft Common Language Runtime (CLR) VM; •Automatic load balancing •Predefined application in Python •Persistent state stored in MegaStore Automatic scaling Storage •Range of models from block store (EBS) to augmented key/blob store (SimpleDB) •Scaling varies from no scaling (EBS) to fully automatic (SimpleDB, S3) •APIs vary from standardized (EBS) to proprietary (S3) •SQL Data Services (restricted view of SQL Server) •Azure storage service •MegaStore/BigTable Network •Declarative specification of topology •Security Groups Availability zones •Elastic IP addresses provide persistent name •Automatic based on roles •Fixed topology for 3-tire webapps •Automatic scaling How to deploy SPIDR in Cloud? Single instance: SPIDR webapp & web services EC2 EBS S3 MySQL databases Database dump File system snapshot VM snapshot bundle VM image Can we support multiple SPIDRs? In different Amazon cloud regions? Yes! • Launch several instances of the SPIDR VM • Configure DNS round-robin for load balancing • Run MySQL master on the first instance, and MySQL slaves on others or • Use third-party high-availability products for Amazon cloud, such as RightScale Clouds above Grid: Cumulus Nimbus experiment in SKIF-Grid, fall 2009 Cloud VMs managed as Grid jobs Condor Grid deployed in Cloud