Download Scientific Cloud Computing: Early Definition and Experience

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Net bias wikipedia , lookup

TV Everywhere wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

VSide wikipedia , lookup

Transcript
The 10th IEEE International Conference on High Performance Computing and Communications
Scientific Cloud Computing: Early Definition and Experience
Lizhe Wang, Jie Tao, Marcel Kunze
Institute for Scientific Computing, Research Center Karlsruhe
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
Alvaro Canales Castellanos, David Kramer, Wolfgang Karl
Department of Computer Science, University Karlsruhe (TH)
76128 Karlsruhe, Germany
Abstract
Compute Cloud [16], IBM’s Blue Cloud [14], scientific
Cloud projects such as Nimbus [17] and Stratus [26],
OpenNEbula [19].
There are still no widely accepted definition for Cloud
computing albeit Cloud computing practice has attracted
much attention. Several reasons has lead into this situation:
Cloud computing emerges as a new computing paradigm
which aims to provide reliable, customized and QoS guaranteed computing dynamic environments for end-users.
This paper reviews recent advances of Cloud computing,
identifies the concepts and characters of scientific Clouds,
and finally presents an example of scientific Cloud for data
centers.
• Cloud computing involves researchers and engineers
from various backgrounds, e.g., Grid computing, software engineering, data storage. They work on Cloud
computing from different viewpoints.
• Technologies which enable the Cloud computing are
still evolving and progressing, for example, Web 2.0
and SOA.
1. Introduction
Cloud computing emerges as a hot topic from the late
of 2007 due to its abilities of offering flexible dynamic IT
infrastructures, QoS guaranteed computing environments
and configurable software services. As reported in Google
trends (Figure 1), Cloud computing (blue line), which is
enabled by Virtualization technology (yellow line), has outpaced Grid computing [7] (red line).
• Existing computing Clouds still lack large scale deployment and usage, which would finally justify the
concept of Cloud computing.
In this paper, we try to give an early definition of “Cloud
computing” based on recent advances from academia and
industry as well as our experience. This paper also introduces a proof-of-concept computing Cloud – Cumulus,
which is deployed at our site. This paper is organized as
follows. Section 2 introduces the current projects of Cloud
computing. Section 3 defines the concept of Cloud computing. Cumulus project, our experience of Cloud computing,
is presented in Section 4. Section 5 concludes the paper.
2. Recent advances of Cloud computing
This section discusses several projects which are currently devoted to Cloud computing.
Figure 1. Cloud computing in Google trends
2.1. Globus virtual workspace service and
Nimbus
Currently numerous projects from industry and
academia have been proposed, for example, RESERVOIR project [23] - IBM and European Union joint
research initiative for Cloud computing, Amazon Elastic
978-0-7695-3352-0/08 $25.00 © 2008 IEEE
DOI 10.1109/HPCC.2008.38
A virtual workspace [10, 9] is the abstraction of an execution environment that can be made dynamically avail-
825
able to authorized clients by using well-defined protocols. The abstraction captures resource quota assigned to
such execution environments during deployment (such as
CPU or memory) as well as software configuration aspects
of the environment (such as operating system installation
or provided services). The workspace service allows a
Globus Toolkit client to dynamically deploy and manage
workspaces.
The virtual workspace services consist of the following
interfaces [10, 9]:
• The Workspace Factory Service has one operation
called “create”. Create has two required parameters:
workspace metadata and a deployment request for that
metadata.
• Once created, a workspace is represented as a WSRF
resource and can be inspected and managed through
operations of the Workspace Service.
Figure 2. OpenNebula architecture
2.3. Amazon Elastic Compute Cloud
• The Group Service allows an authorized client to manage a group of workspaces as a whole.
Amazon Elastic Compute Cloud (EC2) [16] is a Web service that provides resizable compute capacity in the cloud.
It is designed to make Web-scale computing easier for developers. Amazon EC2’s simple Web service interface allows users to obtain and configure capacity with minimal
friction. It provides users with complete control of computing resources and lets them use Amazon’s proven computing environment. Amazon EC2 reduces the time required
to obtain and boot new server instances to minutes, allowing users to quickly scale capacity, both up and down, as
computing requirements change. Amazon EC2 changes the
economics of computing by allowing users to pay only for
capacity that they actually use.
With Amazon EC2 users can:
• The Status Service offers the interface through which
a client can query the usage data the service has collected about it.
Based on Globus virtual workspace services, a cloudkit
named Nimbus [17] has been developed to build scientific
Clouds. With Nimbus client, users could:
• browse virtual machine images inside the cloud,
• submit their own virtual machine images to the clouds,
• deploy virtual machines, and
• query virtual machine status, and finally access the virtual machine.
• create an Amazon Machine Image (AMI) containing
the applications, libraries, data and associated configuration settings, or use Amazon’s pre-configured, templated images to get up and running immediately;
2.2. OpenNEbula
OpenNEbula (former GridHypervisor) is a virtual infrastructure engine that enables the dynamic deployment and
re-allocation of virtual machines in a pool of physical resources. OpenNEbula extends the benefits of virtualization
platforms from a single physical resource to a pool of resources, decoupling the server not only from the physical
infrastructure but also from the physical location [19].
OpenNEbula contains one frontend and multiple backends. The frontend provides users with access interfaces
and management functions. The backends are installed on
Xen servers, where Xen hypervisors are started and virtual
machines could be backed. Communications between frontend and backends use SSH. OpenNEbula gives users a single access point to deploy virtual machines on a locally distributed infrastructure.
• upload the AMI into Amazon Simple Storage Service
(S3). Amazon EC2 provides tools that make storing
the AMI simple. Amazon S3 provides a safe, reliable
and fast repository to store user’s images;
• use Amazon EC2 Web service to configure security
and network access;
• choose the type of instance users want to run;
• start, shutdown, and monitor as many instances of
user’s AMI as needed, using the web service APIs;
• pay for the CPU time and bandwidth that user actually
consume.
826
2.4. Discussion
center/computer center - as a pay-as-you-go subscription service. The HaaS could be flexible, scalable and
manageable to meet your needs [2].
We have studied the solutions for network configuration,
data management, virtual machine infrastructure deployment inside the cloud.
Nimbus & Globus virtual workspace provide three network configurations:
• SaaS: Software as a Service
Software or application is hosted as a service and provided to customers across the Internet. This mode
eliminates the need to install and run the application
on the customer’s local computer. SaaS therefore alleviates the customer’s burden of software maintenance,
and reduce the expense of software purchases by ondemand pricing.
• public mode picks a public IP address from a pool for
virtual machine,
• private mode picks a private IP address from a pool for
virtual machine, and
• DaaS: Data as a Service
Data in various formats, from various sources, could be
accessed via services to users on the network. Users
could, for example, manipulate remote data just like
operate on local disk; or access data in a semantic way
on the Internet.
• advisory mode gives a static IP address for virtual machine
The solutions are sometimes however beyond of some user
scenarios. For example, a data center might employ a central DHCP service, which allocates dynamic IP addresses
for all virtual machines. Globus virtual workspace in addition requires to contact all the backends of the local infrastructure. Sometimes a computer center might employ a local virtualization management system, like VMware Infrastructure [29], to manage local hosting resources. It would
pay off , in our viewpoint, that Globus virtual workspace
talk with a local management system. The same scenario
happens when Globus Toolkit works together with local resource scheduler like OpenPBS [20], or Condor [12].
OpenNEbula employs NIS (Network Information System) to manage a common user system and NFS (Network
File System) for virtual machine image management. However it has been widely recognized that NIS has a major
security flaw: it leaves users’ password file accessible by
anyone in the entire network. To employ OpenNEbula in
professional way, it is better to merge OpenNEbula with
some modern infrastructure solutions, e.g., LDAP [11] and
Oracle Cluster File System [21].
Based on the support of HaaS, SaaS, and DaaS, Cloud
computing thereafter delivers Platform as a Service (PaaS)
for users. Users thus can on-demand subscribe a computing
platform with requirements of hardware configuration, software installation and data access demands. Figure 3 shows
the relationship between above services.
PaaS
HaaS
SaaS
DaaS
Figure 3. Cloud functionalities
3.2. Key features
Cloud computing distinguishes itself from other computing paradigms, like Grid computing [7], Global computing [6], Internet Computing [13], in following aspects:
3. Cloud computing: definition, characterization and Enabling technologies
• User-centric interfaces
Cloud services could be accessed with user-centric interfaces, which means:
3.1. Functionalities
Computing clouds render users with services to access
hardware, software, and data resource; Furthermore, some
configurable integrated platforms for users could be supported:
– The Cloud interfaces do not force users to change
their working habits, e.g., developing language,
compiler, operating system, and so on.
– The Cloud client which is required to be installed locally is lightweight, for example, Nimbus Cloudkit client size is around 15MB.
– Cloud interfaces are location independent and
can be accessed by some well established interfaces like Web service and Internet browser.
• HaaS: Hardware as a Service
Hardware as a Service was coined possibly at 2006.
As the result of rapid advances in hardware virtualization, IT automation, and usage metering and pricing,
users could buy IT hardware - or even an entire data
827
• On-demand service provision
Computing Clouds provide resources and services for
users on-demand. Users can customize required computing environments later on, for example, software
installation, network configuration, as users normally
own “root” privilege.
information sharing, and, most notably, collaboration
among users. These concepts have led to the development and evolution of Web-based communities and
hosted services [4].
Mashup is a Web application that combines data from
more than one source into a single integrated storage
tool [3]. SmugMug [25] is an example of Mashup,
which is a digital photo sharing website, allowing the
upload of an unlimited number of photos for all account types, providing a published API which allows
programmers to create new functionality, and supporting XML-based RSS and Atom feeds.
• QoS guaranteed offer
The computing environments provided by computing
Clouds can guarantee QoS for users, e.g., hardware
performance like CPU bandwidth and memory size.
• Autonomous System
The Computing Cloud is an autonomous system and
managed transparently to users. Hardware, software
and data inside clouds can be automatically reconfigured, orchestrated and consolidated to a single platform image, finally rendered to users.
Globus
Virtual Workspace Service
3.3. Enabling technologies
A lot of enabling technologies contribute to the Cloud
computing, here we identify several state-of-the-art techniques:
OpenNEbula
frontend
Access point
• Virtualization
Virtualization technologies multiplex hardware and
thus provide flexible and scalable platforms. Virtual machine techniques, such as VMware [29]
and Xen [1], offer virtualized IT-infrastructures ondemand. Virtual network advances, such as VPN [5],
support users with a customized network environment
to access cloud resources.
SSH
VM
• Serviceflow and workflow orchestration
Computing clouds offer a complete set of service images on-demand, which could be composed by services inside the Cloud. Cloud should be able to automatically orchestrate services from different sources
and of different types to form a serviceflow or workflow for users.
VM
VM
VM
VM
VM
Xen
Hypervisor
Xen
Hypervisor
Xen
Hypervisor
Host
Host
Host
Oracle File System
virtual network domain
• Web service and SOA
Cloud services are normally exposed as Web services,
which follow industry standards, like WSDL [28],
SOAP [24] and UDDI [18]. The services organization
and orchestration inside clouds could be managed in a
Service Oriented Architecture (SOA). A set of Cloud
services furthermore can be included a SOA, make
themselves available on various distributed platforms
and can be accessed across networks.
physical network domain
Figure 4. Cumulus architecture
• World-wide distributed storage system
A Cloud storage model should foresee:
– A network storage system, which is backed by
distributed storage providers (e.g., data centers),
offers storage capacity for users to lease. The
data storage could be migrated, merged, and
managed transparently to end users for whatever
data formats. Examples are Google File System [8] and Amazon Elastic Storage [16].
• Web 2.0 and Mashup
Web 2.0 describes the trend in the use of World Wide
Web technology and web design to enhance creativity,
828
References
– A distributed data system which provides data
sources accessed in a semantic way. Users could
locate data sources in a large distributed environment by the logical name instead of physical locations. Virtual Data System (VDS) [27] could
be good reference.
[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield.
Xen and the art of virtualization. In Proceedings of
the 19th ACM Symposium on Operating Systems Principles, pages 164–177, New York, U. S. A., Oct. 2003.
4. Cumulus: a scientific Cloud for data center
[2] Here
comes
HaaS
[URL].
http://www.roughtype.com/archives/2006/03/here comes haas.php
access on June 2008.
Cumulus is an on-going project of Cloud computing at
our site. We design Cumulus in a layered architecture (see
also Figure 4):
[3] Web
2.0
definition
[URL].
http://en.wikipedia.org/wiki/mashup (web application hybrid)/,
access on June 2008.
• Globus virtual workspace service resides on the access
point of Cumulus, accepts users’ requirements of virtual machine operation.
[4] Web
2.0
definition
http://en.wikipedia.org/wiki/web 2/,
June 2008.
• The OpenNEbula works as Local Infrastructure Virtualization Manager (LIVM). The frontend of OpenNEbula works on the Cumulus access point and get
messages from Globus virtual workspace service.
[URL].
access
on
[5] B. Gleeson etc. A framework for ip based virtual
private networks. Rfc2764, The Internet Engineering
Task Force, Feb. 2000.
• OpenNEbula frontend communicates to its backends
and Xen hypervisors on the hosts via SSH for virtual
machine manipulation.
[6] G. Fedak, C. Germain, V. Néri, and F. Cappello.
Xtremweb: A generic global computing system. In
Proceedings of the 1st IEEE International Symposium
on Cluster Computing and the Grid, pages 582–587,
2001.
Virtual machines stay in a separate network domain.
Network solution for virtual machines could be:
• Virtual machines could start with Xen virtual network
interface and configure like physical machine. For example, virtual machines might be configured with dynamic IP addresses by listening to a center DHCP service in the network domain.
[7] I. Foster and C. Kesselman. The grid: blueprint for
a new computing infrastructure. Morgan Kaufmann,
1998.
• Virtual network technologies could be used for virtual
machine network space. For example, VNET [15] ties
virtual machines together efficiently and makes them
appear to users.
[8] S. Ghemawat, H. Gobioff, and S. Leung. The google
file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29–43,
2003.
To reach a productional quality, we build the local infrastructure using IBM Bladecenter as backend and Oracle VM
server [22] as operating system. All the hosts and virtual
machines are backed by Oracle Cluster File System [21].
Virtual machine images and templates are stored in a Oracle
File System, which could be mounted by all hosts. Application level software are also saved in the Oracle data server.
Virtual machines could thus automatically mount software
installation packages required by users. The results so far
look promising.
[9] K. Keahey, K. Doering, and I. Foster. From sandbox to
playground: dynamic virtual environments in the grid.
In Proceedings of the 5th International Workshop on
Grid Computing, pages 34–42, 2004.
[10] K. Keahey, I. Foster, T. Freeman, and X. Zhang. Virtual workspaces: achieving quality of service and
quality of life in the grid. Scientific Programming,
13(4):265–275, 2005.
5. Conclusion
[11] V. A. Koutsonikola and A. Vakali. Ldap: Framework, practices, and trends. IEEE Internet Computing,
8(5):66–72, 2004.
This paper reviews the recent advances of Cloud computing and presents our early definition of Cloud computing, its
interfaces, and its features. We also discuss our experience
of building a scientific Cloud for a data center.
[12] M. Litzkow, M. Livny, and M. W. Mutka. Condor a hunter of idle workstations. In Proceedings of the
8th International Conference on Distributed Computing Systems, pages 104–111, 1988.
829
[13] M. Milenkovic, S. H. Robinson, R. C. Knauerhase,
D. Barkai, S. Garg, V. Tewari, T. A. Anderson, and
M. Bowman. Toward internet distributed computing.
IEEE Computer, 36(5):38–46, 2003.
[14] IBM Blue Cloud project [URL].
http://www03.ibm.com/press/us/en/pressrelease/22613.wss/, access on June 2008.
[15] A. I. Sundararaj and P. A. Dinda. Towards virtual networks for virtual machine grid computing. In Proceedings of the 3rd Virtual Machine Research and
Technology Symposium, pages 177–190, 2004.
[16] Amazon
Elastic
Compute
Cloud
[URL].
http://aws.amazon.com/ec2, access on Nov. 2007.
[17] Nimbus
Project
[URL].
http://workspace.globus.org/clouds/nimbus.html/,
access on June 2008.
[18] OASIS UDDI Specification [URL]. http://www.oasisopen.org/committees/uddi-spec/doc/tcspecs.htm, access on June 2008.
[19] OpenNEbula
Project
[URL].
http://www.opennebula.org/, access on Apr. 2008.
[20] OpenPBS [URL]. http://www.pbsgridworks.com/, access on Nov. 2007.
[21] Oracle
Cluster
File
System
[URL].
http://oss.oracle.com/projects/ocfs/, access on June
2008.
[22] Oracle
Virtual
Machine
[URL].
http://www.oracle.com/technologies/virtualization/index.html/,
access on June 2008.
[23] Reservoir Project [URL].
http://www03.ibm.com/press/us/en/pressrelease/23448.wss/,
access on June 2008.
[24] Simple Object Access Protocol (SOAP) [URL].
http://www.w3.org/tr/soap/, access on Nov. 2007.
[25] SmugMug [URL]. http://www.smugmug.com/, access
on June 2008.
[26] Status Project [URL]. http://www.acis.ufl.edu/vws/,
access on June 2008.
[27] Virtual Data System [URL].
http://vds.uchicago.edu/, access on Nov. 2007.
[28] Web Service Description Language (WSDL) [URL].
http://www.w3.org/tr/wsdl/, access on Nov. 2007.
[29] VMware
virtualization
technology
[URL].
http://www.vmware.com, access on Nov. 2007.
830