Download e-Infrastructure Integration with gCube - Indico

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer security wikipedia , lookup

Mobile security wikipedia , lookup

Computer and network surveillance wikipedia , lookup

Information privacy law wikipedia , lookup

Data remanence wikipedia , lookup

Transcript
EGI User Forum
13 April 2011
Vilnius (Lithuania)
e-Infrastructure Integration with gCube
Andrea Manzi ( CERN )
Pasquale Pagano ( ISTI-CNR )
www.d4science.eu
Outline
• D4Science II Ecosystem
• gCube architecture
• Interoperability approaches
•
•
•
•
•
Resource Discovery
Data Storage & Access
Data Discovery
Data Process
Security
• Applications
•
•
AquaMaps
Time Series
2
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
D4Science II Ecosystem
•
Heterogeneous
resources
•
Heterogeneous
computational
platforms
FAO Geonetwork
•
Rich set of
legacy
applications
FAO FIGIS
•
Multiple
administrative
domains
INSPIRE
AquaMaps
•
Evolving
communities
Hadoop
EGEE/EGI
D4SCIENCE INFRASTRUCTURE
DRIVER
Community B
Community C
Portal
GENESI-DR
Community C
Community A
3
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
gCube architecture
gCube run-time environment
gCube Definition and Management Services
gCube Application Services
Presentation Services
Portlets
Information Organization
Services
Collection Content Metadata Annotation -…
Management
Ontology
Management
Process Execution
Management
VRE
Management
Storage
Management
gCube Container
Information Access Services
Search
Framework
Personalization
Service
Index
Management
Framework
User Services
Application Support Layer
DIR Support
Framework
Information System
Security
gCore Framework
4
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Virtual Organization
A Virtual Organization (VO) specifies how a set of users can
access a set of resources
 what is shared
 who is allowed to share
 the conditions under which
sharing can occur
The concept of VO Is not adequate to cover some common
scenarios
• Data needs to be assessed before to make it publically
exploitable by the VO members.
• Restricted set of users have to collaborate to refine
processes and implement show cases.
• Products generated through elaboration of data or simulation
have to be validated by expert users.
5
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Virtual Research Environment
VRE 1
VRE 2
VO
Virtual Research Environment (VRE) is
 a distributed and dynamically created
environment
 where subset of resources can be
assigned to a subset of users via
interfaces
 for a limited timeframe
 at little or no cost for the providers of
the infrastructure
 Integrated with cloud systems (
OpenNebula )
gCube is a first example of a VRE management system
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
6
www.d4science.eu
Interoperability: Assumptions
 Very rich applications and data collections are currently
maintained by a multitude of authoritative providers
 Different problems require different execution
paradigms: batch, map-reduce, synchronous call,
message-queue, …
 Key distributed computation technologies exist: grid
(gLite and Globus), distributed resource management
(Condor), clusters (Hadoop), …
 Several standards are adopted in the same domain
7
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability: Landscape
security
Data
process
Data
Access
Data
Storage
Resource
Discovery
Data
Discovery
Unstructured Data: blob (binary), and textual files
Structured Data: tabular, statistical, geospatial, temporal, and textual data
Compound Data: data composed by unstructured and structured data entities
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
8
www.d4science.eu
Interoperability: gCube Vision
gCube objectives:
hide heterogeneity, i.e. abstract over differences in
location, protocol, and model;
embrace heterogeneity, i.e. allow for multiple locations,
protocols, and models;
Technical goals:
 no bottlenecks: scale no less than the interfaced
resources
 no outages: keep failures partial and temporary
 autonomicity: system reacts and recovers
9
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Hiding Heterogeneity
• Heterogeneous resources are virtually accessible in a
common ecosystem of resources
•
despite their locations, technologies, and protocol
• Different communities have access to different views
•
according to the conditions under which the sharing can occur
• Each community can define
its own VRE
•
for a limited timeframe and at
no cost for the providers of the
resource
• Several VRE can coexist
•
without interfering each other
even by competing for the
same resources
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
10
www.d4science.eu
Embracing Heterogeneity
Approaches and solutions to achieve interoperability :
Blackboard-based
 asynchronous communication between components in a
system
 one protocol to R/W and one language to specify messages
Wrapper/ Mediator-based
 translates one interface for a component into a compatible
interface
Adaptor-based
 provides a unified interface to a set of other components
interfaces and encapsulates how this set of objects interact
11
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
gCube interoperability framework: the solution
Interoperability Approaches:
Resource Discovery
Each resource is represented by a profile (metadata) characterising:
 the interface
 the state
 the list of dependencies
 the run-time status
 the policies
 the configuration
 the pending tasks to execute
A Resource profile
 is published by the resource owner
 is discovered by the resource consumers asynchronously through a
common resource-independent protocol
gCube offers a distributed and scalable Information System
(blackboard) to store, discover, and access resource profiles
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
12
www.d4science.eu
Interoperability Approaches:
Content Interoperability[1/2]
gCube Open Content Management Architecture (OCMA)
 Assumption
 data stored in different storage back-ends
 diverse locations, models, access types
 few common primitives: documents, collections, repositories
 gCube allows to
 reach content that lies outside system
 expose content (reachable from) inside system

perform coarse-grained as well as fine-grained retrieval, update, and
addressing
 Runtime scalability
 autonomic read-only state replication,
 maximize throughput, minimize response time: discovery-time load balancing
(through IS)
 reduce latencies
 Software
 plugin-based architecture to reduce development costs (plugins over Storage
systems)
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
13
www.d4science.eu
Interoperability Approaches:
Content Interoperability[2/2]
Content Manager Service ( OCMA Service)
• Adapts gCube doc model ( gDoc ) to an unbounded number of back-end types
gDoc
factory
adapts
T1
gDoc
Write
gDoc
Read
…
adapts
T2
14
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability Approaches :
Data Discovery
gCube offers
 Several index types







Forward indexing, which supports ultra fast lookups on tabular
typed metadata;
XML indexing, that supports semistructured lookups on content
metadata;
Textual field indexing, that supports full text and qualified lookups
on textual (mainly) metadata;
Metadata full text indexing, that enables full text lookups on
metadata;
Content full text indexing, that enables full text lookups on text
extracted by content;
Geospatial/temporal indexing, that enables geospatial proximity
and coverage queries to be executed over geospatial/temporal
metadata;
Feature indexing, that enables high-dimension vector indexing, for
feature lookup (currently the feature is inactive);
15
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability Approaches :
Process Execution [1/2]
gCube offers solutions to:
 Decouple the business domain and infrastructure specific
logic from the core “execution” functionality

Invocate a wide range of logic components: SOAP and REST
WebServices, Shell Scripts, Executable Binaries, POJOs,
…

Support most of the execution paradigms: batch, map-reduce,
synchronous call

Bridges key distributed computation technologies: grid (gLite and
Globus), Condor, Hadoop
 Control and monitor the execution of a processing flow
 Staging of data among different storage providers
 Streaming data among computation elements
16
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability Approaches :
Process Execution [2/2]
By using adaptors that
 operate on a specific
third party language
and translate them
into native constructs,
 allow for the creation
of complex
workflows that
exploit several
diverse technologies
deployed on different
infrastructures
17
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability Approaches :
Security [1/2]
gCube offers solutions :
 To secure access to gCube resources for interoperable
external systems (incoming security)
 To ensure Interoperability of gCube security mechanisms
with standards compliant security systems (reuse)
 To facilitate secure access to external resources for
gCube services (outgoing)
18
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Interoperability Approaches :
Security [2/2]
Authz:
• XACML for authz request/response protocol and policy
definition
• SAML assertions to transport user/service authN
information
• Argus-based approach (EMI Authz framework) having
pluggable design to integrate additional PIPs
• SAML Profile for XACML 2.0 following the OASIS
Authorization Interoperability Profile Specification
AuthN:
• Production level SSL/HTTPS support
• Key- and Trust-Manager
19
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Species Distribution Maps Generation
AquaMaps is an application*
 tailored to predict global distributions of marine species
initially designed for marine mammals and subsequently
generalised to marine species,
 that generates color-coded species range maps using a
half-degree latitude and longitude blocks
 by interfacing several databases and repository providers
* Algorithm by Kashner et al. 2006
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
20
www.d4science.eu
Species Distribution Maps Generation
AquaMaps execution is based on the gCube Ecological Niche
Modelling Suite which allows the extrapolation of known
species occurrences
◦ to determine environmental
envelopes (species tolerances)
◦ to predict future distributions by
matching species tolerances
against local environmental
conditions (e.g. climate change
and sea pollution)
Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360
Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540
species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
21
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Time Series Management
Offers a set of tools to manage capture statistics
 Supports the complete TS lifecycle
 Supports validation, curation, and analysis
 Provides support for data reallocation
 Produces uniform data-set
22
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Time Series and R statistical software
integration
The main aims are to:
• provide a complete, fully working, environment for R
language
• give user methods to automatically extract data from the
time series he was working on
• give user the possibility to perform queries on the time
series database
• provide a service distributed on the infrastructure.
Multiple instances can be managed on the infrastructure
VREs, the distribution being transparent to the users
(SaaS model)
23
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Conclusions
gCube System:
 Stable software being improved over the last 5 years ( end
of DILIGENT -> D4Science -> D4ScienceII)
gCube offers a variety of patterns, tools, and solutions
 to delivery interoperability solutions and interconnect



Heterogeneous digital content
Heterogeneous repository systems
Heterogeneous computation platforms
 to decrease the cost of adoption
 to deal with several standards
24
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu
Questions Time
25
e-Infrastructure Integration with gCube
Vilnius, 13 April 2011
www.d4science.eu