Download GBIF standards

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

IEEE 1355 wikipedia , lookup

Transcript
WWW.GBIF.ORG
GLOBAL
BIODIVERSITY
INFORMATION
FACILITY
Information
architecture
of the GBIF
Hannu Saarenmaa
IABIN/CHM
Cancún, Mexico, 12-14 August 2003
Global Biodiversity Information Facility
Outline
1.
2.
3.
4.
5.
Data
Software
Hardware
Peopleware (Nodes)
Status of network and conclusion
Global Biodiversity Information Facility
1. Data

Policy, decisions,

Knowledge
and

Information
depend on

Data
Global Biodiversity Information Facility
GBIF is concerned with ”primary
biodiversity data” only
Specimens
 Observations
 Names


Species

Literature

Metadata on the above
Global Biodiversity Information Facility
How the data will be organised ?

By having a
common
information
model and
shared data
standards
Rights:
Services:
Source: URL
Protocol: SOAP, DiGIR
Format: XML Schema
Description:
Datasets
Rights:
Format:
Specimen
data
Units/
Records
Observation
data
Institutions
Data
sources
Taxonomies in
ECAT and
Catalogue of Life
Knowledge
Bases
Objects
Checklists
Redlists
Central
Distributed
Species
Unstructured
Knowledge
information
Global Biodiversity Information Facility
Data exchange standards are the key
Data description
in XML
 Specimen/
Observation


Name/ Taxon
Providers /
Collections /
Persons in
various roles
Leading standards






DiGIR
Darwin Core
ABCD/BioCASE
Dublin Core
SOAP
Grid OGSA
Standards process

GBIF-DADI works
with TDWG

Discussion,
documentation
Open source


digir.sourceforge.net
Global Biodiversity Information Facility
2.Software
GBIF is buidling a distributed
network of databases
using a web services approach
Global Biodiversity Information Facility
Web Services: Definitions
A
Web Service is a software application or
component identified by a URI, whose
interfaces and bindings are capable of being
described by standard XML vocabularies and
that supports direct interactions with other
software applications or components through
the exchange of information that is expressed
in terms of an XML infoset via Internet-based
protocols. - Chris Ferris, Sun Microsystems, W3C
Global Biodiversity Information Facility
The Web Services Stack
DiGIR,
Global Biodiversity Information Facility
2.1. The



Used for communication between data providers and users
More light-weight and specialised than SOAP
Enables single point of access (portal/search) to distributed
information resources





Resource: a collection of data objects that conform to a common
schema (DB records, XML documents)
Distributed resources conform to federation schema
Enables search & retrieval of structured data


protocol
Search for data values in context (semantics)
Results as structured data set
Makes location and technical characteristics of native resource
transparent to the user
The Distributed Generic Information Retrieval protocol has been
invented by David Vieglais (University of Kansas) and Stan Blum
(California Academy of Sciences)
Global Biodiversity Information Facility
A simple DiGIR architecture
Search
engine
Portal
Data Providers
Global Biodiversity Information Facility
GBIF DiGIR Architecture
( UDDI )
Provider
query
Registry
Institutions
Providers
Services
Metadata
and name
query
Index
Metadata
Accounting
Metadata
response
Metadata
and logs
Synonyms, guids
Name provider
DiGIR
SOAP
Provider Services
Resource
Resource
Portal
Request
Marshaller
Cache
Available
providers
Publish
availability
User
Full data
query
Query
Engine
Full data
response
Data provider
Provider Services
Metadata
Metadata
Resource
Resource
Metadata
Metadata
Global Biodiversity Information Facility
2.2. The registry
You don’t get very far with web services
unless you have a registry...”
-Tom Gaskins, uddi.org

Global marketplace of shared biodiversity data

Technically available now, awaits being populated
 Multiple UDDI servers possible in 2004 (v3)
Based on UDDI (Systinet WASP) and web services





Directory of Participants and data providers
Services of the providers, i.e., datasources and datasets offered
tModels of the standards that must be adhered to
Open interfaces for portals and specialised search
engines


Anybody can write their portal/search tool that uses the registry
Use of index is optional
Global Biodiversity Information Facility
How does the GBIF UDDI registry work?
G
B
I
F
1) GBIF Secretariat
and other developers
create and populate
the registry with
descriptions of
standards (tModels)
6) Scientists and policy
users use portals to
build data sets for
analysis and synthesis
GBIF UDDI Registry
Provider
Registrations
2) Museums and other
data providers install
data provider packages
which are automatically
registered
Services
Registrations
3) GBIF Participant is
notified of new provider in
their domain, possible
endorsement
5) Portals and
search engines
query the registry
and the index to
build tageted user
interfaces
4) A global index queries the
registry and caches metadata
and usage statistics, creating
unique identifier for each
record (and name)
Global Biodiversity Information Facility
2.3. Metadata and names index

Closely paired with the services registry
will be a global index of the available data
Retrieves metadata of datasets/resources
available in the registered providers
 Indexes on scope and coverage of
datasets/resource (Dublin Core registry)

 Taxonomic,

spatial, temporal, ...
Maintains a cache of key data in case
provider goes off-line
Global Biodiversity Information Facility
Name Service (ECAT) is a major
component of the global index
GBIF Portal
XML Data Access
HTML Data Access
ECAT elements
have been
coloured orange:
“Name Lists” are
lists of names for
a specific purpose
(e.g. Red List,
regional checklist)
Biodiversity Data Access
GBIF Data
Nodes
Specimen
Data
Observation
Data
Name Lists
Unstructured
Data URLs
Index
Manager
Indexing
of usage
Indexing
of usage
Name Usage
Index
Taxonomic
Name
Service
(ECAT)
Catalogue of
Life
Global Biodiversity Information Facility
2.4.a. Data provider software

Each system entails







Provider software
Communication with the DiGIR protocol
Data standards Darwin Core, Dublin Core
Installation for each provider
Configuration for each resource (local existing
database)
Registration with GBIF UDDI registry
Turn-key package for Linux and Windows


Based on PHP and digir.sourceforge.net code
Available in August 2003
Global Biodiversity Information Facility
2.4.b. Data repository tool

A data warehouse tool to manage and share
data without database


Upload and manage datasets in document format
either as a) spreadsheet, b) embedded Darwin
Core, or c) ABCD
Release dataset to public


Revoke release


Data is parsed into embedded MySQL database and
becomers available as DiGIR resource
Data is deleted from database
Stand-alone package or module of GBIF PTK


For Linux and Windows
Based on Python and Zope, available Q3/2003
Global Biodiversity Information Facility
2.5. Logging and accounting


Track the usage of the network and document the data
provided by the nodes.
Why?





Recognise the efforts of the data providers
Help the users to acknowledge the sources of the data they
are using
Report back to the Participants whether the GBIF network is
really used
Optimise network performance and services
How?



Willing data providers log their transactions
Central accounting service downloads logs, providing statistics
of usage and a citation service on the web site and via email
Part of the Index
Global Biodiversity Information Facility
2.6. Portals


Portals are gateways to distributed
information resources
You do not need your own portal in order to
become data provider


Just access to one that talks to a registry
Anybody can write their specialised
portal/search tool that uses the registry and
the index through their open interfaces (DiGIR,
SOAP)


The MANIS portal is available now (Java)
GBIF Portal Toolkit v2 that can be used to access
data planned for availability Q1/2004
Global Biodiversity Information Facility
Two roles of portals

Communication/ coordination needs



Portals are integrative tools and gateways to information
that go beyond single websites
Portals and related directory services can be used to
coordinate network activities
Data access needs



Much of the content on the portals can be built
automatically out of contents of the central Index
GBIF central portal is only one of many portals and
search engines making use of the central metadata
registry and related index through their open interfaces
Participant nodes need portals to data in their domain
Global Biodiversity Information Facility
GBIF Portal Toolkit
Communications portal (version 1) released at the end of 2002, and as
portal toolkit (PTK) for use by nodes











News syndication with RSS/RDF
Events, calendar of calendars, projects
Articles, documents, images, audio and video content
Search within the site, across the GBIF network
Download area
Getting started service and how to become a node
About GBIF
CIRCA-based group collaboration services
Directory services (CIRCA-based open LDAP)
Suggestions and feedback from users
Prototype data repository
Data access portal (version 2) Q1/2004,




Registry
Access to primary biodiversity data derived from the central index
Accounting service of use of data
Links to Participant nodes and their content Global Biodiversity Information Facility
Test
version of
the central
GBIF
communications
portal
Global Biodiversity Information Facility
3. Hardware

Each Participant should have on
Internet, alternatively, or both:
A network of distributed data providers
 A central data warehouse


At least one server and an Internet
connection that are stable

Can be hosted elsewhere, if stablity is
problem
Global Biodiversity Information Facility
4. Peopleware
How to become
a GBIF data provider?
Data is provided by the nodes.
Global Biodiversity Information Facility
GBIF node responsibilities
1.
2.
3.
4.
Network
Registry
Standards
Tools
GBIF
Registry,
Index, and
Portal
1.
2.
Data
Node
Coordination
Network
Registry
Standards
Tools
Consolidated Data
Identify Data Nodes
Endorse and quality assure
data nodes
National Language Interfaces
3.
1.
2.
1.
2.
3.
4.
5.
6.
Register metadata
Allow indexing
1.
2.
Participant
Node
Portal
Encourage participation
Manage registration of Data Nodes
Global Biodiversity Information Facility
NODES coordinate their
Participant networks

The NODES Committee



NODES are in key position in promotion and helping
of inclusion of new data providers and data sets



Comprises the managers of the Participant nodes
Works with the Information and Communications
Technology (ICT) staff of the Secretariat to develop the
network of nodes
Building of data network requires building of a human
network
Maintains global directory of people, roles, data
providers
Sharing the best practices, experiences and ideas
and share software tools
Global Biodiversity Information Facility
What tools Participant node needs

Registry tools to endorse institutions and data
providers




Directory of people, collections, institutions
and related communication tools
Portal server for domain-specific website



Access to the central UDDI registry
Local directory server or UDDI server
National language support as needed
Data warehouse to host data from the
willing/unable data nodes
Tools for quality assurance
Global Biodiversity Information Facility
Training

Training programme is being shaped

7 regional workshops in 2003 on
”Becoming a GBIF data provider”


Stockholm, Ottawa, Tsukuba, Lisbon, San
Jose, Africa, ”francophonie”
Secretariat only works with the
Participant nodes, therefore:



”Train the trainer” concept
Certification of a cadre of trainers
Standardised tools and materials
Global Biodiversity Information Facility
Helpdesk
For all operational services
 Ticket handling, followup
 Will be geographically distributed
 For ”GBIF-approved packages”

Global Biodiversity Information Facility
Why would I share my data?

Identity of each record will be maintained

Globally unique identifier (LSID/URN)






Comparable to authorship of names
Usage will be logged and statistics provided


Network:Provider:Namespace:Key:Version, E.g.
GBIF-LSID:mysite.org:SpecimenID:123456:1
The efforts of the data providers will be recognised
Users required to acknowledge the sources of the data they are
using
Users will be informed who is using their data (difficult without
authentication)
Could be required for publication (cf. GenBank)
”GBIF Public Licence”
Global Biodiversity Information Facility
GBIF IPR Principles



GBIF will seek to ensure that data in GBIF-affiliated databases
is in public domain
 In particular data enabling linking with other data
 GBIF will seek to ensure that source of data is
acknowledged by all users
 Cf. Open Source licenses, commons
Maintenance and control of data remain in hands of database
owners
 There will be no central data banks (except caches)
 Database owners can block access to sensitive data
 Countries have sovereignity over their biological resources
It follows that GBIF services will mainly be integrative
metadata services, and standards
Global Biodiversity Information Facility
Conclusion
Global Biodiversity Information Facility
GBIF as
a global
integrator
Global Biodiversity Information Facility
GBIF network status







NODES committee set its goal to have a DiGIR
network up and running by end of 2003
Seven regional workshops and training events
Two DiGIR provider implementations available
August 2003
UDDI registry up and running July 2003
Global index Q4/2003
Portal to browse and search data Q4/2003,
toolkit Q1/2004
Specialised services such as BIODI GARP service
emerging
Global Biodiversity Information Facility
SUMMARY

Central registry and marketplace of
distributed data

Anyone can build their vertical portals or
specilised search engines on top of that
Participant nodes: Major role in coordination
and dissemination, quality assurance
 Data nodes: Register your datasets, provide
online access to database or repository
 Data remains under the control of providers
 Data standards and web services make it work

Global Biodiversity Information Facility