Download DOI Implementation Plan - ICSU World Data System

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Geographic information system wikipedia , lookup

Theoretical computer science wikipedia , lookup

Neuroinformatics wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Responsible Citizenship of the Data World
Wim Hugo
Grenoble, 12 April 2017
http://dx.doi.org/10.1016/j.ajpath.2014.11.001
Credibility of Science
• Access to original and
complete data sets
for reproducibility
• Re-usability declines
with time
• Availability declines
with age
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000308#pone-0000308-g002
http://www.sciencedirect.com/science/article/pii/S0960982213014000
Linked Open Data
• Network of referenced objects in the web
• Dependent on permanent identifiers for the
objects
• References vocabularies, ontologies, registries,
…
The Knowledge Network
ICSU-WDS Knowledge Network: the Fabric of Science
Scholarly
Publications
(CrossRef?)
TDRs
(WDS, DSA,
DataCite*)
Samples and
Events
RDI Outputs/
Online
Resources
People
(ORCID)
Coverage
(Temporal,
Spatial, Topic)
Data Citations
(DataCite)
Institutions
(?)
Projects
Use, Caveats,
Lineage,
Methods
Initiatives
Licenses
(CoDATA,
Creative
Commons)
Networks
* Including re3data, DataBib
Funders
(?)
Exists
Not
Now
Started
6
The Knowledge Network
WDS
Generalised Scientific Data Infrastructure Use Case
“Predictable Assembly from Reliable Components”
Access/
Download
Data/
Services
“Bind”
“Publish”
Metadata
Analise/
Visualise
Process
“Find”
Discovery
Generalised Scientific Data Use Case
Generalised Scientific Data Infrastructure Use Case
Curate
Cite
Access/
Download
Data/
Services
“Bind”
“Publish”
Metadata
Analise/
Visualise
Process
“Find”
Discovery
Generalised Scientific Data Use Case
Assess/
Rate
Mediation and Brokering
Curate
Mediate
Cite
Access/
Download
Data/
Services
“Bind”
“Publish”
Metadata
Analise/
Visualise
Process
“Find”
Discovery
Mediation and Brokering
Assess/
Rate
“Responsible Citizenship of the Data World”
Content
Best Practice
Persistent Identifiers and Registries
Vocabularies, Ontologies
Global Infrastructure Services
Federated Data Services and
Implementations
Standards and Specifications
Licenses and Data Policy
Scope
Applies to all
Applies to a specific data family or
format
Applies to a specific scientific discipline
or domain
Actors
Individual Researchers, Institutions,
Initiatives
Voluntary contributors and the Public
Systems developers, Data Centre
Managers, Architects
Granularity
Individual Data Points (UncertML, …)
Individual Data Sets (GEO Data
Management Principles, ...)
Data Centres and Repositories (WDS,
DSA, ISO, Nestor)
Data Networks and Composite
Services (WDS)
Maturity
Data Management Principles
Core Certification: Trusted Data Services
Data Records
Data Sets
<UncertML>
…
…
Quality, Accreditation, and Trust
Repositories
African GRDI Perspectives
Challenges are obvious …
Technology Footprint
It is clear that technology of the type expected in more developed
nations remains a problem: not only is bandwidth at a premium, but it
is expensive, and state-of-the art equipment (both personal equipment
and in respect of data centres) are unlikely to be commonplace in
Africa in the near future.
Design directives for Networked Data Centres:
• Technology: use of mobile phone technology in a non-bandwidth
intensive manner will be a very good option. Simple data discovery
is preferable to non-discovery due to technology hurdles.
• Technology: Cloud-based services
• Governance: data dissemination via satellite remains an affordable
option for large data sets.
Open Access
Irrespective of the wide and growing acceptance and mandatory
implementation in the developed world, open access remains
problematic in the developing world. One can aim to address the
misconceptions – undoubtedly a longer term goal – but in the
meantime, discovery and access to data embargoed in some way is
preferable to non-discovery.
Design directives for Networked Data Centres:
• Technology: data centres should allow multiple modes of access
(free and open, acceptance of limiting conditions, paywall).
• Policy: Licenses should allow a small number of valid restrictions.
Divergence of national policies need to be accommodated by
matching them with a small number of standardised licenses.
Growth of Creative Commons
Policies and Licensing
Creative Commons License Use
Policies and Licensing
Where CC Works are Published
Policies and Licensing
Funding
It is highly unlikely that funding for the establishment of data centres on a
scale comparable to the developed world will emerge. African countries may
have funds, but capacity is also a problem. At times, donors or multinational
projects fund infrastructure, but one has to accept that these are often
ineffectual, or will not be able to serve the majority of scientists.
Design directives for Networked Data Centres:
• Technology: we need to make use of free technology as far as possible:
cloud-based data storage, network data centres for meta-data that are
funded by stakeholder institutions, and low-bandwidth options for data
discovery, application, and use.
• Governance: Use the crowd - peer review, quality assurance, and some
oversight functions can be crowd-sourced. It may be beneficial for
experienced scientists, globally, to act voluntarily as governance sources
for Network Data Centres – without financial compensation. Such a
framework, and the explicit roles, responsibilities, and benefits may
require endorsement by a suitable global institution such as ICSU.
Capacity
A large part of the problem with implementation of Global Research Data
Infrastructure (GRDI) and data centres as a component of these is technology focus.
Even today, it is necessary to have significant background knowledge of aspects such
as meta-data and its mainstream standards, data formats and their standards, and the
general body of knowledge associated with GRDI to participate in and benefit from the
emerging infrastructure.
This has to change: end users need not know any more of this than they need to know
of the standards that enable Google Mail or and Android smartphone. The challenge is
with the developers of the GRDI: listen to the customer.
Design directives for Networked Data Centres:
•
•
Technology: Data Centre use needs to be made intuitive, and shield end users from
technical complexity, standards, and specialist knowledge.
Technology: integration with well-established services (cloud-based services, social
networks) both in terms of functionality and shared infrastructure is needed.
Scientific Activity in Southern Africa
Ships in the Night …
• First-world-funded initiatives – e.g. SciGaIA
– Unaware of one another
– Not connected to National Initiatives
– Why not a funded programme to make Zenodo and
OpenAire immediately useful to developing country
infrastructure providers?
• Recognition of Effort
– Often hidden in first-world networks or data centres
• Coordination and landscape assessment