* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DOI Implementation Plan - ICSU World Data System
Survey
Document related concepts
Transcript
Responsible Citizenship of the Data World Wim Hugo Grenoble, 12 April 2017 http://dx.doi.org/10.1016/j.ajpath.2014.11.001 Credibility of Science • Access to original and complete data sets for reproducibility • Re-usability declines with time • Availability declines with age http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000308#pone-0000308-g002 http://www.sciencedirect.com/science/article/pii/S0960982213014000 Linked Open Data • Network of referenced objects in the web • Dependent on permanent identifiers for the objects • References vocabularies, ontologies, registries, … The Knowledge Network ICSU-WDS Knowledge Network: the Fabric of Science Scholarly Publications (CrossRef?) TDRs (WDS, DSA, DataCite*) Samples and Events RDI Outputs/ Online Resources People (ORCID) Coverage (Temporal, Spatial, Topic) Data Citations (DataCite) Institutions (?) Projects Use, Caveats, Lineage, Methods Initiatives Licenses (CoDATA, Creative Commons) Networks * Including re3data, DataBib Funders (?) Exists Not Now Started 6 The Knowledge Network WDS Generalised Scientific Data Infrastructure Use Case “Predictable Assembly from Reliable Components” Access/ Download Data/ Services “Bind” “Publish” Metadata Analise/ Visualise Process “Find” Discovery Generalised Scientific Data Use Case Generalised Scientific Data Infrastructure Use Case Curate Cite Access/ Download Data/ Services “Bind” “Publish” Metadata Analise/ Visualise Process “Find” Discovery Generalised Scientific Data Use Case Assess/ Rate Mediation and Brokering Curate Mediate Cite Access/ Download Data/ Services “Bind” “Publish” Metadata Analise/ Visualise Process “Find” Discovery Mediation and Brokering Assess/ Rate “Responsible Citizenship of the Data World” Content Best Practice Persistent Identifiers and Registries Vocabularies, Ontologies Global Infrastructure Services Federated Data Services and Implementations Standards and Specifications Licenses and Data Policy Scope Applies to all Applies to a specific data family or format Applies to a specific scientific discipline or domain Actors Individual Researchers, Institutions, Initiatives Voluntary contributors and the Public Systems developers, Data Centre Managers, Architects Granularity Individual Data Points (UncertML, …) Individual Data Sets (GEO Data Management Principles, ...) Data Centres and Repositories (WDS, DSA, ISO, Nestor) Data Networks and Composite Services (WDS) Maturity Data Management Principles Core Certification: Trusted Data Services Data Records Data Sets <UncertML> … … Quality, Accreditation, and Trust Repositories African GRDI Perspectives Challenges are obvious … Technology Footprint It is clear that technology of the type expected in more developed nations remains a problem: not only is bandwidth at a premium, but it is expensive, and state-of-the art equipment (both personal equipment and in respect of data centres) are unlikely to be commonplace in Africa in the near future. Design directives for Networked Data Centres: • Technology: use of mobile phone technology in a non-bandwidth intensive manner will be a very good option. Simple data discovery is preferable to non-discovery due to technology hurdles. • Technology: Cloud-based services • Governance: data dissemination via satellite remains an affordable option for large data sets. Open Access Irrespective of the wide and growing acceptance and mandatory implementation in the developed world, open access remains problematic in the developing world. One can aim to address the misconceptions – undoubtedly a longer term goal – but in the meantime, discovery and access to data embargoed in some way is preferable to non-discovery. Design directives for Networked Data Centres: • Technology: data centres should allow multiple modes of access (free and open, acceptance of limiting conditions, paywall). • Policy: Licenses should allow a small number of valid restrictions. Divergence of national policies need to be accommodated by matching them with a small number of standardised licenses. Growth of Creative Commons Policies and Licensing Creative Commons License Use Policies and Licensing Where CC Works are Published Policies and Licensing Funding It is highly unlikely that funding for the establishment of data centres on a scale comparable to the developed world will emerge. African countries may have funds, but capacity is also a problem. At times, donors or multinational projects fund infrastructure, but one has to accept that these are often ineffectual, or will not be able to serve the majority of scientists. Design directives for Networked Data Centres: • Technology: we need to make use of free technology as far as possible: cloud-based data storage, network data centres for meta-data that are funded by stakeholder institutions, and low-bandwidth options for data discovery, application, and use. • Governance: Use the crowd - peer review, quality assurance, and some oversight functions can be crowd-sourced. It may be beneficial for experienced scientists, globally, to act voluntarily as governance sources for Network Data Centres – without financial compensation. Such a framework, and the explicit roles, responsibilities, and benefits may require endorsement by a suitable global institution such as ICSU. Capacity A large part of the problem with implementation of Global Research Data Infrastructure (GRDI) and data centres as a component of these is technology focus. Even today, it is necessary to have significant background knowledge of aspects such as meta-data and its mainstream standards, data formats and their standards, and the general body of knowledge associated with GRDI to participate in and benefit from the emerging infrastructure. This has to change: end users need not know any more of this than they need to know of the standards that enable Google Mail or and Android smartphone. The challenge is with the developers of the GRDI: listen to the customer. Design directives for Networked Data Centres: • • Technology: Data Centre use needs to be made intuitive, and shield end users from technical complexity, standards, and specialist knowledge. Technology: integration with well-established services (cloud-based services, social networks) both in terms of functionality and shared infrastructure is needed. Scientific Activity in Southern Africa Ships in the Night … • First-world-funded initiatives – e.g. SciGaIA – Unaware of one another – Not connected to National Initiatives – Why not a funded programme to make Zenodo and OpenAire immediately useful to developing country infrastructure providers? • Recognition of Effort – Often hidden in first-world networks or data centres • Coordination and landscape assessment