Download The OpenScienceLink architecture for novel services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The OpenScienceLink architecture for novel services
exploiting open access data in the biomedical domain
Efstathios Karanastasis, NTUA
Vassiliki Andronikou (NTUA), Efthymios Chondrogiannis (NTUA)
George Tsatsaronis (TUD), Daniel Eisinger (TUD), Alina Petrova (TUD)
PCI2014 conference, Athens, Greece
3rd of October 2014
CIP-ICT PSP-2012-6
ICT PSP Main Theme: Open
Data and Open Access to
Scientific Information
The Scientific Literature Background
• Lack of universal well-structured repositories of scientific and
research data for experimentation and benchmarking of
pertinent research works in a given thematic area
• Fragmented, lengthy, weak and inefficient peer review
processes given the growing number of journals, magazines
and conferences
• Non-objective and extremely focused (in terms of the
aspects that they cover such as impact and popularity) tools
and metrics for assessing research work as well as
individuals, institutions and organizations which are based
on a specific snapshot of the scientific work
• Poorly linking of research articles to data journals
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
The Scientific Literature Background
• Growing wealth of the scientific work and information
produced by researchers and scholars
– scientific/research articles
– monographs
– research datasets
• Need for more effective processes and improved tools
and techniques towards:
–
–
–
–
–
reviewing scientific articles and research data
organising and managing data journals
bibliographic analysis
management of scientometrics and development of new ones
better collaboration between researchers
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
OpenScienceLink Objectives
Open Semantically-enabled, Social-aware
Access to Scientific Data
• Provide a holistic approach to the publication, sharing,
linking, reviewing and evaluation of research results
based on open access to scientific information
• Empower a novel eco-system for open access to scientific
information, which will provide a range of added-value
services for all stakeholders
• Main Outcomes:
– The OpenScienceLink platform
– Implementation of 5 pilots
– The Biomedical Data Journal (BMDJ)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
OpenScienceLink Pilots
1. Research Dynamics-aware Open Access Data Journals
Development
2. Novel open, semantically-assisted peer review process
3. Data Mining for Biomedical and Clinical Research
Trends Detection and Analysis
4. Data Mining for Proactive Formulation of Scientific
Collaborations
5. Scientific field-aware, Productivity- and Impact-oriented
Enhanced Research Evaluation Services
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
5
Pilots Overview
Research Dynamics-aware
Open Access Data Journals
Development
•
Data journal establishment
•
Journal issue suggestion
•
Dataset submission
•
Novel open, semanticallyassisted
peer
review
process
•
Article-based reviewers
suggestion
Dataset peer review
•
Assign competent reviewers
•
Publishing
•
•
Assessment and evaluation
•
Identification of research
dynamics associated with
specific datasets
Review support tools (e.g.
automatic retrieval of relevant
research articles)
•
Review form submission
•
Post-review discussion
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Pilots Overview
Data Mining for Biomedical
and
Clinical
Research
Trends
Detection
and
Analysis
•
Detect research trends
•
Analyse research trends
•
Essential for:
– allocation of research funding (by
private sponsors and
governmental agencies)
– overall planning of research
strategies
OpenScienceLink
Data Mining for Proactive
Formulation of Scientific
Collaborations
•
Enable the networking and
collaboration of researchers and
scholars working on similar or
potentially collaborating scientific
fields and sharing similar
research interests
•
Infer relationships between
researchers and research
groups, including (in several
cases) non-obvious, nondeclared ones
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Pilots Overview
Scientific field-aware, Productivity- and
Enhanced Research Evaluation Services
Impact-oriented
•
Current simplified indices and impact factors evaluate only an aspect of the
scientific work
•
Introduce, produce and track new metrics of research and scientific
performance, beyond conventional ones for evaluation of:
– Research work (incl. data papers)
– Researcher
– Research group or community
– Conference, Journal, Publisher
– Department, Laboratory, Institution, University, Organisation
– Country
– Research grant
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
OpenScienceLink Ecosystem
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Integrated Platforms
•
FP7 SocIoS
–
–
–
–
•
GoPubMed
–
–
–
•
A set of tools that leverage the potential of Social Networking Sites (SNSs)
Serves as an umbrella for accessing user data scattered among various SNSs through a
common and secure interface
Hides SNS-specific complexity
Enables the delivery of services which exploit social graphs
A semantic search engine for the life sciences
Allows exploring PubMed search results with concepts from the Medical Subject Headings
(MeSH), the Gene Ontology (GO) and the Universal Protein Resource (UniProt)
A data management model expanded with the ability to index, annotate, and semantically
search datasets
FP7 PONTE
–
–
–
A knowledge-oriented platform that supports the design and creation of clinical trial protocols
Provides a set of semantic web enabled mechanisms and services facilitating clinical trials
lifecycle
Incorporates a set of advanced data mining and semantic reasoning mechanisms which are
applied on a variety of web data sources containing clinical and non-clinical information
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
OpenScienceLink Conceptual Architecture
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
OpenScienceLink Core Components
• The OpenScienceLink core components implement the main
functionality of the Platform and form the OpenScienceLink
API
• Users Management
– Responsible for handling all functionality related to the Platform
users, their profile and access rights, such as user registration,
profile editing, authentication and role-based authorisation.
• Datasets Management
– Responsible for handling all functionality related to datasets and the
corresponding metadata.
– Metadata are partially based on the Dryad Metadata Application
Profile, including extensions at the level of parameters, e.g. dataset
source type (real-world vs. synthetic), level of noise, and species,
among others
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Core Components Layer
• Articles Management
– Responsible for handling all functionality related to articles
• Authors Management
– This component is responsible for handling all functionality related to
authors
• Groups Management
– This component is responsible for handling all functionality related to groups
of people
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Core Components Layer
• Review Data Management
– Responsible for handling all functionality related to the review process and
the corresponding data
– Covers the initiation and updating of the review process as well as the
provision of access to the reviews to the corresponding users
• For example, for a particular article or dataset, some users can see
their own review (e.g. a reviewer), some users can see all reviews
without knowing the reviewers (e.g. an author), and some users can
see all reviews and reviewers (e.g. a publisher)
– Comments and ratings are also managed by this component, always
considering each user's access rights
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Adaptors Layer
• The OpenScienceLink core components interact with the
SocIoS, GoPubMed and PONTE platforms by means of the
adaptors
• The latter undertake the required actions, mappings and
transformations in order to enable communication with the
existing platforms and ultimately the underlying data sources
for the exploitation of the existing wealth of information
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
Social Networks Adaptor
•
•
•
•
•
•
•
Comprises a simplification layer on top of the SocIoS Services
Undertakes the integration of the underlying SocIoS platform and
communication with the connected SNS(s)
Receives requests from the OpenScienceLink core components for
the provision of data stemming from the connected SNSs, including the
exact type of information required and the SNS(s) involved
Combines SocIoS services in order to provide tailored functionality
pertaining to the specific data needs of the OpenScienceLink Core
Components
Queries the services built on top of the SocIoS platform in order to
further process the specific requests and gather the required data
Internally performs data processing or mapping that may required for
the seamless collaboration between the OpenScienceLink core
components and the SocIoS platform in either direction
Offered functionality: Persons retrieval, connected persons retrieval,
media items retrieval, activities retrieval, data transformation and data
extraction
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
16
Content and Data Management
Adaptor
• Integrates the data management system of GoPubMed
within the OpenScienceLink Platform
• Integrates the services of the GoPubMed semantic search
engine
• Comprises a simplified layer of services on top of the
GoPubMed platform that pertain to the indexing of data,
annotation with the underlying ontology concepts,
importing of new ontologies, semantic search on the
indexed data and identification of trends in the indexed
data.
– Utilised for presenting statistics about the resulting set of
documents, such as the number of publications over time, the
top countries, cities, journals, authors and ontology terms
– It is, thus, a summary of the trends observed for the documents
that are returned via the input query
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
17
Semantically-enabled Inference
Adaptor
•
•
•
•
•
Enables the integration of the PONTE platform with OpenScienceLink
Exploits the PONTE data mining and semantic reasoning mechanisms and
services as well as the rich knowledge base of the PONTE platform
Use of the term co-occurrence index building capability of the PONTE
platform, in order to exploit the fact that relevant terms appear together in the
literature – the more this happens, the more relevant they are considered to be
– and build a co-occurrence index for pairs and triples of terms, ranked on
each case by frequency (offering a first stage filter of information, able to
reduce the amount of information to manageable levels, without sacrificing
interesting results, for guiding research)
Exploitation of a local knowledge base based on curated data from the web of
linked data, as well as specialized data sources (incl. KEGG, ChEBI,
DrugBank, Sider, etc)
Application of various ranking algorithms to the discovered data, following the
knowledge-based concept correlations capability stemming from PONTE, with
the ranking results being used either for presentation purposes (top first) or for
adjusting the level of inclusion / exclusion of terms deemed relevant.
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
18
Conclusions
• The OpenScienceLink platform enables accessing and
offering of added value services (including trends detection
and analysis, development of new scientometrics, data
journals management, enhanced review processes) based
on a multitude of openly accessible data sources (from
literature and data sets to social network data), while at the
same time empowering their semantic linking and data
processing
• It further offers a wide range of opportunities for better
collaboration between researchers, scholars, and
research organisations, including their ability to formulate
added-value scientific / research networks
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
19
Future Work
• Expand the capabilities of the components and user interfaces
according to the recorded end user needs and requirements
regarding all Pilots
• Address any issues with the implemented functionality and provide
improvements based on the end user’s evaluation feedback
• Consider additional data sources for inclusion via integration with
the underlying platforms, according to the needs of
OpenScienceLink
• Investigate the integration more SNSs, with the aim to also include
networks specifically addressed to researchers and research
communities, with the most probable first candidate being Mendeley
• Analyse the steps required (e.g., link with other domains’ ontologies,
data sources and models) for enabling the Platform to offer its
services beyond the biomedical domain, and, thus, ideally become
domain-agnostic
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
20
Thank you
► Contact
Efstathios Karanastasis
Research Engineer
+30 210 772 2132
[email protected]
National Technical University of Athens
School of Electrical and Computer Engineering
Distributed Knowledge and Media Systems Group
http://grid.ece.ntua.gr
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
21
The OpenScienceLink Platform
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
22
Log in
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
23
User registration (step 1 of 3)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
24
User registration (step 2 of 3)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
25
User registration (step 3 of 3)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
26
Main menu bar
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
27
My profile
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
28
My datasets
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
29
Upload dataset
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
30
Create review call (1 of 2)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
31
Create review call (2 of 2)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
32
Trends
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
33
Collaborations
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
34
Evaluation (1 of 2)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
35
Evaluation (2 of 2)
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
36
PONTE: Eligibility Criteria Model
Scope within PONTE:
► Formal representation of Eligibility (Inclusion/Exclusion)
Criteria
► Patients Model for Clinical Research purposes
(especially recruitment)
Current Status: 1st year work
► Work upon extending and adapting the eligibility criteria
model for OpenScienceLink purposes
Future work: 2nd and mainly 3rd year
► Update and Integrate I/E criteria model within
OpenScienceLink platform
► Annotate literature search results
► Improve literature search process
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
37
PONTE: Abbreviations - Introduction
► An abbreviation is shortened form of a term or expression
(aka the expanded form)
► Abbreviations are widely used in biomedical articles and
datasets. Example:
► An abbreviation is present within a document,
e.g. “Cardiac testing for all patients at low-risk for ACS is not
sustainable”…
► But its expansion is missing
Acute Coronary Syndrome
► Highly Ambiguous
► Over 5 expansions per abbreviation on average
► Abbreviations expansion detection or prediction is a real
challenge
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
38
PONTE: Abbreviations - Tasks
►Current Status: Work done during 1st year
►In-depth analysis of problem
►Abbreviation Expansion Detection and Prediction
System Architecture
►Description of Algorithms / Methodology
►Future Work: for 2nd and 3rd year
►Repository of abbreviations with expansions along
with context
►Suggestion of most appropriate expansion for an
abbreviation
OpenScienceLink
Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014
39