Download Sustainability - Accomplishments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
BD2K @ NIH – A Vision Through
2020
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
[email protected]
First and foremost you should see this
meeting as a celebration of the hard
work of the past two years
Yes these are uncertain times, but …
There is a commitment to the BD2K
program through 2020
BD2K cannot be viewed in isolation,
but rather as part of a broader view of
data science @ NIH …
Particularly as funding is increasingly
from the IC’s
A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
A Strategic Response can be Modeled on
Three Axes:
Research
Resources
Outcomes
A Strategic Response
Research
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
Resources
Outcomes
A Strategic Response
Research
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
Resources
Outcomes
• Standards
• Commons
APIs
Reference data sets
Workflows
Access &
Authentication
• Workforce
A Strategic Response
Research
•
•
•
•
•
•
•
Evaluated pilots
FAIR data
Trained workforce
Best practices
Policies
Effective use of clouds
On-ramps for all IC’s
Outcomes
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
Resources
• Standards
• Commons
APIs
Reference data sets
Workflows
Access &
Authentication
• Workforce
A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
The Current Situation
• NIH Funded Data
– Total data from NIH-funded research currently estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB
this year
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
The Commons - Status
• Commons and FAIR principles* adopted across
NIH
• Development and public release of a prototype
Data Discovery Index
– DataMed
• Feb. v 1.0
• Nov v 1.5
• Cloud credits being issued for work in the
Commons
• FOA’s for Commons Framework being issued
• Commons pilots under way
* https://www.ncbi.nlm.nih.gov/pubmed/26978244
Sustainability – Sample Other Activities
• Request for Information: Metrics to Assess Value of Biomedical
Digital Repositories (NOT-OD-16-133)
– To be discussed at Sustainability Session, Wed 1pm
• RFA to support community based standards work was released in
the fall for May 2017 award, session today 1pm
• Funding opportunity announcement: (BD2K) Enhancing the
Efficiency and Effectiveness of Digital Curation for Biomedical Big
Data (RFA-LM-17-001)
 Applications due Dec 15
Sustainability – Looking Forward
• International collaboration on business models
for sustainable data repositories
– Sustainable Business Models for Data Repositories
(OECD Global Science Forum)
– Future of Life Sciences and Biomedical Databases
(International Human Science Frontiers Program)
• NIH long-term data repository support
– Federal interagency Workshop on Measuring the
Impact of Data Repositories, 2017
– Recommend mechanism(s), review criteria,
implementation plan
Example Cross-cutting Activities
• International partnerships
• Count everything – Secure count query
framework
• California centers regional meetings
• GA4GH – Beacon project
A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
NLM
• Working Group Report
– http://acd.od.nih.gov/reports/Report-NLM06112015-ACD.pdf
– Recommendation – NLM should become the
programmatic epicenter for data science at NIH …
• Patti Brennan – New NLM director
What We Hope to See in 2020
• New innovations bought about by large and
complex data
• Evidence of translation i.e. real application at the
point of care
• Broad Commons adoption leading to
– Improved sharing, reuse and hence cost effectiveness
and reproducibility
• A balance between what is spent on data vs what
is gained from that data
• Policies that are supportive of the above
… for your hard work and to the NIH
staff from the ADDS office and from
across the IC’s who have toiled to
make BD2K a success