Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BD2K @ NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science [email protected] First and foremost you should see this meeting as a celebration of the hard work of the past two years Yes these are uncertain times, but … There is a commitment to the BD2K program through 2020 BD2K cannot be viewed in isolation, but rather as part of a broader view of data science @ NIH … Particularly as funding is increasingly from the IC’s A View Which Includes: • A vibrant research program of: – Fundamental developments in data science – Application of those fundamental developments – Flagship projects to which developments are applied: • PMI, Brain, Moonshot, ECHO • A sustainable data ecosystem – Commons and the FAIR Principles adoption – Cross-cutting activities • Increased workforce training • A changing governance model A Strategic Response can be Modeled on Three Axes: Research Resources Outcomes A Strategic Response Research • Fundamental • Machine learning • Data mining • Indexing • Predictive modeling … • Applied • Sustainability, governance, economics of data • Privacy and security • Effective use of clouds … Resources Outcomes A Strategic Response Research • Fundamental • Machine learning • Data mining • Indexing • Predictive modeling … • Applied • Sustainability, governance, economics of data • Privacy and security • Effective use of clouds … Resources Outcomes • Standards • Commons APIs Reference data sets Workflows Access & Authentication • Workforce A Strategic Response Research • • • • • • • Evaluated pilots FAIR data Trained workforce Best practices Policies Effective use of clouds On-ramps for all IC’s Outcomes • Fundamental • Machine learning • Data mining • Indexing • Predictive modeling … • Applied • Sustainability, governance, economics of data • Privacy and security • Effective use of clouds … Resources • Standards • Commons APIs Reference data sets Workflows Access & Authentication • Workforce A View Which Includes: • A vibrant research program of: – Fundamental developments in data science – Application of those fundamental developments – Flagship projects to which developments are applied: • PMI, Brain, Moonshot, ECHO • A sustainable data ecosystem – Commons and the FAIR Principles adoption – Cross-cutting activities • Increased workforce training • A changing governance model The Current Situation • NIH Funded Data – Total data from NIH-funded research currently estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 The Commons - Status • Commons and FAIR principles* adopted across NIH • Development and public release of a prototype Data Discovery Index – DataMed • Feb. v 1.0 • Nov v 1.5 • Cloud credits being issued for work in the Commons • FOA’s for Commons Framework being issued • Commons pilots under way * https://www.ncbi.nlm.nih.gov/pubmed/26978244 Sustainability – Sample Other Activities • Request for Information: Metrics to Assess Value of Biomedical Digital Repositories (NOT-OD-16-133) – To be discussed at Sustainability Session, Wed 1pm • RFA to support community based standards work was released in the fall for May 2017 award, session today 1pm • Funding opportunity announcement: (BD2K) Enhancing the Efficiency and Effectiveness of Digital Curation for Biomedical Big Data (RFA-LM-17-001) Applications due Dec 15 Sustainability – Looking Forward • International collaboration on business models for sustainable data repositories – Sustainable Business Models for Data Repositories (OECD Global Science Forum) – Future of Life Sciences and Biomedical Databases (International Human Science Frontiers Program) • NIH long-term data repository support – Federal interagency Workshop on Measuring the Impact of Data Repositories, 2017 – Recommend mechanism(s), review criteria, implementation plan Example Cross-cutting Activities • International partnerships • Count everything – Secure count query framework • California centers regional meetings • GA4GH – Beacon project A View Which Includes: • A vibrant research program of: – Fundamental developments in data science – Application of those fundamental developments – Flagship projects to which developments are applied: • PMI, Brain, Moonshot, ECHO • A sustainable data ecosystem – Commons and the FAIR Principles adoption – Cross-cutting activities • Increased workforce training • A changing governance model NLM • Working Group Report – http://acd.od.nih.gov/reports/Report-NLM06112015-ACD.pdf – Recommendation – NLM should become the programmatic epicenter for data science at NIH … • Patti Brennan – New NLM director What We Hope to See in 2020 • New innovations bought about by large and complex data • Evidence of translation i.e. real application at the point of care • Broad Commons adoption leading to – Improved sharing, reuse and hence cost effectiveness and reproducibility • A balance between what is spent on data vs what is gained from that data • Policies that are supportive of the above … for your hard work and to the NIH staff from the ADDS office and from across the IC’s who have toiled to make BD2K a success