Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UCAR Workshop Review – “Bridging Data Lifecycles: Tracking Data Use via Data Citations” Matt Mayernik Research Data Service Specialist NCAR Library/Integrated Information Services (IIS) National Center for Atmospheric Research (NCAR) University Corporation for Atmospheric Research (UCAR) BESSIG, April 18, 2012 Workshop • April 5-6, at UCAR Center Green Campus • Funded by NOAA through the UCAR JOSS program • ~80 attendees – – – – Academic librarians Data management professionals Software engineers Scientists • Agenda and presentations posted at http://library.ucar.edu/data_workshop/ 2 What is a Data Citation? Citation to journal article Citation to data set From: Patil, S. and M. Stieglitz. 2011. Hydrologic similarity among catchments under variable flow conditions. Hydrology and Earth System Sciences, 15, 989–997. doi: 10.5194/hess-15-989-2011 3 Interest in Data Citations NSF GEO issued a “Dear Colleague Letter” on March 29 4 NCAR/UCAR Data Climate model output data Longitudinal time-series data All images: copyright University Corporation for Atmospheric Research Observational data from field studies 5 Motivation for Data Citations • Understand use and impact of data – Measurements of data use – Give scientists and data centers credit for producing, managing, and curating data – Metrics requirements as an FFRDC • Connecting data and scholarship • Increase transparency of data and science 6 Mark Parsons 7 Data Citation Practices • Most data users don’t cite data • Ex. “MODIS snow cover data” from NSIDC From: Parsons, M. A., Duerr, R., and Minster, J.-B. 2010. Data Citation and Peer Review. Eos Transactions, AGU, 91(34): 297-298. http://dx.doi.org/10.1029/2010EO340001 8 Mark Parsons Hypothesis: ~80% of citation scenarios for 80% of ESS data 9 Joan Starr EZID: long-term identifiers made easy take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation Primary Functions 1. Create persistent identifiers 2. Manage identifiers over time 3. Manage associated metadata over time Joan Starr DOIs vs • Established brand in publishing • Indexed by major A&I citation databases • Cannot be deleted • More costly • Ex. ARKs • Case sensitive • Special feature supports granularity • Informative • Less costly • Ex. http://n2t.net/ark:/b5065/d6wd3xh5 http://dx.doi.org/10.5065/D6WD3XH5 Both resolve to: http://www.ncl.ucar.edu Bill Cook Excerpts from existing AGU policy – Citing Data ..data cited in AGU publications must be permanently archived in a data center or centers that meet the following conditions: • are open to scientists throughout the world. • are committed to archiving data sets indefinitely. • provide services at reasonable costs. Data sets that are available only from the author, through miscellaneous public network services, or academic, government or commercial institutions not chartered specifically for archiving data, may not be cited in AGU publications. Bill Cook Excerpts from existing AGU policy – Preserving/Archiving Data AGU does not expect to archive data sets subject to this policy, except on a for-fee basis and for sets of a small size It is not AGU's intention to serve as an archive for large data sets that should be housed in data centers. AGU maintains a deposit service for supplementary material of different types in order to provide long-term access to small supporting data sets and graphics files that are published concurrently with, and are an electronic component of, some AGU journal articles. NCAR Data Citation Initiatives 1. Technical 2. Policy/procedural Image copyright University Corporation for Atmospheric Research 14 Citation Challenges 1. Diversity 2. Granularity 3. Version Control 4. Maintenance Over Time 15 Mike Daniels What granularity for EOL DOIs and when are they issued? • Given a large project with aircraft, soundings, radars, model output and satellite data do we: – – – – Assign a DOI for each data file? Assign one DOI for all datasets for the project? Assign separate DOIs for datasets from each major platform? What about ancillary data? Do we assign DOIs or does the providing institution? • We are thinking to assign DOIs for each major platform data associated with the project (e.g. C-130, S-Pol), outside datasets that we have “value-added”, and data for which no DOI exists • It may be beneficial to only issue DOIs when processed data are released so as to prevent pubs from referencing preliminary data Gary Strand Data QC Nicole Kaplan The LTER NIS 2000 K.S. Baker, B.J. Benson, D.L. Henshaw, D. Blodgett, J.H. Porter, S.G. Stafford. (2000) Evolution of a Multisite Network Information System: The LTER Information Management Paradigm. BioScience. 50(11) 963-978.Nicole Kaplan, CSU - Long-Term Management of Ecological Data - April 2012, UCAR Nicole Kaplan The LTER NIS 2011 Nicole Kaplan, CSU - Long-Term Management of Ecological Data - April 2012, UCAR Barb Losoff Results of CU Faculty Survey About Data Curation • Many researchers had curation plans for their data • Many had orphan data without curation plans • Few departments had procedure for data preservation, some participated in disciplinary based repositories supporting long-term storage • Receptivity to a library role in data curation fell more in-line with the researchers disciplinary culture or philosophy regarding data sharing and collaborative projects. Ruth Duerr 21 Lynn Yarmey 22 Ted Habermann Citations in the Bigger Picture Ted Habermann, NOAA/NESDIS/NGDC, NASA/ESDIS Data preservation is communicating with the future Ted Habermann Metadata Types and Sharing User Discovery Portal User Community Metadata Collections Discovery Use / Mashup Understanding More documentation is required for understanding data than discovering or using it. Tim Killeen 25 Steve Worley Current Practices @ NCAR’s Research Data Archive Metrics Usage - Sample 37% of Users are from US Now exporting 25+ TB monthly Track User activity: - who accessed what and when Subsetting, in general, is +500 requests/month 26 Bridging Data Lifecycles, April 5-6, 2012 Dan Kowal Annual Reporting Example - 294,337 visits (browser/user only) - 14,658 unique visitors - 9.27 pages/visit - 6:45 avg. duration Most Accessed out of 28 Data Sets: * SPIDR NODES Dan Kowal, Data Administrator Leonard Sitongia NCAR Mauna Loa Solar Observatory Pubs. 28 Steve Worley Dataset Family Tree Example Global and Regional Atmospheric and Ocean Re-analyses NCEP/NCAR, NARR, ERA-40, ERA-Interim, 20CR, OARCA NOC Surf. Flux (1973-2009) WASwind (1950-2009) Etc. Ocean Clouds (1900-2010) JMA SST (1871-2011) HadSLP (1871-2011) HadISST (1871-2011) NOAA OI SST (1981-2011) NOAA ERSST (1854-2011) International Comprehensive Ocean Atmosphere Data Set (ICOADS) Global marine surface observations (1662-2011) 29 How to Get Started • • • • • Know what you want to achieve Know your identifier options Engage stakeholders Start with well-bounded cases Plan for the long-term implications – How to maintain – How to count 30 Thank You Workshop agenda and presentations: http://library.ucar.edu/data_workshop/ Email: [email protected] 31 END 32