Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University of Southampton, U.K. [email protected] © S.J. Coles 2006 Funding Body Viewpoint © S.J. Coles 2006 Supporting Small Laboratory Working Practice “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006) © S.J. Coles 2006 The Information Environment Institutional Data Sources © S.J. Coles 2006 A Data-Rich Subject – the Crystallography Problem 1.5,000,000 Cl Cl N Cl O O Cl + Cl N O OCl O Cl Cl Cl O O + N O Cl O Cl N Cl Cl N O N 30,000,000 450,000 © S.J. Coles 2006 Data and Information Loss © S.J. Coles 2006 Open Access as the Answer? © S.J. Coles 2006 Separating Data from Interpretations Underlying Intellect & Interpretation data © S.J. Coles 2006 Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource Data creation / discovery, linking, capture / embedding gathering: laboratory Data analysis, Aggregator experiments, transformation, services: national, Grids, fieldwork, mining, modelling commercial surveys, media Harvesting metadata Research & e-Science workflows Validation Deposit / selfarchiving Repositories : institutional, e-prints, subject, data, learning objects The scholarly knowledge cycle. Liz Lyon, eBankUK article. Ariadne, July 2003. Validation Publication Linking Data curation: databases & databanks Peer-reviewed publications: journals, conference proceedings © S.J. Coles 2006 eBank-UK and the eCrystals Repository © S.J. Coles 2006 Workflow Capture and Analysis RAW DATA DERIVED DATA RESULTS DATA © S.J. Coles 2006 The eCrystals Data Archive http://ecrystals.chem.soton.ac.uk © S.J. Coles 2006 Access to the underlying data © S.J. Coles 2006 Metadata Publication • Using simple Dublin Core • Crystal structure • Title (Systematic IUPAC Name) • Authors • Affiliation • Creation Date • Additional chemical information through Qualified Dublin Core • Empirical formula • International Chemical Identifier (InChI) • Compound Class & Keywords • Specifies which ‘datasets’ are present in an entry • DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145 • Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html • Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ © S.J. Coles 2006 Metadata and Data Quality Control Data manipulation toolbox Associated Metadata Value added Format conversion © S.J. Coles 2006 Harvesting & Aggregating: Google Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k © S.J. Coles 2006 Harvesting: OAIster © S.J. Coles 2006 Linking and aggregating © S.J. Coles 2006 Embedded in a science portal © S.J. Coles 2006 The Repository for the Laboratory – R4L © S.J. Coles 2006 Repositories Supporting Laboratory Working Practice • eBank-UK / eCrystals concentrates on the dissemination of data compiled once a study is complete – ideal for complex studies • Still a need to capture data from ‘single shot’ experiments on small laboratory instruments • To fully assure quality and accuracy of metadata it is essential to capture and describe data at the point when it is generated • Solution: A repository with the potential to store data and metadata as they are generated in the laboratory • Added Bonus: A repository can manage data and provide automated report generation and data analysis tools © S.J. Coles 2006 Laboratory Repositories and Information Management © S.J. Coles 2006 Workflow Analysis Researcher, Compound, Experiment type, Timestamp Sample preparation Deposit current dataset Data acquisition Analyse: Refine experiment? Complete experiment deposit © S.J. Coles 2006 The R4L Repository Create new compound Add experiment data and metadata Deposit Search / Browse © S.J. Coles 2006 The eCrystals Federation Model Data creation & capture in “Smart lab” Data discovery, linking, citation Data analysis, transformation, mining, modelling Presentation services: portals Search, harvest Aggregator services Harvest Deposit e-Research workflows Institutional data repositories Laboratory repository Validation Data curation & preservation: databases & databanks Deposit Validation Publication Linking, citation Publishers: peerreview journals, conference proceedings © S.J. Coles 2006