Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Database model wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
SB/DMG/V1.0/8-00 Soil Biodiversity NERC Thematic Programme NERC Soil Biodiversity Programme Data Management Guidelines for Projects The Soil Biodiversity (SB) Programme’s central database has been set up to store and maintain all definitive datasets arising from the Programme’s research, including baseline/background data from the Sourhope site, plus project datasets and associated metadata. However, during the process of data collection, processing, analysis and interchange, projects are responsible for the management of their own data. Projects are strongly encouraged to follow some basic principles of data management and quality assurance to ensure i) that datasets are sufficiently well structured and contain enough meta-information to be fully understood for analysis and interpretation by their own and other project teams, both now and in the longer term, and (ii) that datasets can be integrated across the Programme within the final database. The following are some general data management guidelines; the numbered points relate to those listed in the Data Management Plan pro-forma attached to the Quality Plan: 1. Data Managers. Ideally, a single person should have overall responsibility for project data storage, management, integrity, back-up, and liaison with central database at CEH Merlewood. If a project is composed of a number of collaborating groups or sub-projects, then, in addition, a nominated person within each group should be responsible for managing its constituent datasets. Data managers should maintain the ‘definitive’ version of a dataset, log its development, and handle access requests. 2. Data Management System. Ideally, data should be structured and maintained within an on-line, centrally maintained database system, where resources and expertise allow. Whether within a database, spreadsheet, or ASCII file format, data and meta-data must be properly maintained, documented and backed-up. Systems and formats should be commonly available de-facto standards such as Oracle, MS Access, Excel, with Arc/Info or ArcView for GIS data. Where specialised formats are necessary, these should be capable of being exported into a portable form for data export. Back up and security procedures should be implemented for all data, whether in digital or analogue form, to protect data from loss or corruption. Copies of data can be stored in fire-proof cabinets and/or taken off-site to minimise risk. A versioning system of dataset/database table back-ups should be devised where datasets are subject to change and development. 3. Meta-data. All datasets should be fully documented. Meta-data should include at least those categories defined in the SB data transfer template, which themselves have been selected from established meta-data standards as a minimum requirement. These include dataset identifier and description, position in time and space, variable descriptions, units of measurement / category descriptions (including species naming and coding systems), missing value codes and reference to documented methods used to generate those variables. The development of a dataset should be documented and any transformations logged so that its derivation can be traced for interpretation or error-tracking. SB/DMG/V1.0/8-00 4. Referencing System. The standard referencing system described in the SB Sampling Protocol should be used to label samples and to uniquely identify data records so that they can be linked back to the sampling/measurement activity at Sourhope in space and time, and so that results can be linked across projects. The SB data transfer template demonstrates this principle. In particular, it is most important that the Sampling Unit ID (SUID) accompany any sample extracted or derived from that original Sampling Unit, and any analytical/experimental results generated from it. Projects should devise and document their own system for unique coding of subsequent sub-samples or bulked samples, which links back to the original SUID. 5. Quality Assurance of data: i. Data entry: Ensure samples are labelled clearly and data entered correctly. Ideally, a second person should double-check labelling during sampling, and data should be ‘double-punched’ data when keyed-in directly. ii. Data Validation: Perform at least some basic validation checks on the data, for example range checks (max and min constraints), internal consistency checks (e.g. count of samples taken from different locations against field forms), format checks (e.g. invalid dates). Use a 4-figure year for dates. iii. Error Handling: The following are some general recommendations: If a known error, e.g. instrumentation failure, then, if possible, repeat the measurement (e.g. re-analyse sample), or if not repeatable (e.g. hourly temperature from an AWS), then set the value to null or to an established missing value code. Ensure that these codes are properly documented and handled during analysis. If a known to be a valid outlier, for example due to extreme conditions, then ‘flag’ as such in the meta-data. If reason for error is unknown, and back-checking doesn’t reveal a correction, then keep the value, but ‘flag’ it as suspect in the meta-data. Decisions on how to handle this value will need to be made on analysis. Ensure that this meta-data is sent with the dataset on transfer to the central database. 6. Data Exchange and Transfer: An Excel data transfer template has been devised for the Programme which provides a structure for SB datasets and associated meta-data. The meta-data fields are the minimum considered necessary to allow others understand and interpret the data. This template should be used to transfer data to the central database at CEH Merlewood. However, where projects already hold their data in structured databases, export of database tables containing the relevant data and meta-data would be appropriate. Email attachment is the preferred method of data transfer, but where files exceed attachment size restrictions at sites, then FTP or CD-ROM is appropriate. All files should be scanned by up-to-date virus checking software before transfer; likewise, any incoming files should be scanned before opening. 7. Data policy. The position with respect to intellectual property rights (IPR) on data and terms and conditions of access for NERC Thematic Programme data are summarised in the Soil Biodiversity Data Policy (see www.nmw.ac.uk/soilbio/data_policy.htm). Each project is encouraged to devise a data policy for its own data (concordant with the SB/DMG/V1.0/8-00 Programme data policy) and procedures for sharing and release of data to third parties. These procedures may need to include licensing arrangements in order legally to protect data from mis-use, such as distribution of data to other parties or exploitation without reference to originators. In turn, projects should be careful to comply with the terms and conditions attached to the use of any datasets belonging to third parties, for example Ordnance Survey (OS) data (it is worth noting that HEIs and NERC Centre/Surveys are subject to different terms and conditions for use of OS data). M Lane and D Caffrey 22/8/2000