Download Guidelines on Data Management for Soil Biodiversity Programme

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Forecasting wikipedia , lookup

Database model wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
SB/DMG/V1.0/8-00
Soil
Biodiversity
NERC Thematic Programme
NERC Soil Biodiversity Programme
Data Management Guidelines for Projects
The Soil Biodiversity (SB) Programme’s central database has been set up to store and
maintain all definitive datasets arising from the Programme’s research, including
baseline/background data from the Sourhope site, plus project datasets and associated metadata. However, during the process of data collection, processing, analysis and interchange,
projects are responsible for the management of their own data. Projects are strongly
encouraged to follow some basic principles of data management and quality assurance to
ensure i) that datasets are sufficiently well structured and contain enough meta-information to
be fully understood for analysis and interpretation by their own and other project teams, both
now and in the longer term, and (ii) that datasets can be integrated across the Programme
within the final database.
The following are some general data management guidelines; the numbered points relate to
those listed in the Data Management Plan pro-forma attached to the Quality Plan:
1. Data Managers. Ideally, a single person should have overall responsibility for project
data storage, management, integrity, back-up, and liaison with central database at CEH
Merlewood. If a project is composed of a number of collaborating groups or sub-projects,
then, in addition, a nominated person within each group should be responsible for
managing its constituent datasets. Data managers should maintain the ‘definitive’ version
of a dataset, log its development, and handle access requests.
2. Data Management System. Ideally, data should be structured and maintained within an
on-line, centrally maintained database system, where resources and expertise allow.
Whether within a database, spreadsheet, or ASCII file format, data and meta-data must be
properly maintained, documented and backed-up. Systems and formats should be
commonly available de-facto standards such as Oracle, MS Access, Excel, with Arc/Info
or ArcView for GIS data. Where specialised formats are necessary, these should be
capable of being exported into a portable form for data export.
Back up and security procedures should be implemented for all data, whether in digital or
analogue form, to protect data from loss or corruption. Copies of data can be stored in
fire-proof cabinets and/or taken off-site to minimise risk. A versioning system of
dataset/database table back-ups should be devised where datasets are subject to change
and development.
3. Meta-data. All datasets should be fully documented. Meta-data should include at least
those categories defined in the SB data transfer template, which themselves have been
selected from established meta-data standards as a minimum requirement. These include
dataset identifier and description, position in time and space, variable descriptions, units
of measurement / category descriptions (including species naming and coding systems),
missing value codes and reference to documented methods used to generate those
variables. The development of a dataset should be documented and any transformations
logged so that its derivation can be traced for interpretation or error-tracking.
SB/DMG/V1.0/8-00
4. Referencing System. The standard referencing system described in the SB Sampling
Protocol should be used to label samples and to uniquely identify data records so that they
can be linked back to the sampling/measurement activity at Sourhope in space and time,
and so that results can be linked across projects. The SB data transfer template
demonstrates this principle. In particular, it is most important that the Sampling Unit ID
(SUID) accompany any sample extracted or derived from that original Sampling Unit,
and any analytical/experimental results generated from it. Projects should devise and
document their own system for unique coding of subsequent sub-samples or bulked
samples, which links back to the original SUID.
5. Quality Assurance of data:
i. Data entry: Ensure samples are labelled clearly and data entered correctly. Ideally, a
second person should double-check labelling during sampling, and data should be
‘double-punched’ data when keyed-in directly.
ii. Data Validation: Perform at least some basic validation checks on the data, for
example range checks (max and min constraints), internal consistency checks (e.g.
count of samples taken from different locations against field forms), format checks
(e.g. invalid dates). Use a 4-figure year for dates.
iii. Error Handling: The following are some general recommendations:
 If a known error, e.g. instrumentation failure, then, if possible, repeat the
measurement (e.g. re-analyse sample), or if not repeatable (e.g. hourly
temperature from an AWS), then set the value to null or to an established missing
value code. Ensure that these codes are properly documented and handled during
analysis.
 If a known to be a valid outlier, for example due to extreme conditions, then ‘flag’
as such in the meta-data.
 If reason for error is unknown, and back-checking doesn’t reveal a correction,
then keep the value, but ‘flag’ it as suspect in the meta-data. Decisions on how to
handle this value will need to be made on analysis. Ensure that this meta-data is
sent with the dataset on transfer to the central database.
6. Data Exchange and Transfer: An Excel data transfer template has been devised for the
Programme which provides a structure for SB datasets and associated meta-data. The
meta-data fields are the minimum considered necessary to allow others understand and
interpret the data. This template should be used to transfer data to the central database at
CEH Merlewood. However, where projects already hold their data in structured
databases, export of database tables containing the relevant data and meta-data would be
appropriate. Email attachment is the preferred method of data transfer, but where files
exceed attachment size restrictions at sites, then FTP or CD-ROM is appropriate. All files
should be scanned by up-to-date virus checking software before transfer; likewise, any
incoming files should be scanned before opening.
7. Data policy. The position with respect to intellectual property rights (IPR) on data and
terms and conditions of access for NERC Thematic Programme data are summarised in
the Soil Biodiversity Data Policy (see www.nmw.ac.uk/soilbio/data_policy.htm). Each
project is encouraged to devise a data policy for its own data (concordant with the
SB/DMG/V1.0/8-00
Programme data policy) and procedures for sharing and release of data to third parties.
These procedures may need to include licensing arrangements in order legally to protect
data from mis-use, such as distribution of data to other parties or exploitation without
reference to originators. In turn, projects should be careful to comply with the terms and
conditions attached to the use of any datasets belonging to third parties, for example
Ordnance Survey (OS) data (it is worth noting that HEIs and NERC Centre/Surveys are
subject to different terms and conditions for use of OS data).
M Lane and D Caffrey
22/8/2000