Download sensor_bp_IMC2013 - LTER Information Management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Considering best practices in
managing sensor data
Don Henshaw
H.J. Andrews Experimental Forest
Ecological Information Management
(EIM) 2008
LTER Information Management Committee Meeting, July 23-25, 2013
COMMON THEMES FROM PARTICIPATING SITES
JOINT NERC ENVIRONMENTAL SENSOR NETWORK/SENSOR NIS WORKSHOP,
HUBBARD BROOK EXPERIMENTAL FOREST, NH, OCTOBER 25-27TH, 2011
 Greatest
Needs
 Middleware
between sensor/data logger and
database/applications
 Programming support
 Training workshops to disseminate knowledge & solutions
 Ways to share experiences with software and tools that are
useful
 Clearinghouse for sharing code and solutions
 Knowledge Base (web page) organized by topics
(http://wiki.esipfed.org/index.php/EnviroSensing_Cluster)
LTER Information Management Committee Meeting, July 23-25, 2013
Joint NERC Environmental Sensor Network/LTER SensorNIS Workshop, October 25-27th, 2011
ESIP EnviroSensing Cluster:
Building a sensor network resource guide through
community participation
o Online resource guide outline
•
•
•
•
•
•
Sensor, site, and platform selection
Data acquisition and transmission
Sensor management, tracking, documentation
Streaming data management middleware
Sensor data quality assurance/quality control (QA/QC)
Sensor data archiving
Software Tools for Sensor Networks, April 23-26, 2013
Sensor, site, and platform selection
o Problem statement
• Vast array of possible sensor/hardware packages for
multiple science applications
• Communication among PI’s, techs, and specialists
o work together in considering options and planning
• Deployment may be based on interacting factors
o e.g., permitting, geography, access
• Considerations:
o seasonal weather patterns, power sources,
communications options, land ownership,
distance from managing institution,
available personnel/expertise, and
potential expansion/future-proofing
LTER Information Management Committee Meeting, July 23-25, 2013
Data acquisition and transmission
Problem statement
• Manual downloads of environmental sensor
data may not be sufficient to assure data
security or data integrity, or allow direct control
of devices
• Considerations:
o
o
o
o
o
o
o
need for immediate access
need for one- or two-way transmission methods
bandwidth requirements to transfer the data
need for line-of-site communication or repeaters
hardware and network protocols
power consumption of the system components
physical and network security requirements
LTER Information Management Committee Meeting, July 23-25, 2013
Sensor management, tracking,
and documentation
Problem statement
• Documentation of field procedures need to be
sufficient to withstand personnel changes over time
• Noted sensor issues and problems need to be quickly
communicated among field technicians, lead
investigators and data managers
• Sensor histories are typically tracked in field notebooks
or field check sheets and are essential for internal
review of data streams, but are often inaccessible to
data handlers
• Noted field problems may provide insight into quality
control issues and data behavior and should be
captured in data qualifier flags
LTER Information Management Committee Meeting, July 23-25, 2013
SENSOR MANAGEMENT, TRACKING, DOCUMENTATION
• Develop protocols for installation, calibration,
maintenance, and removal of sensors
 Track sensor events and history
 Record
sensor events and failures, deployment
information, calibration events, maintenance history,
operational dates, etc.
 Record sensor descriptions, methodology changes,
sampling frequency, geo-location, photo points, etc.

Documentation
Standardize field notebooks or field checklists
 Build log files or databases for annotation of sensor
events, e.g.,
• Timestamp (or range), DataloggerID, SensorID,
event category, description and note taker of event

Software Tools for Sensor Networks Training, 1 May 2012
LTER Information Management Committee Meeting, July 23-25, 2013
Sensor data quality assurance
and quality control (QA/QC)
• Preventative QA measures in the field are desirable
• Automated QC is necessary for
o near real-time use of data
o efficient processing of high volume data streams
• Manual methods are unavoidable
o a hybrid QC system will include subsequent manual
inspection and additional QC checking
• QC system must
o provide qualifier flags to sensor data
o accommodate feedback to policies and procedures
o assure that all QC workflows are documented
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY ASSURANCE –
PREVENTATIVE MEASURES

Routine calibration and maintenance
 Anticipate
common repairs and replacement parts
 Record
known events that may impact
measurements

Continuous monitoring and evaluating
of sensor network
 Early
detection of problems
 Automated

alerts; in situ web cams
Sensor redundancy
 Ideal:
Triple the sensor, triple the logger!
 Practical:
Cheaper, lower cost, lower resolution
sensors, or correlated (proxy) sensors
 Alternative:
Datalogger-independent sensor spot
checks; portable instrument package
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA:
POSSIBLE QUALITY CONTROL CHECKS IN NEAR REAL-TIME

Timestamp integrity (Date/time)


Range checks




E.g., TMAX-TMIN>0, snow depth>snow water equivalence
Consistency of derived values
Variance checks




Sensor specifications - identify impossible values; not unlikely ones
Seasonal/reasonable historic values
Internal (plausibility) checks


Sequential, fixed intervals, i.e., checks for time step or frequency
variation
Sigma (standard deviation), Delta/step (difference of subsequent
pairs), change in slope checks
e.g., outlier detections, indicator of sensor degradation
Sensitivity is specific to site and sensor type
Persistence checks

Check for repeating values that may indicate sensor failure


E.g., freezing, sensor capacity issues
Spatial checks

Use correlations with redundant or nearby sensors, e.g., check for
sensor drift
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA:
DATA QUALIFIERS (DATA FLAGS)
Many vocabularies of data flags
 Good approach


Rich vocabulary of fine-grained flags for streaming data –
intended to guide local review


Simpler vocabulary of flags for “final” data for public
consumption, e.g.,


site-specific flags
‘Accepted’, ‘Missing’, ‘Estimated’, ‘Suspicious’, estimate uncertainty
Certain types of qualifiers may be better as data
columns
Method shifts, sensor shifts
 Place key documentation as close to data value as possible

Image from Campbell et.al.,
Bioscience, In Press.
SENSOR DATA ARCHIVING
 Archiving strategies




create well documented data snapshots
assign unique, persistent identifiers
maintain data and metadata versioning
store data in text-based formats
 Partner with community supported archives
 E.g.,
the LTER NIS, or federated archive initiatives such as
DataONE
 Best practices
develop an archival data management plan
 implement a sound data backup plan
 archive raw data (but they do not need to be online)
 make data publicly available that have appropriate QA/QC
procedures applied
 assign QC level to published data sets

LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA:
QUALITY LEVELS
Quality control is performed at multiple levels
 Level 0 (Raw streaming data)




Raw data, no QC, no data qualifiers applied (data flags)
Preservation of original data streams is essential
Level 1 (QC applied, qualifiers added)

Provisional level (near real-time preparation)


Published level (delayed release)


if released, provisional data must be labeled clearly
QC process is complete; data is unlikely to change
Level 2 (Gap-filled or estimated data)
Involves interpretation – may be controversial
 Desirable when generating summarized data, but
transparency critical – flag estimated values

LTER Information Management Committee Meeting, July 23-25, 2013
Streaming data management middleware
o Definition/Purpose
• “Middleware” in conjunction with sensor networks is
computer software that enables communication and
management of data from field sensors to a client such as a
database or a website
• Purpose of middleware includes the collection, analysis,
and visualization of data
• Middleware is chained together into a scientific workflow
• Examples:
o Read, reformat, export of different data types or
structures (input/output)
o Automated QA/QC on data streams
o Integration of field notes and documentation with the
data
o Archiving
LTER Information Management Committee Meeting, July 23-25, 2013
Streaming data management middleware
o Middleware/software – Proprietary
• Campbell Scientific LoggerNet
o functionality to set up and configure a network of loggers
o tools to program, visualize, monitor, and publish data
• Vista Engineering: Vista Data Vision (VDV)
o tools to store and organize data from various loggers
o visualization, alarming, reporting, and web publishing features
• YSI EcoNet (for YSI monitoring instrumentation)
o delivery of data from the field to the YSI web server
o visualization, reports, alarms, and email notification tools
• NexSens: iChart
o Windows-based data acquisition software package
o interfaces with popular products such as YSI, OTT, ISCO sensors
LTER Information Management Committee Meeting, July 23-25, 2013
SENSOR DATA MANAGEMENT MIDDLEWARE
OPEN SOURCE ENVIRONMENTS FOR STREAMING DATA

Matlab GCE toolbox (Proprietary/ limited open source)


Open Source DataTurbine Initiative


GUI, visualization, metadata-based analysis, manages QA/QC
rules and qualifiers, tracks provenance
Streaming data engine, receives data from various sources
and sends to analysis and visualization tools, databases, etc.
Kepler Project (open source)

GUI, reuse and share analytical components/workflows with
other users, tracks provenance, integrates software
components and data sources
SENSOR MANAGEMENT BEST PRACTICES WORKSHOP
PARTICIPANTS
Don Henshaw (AND) - organizer
Corinna Gries (NTL) - organizer
Renee Brown (SEV)
Adam Kennedy (AND)
Richard Cary (CWT)
Mary Martin (HBR)
Christine Laney (UTEP, JRN)
Jennifer Morse (NWT)
Chris Jones (DataONE)
Branko Zdravkovic (Univ of Saskatchewan)
Scotty Strachan (Univ of Nevada-Reno)
Jordan Read (USGS) - vtc
Wade Sheldon (GCE) - vtc
LTER Information Management Committee Meeting, July 23-25, 2013