Download Data Management Needs and Challenges for Telemetry Scientists

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Database model wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Management Needs and
Challenges for Telemetry Scientists
Josh M London
Wildlife Biologist, Polar Ecosystems Program
National Marine Mammal Laboratory
NOAA NMFS Alaska Fisheries Science Center
Temptation to
identify biologists
as the source for
the raw data
The Tip of a Complex Iceberg
Publications
Contract reports
Status/Listing Review
derived products
movement model
data quality control synthesis
Data Management
Narrowing Bottleneck
Many biologists lack the
skills and training for
effective, scalable
database design and data
management practices
Deployment of tags (location, age/sex, time)
tag design/vendor
tag programming
opportunistic vs. planned
hypothesis agency needs/mandates
funding initiatives
Field Work
and
Study Design
Field Work & Tag Deployment
 When? Where?
 Which Tag/Vendor?
 Which Age? Which Sex?
(Do we have a choice?)
 Tag Programming
 Deployment Length
(attachment type)
Limited Tools for Managing Raw
Telemetry Data
‘raw’ data
 via Argos as CSV/Text
 Process w/ Vendor Software
(behavior data)
 Typically output as CSV
 Field data about animal (e.g.
ID, species, sex, age, health)
needs




Explore ‘raw’ data
Address hypotheses
Visualize movement/use
Synthesize w/ dependent (e.g.
health, age) and independent
data (e.g. other animals,
remote sensed)
Biologists Not Trained in Large Scale
Data Management
Biologists





Excel and/or Access
ESRI ArcMap (shapefiles)
Google Earth
Mouse Click Interaction
Programming (visual basic, R,
python) recipe driven … not
developers
Data Manager
 Postgres/PostGIS, Oracle,
MySQL, SQL Server
 Normalization and Efficient
Design
 Scripting, Jobs, Transactions
 Data Integrity
 Automation, Reproducible
My Perspective
To address complex questions related to marine mammal telemetry and
understanding animal ecology, I had to become more of a data manager
…And, in the process, I’ve become less of a biologist
Start (2006)
Current System
 Argos Monthly CDs
 SatPack Access Database
 Excel Files (limited to
56k)
 Large, Flat Tables
 No Central Repository





Nightly FTP Argos Push
Nightly Data Processing
CSV/External Oracle Table
PL/SQL Procedures
Developed/Designed with
Training via Google Search
My Perspective
Current Limitations
 Data access requires a minimum level of technical
skills (basic SQL, Oracle framework, Oracle APEX, R
spatial tools, ArcMap)
 Single Point of Access/Failure (me)
 Limited Documentation of Design
 Design May Not be Optimal/Appropriate
 Main Objective to Provide Data to Analysts – Not
necessarily designed for providing data to public
My Perspective
Greatest Needs – Research Program
 Data Management and Design Consultation
 Data Design & Documentation Portal
(user-friendly metadata)
 Low Tech Exploration Tools
 Database and Application Developers
(data flow and data input)
 Training Opportunities
My Perspective
Greatest Needs – External to Program?





Provide Meaningful Public Access to Data
A Clear Data Sharing Policy w/ Best Practices
Encourage/Facilitate Scientific Collaboration
Meet Agency Needs and Requirements
How to Communicate Scientific Knowledge in the
Modern/Digital Age–sharing knowledge/expertise just
as important as sharing data
 Publish Data Once
My Perspective
Challenges / Road Blocks
 Limited Funds and Priorities – appropriate resources
for doing the priority analysis and science not
available, let alone the resources to distribute data
responsibly
 Database design/management often in the hands of
the least skilled users
 IT Policies, Investments, and Infrastructure Varied
Across Institutions
 No standard(s) for communicating and sharing ‘raw’
animal telemetry data. What is ‘raw’ data?