Download DataCube Discussion7/29

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Pattern recognition wikipedia , lookup

Theoretical computer science wikipedia , lookup

Data analysis wikipedia , lookup

Neuroinformatics wikipedia , lookup

Corecursion wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Earth Cube Discussion7/29
DSS perspective
Data Life Cycle, ACCI Report
and in many others
RDA functions are:
• Document
• Organize
• Protect
• Access
Already Enables
reuse/repurposing and
provides RoI.
DSS Starting Points
• Large and small diverse data collections
– Diverse in: content, formats, parameter
consistency, resolutions, etc.
• Uniform metadata standard managed in a DB
framework (GCMD)
– Facilitates complete cross-RDA discovery
– Harvest content metadata for most data files
• Provides user selectable choices => files that satisfy
constraints plus drive “fast/efficient” data extraction
DSS Starting Points
• Large and/or high impact science data categories
– TIGGE, Operational NWP Analyses, Seven Reanalyses
•
•
•
•
Serve both “big” and “small” science
Measure re-use for individual RDA entities
Scalable archiving and curation (RDA-DAMS)
Broad service environment: web, DAV, HPC
– Foundation for data-centric CI services
• Leverage CISL resources, DAV, HPC, GLADE, HPSS
DSS Starting Points
ACCI Task Force Report, pg. 24, Data Management
Guidelines
“What does ‘good’ look like?” –Develop standards
for data management policy
CISL RDA knows exactly what good looks like.
• Track record for 40+ years
• Transitioned many IT and service
implementations
– Media migration, servers, networking, metadata, etc.
DSS – what is next?
• ACCI impact (pg. 15) improved NSF sponsored
infrastructure to accelerate scientific start up, get greater
return on investment –less infrastructure required by
individual researchers at Universities, less duplicity of
effort.
• Dissolve the data format barrier
– Conversion tool library (everyone receives formats
and resolutions they want)
• Factor in multi-disciplinary metadata vocabulary translation
• Near-immediate user selected data extraction
from TB+ datasets (research to operations)
– HPC, parallel processing and I/O, fast storage
– Better support for both “big” and “small” science
DSS – what is next?
• Create API and web service protocol standard
with other climate data centers, in collaboration
with industry application developers.
• Step beyond NCAR home-grown data portals
• Easier to serve larger community
• Community designed applications that draw RDA
assets
• Allows for broader cross-disciplinary scalability
– Metadata DB interworkability
– Application driven interworkability
DSS – what is next?
• Create generalized archiving tool for initial
research data dumps by individual scientists (i.e.
raw model output or observations that may not
be suitable for long-term curation)
– Standardized storage structure
– Basic descriptive information stored in metadata
databases
– Allows others to discover/identify data without the
need for “inside knowledge” –open access
– Mitigates problem of orphaned data
– Could be implemented with cloud storage services