Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
2010-2011 NEON Computer Science Clinic Common Data Services for Ecological Data NEON NEON, the National Ecological Observatory Network, is a NSF-funded project to collect and manage data from across the U.S. NEON's mission is to enable understanding and forecasting of the impacts of climate change, land-use change and invasive species on continental-scale ecology -by providing infrastructure and consistent methodologies to support research and education in these areas. Problem Statement System Design This project uses JUnit to automate the testing of software deliverables. A set of unit tests cover the data-access API call, getData, as well as getConfig, a configuration-access API call. Both edge cases and common scenarios are considered. Ecological studies Community Environmental modeling / monitoring data products Applications Testing The majority of these tests use data from a test database to verify their correctness. To support these tests, the team has written and provided scripts to populate the test database. Data formatting / visualization Data gap-filling / smoothing / aggregation Common Data Services Data NEON's SQL database Deliverables other data files Thus, the team is delvering to NEON: raw data from towers, scientists, others... NEON's mission is to make its large, varied datastores available to the scientific community. In addition, NEON application developers – and applications -- need the ability to access data systematically and without worrying about the details of the model by which the data is stored. NEON organizes the U.S. into 20 ecoclimatic domains. Data Sources NEON’s Computational Infrastructure (CI) systems will handle a wide variety of data from different soruces and formats: • The Fundamental Instrument Unit (FIU) monitors physical and chemical climate properties such CO2 and moisture. NEON is designing and deploying observation platforms to provide such data. Those data will be collected into NEON's large SQL database at its headquarters in Boulder, CO. • The Fundamental Sentinel Unit (FSU) collects specimens of local species and data on biodiversity and populations. • The Airborne Observation Platform (AOP) analyzes changing land use, vegetation cover, and species migration. • The Land Use Analysis Package (LUAP) compiles and assess historical data, much of which is in NetCDF format. Thus, NEON asked the HMC clinic team to design and prototype a Common Data Services (CDS) software layer. The CDS provides application developers a consistent, extensible abstraction through which to access NEON's data, whether its database or files. The team has also documented and tested its final, deliverable CDS system. • A hibernate-based system that traverses NEON's database and returns Java objects encapsulating data/configurations. Given a query for data, the Common Data Services (CDS) layer provides a list of NetCDF files as its primary output method. If desired, the user can choose to access the “bare” data, provided as a Java object, instead of the file. NetCDF files generated by the CDS are lazily created: they aren’t generated until the user requests them via a function call. This saves time and disk access in the case that the calling application isn’t interested in them. Database Traversal NetCDF Example: getData( "Mauna Loa, HI", “Raw CO2” ) NetCDF is a community-standard scientific data file format supported by a large number of existing applications and well-maintained libraries. Because the ecological community and NEON itself use NetCDF already, we chose NetCDF as our common intermediate data format. The Common Data Services layer outputs the results of a query in NetCDF form, whether the data are stored in a flat file or NEON's database. Hibernate Hibernate is a Java-based object/relational mapping library that allows developers to treat database tables as objects. Thus, Hibernate simplifies the process of reading databases: instead of constructing complex SQL queries, the CDS traverses a collection of interconnected Java objects. A NEON observation platform for collecting FIU data. The NEON data model, accessed through Hibernate In addition, we use the Hibernate Spatial and the JTS Topology Suite in order to access and manipulate geometric objects from the database. These libraries also offer a large set of transformations, such as polygon intersections and area computations, on arbitrary geometric objects. • A netCDFHandler class that can write Java DataResult objects into NetCDF files. This file-creation is done lazily. • An overall CDS layer that thus offers NEON's application developers a consistent interface -- NetCDF files -- whether the data is stored in NetCDF or in NEON's database. If desired, developers can access the underlying Java objects. • The clinic team has also created a javascript/PHP interface to highlight an example of the type of application CDS might support. It uses historic CO2 data from Mauna Loa, Hawaii. • The provided CDS system is necessarily a prototype, because NEON and its data handling policies are evolving during this early phase of its deployment. The clinic’s CDS system offers a flexible foundation layer to which NEON and other developers will add additional capabilities in order to monitor, maintain, and investigate the nation’s largest store of ecological data. Acknowledgments Team Members Jason Garrett-Glaser '11 Keith Ingram '11 (PM) Alejandro Lopez-Lago '11 HamsterBob Stewart '11 NEON Liaisons Robert Tawa DJ Spiess Faculty Advisor Zachary Dodds