Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“curator” DB design Curator meeting, GFDL, Sep 20 Why RDBMS A lot of information: Model metadata Experiments metadata Institution/user metadata Data metadata Mostly it’s in textual form Information is internally linked tightly that can be easy to express by means of relational databases. Relational databases have well developed means for searching and extracting procedures (SQL query language and program interfaces for any language) as for local as well as for remote user. Very reliable, safety technology. 2 Curator meeting, GFDL, Sep 20 Desirable Features of Model Data Factory Relational Database storing metadata, containing description of XML as data exchange format model components and model configuration scenarios postprocessing (model output and CMOR) directives experiments variables formalized rules of Quality Control data locations task scheduler users and groups accounts for compliance with FRE working format of existing third party software good fitted for hierarchical metadata description prevalent in world, easy to exchange with others Data Portals Model Builder (FMS Runtime Environment in GFDL) checks out available model components from DB chooses model datasets from DB sets postprocessing directives checks components and configurations compatibility builds executable application and runs it write metadata about experiment into DB (model configuration, scenario, project, organization/user, postprocessing) 3 Curator meeting, GFDL, Sep 20 Desirable Features of Model Data Factory (continue) Climate Model Output Rewriter (CMOR) subsystem prepares data consistently with specific project requirements Data Publisher transfer data to Data Portal storage in accordance to settings from DB Data Portal Software Package Configuration Manager (configures Aggregation Server and Data Portal Interface) Search Catalog Engine Data Subsampling Engine Data Computation Engine Data Visualization Data Delivery Manager 4 Curator meeting, GFDL, Sep 20 Standard scenario of functioning Model Data Factory (ideal picture) Scientist builds model in FRE using available model components, datasets and forcing scenario. FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB. Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR. CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions. DP calls QAC and then transfers data to Data Portal storage. Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB. End of process, data is ready to go. 5 Curator meeting, GFDL, Sep 20 Common functionality schema of ‘Model Data Factory’ 6 Curator meeting, GFDL, Sep 20 Database ‘curator ’ design Database Compartments: Model Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configuration Variables Compartment List of all related physical variables Workflow Compartment contains scenarios, experiments, institutions, projects and users info Postprocessing Compartment defines postprocessing plan for conducting experiment Data Portal Compartment contains info about experiments data 7 Curator meeting, GFDL, Sep 20 MySQL DB CURATOR 8 Curator meeting, GFDL, Sep 20 Model Metadata Compartment (in development) Coupled_Models Component_Medias Model_List Models Workflow Compartment Experiments Variables Compartment Variables 9 Curator meeting, GFDL, Sep 20 Data Samples from Model Compartment Coupled_Models Components_Medias Model_List Models 10 Curator meeting, GFDL, Sep 20 Variables Compartment Variables Variable_Bundles Variable_Lists Variable_List_Contents Workflow Compartment Proj_Var_Names Projects 11 Curator meeting, GFDL, Sep 20 Data Sample from Variables Compartment Proj_Var_Names Variable_Lists Variables Variable_List_Contents Variable_Bundles 12 Curator meeting, GFDL, Sep 20 Workflow Compartment (in development) Institutions GFDL_USERS Experiment_Status Experiments Projects Scenarios Realization 13 Curator meeting, GFDL, Sep 20 Data Samples from Workflow Compartment Scenarios Experiments 14 Curator meeting, GFDL, Sep 20 Postprocessing Compartment Post_Proc PP_Units Coupled_Models Projects GFDL_USERS PP_Content Average_Periods Variable_Lists Data Samples from Postprocessing Compartment PP_Units PP_Content 15 Curator meeting, GFDL, Sep 20 Data Portal Compartment Data_Files Data_Grids MissedData_Descriptors Coupled_Models Variable_Bundles Experiments Variables Curator meeting, GFDL, Sep 20 16 Data Samples from Data Portal Compartments Data_Files MissedData_Descriptors Data_Grids 17 Curator meeting, GFDL, Sep 20 “curator” DB is in use now: CM2.0 CM2.1 18 Curator meeting, GFDL, Sep 20 Future Development Bring DB terms to conventional terminology. Set up model metadata schema standards and create tables in “curator” DB following this schema. Fill these tables with real metadata extracted from models of GFDL, CCSM, MIT and from ESMF Component Database. Implement tables for observation data metadata. Implement DODS aggregated data support. Build XML bridge for XML transcoding DB input/output 19 Curator meeting, GFDL, Sep 20 END Questions? Suggestions? Objections? Thanks! 20 Curator meeting, GFDL, Sep 20