* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Geographic information system wikipedia , lookup
Neuroinformatics wikipedia , lookup
Theoretical computer science wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Pattern recognition wikipedia , lookup
Data analysis wikipedia , lookup
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia Overview Current statistical production cycle in SORS Using the metadata in Blaise applications The role of metadata in automatic editing system in SAS Metadata connected with the data in Oracle data warehouse Lessons learnt Questions Current statistical production cycle Entry and micro editing (Blaise) Macro and statistical editing (SAS) Storing and analysis (Oracle) Dissemination (PC-Axis) Central metadata stores (Klasje & Metis) Using the metadata in Blaise applications Generation of (high speed) data-entry applications using Gentry (using by nonIT personnel) Metadata-based transformations between different data structures (EXTRA-FAT, FAT, THIN) Gentry – tool for generation of the Blaise data-entry application Questionnaire structure and layout (name, blocks, tables, routing etc.) Field characteristics (length, data type, constants, other parameters) Data type Field characteristics Gentry – example of generated application header section Data entry for table 12 Transformations All data for one unit (provider) in one row (EXTRA FAT): suitable for micro editing PROVIDER Industry SizeClass Provider 1 A big QuantityProduct1 ValueEURProduct1 QuantityProduct2 ValueEURProduct2 150 300 200 400 Metadata-based transformation in Blaise Classification and continuous variables in the columns (FAT): suitable for analysis PROVIDER PRODUCT Industry SizeClass Provider 1 Product 1 A big Provider 1 Product 2 A big Quantity ValueEUR 150 200 300 400 Metadata-based transformation in SAS Classification variables in the columns and continuous variables in the rows (THIN) PROVIDER Provider 1 Provider 1 Provider 1 Provider 1 PRODUCT Product 1 Product 2 Product 1 Product 2 Industry A A A A SizeClass big big big big ContVariables ContObservations Quantity 150 ValueEUR 300 Quantity 200 ValueEUR 400 The role of metadata in automatic editing system in SAS General system for automated editing Process metadata The role of metadata in automatic editing system in SAS In order to be general the tool must be able to: - - - recognize the data which are due to be subjected to editing and/or imputation; recognize which editing method should be applied, and with what parameters Process indicators – level 1 Mode of data collection - - - 1 data provided directly by reporting unit 2 data from administrative source 3 data computed from original values 4 imputed data – imputation of non-response 5 imputed data – imputation due to invalid values detected through the editing process 6 data missing because the unit is not eligible for the item (logical skip) Process indicators – level 2 Data status - 1 original value 2 corrected value Process indicators – level 3 Method of data correction - 11 correction after telephone contact 12 data reported at a later stage Process indicators – level 3 Reporting methods - 11 reporting by mail questionnaire 12 computer assisted telephone interview(CATI) 13 telephone interview without computer assistance 14 paper assisted personal interview (PAPI) 15 computer assisted personal interview (CAPI) 16 paper assisted self interviewing 17 computer assisted self interviewing 18 web reporting Process indicators – level 3 Imputation methods - 10 method of zero values 11 logical imputation 12 historical data imputation 13 mean values imputation 14 nearest neighbour imputation 15 hot-deck imputation 16 cold-deck imputation 17 regression imputation 18 method of the most frequent value 19 estimation of anual value based on infraanual data 21 stochastic hot-deck (random donor) 22 regression imputation with random residuals 23 multiple imputation Process indicators examples - xy.zz 11.15 means: 1 - data provided directly by reporting unit 11 - original value 11.15 - computer assisted personal interview (CAPI) 42.19 means: 4 - imputed data – imputation of nonresponse 42 - corrected value 42.19 - estimation of anual value based on infraanual data Statistical process Blaise Blaise SAS Oracle Key responders Other units SAS Metadata connected with the data in Oracle data warehouse On-line access to: - - Historical data Data from different phases (not only final data) Data for multiple surveys (not only data marts) Statistical (variables & classifications) and process (time stamps, status indicators...) metadata connected with the data ...accessible for third-party tools Conceptual star scheme for SBS DIM_TIME DIM_NACE DIM_VARIABLE DIM_NUTS FACTS_SBS DIM_SOURCE OBS_VALUE WEIGHT ... THIN table design DIM_ORG_FORM DIM_INDICATOR DIM_OBS_UNIT CLASSIFICATION DIMENSIONS METADATA DIMENSIONS Metadata server Input tables Classification server Business register Input data Variables Sources... Imputation table Imputed data Loading Corrected data Uncorrect data Parameter query Transactional star Editing table Editing form Searching Automatic corrections Clean data Imputations Imputations Classifications Editing Manual corrections Control query Extracted data Extractions Analysis Analytical data Analytical cubes Analytical query Results Oracle Discoverer Lessons learnt The role of central repositories for metadata - Harmonisation of metadata concepts - Natural source of conceptual metadata Metadata have to be exact, complete and consistant Process metadata should be connected with the data Local metadata vs. global metadata The cultural change is needed Technical considerations - The possibilities for metadata exchange and system integration are good (XML, SQL) Questions