Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Neuroinformatics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
USING THE METADATA IN
STATISTICAL PROCESSING CYCLE
– THE PRODUCTION TOOLS
PERSPECTIVE
Matjaž Jug, Pavle Kozjek, Tomaž Špeh
Statistical Office of the Republic of Slovenia
Overview






Current statistical production cycle in SORS
Using the metadata in Blaise applications
The role of metadata in automatic editing
system in SAS
Metadata connected with the data in Oracle
data warehouse
Lessons learnt
Questions
Current statistical production cycle





Entry and micro editing (Blaise)
Macro and statistical editing (SAS)
Storing and analysis (Oracle)
Dissemination (PC-Axis)
Central metadata stores (Klasje & Metis)
Using the metadata in Blaise
applications


Generation of (high speed) data-entry
applications using Gentry (using by nonIT personnel)
Metadata-based transformations between
different data structures (EXTRA-FAT,
FAT, THIN)
Gentry – tool for generation of the
Blaise data-entry application


Questionnaire
structure and
layout (name,
blocks, tables,
routing etc.)
Field
characteristics
(length, data type,
constants, other
parameters)
Data type
Field characteristics
Gentry – example of
generated application
header
section
Data entry for table 12
Transformations
All data for one unit (provider) in one row (EXTRA FAT): suitable for micro editing
PROVIDER Industry SizeClass
Provider 1 A
big
QuantityProduct1 ValueEURProduct1 QuantityProduct2 ValueEURProduct2
150
300
200
400
Metadata-based transformation in Blaise
Classification and continuous variables in the columns (FAT): suitable for analysis
PROVIDER PRODUCT Industry SizeClass
Provider 1 Product 1 A
big
Provider 1 Product 2 A
big
Quantity
ValueEUR
150
200
300
400
Metadata-based transformation in SAS
Classification variables in the columns and continuous variables in the rows (THIN)
PROVIDER
Provider 1
Provider 1
Provider 1
Provider 1
PRODUCT
Product 1
Product 2
Product 1
Product 2
Industry
A
A
A
A
SizeClass
big
big
big
big
ContVariables ContObservations
Quantity
150
ValueEUR
300
Quantity
200
ValueEUR
400
The role of metadata in automatic
editing system in SAS


General system for automated editing
Process metadata
The role of metadata in automatic
editing system in SAS

In order to be general the tool must be able
to:
-
-
-
recognize the data which are due to be subjected
to editing and/or imputation;
recognize which editing method should be
applied,
and with what parameters
Process indicators – level 1

Mode of data collection
-
-
-
1 data provided directly by reporting unit
2 data from administrative source
3 data computed from original values
4 imputed data – imputation of non-response
5 imputed data – imputation due to invalid
values detected through the editing process
6 data missing because the unit is not eligible for
the item (logical skip)
Process indicators – level 2

Data status
-
1 original value
2 corrected value
Process indicators – level 3

Method of data correction
-
11 correction after telephone contact
12 data reported at a later stage
Process indicators – level 3

Reporting methods
-
11 reporting by mail questionnaire
12 computer assisted telephone interview(CATI)
13 telephone interview without computer assistance
14 paper assisted personal interview (PAPI)
15 computer assisted personal interview (CAPI)
16 paper assisted self interviewing
17 computer assisted self interviewing
18 web reporting
Process indicators – level 3

Imputation methods
-
10 method of zero values
11 logical imputation
12 historical data imputation
13 mean values imputation
14 nearest neighbour imputation
15 hot-deck imputation
16 cold-deck imputation
17 regression imputation
18 method of the most frequent value
19 estimation of anual value based on infraanual data
21 stochastic hot-deck (random donor)
22 regression imputation with random residuals
23 multiple imputation
Process indicators examples - xy.zz

11.15 means:
1 - data provided directly
by reporting unit
11 - original value
11.15 - computer assisted
personal interview
(CAPI)

42.19 means:
4 - imputed data –
imputation of nonresponse
42 - corrected value
42.19 - estimation of anual
value based on
infraanual data
Statistical process
Blaise
Blaise
SAS
Oracle
Key responders
Other units
SAS
Metadata connected with the data in
Oracle data warehouse

On-line access to:
-
-

Historical data
Data from different phases (not only final data)
Data for multiple surveys (not only data marts)
Statistical (variables & classifications) and
process (time stamps, status indicators...)
metadata connected with the data
...accessible for third-party tools
Conceptual star scheme for SBS
DIM_TIME
DIM_NACE
DIM_VARIABLE
DIM_NUTS
FACTS_SBS
DIM_SOURCE
OBS_VALUE
WEIGHT
...
THIN table design
DIM_ORG_FORM
DIM_INDICATOR
DIM_OBS_UNIT
CLASSIFICATION DIMENSIONS
METADATA DIMENSIONS
Metadata
server
Input tables
Classification
server
Business
register
Input data
Variables
Sources...
Imputation
table
Imputed
data
Loading
Corrected data
Uncorrect data
Parameter
query
Transactional
star
Editing
table
Editing
form
Searching
Automatic
corrections
Clean
data
Imputations
Imputations
Classifications
Editing
Manual
corrections
Control
query
Extracted data
Extractions
Analysis
Analytical
data
Analytical
cubes
Analytical
query
Results
Oracle
Discoverer
Lessons learnt

The role of central repositories for metadata
-

Harmonisation of metadata concepts
-

Natural source of conceptual metadata
Metadata have to be exact, complete and consistant
Process metadata should be connected with the data
Local metadata vs. global metadata
The cultural change is needed
Technical considerations
-
The possibilities for metadata exchange and system
integration are good (XML, SQL)
Questions