Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 The environment of SELEKT Input, throughput, output, use Throughput Input -Coding -Editing -Imputation -Estimation Respondent (u) has one or several sampled units Sampled unit (k) Observed Background variable unit (l) 1 2 Measurement var. (j) Industry Gender Occup. B M 2 1 2=Wage y jkl Use Output Sum of wages by Industry Industry A B -Decision making -Information C D E F-Z 3 4 Sum of wages by Occupation and Gender Gender Occupation Men 1 2 3 4 Sum Women Sum The environment of SELEKT Input, throughput, output, use Throughput Input -Coding -Editing -Imputation -Estimation Respondent (u) has one or several sampled units Sampled unit (k) Observed Background variable unit (l) 1 2 Measurement var. (j) Industry Gender Occup. B M 2 1 2=Wage y jkl Use Output Sum of wages by Industry Industry A B -Decision making -Information C D E F-Z 3 4 Sum of wages by Occupation and Gender Gender Suspicion Occupation Men 1 2 3 4 Sum Women Sum SELEKT 1.1 Raw+edited past (cold) survey data Survey specific cold adapter (SAS code) Data preparation SAS data set Input (hot) survey data Edits SNOWDON -X analysis Table of Parameters of edits CLAN estimation software Table of Estimates Records to FOLLOW-UP PRE-SELEKT Parameter specifications, Analysis of cold data SAS data set AUTOSELEKT Score calculation & record flagging Records to IMPUTATION Survey specific hot adapter (SAS code) Data preparation Accepted records Process data and reports Glossary of Terms on Statistical Data Editing (1) “EDIT RULE SPECIFICATION CHECK RULE SPECIFICATION A set of check rules that should be applied in the given editing task.” Glossary of Terms on Statistical Data Editing (2) “CHECKING RULE A logical condition or a restriction to the value of a data item or a data group which must be met if the data is to be considered correct. In various connections other terms are used, e.g. edit rule.” Recommended Practices for Editing and Imputation in Crosssectional Business Surveys “EDIT A logical condition or a restriction to the value of a data item or a data group which must be met if the data is to be considered correct. Also known as edit rule or checking rule.” Example 1 if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A01 = ‘Flag’ Example 1 The test variable if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A01 = ‘Flag’ Example 1 The edit group if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A01 = ‘Flag’ Example 1 The acceptance region if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) then Errcode_A01 = ‘Flag’ Example 2 The test variable if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A02 = ‘Flag’ Example 2 The edit groups Occupation = ‘Doctor’ and not (29000 < Salary < 71000) if or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A02 = ‘Flag’ Example 2 The acceptance regions if Occupation = ‘Doctor’ and not (29000 < Salary < 71000) or Occupation = ‘Nurse’ and not (23300 < Salary < 43800) then Errcode_A02 = ‘Flag’ Edits EDIT Edit identification Type of edit Active Section Internal error message External error message Instruction for data review Un-edited test variable Error flag EDIT GROUP AND ACCEPTANCE REGION Edit identification Edit group Acceptance region Edits EDIT GROUP AND ACCEPTANCE REGION EDIT Edit identification Type of edit Active Section Internal error message External error message Instruction for data review 1 Edit identification Edit group Acceptance region EDIT PRACTICAL SUPPORT 2 Un-edited test variable Error flag 3 Edit identification Standard edit rule Edited test variable Suspicion probability value produced by the SELEKT system IMPACT ON STATISTICS LINK Edit identification Survey variable 4 5 FLAGGING EDITS, VARIABLES AND UNITS Survey variable Potent. impact on statistics My questions (1) • Can most edits be described as consisting of the components – test variable – edit group – acceptance region ? • What types of edits can not? My questions (2) If the edits can be described this way, what arguments are there for saying that – one edit has only one edit group and one acceptance region – one edit can be composed of many edit groups with one acceptance region each? My questions (3) Can you give me examples of • similar modeling of edits • metadata storage for edits • edit script generator using a standard metadata storage for edits