Download methods of detecting and treating outliers used in republika srpska

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Asynchronous I/O wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Soft error wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Forecasting wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Transcript
METHODS OF DETECTING AND TREATING OUTLIERS USED IN
REPUBLIKA SRPSKA INSTITUTE OF STATISTICS
─ ABSTRACT ─
Darko Marinković
Republika Srpska Institute of Statistics/Senior Officer for Sampling and Data Analysis
Veljka Mlađenovića 12d, 78 000 Banja Luka, Bosnia and Herzegovina
Phone: ++38751332724; Fax: ++38751332750
E-mail: [email protected]
Aleksandra Đonlaga
Republika Srpska Institute of Statistics/Senior Officer for Services Statistics
Veljka Mlađenovića 12d, 78 000 Banja Luka, Bosnia and Herzegovina
Phone: ++38751332718; Fax: ++38751332750
E-mail: [email protected]
Non-sampling errors in surveys include all errors that can occur during data collection, data
processing, estimation and analysis, except error that is related to the fact that a survey is
conducted using probability sample. Having in mind number of possible sources of this types
errors, it is not easy task to ensure a level of quality required by users and, at same time, to
exploit available resources in most efficient manner and stay within predefined time/budget
restrictions. To be able to respond to that challenge, process of production of official statistics
must include systematic approach to prevention, identification and treatment of errors that are
occurring in survey operations other than sampling. Outliers, as a potential non-sampling
error, might have significant influence on estimates produced on domains of interest of the
survey, and must be identified and treated in proper manner. They might include errors from
one or more sources or, on the other hand, be a result of true change in the phenomenon
which is subject of the survey. Proper distinction must be made between the two situations in
order to avoid serious bias in survey estimates. That process include not only checking
internal consistency of data (usually implemented within data entry/collection solution), but
also checking external consistency, by comparing collected data at unit level with historical
data on same/related surveys and possibly with available administrative sources. To be able to
combine multiple data sources with survey data, there is a need for implementation of a
method that is simple enough and, at same time, that can in efficient manner identify highly
influential observations, which are potential non-sampling error. Also, in some situations the
only solution for treatment of identified error is recontacting of the unit and this fact should be
taken into account if we want to stay within time/budget restrictions. This means that
identification and minimization of influence of outliers on survey estimates is one of major
challenges for statisticians.
Paper describes the main methods of identification and treatment of outliers, which are
commonly used in long-term surveys conducted in Republika Srpska Institute of Statistics.
An overview of the Hidiroglou-Berthelot ratio method is given, which is applied to detect
outliers in Structural Business Statistics and Labour Cost Survey. Also, brief description of
the implementation of the method is given.