Download Local and Global Scores in Selective Editing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Euclidean geometry wikipedia , lookup

Euclidean space wikipedia , lookup

Transcript
Local and Global Scores in
Selective Editing
Dan Hedlin
Statistics Sweden
1
Local score
• Common local (item) score for item j in record
k: ~
~
 kj  wk y kj  z kj  j
•
•
•
•
wk design weight
~
ykj predicted value
zkj reported value
j standardisation measure
2
Global score
• What function of the local scores to form
a global (unit) score?
• The same number of items in all records
• p items, j = 1, 2, … p
• Let a local score be denoted by kj
• … and a global score by g γ k 
3
Common global score functions
In the editing literature:
• Sum function:
p
  kj
j 1
• Euclidean score:
p
2

 kj
j 1
 
 kj
• Max function: max
j
4
• Farwell (2004): ”Not only does the
Euclidean score perform well with a
large number of key items, it appears to
perform at least as well as the
maximum score for small numbers of
items.”
5
Unified by…
• Minkowski’s distance
 p 
g γ k ;       kj 
 j 1 
 1
1
• Sum function if  = 1
• Euclidean  = 2
• Maximum function if   infinity
6
• NB extreme choices are sum and max
• Infinite number of choices in between
•  = 20 will suffice for maximum unless
local scores in the same record are of
similar size
7
Global score as a distance
• The axioms of a distance are sensible
properties such as being non-negative
• Also, the triangle inequality
g γ k  γ l   g γ k   g γ l 
• Can show that a global score function that
does not satisfy the triangle inequality
yields inconsistencies
8
• Hence a global score function should be a
distance
• Minkowski’s distance appears to be
adequate for practical purposes
• Minkowski’s distance does not satisfy the
triangle inequality if  < 1
• Hence it is not a distance for  < 1
9
Parametrised by 
• Advantages: unified global score simplifies
presentation and software implementation
• Also gives structure:  orders the feasible
choices
…from smallest:  = 1
…to largest: infinity
10
• Turning to geometry…
11
Sum function = City block distance
p = 3, ie three items
12
Euclidean distance
13
Supremum (maximum, Chebyshev’s)
distance
14
Imagine questionnaires with
three items
 k2
Record k
 k3
Euclidean distance
 k1
15
16
The Euclidean function, two items
Threshold 
A sphere in 3D
Threshold 
17
The max function
A cube in 3D
Same threshold 
18
The sum function
An octahedron in 3D
19
20
• The sum function will always give more to
edit than any other choice, with the same
threshold
21
Three editing situations
1. Large errors remain in data, such as unit
errors
2. No large errors, but may be bias due to
many small errors in the same direction
3. Little bias, but may be many errors
22
Can show that if…
1. Situation 3
2. Variance of error is Var  kj   ~ykj  zkj 2
~
3. Local score is  kj  wk ~y kj  z kj  j
•
Then the Euclidean global score will
minimise the sum of the variances of the
remaining error in estimates of the total
23
Summary
• Minkowski’s distance unifies many
reasonable global score functions
• Scaled by one parameter
• The sum and the maximum functions are
the two extreme choices
• The Euclidean unit score function is a good
choice under certain conditions
24