Download id Zipcode Sex National. Disease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Privacy-preserving
Anonymization of Setvalued Data
Manolis Terrovitis, Nikos Mamoulis and Panos
Kalnis
VLDB2008
outline
•
•
•
•
•
•
Motivation
Generalization model
Information loss
Anonymization techniques
Experimental results
conclusions
motivation
k-anonymity: intuitively, hide each individual
among k-1 others
– each QI set of values should appear at least k times in the released
microdata
microdata
4-anonymous data
id
Zipcode
Sex
National.
Disease
1
13053
28
Russian
Heart Disease
2
13068
29
American
Heart Disease
3
13068
21
Japanese
Viral Infection
4
5
13053
14853
23
50
American
Indian
Viral Infection
Cancer
6
14853
55
Russian
Heart Disease
7
14850
47
American
Viral Infection
8
9
10
11
12
14850
13053
13053
13068
13068
49
31
37
36
35
American
American
Indian
Japanese
American
Viral Infection
Cancer
Cancer
Cancer
Cancer
id
1
2
3
4
5
6
7
8
9
10
11
12
Zipcode Sex
130**
<30
130**
<30
130**
<30
130**
<30
1485* ≥40
1485* ≥40
1485* ≥40
1485* ≥40
130**
3∗
130**
3∗
130**
3∗
130**
3∗
National.
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
Disease
Heart Disease
Heart Disease
Viral Infection
Viral Infection
Cancer
Heart Disease
Viral Infection
Viral Infection
Cancer
Cancer
Cancer
Cancer
Cont.
• Km-anonymity: for any set of m or less items,
there should be at least k transaction, which
contain this set
• Depending of the point of view of adversary
• Ex:22-anonymity
{a1,b1} only appear in t1(k=1)
not satisfy 22-anonymity
id
Contents
t1
{a1,b1,b2}
t2
{a2,b1}
t3
{a2,b1,b2}
t4
{a1,a2,b2}
Generalization model
• Generalization:
“skim-milk”, ”choco-milk”, ”full-fat milk”->”milk”
“milk”, ”yogurt”, ”cheese”->”dairy product”
The set of items which are different in a detailed
level could become identical
{skim-milk, bread} , {full-fat milk, bread} become identical
” {milk, bread}”
ALL
A
a1
B
a2 b1
b2
Cont.
id
Contents
t1
{a1,b1,b2}
t2
{a2,b1}
t3
{a2,b1,b2}
t4
{a1,a2,b2}
Generalization
{a1,a2} -> A
Original database
22-anonymity
{a1,b1} only appear in t1(k=1)
not satisfy 22-anonymity
id
Contents
t1
{A,b1,b2}
t2
{A,b1}
t3
{A,b1,b2}
t4
{A,b2}
transformed database
{a1,b1} ->{A,b1}
satisfy 22-anonymity
Cont.
• Goal: find the best horizontal cut (generalization
rule) of the hierarchy tree
Information loss
up: the node of the item generalization
hierarchy where p is generalized
|up|: the number of leaves under up
ex: a1 generalized to A
NCP(a1)=2/4
(a1,a2) (a1,a2,b1,b2)
|I|: the number of entire leaves in the entire
hierarchy
id
Contents
Ex: cut ( {a1,a2} ->A )
t1
{a1,b1,b2}
NCP(D)=2*0.5+3*0.5+0+0 / 11
t2
{a2,b1}
t3
{a2,b1,b2}
t4
{a1,a2,b2}
ALL
A
a1
B
a2 b1
b2
Anonymization
techniques
• Anonymization techniques:
detect the
cut in the generalization hierarchy that prevents
any privacy breach and cause the minimum
information loss at the same time
• Data structure :count-tree
• Optimal anonymization (OA)
• Direct anonymization (DA)
• Apriori-based anonymization (AA)
Count-tree
• Accelerate the search of itemset supports
A
B
4
b1
3
b2
3
id
Contents
t1
{b1,b2,A,B}
t2
{a2,b1,A,B}
t3
{a2,b1,b2,A,B}
t4
{a1,a2,b2,A,B}
4
a2
Expanded database
B
3
4
a1
a2
2
3
b1
b1
2
3
b2
b2
2
3
a1
a1
1
2
b2
2
a1
1
a1
2
Optimal anonymization (OA)
ALL
AB
A
a1 a2
AC
B
BC
AB
C
b1 b2 c1 c2
AC BC
Hash table
A B C BC
queue
Direct anonymization (DA)
A
B
4
b1
3
{a1,a2} ->A
b2
3
4
a2
B
3
4
a1
a2
2
3
b1
b1
2
3
b2
b2
2
3
a1
a1
1
2
b2
2
a1
1
a1
2
Apriori-based
anonymization (AA)
id
1-itemsets
2-itemsets
Contents
id
t1
{b1,b2}
t1
{b1,b2,A,B}
t2
{a2,b1}
t2
{a2,b1,A,B}
t3
{a2,b1,b2}
t3
{a2,b1,b2,A,B}
t4
{a1,a2,b2}
t4
{a1,a2,b2,A,B}
{a1,a2} ->A
Contents
reduced by {a1,a2} ->A
id
Contents
t1
{b1,b2,A,B}
t2
{b1,A,B}
t3
{b1,b2,A,B}
t4
{b2,A,B}
Experimental results
Experimental results
(cont.)
OA
DA
AA
Time
3
2
1
Memory
2
2
1
Informati
-on loss
1
1
1
conclusions
• An intuitive method
• Can it apply to k-anonymity ?
Related documents