Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Approach to
Quantification of Privacy on
Social Network Sites
IEEE AINA 2010
Tran Hong Ngoc
Isao Echizen
Kamiyama Komei
Hiroshi Yoshiura
VNU, Vietnam
NII, Japan
UEC, Japan
UEC, Japan
Presenter: Yu-Song Syu
Social Network Sites
Growth of SNSs
Leads to an explosion in online informationsharing
With SNSs
People share information with friends
Information include sensitive data
Location, age, career, …
Intruders in SNSs
By making statistics, Intruders may achieve
personal information:
Commercial purpose
Identity theft
Physical harm
…
How to get such information?
http://www.iis.sinica.edu.tw
http://www.iis.sinica.edu.tw
Usually, people do not know How Much private
information they reveal about themselves and others
Privacy Metric
Based on probability and entropy
Helps user know how much private information may
leak from their blog sentences
Defines the Leaked Privacy Value, Δ, as the amount
of knowledge that intruders can learn about a
“problem of interest”
Proposed System Model
Info. Retrieval techniques
based on NLP methods
Quantification of Privacy
System Model
Find the information about someone
Prefecture, age, city, university, …
Blog sentences that users post
Event
Event & Blog Set
BlogSetj
BlogSeti
Event:
( k ) x | x U ,0 p( x) 1
Blog Set:
(k )
i
x | x
(k )
,0 p( x) 1 ,
(k )
i
(k )
Intersection:
~ ( k ) x | 0 p( x) 1, x 1k , x 2k ,..., x nk
n
i 1
Blog Set / Joint Blog Set
Assumed to never be empty
Example: Prefecture
Before Proposed Metric…
Math Backgrounds
Entropy (Uncertainty)
Event
Conditional Entropy
Joint Entropy
Possible Value
Why Use Entropy?
Idea:
Difference of Uncertainty
Leaked Privacy
Privacy Leakage Metric
Leaked Privacy Value:
The change in the privacy value that is had by subtracting
the privacy after sentences are posted from the privacy
before the sentences are posted
before
(k )
after
(k )
H ({ }) H ({~ })
# events
H ({ ( k ) }) H ( (1) , ( 2) ,..., ( m) )
,&
H ({~ ( k ) }) H ( ~(1) , ~ ( 2) ,..., ~ ( m) )
Experiments
Dataset:
Statistical Survey Department, Statistics Bureau,
Ministry of Internal Affairs and Communications
Problem of Interest:
Gaining information relating to a victim in an
accident, which happened in Japan’s subway and
were discussed by SNS users
Experiments - Prefecture
Experiments - Age
(Age)
Prefecture
Age
Experiments – Total Leaked Privacy
Total Leaked Privacy Before & After Blogging
Conclusions
Proposed a new metric to quantify how much
private information is leaked from blog on
SNSs
SNS users can see if the posting carelessly
expose private information
Based on probability and entropy, the proposal
is simpler then others but effective, as proved
in experiments