Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Adam Sweeney CS 898AB Feb. 23, 2017 Community-Enhanced Deanonymization of Online Social Networks 1 Introduction • Online Social Networks (OSN) host vast amounts of personal information • Many applications – Targeted advertising – Health care – Study of human behavior • Information is anonymized before provided to customers • Adversaries will want to de-anonymize the data 2 Introduction (cont.) • Narayanan and Shmatikov demonstrated a method of de-anonymizing across social networks using ‘network alignment’ • This paper focuses on network alignment deanonymization with no additional attributes – But two networks have a high overlap • Authors propose a divide-and-conquer approach – Divide network into communities – Use two-stage mapping 3 Agenda Definitions and Attack Models Background Community Enhanced De-Anonymization Degree of Anonymity Evaluation Results Related Work 4 Definitions and Attack Models 5 Definitions • Undirected graph, 𝐺 𝑉, 𝐸 • Clique or k-clique – Fully connected sub-graph, k can specify size of clique • A community-blind algorithm is one that does not see communities, like that of Narayanan and Shmatikov 6 Attack Model • Assume adversary has access to two networks, 𝐺 𝑉, 𝐸 and 𝐺′ 𝑉 ′ , 𝐸 ′ , where 𝑉 ∩ 𝑉 ′ ≠ ∅ and 𝐸 ∩ 𝐸′ ≠ ∅ – Focus on cases where 𝑉 ≈ 𝑉′ and 𝐸 ≈ 𝐸′ • Goals of the attacker – Align the anonymized network with the ‘reference’ network – Re-identify anonymized users – Reveal private information • Problem changes if both networks are anonymized 7 Background 8 Background • A community is a typically regarded as a group of densely connected nodes, where there are few connections to nodes outside the community • Communities often overlap – Paper focuses on disjoint, non-overlapping communities to simplify the problem • Degree of anonymity – 0 ≤ 𝐴 𝑋 ≤ 1, where A(X) = 1 indicates complete anonymity • Bullet point goes here 9 Community Enhanced DeAnonymization 10 Community Enhanced DeAnonymization • Any community mapping method can be used to identify communities – Authors used Infomap • Communities need to be mapped (2 methods) – Identifying seed communities • Needs pre-identified seed mappings – Creating a network of communities • Community structure is considered a high-level, coarse grained graph (communities are nodes) 11 Community Enhanced DeAnonymization (cont.) • Community mapping allows for the identification of additional seeds – Searching within communities provides a comparatively narrow scope 12 Community Enhanced DeAnonymization (cont.) • Finding additional seeds is called “seed enrichment” • Seed enrichment at the community level is done by using two distance metrics – A node’s degree, and a node’s clustering coefficient – Metrics are computed and tested between each pair of nodes across mapped communities – Community-blind algorithm is run locally (between two mapped communities) • Finally, apply community-blind algorithm to the whole network using all mapped nodes as seeds 13 Degree of Anonymity 14 Degree of Anonymity • Community structure may reveal information about true mappings – Even if a node cannot be mapped by a deanonymization algorithm • Skipping over a lot of math • This measure is estimating the upper bound – Quantifies the minimum possible damage from a deanonymization attack 15 Evaluation 16 Evaluation Overview • Simulation-based experiments using real-world network datasets • For each experiment – Prepare a copy of original network – Partially alter the structure – Compare network alignment of community-blind against community-aware algorithms 17 Data Sets • Network of co-authorships between scientists that posted to a specific archive – Authors are connected if they wrote a paper together • Twitter mention network – Users who mutually mentioned each other • Smaller sub-section of the same Twitter mention network 18 Experimental Setup • Original network is assumed to be anonymized – Prepare an array of networks with different noise levels – Θ = 0.10 means that 10% of edges are re-wired • After noisy networks are created, a percentage of nodes are randomly removed from all networks – For example, when Θ = 0.10, an additional 5% of nodes are removed 19 Experimental Setup (cont.) • Eccentricity of node-mapping algorithms set to 0.1 – Threshold of community mapping set to 0 – Observed that more mapped communities always returned more correctly mapped nodes • More false positives, but effect is limited • Both algorithms given the same set of initial seeds – Mimics prior knowledge of attacker 20 Two-Column Layout Results 21 Results 22 Results (cont.) 23 Results for Overlapping Data Sets 24 Related Work 25 Graph Anonymization • Can be classified into four approaches • Clustering – Many possible mappings from clusters to mappings, including the original mapping • Clustering with constraints – Merges nodes of a cluster into a single node – Decides which edges to include such that equivalence class nodes have same constraints as original data 26 Graph Anonymization (cont.) • Modification of graph – Approach used in this paper – Re-wiring, node removal – Attempts to subvert attacks based on a known structure • Hybrid – Any combination of the prior three 27 De-Anonymization Attacks Based on Structure • Leverage patterns of connectivity • Active attack – Adversary chooses victims ahead of time – Create Sybils and attempt to form connections to the victims – Adversary can force unique structure that can be identified from anonymized graph • Passive attack – Small group of attackers identifies its location in the network – Attempt to discover existence of edges 28 De-Anonymization Attacks Based on Other Attributes • Use a victim’s public and non-sensitive data • Users that are part of multiple social networks share different data – More public on one, more private on another • Not a trivial problem to match users across networks – – – – 29 Exploit activity patterns Tagging behavior Item preferences Communication patterns Network Alignment • Of interest in other fields • A biological context – Map two protein interaction networks to infer the functions of unknown proteins in each species 30 Review Definitions and Attack Models Background Community Enhanced De-Anonymization Degree of Anonymity Evaluation Results Related Work 31 Reference • Nilizadeh, Shirin, Apu Kapadia, and Yong-Yeol Ahn. "Community-Enhanced De-anonymization of Online Social Networks." Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS '14 (2014) 32 Questions 33