* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide
Survey
Document related concepts
Transcript
Adam Sweeney CS 898AB Feb. 23, 2017 Community-Enhanced Deanonymization of Online Social Networks 1 Introduction • Online Social Networks (OSN) host vast amounts of personal information • Many applications – Targeted advertising – Health care – Study of human behavior • Information is anonymized before provided to customers • Adversaries will want to de-anonymize the data 2 Introduction (cont.) • Narayanan and Shmatikov demonstrated a method of de-anonymizing across social networks using ‘network alignment’ • This paper focuses on network alignment deanonymization with no additional attributes – But two networks have a high overlap • Authors propose a divide-and-conquer approach – Divide network into communities – Use two-stage mapping 3 Agenda Definitions and Attack Models Background Community Enhanced De-Anonymization Degree of Anonymity Evaluation Results Related Work 4 Definitions and Attack Models 5 Definitions • Undirected graph, 𝐺 𝑉, 𝐸 • Clique or k-clique – Fully connected sub-graph, k can specify size of clique • A community-blind algorithm is one that does not see communities, like that of Narayanan and Shmatikov 6 Attack Model • Assume adversary has access to two networks, 𝐺 𝑉, 𝐸 and 𝐺′ 𝑉 ′ , 𝐸 ′ , where 𝑉 ∩ 𝑉 ′ ≠ ∅ and 𝐸 ∩ 𝐸′ ≠ ∅ – Focus on cases where 𝑉 ≈ 𝑉′ and 𝐸 ≈ 𝐸′ • Goals of the attacker – Align the anonymized network with the ‘reference’ network – Re-identify anonymized users – Reveal private information • Problem changes if both networks are anonymized 7 Background 8 Background • A community is a typically regarded as a group of densely connected nodes, where there are few connections to nodes outside the community • Communities often overlap – Paper focuses on disjoint, non-overlapping communities to simplify the problem • Degree of anonymity – 0 ≤ 𝐴 𝑋 ≤ 1, where A(X) = 1 indicates complete anonymity • Bullet point goes here 9 Community Enhanced DeAnonymization 10 Community Enhanced DeAnonymization • Any community mapping method can be used to identify communities – Authors used Infomap • Communities need to be mapped (2 methods) – Identifying seed communities • Needs pre-identified seed mappings – Creating a network of communities • Community structure is considered a high-level, coarse grained graph (communities are nodes) 11 Community Enhanced DeAnonymization (cont.) • Community mapping allows for the identification of additional seeds – Searching within communities provides a comparatively narrow scope 12 Community Enhanced DeAnonymization (cont.) • Finding additional seeds is called “seed enrichment” • Seed enrichment at the community level is done by using two distance metrics – A node’s degree, and a node’s clustering coefficient – Metrics are computed and tested between each pair of nodes across mapped communities – Community-blind algorithm is run locally (between two mapped communities) • Finally, apply community-blind algorithm to the whole network using all mapped nodes as seeds 13 Degree of Anonymity 14 Degree of Anonymity • Community structure may reveal information about true mappings – Even if a node cannot be mapped by a deanonymization algorithm • Skipping over a lot of math • This measure is estimating the upper bound – Quantifies the minimum possible damage from a deanonymization attack 15 Evaluation 16 Evaluation Overview • Simulation-based experiments using real-world network datasets • For each experiment – Prepare a copy of original network – Partially alter the structure – Compare network alignment of community-blind against community-aware algorithms 17 Data Sets • Network of co-authorships between scientists that posted to a specific archive – Authors are connected if they wrote a paper together • Twitter mention network – Users who mutually mentioned each other • Smaller sub-section of the same Twitter mention network 18 Experimental Setup • Original network is assumed to be anonymized – Prepare an array of networks with different noise levels – Θ = 0.10 means that 10% of edges are re-wired • After noisy networks are created, a percentage of nodes are randomly removed from all networks – For example, when Θ = 0.10, an additional 5% of nodes are removed 19 Experimental Setup (cont.) • Eccentricity of node-mapping algorithms set to 0.1 – Threshold of community mapping set to 0 – Observed that more mapped communities always returned more correctly mapped nodes • More false positives, but effect is limited • Both algorithms given the same set of initial seeds – Mimics prior knowledge of attacker 20 Two-Column Layout Results 21 Results 22 Results (cont.) 23 Results for Overlapping Data Sets 24 Related Work 25 Graph Anonymization • Can be classified into four approaches • Clustering – Many possible mappings from clusters to mappings, including the original mapping • Clustering with constraints – Merges nodes of a cluster into a single node – Decides which edges to include such that equivalence class nodes have same constraints as original data 26 Graph Anonymization (cont.) • Modification of graph – Approach used in this paper – Re-wiring, node removal – Attempts to subvert attacks based on a known structure • Hybrid – Any combination of the prior three 27 De-Anonymization Attacks Based on Structure • Leverage patterns of connectivity • Active attack – Adversary chooses victims ahead of time – Create Sybils and attempt to form connections to the victims – Adversary can force unique structure that can be identified from anonymized graph • Passive attack – Small group of attackers identifies its location in the network – Attempt to discover existence of edges 28 De-Anonymization Attacks Based on Other Attributes • Use a victim’s public and non-sensitive data • Users that are part of multiple social networks share different data – More public on one, more private on another • Not a trivial problem to match users across networks – – – – 29 Exploit activity patterns Tagging behavior Item preferences Communication patterns Network Alignment • Of interest in other fields • A biological context – Map two protein interaction networks to infer the functions of unknown proteins in each species 30 Review Definitions and Attack Models Background Community Enhanced De-Anonymization Degree of Anonymity Evaluation Results Related Work 31 Reference • Nilizadeh, Shirin, Apu Kapadia, and Yong-Yeol Ahn. "Community-Enhanced De-anonymization of Online Social Networks." Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS '14 (2014) 32 Questions 33