Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UT DALLAS Erik Jonsson School of Engineering & Computer Science Data and Applications Security Security and Privacy in Online Social Networks Murat Kantarcioglu Bhavani Thuraisingham Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations July, 2013 FEARLESS engineering Outline • • • • • • Introduction to Social Networks Properties of Social Networks Social Network Analysis Basics Data Privacy Basics Privacy and Social Networks Access control issues for Online Social Networks FEARLESS engineering References • Barbara Carminati, Elena Ferrari, Raymond Heatherly, Murat Kantarcioglu, Bhavani M. Thuraisingham: A semantic web based framework for social network access control. SACMAT 2009: 177186 • Raymond Heatherly, Murat Kantarcioglu, Bhavani M. Thuraisingham: Preventing Private Information Inference Attacks on Social Networks. IEEE Trans. Knowl. Data Eng. 25(8): 1849-1862 (2013) FEARLESS engineering Social Networks • Social networks have important implications for our daily lives. – – – – Spread of Information Spread of Disease Economics Marketing • Social network analysis could be used for many activities related to information and security informatics. – Terrorist network analysis FEARLESS engineering Enron Social Graph* * http://jheer.org/enron/ FEARLESS engineering Romantic Relations at “Jefferson High School” FEARLESS engineering Emergence of Online Social Networks • Online Social networks become increasingly popular. • Example: Facebook* – Facebook has more than 200 million active users. – More than 100 million users log on to Facebook at least once each day – More than two-thirds of Facebook users are outside of college – The fastest growing demographic is those 35 years old and older *http://www.facebook.com/press/info.php?statistics FEARLESS engineering Properties of Social Networks • “Small-world” phenomenon – Milgram asked participants to pass a letter to one of their close contacts in order to get it to an assigned individual – Most of the letters are lost (~75% of the letters) – The letters who reached their destination have passed through only about six people. – Origins of six degree – Mean geodesic distance l of graphs grows logarithmically or even slower with the network size. (dij is the shortest distance between node i and j) . l FEARLESS engineering 2 d i j ij n(n 1) “Small-World” Example: Six Degrees of Kevin Bacon FEARLESS engineering Properties of Social Networks • Degree Distribution Clustering • Other important properties – – – – – Community Structure Assortativity Clustering Patterns Homomiphly …. • Many of these properties could be used for analyzing social networks. FEARLESS engineering Social Network Mining • Social network data is represented a graph – Individuals are represented as nodes • Nodes may have attributes to represent personal traits – Relationships are represented as edges • Edges may have attributes to represent relationship types • Edges may be directed • Common Social Network Mining tasks – Node classification – Link Prediction FEARLESS engineering Data Privacy Basics • How to share data without violating privacy? • Meaning of privacy? – Identity disclosure – Sensitive Attribute disclosure • Current techniques for structured data – – – – K-anonymity L-diversity Differential privacy Secure multi-party computation • Problem: Publishing private data while, at the same time, protecting individual privacy • Challenges: – How to quantify privacy protection? – How to maximize the usefulness of published data? – How to minimize the risk of disclosure? FEARLESS engineering Sanitization and Anonymization • • • • • Automated de-identification of private data with certain privacy guarantees – Opposed to “formal determination by statisticians” requirement of HIPAA Two major research directions 1. Perturbation (e.g. random noise addition) 2. Anonymization (e.g. k-anonymization) Removing unique identifiers is not sufficient Quasi-identifier (QI) – Maximal set of attributes that could help identify individuals – Assumed to be publicly available (e.g., voter registration lists) As a process 1. Remove all unique identifiers 2. Identify QI-attributes, model adversary’s background knowledge 3. Enforce some privacy definition (e.g. k-anonymity) FEARLESS engineering Re-identifying “anonymous” data (Sweeney ’01) • 37 US states mandate collection of information • She purchased the voter registration list for Cambridge Massachusetts – 54,805 people • 69% unique on postal code and birth date • 87% US-wide with all three • Solution: k-anonymity – Any combination of values appears at least k times • Developed systems that guarantee k-anonymity – Minimize distortion of results FEARLESS engineering Privacy Preserving Distributed Data Mining • Goal of data mining is summary results – Association rules – Classifiers – Clusters • The results alone need not violate privacy – Contain no individually identifiable values – Reflect overall results, not individual organizations The problem is computing the results without access to the data! Data needed for data mining maybe distributed among parties Credit card fraud data Inability to share data due to privacy reasons HIPPAA Even partial results may need to be kept private FEARLESS engineering Graph Model Lindamood et al. 09 & Heatherly et al. 09 • Graph represented by a set of homogenous vertices and a set of homogenous edges • Each node also has a set of Details, one of which is considered private. FEARLESS engineering Naïve Bayes Classification Lindamood et al. 09 & Heatherly et al. 09 • Classification based only on specified attributes in the node FEARLESS engineering Naïve Bayes with Links Lindamood et al. 09 & Heatherly et al. 09 • Rather than calculate the probability from person nx to ny we calculate the probability of a link from nx to a person with ny‘s traits FEARLESS engineering Link Weights Lindamood et al. 09 & Heatherly et al. 09 • Links also have associated weights • Represents how ‘close’ a friendship is suspected to be using the following formula: FEARLESS engineering Collective Inference Lindamood et al. 09 & Heatherly et al. 09 • Collection of techniques that use node attributes and the link structure to refine classifications. • Uses local classifiers to establish a set of priors for each node • Uses traditional relational classifiers as the iterative step in classification FEARLESS engineering Relational Classifiers Lindamood et al. 09 & Heatherly et al. 09 • • • • Class Distribution Relational Neighbor Weighted-Vote Relational Neighbor Network-only Bayes Classifier Network-only Link-based Classification FEARLESS engineering Experimental Data Lindamood et al. 09 & Heatherly et al. 09 • 167,000 profiles from the Facebook online social network • Restricted to public profiles in the Dallas/Fort Worth network • Over 3 million links FEARLESS engineering General Data Properties Lindamood et al. 09 & Heatherly et al. 09 Diameter of the largest component 16 Number of nodes 167,390 Number of friendship links 3,342,009 Total number of listed traits 4,493,436 Total number of unique traits 110,407 Number of components 18 Probability Liberal .45 Probability Conservative .55 FEARLESS engineering Inference Methods Lindamood et al. 09 & Heatherly et al. 09 • Details only: Uses Naïve Bayes classifier to predict attribute • Links Only: Uses only the link structure to predict attribute • Average: Classifies based on an average of the probabilities computed by Details and Links FEARLESS engineering Predicting Private Details Lindamood et al. 09 & Heatherly et al. 09 • Attempt to predict the value of the political affiliation attribute • Three Inference Methods used as the local classifier • Relaxation labeling used as the Collective Inference method FEARLESS engineering Removing Details Lindamood et al. 09 & Heatherly et al. 09 • Ensures that no ‘false’ information is added to the network, all details in the released graph were entered by the user • Details that have the highest global probability of indicating political affiliation removed from the network FEARLESS engineering Removing Links Lindamood et al. 09 & Heatherly et al. 09 • Ensures that the link structure of the released graph is a subset of the original graph • Removes links from each node that are the most like the current node FEARLESS engineering Most Liberal Traits Lindamood et al. 09 & Heatherly et al. 09 Trait Name Trait Value Weight Liberal Group legalize same sex marriage 46.16066789 Group every time i find out a cute boy is conservative a little part of me dies 39.68599463 Group equal rights for gays 33.83786875 Group the democratic party 32.12011605 Group not a bush fan 31.95260895 Group people who cannot understand people who voted for bush 30.80812425 Group government religion disaster 29.98977927 FEARLESS engineering Most Conservative Traits Lindamood et al. 09 & Heatherly et al. 09 Trait Name Trait Value Weight Conservative Group george w bush is my homeboy 45.88831329 Group college republicans 40.51122488 Group texas conservatives 32.23171423 Group bears for bush 30.86484689 Group kerry is a fairy 28.50250433 Group aggie republicans 27.64720818 Group keep facebook clean 23.653477 Group i voted for bush 23.43173116 Group protect marriage one man one woman 21.60830487 FEARLESS engineering Most Liberal Traits per Trait Name Lindamood et al. 09 & Heatherly et al. 09 Trait Name Trait Value Weight Liberal activities amnesty international 4.659100601 Employer hot topic 2.753844959 favorite tv shows queer as folk 9.762900035 grad school computer science 1.698146579 hometown mumbai 3.566007713 Relationship Status in an open relationship 1.617950632 religious views agnostic 3.15756412 looking for whatever i can get 1.703651985 FEARLESS engineering Experiments Lindamood et al. 09 & Heatherly et al. 09 • Conducted on 35,000 nodes which recorded political affiliation • Tests removing 0 details and 0 links, 10 details and 0 links, 0 details and 10 links, and 10 details and 10 links • Varied Training Set size from 10% of available nodes to 90% FEARLESS engineering Local Classifier Results FEARLESS engineering Lindamood et al. 09 & Heatherly et al. 09 Collective Inference Results FEARLESS engineering Lindamood et al. 09 & Heatherly et al. 09 Online Social Networks Access Control Issues • Current access control systems for online social networks are either too restrictive or too loose – “selected friends” • Bebo, Facebook, and Multiply. – “neighbors” (i.e., the set of users having musical preferences and tastes similar to mine) • Last.fm – “friends of friends” • (Facebook, Friendster, Orkut); – “contacts of my contacts” (2nd degree contacts), “3rd” and“4th degree contacts” • Xing FEARLESS engineering Challenges I want only my family and close friends to see this picture. FEARLESS engineering Requirements • Many different online social networks with different terminology – Facebook vs Linkedin • We need to have flexible models that can represent – User’s profiles – Relationships among users • (e.g. Bob is Alice’s close friend) – Resources • (e.g., online photo albums) – Relationships among users and resources • (e.g., Bob is the owner of the photo album and Alice is tagged in this photo), – Actions (e.g., post a message on someone’s wall). FEARLESS engineering Overview of the Solution • We use semantic web technologies (e.g., OWL) to represent social network knowledge base. • We use semantic web rule language (SWRL) to represent various security, admin and filter policies. FEARLESS engineering Modeling User Profiles and Resources • Existing ontologies such as FoAF could be extended to capture user profiles. • Relationship among resources could be captured by using OWL concepts – PhotoAlbum rdfs:subClassOf Resource – PhotoAlbum consistsOf Photos FEARLESS engineering Modeling Relationships Among Users • We model relationships among users by defining N-ary relationship – :Christine a :Person ; :has_friend _:Friendship_Relation_1. :_Friendship_relation_1 a :Friendship_Relation ; :Friendship_trust :HIGH; :Friendship_value :Mike . • Owl reasoners cannot be used to infer some relationships such as Christine is a third degree friend of John. – Such computations needs to be done separately and represented by using new class. FEARLESS engineering Specifying Policies Using OSN Knowledge Base • Most of the OSN information could be captured using OWL to represent rich set of concepts • This makes it possible to specify very flexible access control policies – “Photos could be accessed by friends only” automatically implies closeFriend can access the photos too. – Policies could be defined based on user-resource relationships easily. FEARLESS engineering Security Policies for OSNs • Access control policies • Filtering policies – Could be specified by user – Could be specified by authorized user • Admin policies – Security admin specifies who is authorized specify filtering and access control policies – Exp: if U1 isParentOf U2 and U2 is a child then U1 can specify filtering policies for U2. FEARLESS engineering Security Policy Specification (using semantic web technologies) • Semantic Web Rule Language (SWRL) is used for specifying access control, filtering and authorization policies. • SWRL is based on OWL: – all rules are expressed in terms of OWL concepts (classes, properties, individuals, literals…). • Using SWRL, subject, object and actions are specified • Rules can have different authorization that states the subject’s rights on target object. FEARLESS engineering Knowledge based for Authorizations and Prohibitions • Authorizations/Prohibitions needs to be specified using OWL – Different object property for each actions supported by OSN. – Authorizations/prohibitions could automatically propagate based on action hierarchies • Assume “post” is a subproperty of “write” • If a user is given “post” permission than user will have “write” permission as well • Admin Prohibitions need to be specified slightly different. (Supervisor, Target, Object, Privilige) FEARLESS engineering Security Rule Examples • SWRL rule specification does depend on the authorization and OSN knowledge bases. – It is not possible to specify generic rules • Examples: FEARLESS engineering Security Rule Enforcement • A reference monitor evaluates the requests. • Admin request for access control could be evaluated by rule rewriting – Example: Assume Bob submits the following admin request – Rewrite as the following rule FEARLESS engineering Security Rule Enforcement • Admin requests for Prohibitions could be rewritten as well. – Example: Bob issues the following prohibition request – Rewritten version • Access control requests needs to consider both filter and access control policies FEARLESS engineering Framework Architecture Social Network Application Access request Access Decision Reference Monitor Knowledge Base Queries Modified Access request Reasoning Result Semantic Web Reasoning Engine FEARLESS engineering Policy Retrieval Policy Store SN Knowledge Base Conclusions • Various attacks exist to – Identify nodes in anonymized data – Infer private details • Recent attempts to increase social network access control to limit some of the attacks • Balancing privacy, security and usability on online social networks will be an important challenge • Directions – Scalability • We are currently implementing such system to test its scalability. – Usability • Create techniques to automatically learn rules • Create simple user interfaces so that users can easily specify these rules. FEARLESS engineering