Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MEASUREMENT AND ANALYSIS OF ONLINE SOCIAL NETWORKS Professor : Dr Sheykh Esmaili 1 Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani INTRODUCTION The Internet has spawned different types of information sharing systems, including the Web. MySpace (over 190 million users) Orkut (over 62 million) LinkedIn (over 11 million) LiveJournal (over 5.5 million) Unlike the Web, which is largely organized around content, online social networks are organized around users. 2 INTRODUCTION(CONT.) Users join a network, publish their profile and create links to any other users with whom they associate. The resulting social network provides a basis for maintaining social relationships, for finding users with similar interests, and for locating content. Understanding of the graph structure of online social networks is necessary to evaluate current systems, to design future online social network based systems. 3 INTRODUCTION(CONT.) Recent work has proposed the use of social networks to mitigate email spam, to improve Internet search, and to defend against Sybil attacks. We obtained our data by crawling publicly accessible information on these sites 4 INTRODUCTION(CONT.) This differs from content graphs like the graph formed by Web hyperlinks, where the popular pages (authorities) and the pages with many references (hubs) are distinct. We find that online social networks contain a large, strongly connected core of high-degree nodes, surrounded by many small clusters of lowdegree nodes. Flow of information in these networks. 5 ONLINE SOCIAL NETWORKING SITES Online social networking sites. are usually run by individual corporations. Users. must register with a site, possibly under a pseudonym. Some sites allow browsing of public data without explicit. Links. The social network is composed of user accounts and links between users. Some sites (e.g. Flickr, LiveJournal) allow users to link to any other user, without consent from the link target. 6 ONLINE SOCIAL NETWORKING SITES(CONT.) Groups. Most sites enable users to create and join special interest groups. Users can post messages to groups and upload shared content to the group. Certain groups are moderated. admission to such a group and postings to a group are controlled by a user designated as the group’s moderator. Other groups are unrestricted, allowing any member to join and post messages or content. 7 IS THE SOCIAL NETWORK USED IN LOCATING CONTENT? Only Orkut is a “pure” social networking site, in the sense that the primary purpose of the site is finding and connecting to new users. Flickr, YouTube, and LiveJournal are used for sharing photographs, videos, and blogs, respectively. 8 WHY STUDY SOCIAL NETWORKS? Are already at the heart of some very popular Web sites. Play an important role in future personal and commercial online interaction. Help us understand the impact of online social networks on the future Internet. We speculate how our data might be of interest to researchers in other disciplines. 9 SHARED INTEREST AND TRUST Adjacent users in a social network tend to trust each other. A number of research systems have been proposed to exploit this trust. Adjacent users in a social network also tend to have common interests. Users browse neighboring regions of their social network because they are likely to find content that is of interest to them. 10 IMPACT ON FUTURE INTERNET Impact on future Internet Impact on other disciplines Sociologists can examine our data to test existing theories Studying the structure of online social networks may help improve the understanding of online campaigning and viral marketing. Political campaigns have realized the importance of blogs in elections. 11 HOW TO GET DATASETS? Sites reluctant to give out data Cannot enumerate user list Performed crawls of user graph Crawled using cluster of 58 machines Used APIs where available Otherwise, used HTML screen scraping 12 CHALLENGES IN CRAWLING LARGE GRAPHS Need to crawl quickly Underlying social networks changing rapidly Need to crawl completely Social networks aren’t necessarily connected, some users have no links, or small clusters Need to estimate the crawl coverage 13 HOW TO VERIFY SAMPLES Obtain a random user sample Conduct a crawl using these random users as seeds See if these random nodes connect to the original WCC (weakly connected component) 14 DATASET FROM FLICKR Used API to conduct the crawl Obtained random users by guessing usernames (########@N00) to evaluate coverage Covered 27% of user population, but remaining users have very few links 15 DATASET FROM LIVEJOURNAL Used API to conduct the crawl Obtained random users using special URL http://www.livejournal.com/random.bml Crawl covered 95% of user population 16 DATASET FROM ORKUT Used HTML screen-scraping to conduct the crawl 17 DATASET FROM YOUTUBE Used API to conduct the crawl Could not obtain random users Usernames user-specified strings Unable to estimate fraction of users covered 18 HIGH-LEVEL DATA CHARACTERISTICS Flickr LiveJournal Orkut YouTube Number of Users 1.8M 5.3M 3.1M 1.2M Avg. friends per User 12.2 17.0 106 4.3 Number of User Groups 0.1M 7.5M 8.7M 30K Avg. Group per User 4.6 21.2 106 0.25 Metrics vary by orders of magnitude However, networks share many key properties 19 19 ANALYSIS OF NETWORK STRUCTURE Characterize the structural properties of the four network and compare them Link symetry Power-law node degrees Correlation of indegree and outdegree Path lengths and diameter Link degree correlations Densely connected core Tightly clustered fringe 20 20 HOW ARE THE LINKS DISTRIBUTED? Distribution of indegree and outdegree is similar ● Underlying cause is link symmetry 21 LINK SYMMETRY symmetry YouTube Flickr LiveJournal Orkut 79.1% 62.0% 73.5% 100.0% Possibly contributed by informing users of new incoming links Unlike other complex networks, such as the Web ● Sites like cnn.com receive much links more than they give makes it harder to identify reputable sources 22 22 POWER-LAW NODE DEGREES All social networks show properties consistent with power-law networks. ● The majority of the nodes have small degree, and a few nodes have significantly higher degree 23 CORRELATION OF INDEGREE AND OUTDEGREE outdegree vs. indegree in web outdegree vs. indegree in social networks PW CNN OSN 24 PATH LENGTHS AND DIAMETER network Avg.path len diameter web 16.12 905 flickr 5.67 27 livejournal 5.88 20 orkat 4.25 9 youtube 5.10 21 all four networks have short path length from 4.25 – 5.88 25 25 LINK DEGREE CORRELATIONS Examine which users tend to connect to each other Focus on: Joint degree distribution How often nodes of different degrees connect to each other Scale free behavior A value calculated directly from the joint degree distribution of graph Assortativity A measure of the likelihood for nodes to connect to other nodes with similar degrees 26 JOINT DEGREE DISTRIBUTION AND SCALE-FREE BEHAVIOUR 27 DENSELY CONNECTED CORE comprising of between 1% and 10% of the highest degree nodes removing 10% of core nodes results in breaking up graph into millions of very small SCCs graphs below show results as nodes are removed starting with highest-degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right) Sub logarithmic growth 28 28 CLUSTERING COEFFICIENT Clustering coefficient C is a metric of cliquishness Number of links between friends C Number of links that could exist Online social networks are tightly clustered 10,000 times more clustered than random graphs 5-50 times more clustered than random power-law graphs 29 TIGHTLY CLUSTERED FRINGE Low-degree users show high degree of clustering Social network graphs show stronger clustering 30 GROUPS Network Groups Usage Avg. Size Avg. C Flickr 103M 21% 82 0.47 LiveJournal 7M 61% 15 0.81 Orkut 9M 13% 37 0.52 YouTube 30M 8% 10 0.34 31 31 WHAT DOES THE STRUCTURE LOOK LIKE o the networks contain a densely connected core of high-degree nodes; o and that this core links small groups of strongly clustered, lowdegree nodes at the fringes of the network. octopus 32 CONCLUSIONS Structure of OSNs is significantly different from the Web Higher degree symmetry in OSNs Much higher levels of local clustering in OSNs Privacy controls make graph crawling very difficult Pure social networks different from content sharing networks 33 Thanks 34