Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tamkang University Social Computing and Big Data Analytics (社群運算大數據分析) Time: 2016/04/18 (14:00-16:00) Place: China Airlines Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-04-18 1 戴敏育 博士 (Min-Yuh Day, Ph.D.) 淡江大學資管系專任助理教授 中央研究院資訊科學研究所訪問學人 國立台灣大學資訊管理博士 Publications Co-Chairs, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013- ) Program Co-Chair, IEEE International Workshop on Empirical Methods for Recognizing Inference in TExt (IEEE EM-RITE 2012- ) Workshop Chair, The IEEE International Conference on Information Reuse and Integration (IEEE IRI) 2 Outline • Social Computing and Big Data Analytics • Analyzing the Social Web: Social Network Analysis 3 Business Insights with Social Analytics 4 Social Computing •Social Network Analysis •Link mining •Community Detection •Social Recommendation 5 Internet Evolution Internet of People (IoP): Social Media Internet of Things (IoT): Machine to Machine Source: Marc Jadoul (2015), The IoT: The next step in internet evolution, March 11, 2015 http://www2.alcatel-lucent.com/techzine/iot-internet-of-things-next-step-evolution/ 6 Big Data Analytics and Data Mining 7 Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications Source: http://www.amazon.com/gp/product/1466568704 8 Architecture of Big Data Analytics Big Data Sources Big Data Transformation Middleware * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Platforms & Tools Raw Data Hadoop Transformed MapReduce Pig Data Extract Hive Transform Jaql Load Zookeeper Hbase Data Cassandra Warehouse Oozie Avro Mahout Traditional Others Format CSV, Tables Big Data Analytics Applications Queries Big Data Analytics Reports OLAP Data Mining Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications 9 Architecture of Big Data Analytics Big Data Sources * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Transformation Big Data Platforms & Tools Data Mining Big Data Analytics Applications Middleware Raw Data Hadoop Transformed MapReduce Pig Data Extract Hive Transform Jaql Load Zookeeper Hbase Data Cassandra Warehouse Oozie Avro Mahout Traditional Others Format CSV, Tables Big Data Analytics Applications Queries Big Data Analytics Reports OLAP Data Mining Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications 10 Social Big Data Mining (Hiroshi Ishikawa, 2015) Source: http://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093X 11 Architecture for Social Big Data Mining (Hiroshi Ishikawa, 2015) Enabling Technologies • Integrated analysis model Analysts Integrated analysis • Model Construction • Explanation by Model Conceptual Layer • • • • Natural Language Processing Information Extraction Anomaly Detection Discovery of relationships among heterogeneous data • Large-scale visualization • Parallel distrusted processing Data Mining Multivariate analysis Application specific task Logical Layer Software • Construction and confirmation of individual hypothesis • Description and execution of application-specific task Social Data Hardware Physical Layer Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press 12 Business Intelligence (BI) Infrastructure Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson. 13 Analyzing the Social Web: Social Network Analysis 14 Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311 15 Social Network Analysis (SNA) Facebook TouchGraph 16 Social Network Analysis Source: http://www.fmsasg.com/SocialNetworkAnalysis/ 17 Social Network Analysis • A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest • Social network analysis (SNA) is the study of social networks to understand their structure and behavior Source: (c) Jaideep Srivastava, [email protected], Data Mining for Social Network Analysis 18 Social Network Analysis (SNA) Centrality Prestige 19 Degree C A D B E Source: https://www.youtube.com/watch?v=89mxOdwPfxA 20 Degree C A D B E Source: https://www.youtube.com/watch?v=89mxOdwPfxA A: 2 B: 4 C: 2 D:1 E: 1 21 Density C A D B E Source: https://www.youtube.com/watch?v=89mxOdwPfxA 22 Density Edges (Links): 5 Total Possible Edges: 10 Density: 5/10 = 0.5 C A D B E Source: https://www.youtube.com/watch?v=89mxOdwPfxA 23 Density A E I C G B D F H J Nodes (n): 10 Edges (Links): 13 Total Possible Edges: (n * (n-1)) / 2 = (10 * 9) / 2 = 45 Density: 13/45 = 0.29 24 Which Node is Most Important? A E I C G B D F H J 25 Centrality • Important or prominent actors are those that are linked or involved with other actors extensively. • A person with extensive contacts (links) or communications with many other people in the organization is considered more important than a person with relatively fewer contacts. • The links can also be called ties. A central actor is one involved in many ties. Source: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data” 26 Social Network Analysis (SNA) • Degree Centrality • Betweenness Centrality • Closeness Centrality 27 Degree Centrality 28 Social Network Analysis: Degree Centrality A E I C G B D F H J 29 Social Network Analysis: Degree Centrality Node Score A E I C G B D F H J A B C D E 2 2 5 3 3 F G H 2 4 3 I J 1 1 Standardized Score 2/10 = 0.2 2/10 = 0.2 5/10 = 0.5 3/10 = 0.3 3/10 = 0.3 2/10 = 0.2 4/10 = 0.4 3/10 = 0.3 1/10 = 0.1 1/10 = 0.1 30 Betweenness Centrality 31 Betweenness centrality: Connectivity Number of shortest paths going through the actor 32 Betweenness Centrality C B i g ik i / g jk jk Where gjk = the number of shortest paths connecting jk gjk(i) = the number that actor i is on. Normalized Betweenness Centrality C'B i CB i /[( n 1)(n 2) / 2] Number of pairs of vertices excluding the vertex itself Source: https://www.youtube.com/watch?v=RXohUeNCJiU 33 Betweenness Centrality C A D B E A: BC: 0/1 = 0 BD: 0/1 = 0 BE: 0/1 = 0 CD: 0/1 = 0 CE: 0/1 = 0 DE: 0/1 = 0 Total: 0 A: Betweenness Centrality = 0 34 Betweenness Centrality C A D B E B: AC: 0/1 = 0 AD: 1/1 = 1 AE: 1/1 = 1 CD: 1/1 = 1 CE: 1/1 = 1 DE: 1/1 = 1 Total: 5 B: Betweenness Centrality = 5 35 Betweenness Centrality C A D B E C: AB: 0/1 = 0 AD: 0/1 = 0 AE: 0/1 = 0 BD: 0/1 = 0 BE: 0/1 = 0 DE: 0/1 = 0 Total: 0 C: Betweenness Centrality = 0 36 Betweenness Centrality C A D B E A: 0 B: 5 C: 0 D: 0 E: 0 37 Which Node is Most Important? F G H B E A C D I J F H B E A C D J 38 Which Node is Most Important? F G H B E A C D F I J G H E A I D J 39 Betweenness Centrality C B i g ik i / g jk jk B E A C D 40 Betweenness Centrality C A D B E A: BC: 0/1 = 0 BD: 0/1 = 0 BE: 0/1 = 0 CD: 0/1 = 0 CE: 0/1 = 0 DE: 0/1 = 0 Total: 0 A: Betweenness Centrality = 0 41 Closeness Centrality 42 Social Network Analysis: Closeness Centrality A E I C G B D F H J CA: CB: CD: CE: CF: CG: CH: CI: CJ: 1 1 1 1 2 1 2 3 3 Total=15 C: Closeness Centrality = 15/9 = 1.67 43 Social Network Analysis: Closeness Centrality A E I C G B D F H J GA: GB: GC: GD: GE: GF: GH: GI: GJ: 2 2 1 2 1 1 1 2 2 Total=14 G: Closeness Centrality = 14/9 = 1.56 44 Social Network Analysis: Closeness Centrality A E I C G B D F H J HA: HB: HC: HD: HE: HF: HG: HI: HJ: 3 3 2 2 2 2 1 1 1 Total=17 H: Closeness Centrality = 17/9 = 1.89 45 Social Network Analysis: Closeness Centrality A E I C G B D F H J G: Closeness Centrality = 14/9 = 1.56 1 C: Closeness Centrality = 15/9 = 1.67 2 H: Closeness Centrality = 17/9 = 1.89 3 46 Social Network Analysis (SNA) Tools • NetworkX • igraph • Gephi • UCINet • Pajek 47 Gephi The Open Graph Viz Platform Source: https://gephi.org/ 48 Application of SNA Social Network Analysis of Research Collaboration in Information Reuse and Integration Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 49 Example of SNA Data Source Source: http://www.informatik.uni-trier.de/~ley/db/conf/iri/iri2010.html 50 Research Question • RQ1: What are the scientific collaboration patterns in the IRI research community? • RQ2: Who are the prominent researchers in the IRI community? Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 51 Methodology • Developed a simple web focused crawler program to download literature information about all IRI papers published between 2003 and 2010 from IEEE Xplore and DBLP. – 767 paper – 1599 distinct author • Developed a program to convert the list of coauthors into the format of a network file which can be readable by social network analysis software. • UCINet and Pajek were used in this study for the social network analysis. Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 52 Top10 prolific authors (IRI 2003-2010) 1. Stuart Harvey Rubin 2. Taghi M. Khoshgoftaar 3. Shu-Ching Chen 4. Mei-Ling Shyu 5. Mohamed E. Fayad 6. Reda Alhajj 7. Du Zhang 8. Wen-Lian Hsu 9. Jason Van Hulse 10. Min-Yuh Day Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 53 Data Analysis and Discussion • Closeness Centrality – Collaborated widely • Betweenness Centrality – Collaborated diversely • Degree Centrality – Collaborated frequently • Visualization of Social Network Analysis – Insight into the structural characteristics of research collaboration networks Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 54 Top 20 authors with the highest closeness scores Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID 3 1 4 6 61 260 151 19 1043 1027 443 157 253 1038 959 957 956 955 943 960 Closeness 0.024675 0.022830 0.022207 0.020013 0.019700 0.018936 0.018230 0.017962 0.017962 0.017962 0.017448 0.017082 0.016731 0.016618 0.016285 0.016285 0.016285 0.016285 0.016285 0.016071 Author Shu-Ching Chen Stuart Harvey Rubin Mei-Ling Shyu Reda Alhajj Na Zhao Min Chen Gordon K. Lee Chengcui Zhang Isai Michel Lombera Michael Armella James B. Law Keqi Zhang Shahid Hamid Walter Z. Tang Chengjun Zhan Lin Luo Guo Chen Xin Huang Sneh Gulati Sheng-Tun Li Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 55 Top 20 authors with the highest betweeness scores Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID 1 3 2 66 4 6 65 19 39 15 31 151 7 30 41 270 5 110 106 8 Betweenness 0.000752 0.000741 0.000406 0.000385 0.000376 0.000296 0.000256 0.000194 0.000185 0.000107 0.000094 0.000094 0.000085 0.000072 0.000067 0.000060 0.000043 0.000042 0.000042 0.000042 Author Stuart Harvey Rubin Shu-Ching Chen Taghi M. Khoshgoftaar Xingquan Zhu Mei-Ling Shyu Reda Alhajj Xindong Wu Chengcui Zhang Wei Dai Narayan C. Debnath Qianhui Althea Liang Gordon K. Lee Du Zhang Baowen Xu Hongji Yang Zhiwei Xu Mohamed E. Fayad Abhijit S. Pandya Sam Hsu Wen-Lian Hsu Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 56 Top 20 authors with the highest degree scores Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID 3 1 2 6 8 10 4 17 14 16 40 15 9 25 28 24 23 5 19 18 Degree 0.035044 0.034418 0.030663 0.028786 0.028786 0.024406 0.022528 0.021277 0.017522 0.017522 0.016896 0.015645 0.015019 0.013767 0.013141 0.013141 0.013141 0.013141 0.012516 0.011890 Author Shu-Ching Chen Stuart Harvey Rubin Taghi M. Khoshgoftaar Reda Alhajj Wen-Lian Hsu Min-Yuh Day Mei-Ling Shyu Richard Tzong-Han Tsai Eduardo Santana de Almeida Roumen Kountchev Hong-Jie Dai Narayan C. Debnath Jason Van Hulse Roumiana Kountcheva Silvio Romero de Lemos Meira Vladimir Todorov Mariofanna G. Milanova Mohamed E. Fayad Chengcui Zhang Waleed W. Smari Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 57 Visualization of IRI (IEEE IRI 2003-2010) co-authorship network (global view) Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 58 Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 59 Visualization of Social Network Analysis Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 60 Visualization of Social Network Analysis Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 61 Visualization of Social Network Analysis Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011), "Social Network Analysis of Research Collaboration in Information Reuse and Integration" 62 Summary • Social Computing and Big Data Analytics • Analyzing the Social Web: Social Network Analysis 63 References • Jiawei Han and Micheline Kamber (2011), Data Mining: Concepts and Techniques, Third Edition, Elsevier • Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann • Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications • Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press 64 Q&A Tamkang University Social Computing and Big Data Analytics (社群運算大數據分析) Time: 2016/04/18 (14:00-16:00) Place: China Airlines Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-04-18 65