Download Social Computing and Big Data Analytics

Document related concepts
no text concepts found
Transcript
Tamkang
University
Social Computing and
Big Data Analytics
(社群運算大數據分析)
Time: 2016/04/18 (14:00-16:00)
Place: China Airlines
Min-Yuh Day
戴敏育
Assistant Professor
專任助理教授
Dept. of Information Management, Tamkang University
淡江大學 資訊管理學系
http://mail. tku.edu.tw/myday/
2016-04-18
1
戴敏育 博士
(Min-Yuh Day, Ph.D.)
淡江大學資管系專任助理教授
中央研究院資訊科學研究所訪問學人
國立台灣大學資訊管理博士
Publications Co-Chairs, IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM 2013- )
Program Co-Chair, IEEE International Workshop on
Empirical Methods for Recognizing Inference in TExt (IEEE EM-RITE 2012- )
Workshop Chair, The IEEE International Conference on
Information Reuse and Integration (IEEE IRI)
2
Outline
• Social Computing and
Big Data Analytics
• Analyzing the Social Web:
Social Network Analysis
3
Business Insights
with
Social Analytics
4
Social Computing
•Social Network Analysis
•Link mining
•Community Detection
•Social Recommendation
5
Internet Evolution
Internet of People (IoP): Social Media
Internet of Things (IoT): Machine to Machine
Source: Marc Jadoul (2015), The IoT: The next step in internet evolution, March 11, 2015
http://www2.alcatel-lucent.com/techzine/iot-internet-of-things-next-step-evolution/
6
Big Data
Analytics
and
Data Mining
7
Stephan Kudyba (2014),
Big Data, Mining, and Analytics:
Components of Strategic Decision Making, Auerbach Publications
Source: http://www.amazon.com/gp/product/1466568704
8
Architecture of Big Data Analytics
Big Data
Sources
Big Data
Transformation
Middleware
* Internal
* External
* Multiple
formats
* Multiple
locations
* Multiple
applications
Big Data
Platforms & Tools
Raw
Data
Hadoop
Transformed MapReduce
Pig
Data
Extract
Hive
Transform
Jaql
Load
Zookeeper
Hbase
Data
Cassandra
Warehouse
Oozie
Avro
Mahout
Traditional
Others
Format
CSV, Tables
Big Data
Analytics
Applications
Queries
Big Data
Analytics
Reports
OLAP
Data
Mining
Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
9
Architecture of Big Data Analytics
Big Data
Sources
* Internal
* External
* Multiple
formats
* Multiple
locations
* Multiple
applications
Big Data
Transformation
Big Data
Platforms & Tools
Data Mining
Big Data
Analytics
Applications
Middleware
Raw
Data
Hadoop
Transformed MapReduce
Pig
Data
Extract
Hive
Transform
Jaql
Load
Zookeeper
Hbase
Data
Cassandra
Warehouse
Oozie
Avro
Mahout
Traditional
Others
Format
CSV, Tables
Big Data
Analytics
Applications
Queries
Big Data
Analytics
Reports
OLAP
Data
Mining
Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
10
Social Big Data Mining
(Hiroshi Ishikawa, 2015)
Source: http://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093X
11
Architecture for
Social Big Data Mining
(Hiroshi Ishikawa, 2015)
Enabling Technologies
• Integrated analysis model
Analysts
Integrated analysis
• Model Construction
• Explanation by Model
Conceptual Layer
•
•
•
•
Natural Language Processing
Information Extraction
Anomaly Detection
Discovery of relationships
among heterogeneous data
• Large-scale visualization
• Parallel distrusted processing
Data
Mining
Multivariate
analysis
Application
specific task
Logical Layer
Software
• Construction and
confirmation
of individual
hypothesis
• Description and
execution of
application-specific
task
Social Data
Hardware
Physical Layer
Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press
12
Business Intelligence (BI) Infrastructure
Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson.
13
Analyzing the Social Web:
Social Network Analysis
14
Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann
Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311
15
Social Network Analysis (SNA)
Facebook TouchGraph
16
Social Network Analysis
Source: http://www.fmsasg.com/SocialNetworkAnalysis/
17
Social Network Analysis
• A social network is a social structure of
people, related (directly or indirectly) to each
other through a common relation or interest
• Social network analysis (SNA) is the study of
social networks to understand their structure
and behavior
Source: (c) Jaideep Srivastava, [email protected], Data Mining for Social Network Analysis
18
Social Network Analysis (SNA)
Centrality
Prestige
19
Degree
C
A
D
B
E
Source: https://www.youtube.com/watch?v=89mxOdwPfxA
20
Degree
C
A
D
B
E
Source: https://www.youtube.com/watch?v=89mxOdwPfxA
A: 2
B: 4
C: 2
D:1
E: 1
21
Density
C
A
D
B
E
Source: https://www.youtube.com/watch?v=89mxOdwPfxA
22
Density
Edges (Links): 5
Total Possible Edges: 10
Density: 5/10 = 0.5
C
A
D
B
E
Source: https://www.youtube.com/watch?v=89mxOdwPfxA
23
Density
A
E
I
C
G
B
D
F
H
J
Nodes (n): 10
Edges (Links): 13
Total Possible Edges: (n * (n-1)) / 2 = (10 * 9) / 2 = 45
Density: 13/45 = 0.29
24
Which Node is Most Important?
A
E
I
C
G
B
D
F
H
J
25
Centrality
• Important or prominent actors are those that
are linked or involved with other actors
extensively.
• A person with extensive contacts (links) or
communications with many other people in
the organization is considered more important
than a person with relatively fewer contacts.
• The links can also be called ties.
A central actor is one involved in many ties.
Source: Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data”
26
Social Network Analysis (SNA)
• Degree Centrality
• Betweenness Centrality
• Closeness Centrality
27
Degree
Centrality
28
Social Network Analysis:
Degree Centrality
A
E
I
C
G
B
D
F
H
J
29
Social Network Analysis:
Degree Centrality
Node Score
A
E
I
C
G
B
D
F
H
J
A
B
C
D
E
2
2
5
3
3
F
G
H
2
4
3
I
J
1
1
Standardized
Score
2/10 = 0.2
2/10 = 0.2
5/10 = 0.5
3/10 = 0.3
3/10 = 0.3
2/10 = 0.2
4/10 = 0.4
3/10 = 0.3
1/10 = 0.1
1/10 = 0.1
30
Betweenness
Centrality
31
Betweenness centrality:
Connectivity
Number of shortest paths
going through the actor
32
Betweenness Centrality
C B i    g ik i  / g jk
jk
Where gjk = the number of shortest paths connecting jk
gjk(i) = the number that actor i is on.
Normalized Betweenness Centrality
C'B i   CB i  /[( n 1)(n  2) / 2]
Number of pairs of vertices
excluding the vertex itself
Source: https://www.youtube.com/watch?v=RXohUeNCJiU
33
Betweenness Centrality
C
A
D
B
E
A:
BC: 0/1 = 0
BD: 0/1 = 0
BE: 0/1 = 0
CD: 0/1 = 0
CE: 0/1 = 0
DE: 0/1 = 0
Total:
0
A: Betweenness Centrality = 0
34
Betweenness Centrality
C
A
D
B
E
B:
AC: 0/1 = 0
AD: 1/1 = 1
AE: 1/1 = 1
CD: 1/1 = 1
CE: 1/1 = 1
DE: 1/1 = 1
Total:
5
B: Betweenness Centrality = 5
35
Betweenness Centrality
C
A
D
B
E
C:
AB: 0/1 = 0
AD: 0/1 = 0
AE: 0/1 = 0
BD: 0/1 = 0
BE: 0/1 = 0
DE: 0/1 = 0
Total:
0
C: Betweenness Centrality = 0
36
Betweenness Centrality
C
A
D
B
E
A: 0
B: 5
C: 0
D: 0
E: 0
37
Which Node is Most Important?
F
G
H
B
E
A
C
D
I
J
F
H
B
E
A
C
D
J
38
Which Node is Most Important?
F
G
H
B
E
A
C
D
F
I
J
G
H
E
A
I
D
J
39
Betweenness Centrality
C B i    g ik i  / g jk
jk
B
E
A
C
D
40
Betweenness Centrality
C
A
D
B
E
A:
BC: 0/1 = 0
BD: 0/1 = 0
BE: 0/1 = 0
CD: 0/1 = 0
CE: 0/1 = 0
DE: 0/1 = 0
Total:
0
A: Betweenness Centrality = 0
41
Closeness
Centrality
42
Social Network Analysis:
Closeness Centrality
A
E
I
C
G
B
D
F
H
J
CA:
CB:
CD:
CE:
CF:
CG:
CH:
CI:
CJ:
1
1
1
1
2
1
2
3
3
Total=15
C: Closeness Centrality = 15/9 = 1.67
43
Social Network Analysis:
Closeness Centrality
A
E
I
C
G
B
D
F
H
J
GA:
GB:
GC:
GD:
GE:
GF:
GH:
GI:
GJ:
2
2
1
2
1
1
1
2
2
Total=14
G: Closeness Centrality = 14/9 = 1.56
44
Social Network Analysis:
Closeness Centrality
A
E
I
C
G
B
D
F
H
J
HA:
HB:
HC:
HD:
HE:
HF:
HG:
HI:
HJ:
3
3
2
2
2
2
1
1
1
Total=17
H: Closeness Centrality = 17/9 = 1.89
45
Social Network Analysis:
Closeness Centrality
A
E
I
C
G
B
D
F
H
J
G: Closeness Centrality = 14/9 = 1.56 1
C: Closeness Centrality = 15/9 = 1.67 2
H: Closeness Centrality = 17/9 = 1.89 3
46
Social Network Analysis
(SNA) Tools
• NetworkX
• igraph
• Gephi
• UCINet
• Pajek
47
Gephi
The Open Graph Viz Platform
Source: https://gephi.org/
48
Application of SNA
Social Network Analysis
of
Research Collaboration
in
Information Reuse and Integration
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
49
Example of SNA Data Source
Source: http://www.informatik.uni-trier.de/~ley/db/conf/iri/iri2010.html
50
Research Question
• RQ1: What are the
scientific collaboration patterns
in the IRI research community?
• RQ2: Who are the
prominent researchers
in the IRI community?
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
51
Methodology
• Developed a simple web focused crawler program to
download literature information about all IRI papers
published between 2003 and 2010 from IEEE Xplore
and DBLP.
– 767 paper
– 1599 distinct author
• Developed a program to convert the list of coauthors
into the format of a network file which can be
readable by social network analysis software.
• UCINet and Pajek were used in this study for the
social network analysis.
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
52
Top10 prolific authors
(IRI 2003-2010)
1. Stuart Harvey Rubin
2. Taghi M. Khoshgoftaar
3. Shu-Ching Chen
4. Mei-Ling Shyu
5. Mohamed E. Fayad
6. Reda Alhajj
7. Du Zhang
8. Wen-Lian Hsu
9. Jason Van Hulse
10. Min-Yuh Day
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
53
Data Analysis and Discussion
• Closeness Centrality
– Collaborated widely
• Betweenness Centrality
– Collaborated diversely
• Degree Centrality
– Collaborated frequently
• Visualization of Social Network Analysis
– Insight into the structural characteristics of
research collaboration networks
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
54
Top 20 authors with the highest closeness scores
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ID
3
1
4
6
61
260
151
19
1043
1027
443
157
253
1038
959
957
956
955
943
960
Closeness
0.024675
0.022830
0.022207
0.020013
0.019700
0.018936
0.018230
0.017962
0.017962
0.017962
0.017448
0.017082
0.016731
0.016618
0.016285
0.016285
0.016285
0.016285
0.016285
0.016071
Author
Shu-Ching Chen
Stuart Harvey Rubin
Mei-Ling Shyu
Reda Alhajj
Na Zhao
Min Chen
Gordon K. Lee
Chengcui Zhang
Isai Michel Lombera
Michael Armella
James B. Law
Keqi Zhang
Shahid Hamid
Walter Z. Tang
Chengjun Zhan
Lin Luo
Guo Chen
Xin Huang
Sneh Gulati
Sheng-Tun Li
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
55
Top 20 authors with the highest betweeness scores
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ID
1
3
2
66
4
6
65
19
39
15
31
151
7
30
41
270
5
110
106
8
Betweenness
0.000752
0.000741
0.000406
0.000385
0.000376
0.000296
0.000256
0.000194
0.000185
0.000107
0.000094
0.000094
0.000085
0.000072
0.000067
0.000060
0.000043
0.000042
0.000042
0.000042
Author
Stuart Harvey Rubin
Shu-Ching Chen
Taghi M. Khoshgoftaar
Xingquan Zhu
Mei-Ling Shyu
Reda Alhajj
Xindong Wu
Chengcui Zhang
Wei Dai
Narayan C. Debnath
Qianhui Althea Liang
Gordon K. Lee
Du Zhang
Baowen Xu
Hongji Yang
Zhiwei Xu
Mohamed E. Fayad
Abhijit S. Pandya
Sam Hsu
Wen-Lian Hsu
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
56
Top 20 authors with the highest degree scores
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ID
3
1
2
6
8
10
4
17
14
16
40
15
9
25
28
24
23
5
19
18
Degree
0.035044
0.034418
0.030663
0.028786
0.028786
0.024406
0.022528
0.021277
0.017522
0.017522
0.016896
0.015645
0.015019
0.013767
0.013141
0.013141
0.013141
0.013141
0.012516
0.011890
Author
Shu-Ching Chen
Stuart Harvey Rubin
Taghi M. Khoshgoftaar
Reda Alhajj
Wen-Lian Hsu
Min-Yuh Day
Mei-Ling Shyu
Richard Tzong-Han Tsai
Eduardo Santana de Almeida
Roumen Kountchev
Hong-Jie Dai
Narayan C. Debnath
Jason Van Hulse
Roumiana Kountcheva
Silvio Romero de Lemos Meira
Vladimir Todorov
Mariofanna G. Milanova
Mohamed E. Fayad
Chengcui Zhang
Waleed W. Smari
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
57
Visualization of IRI (IEEE IRI 2003-2010)
co-authorship network (global view)
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
58
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
59
Visualization of Social Network Analysis
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
60
Visualization of Social Network Analysis
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
61
Visualization of Social Network Analysis
Source: Min-Yuh Day, Sheng-Pao Shih, Weide Chang (2011),
"Social Network Analysis of Research Collaboration in Information Reuse and Integration"
62
Summary
• Social Computing and
Big Data Analytics
• Analyzing the Social Web:
Social Network Analysis
63
References
• Jiawei Han and Micheline Kamber (2011),
Data Mining: Concepts and Techniques, Third Edition, Elsevier
• Jennifer Golbeck (2013),
Analyzing the Social Web, Morgan Kaufmann
• Stephan Kudyba (2014),
Big Data, Mining, and Analytics: Components of Strategic
Decision Making, Auerbach Publications
• Hiroshi Ishikawa (2015),
Social Big Data Mining, CRC Press
64
Q&A
Tamkang
University
Social Computing and
Big Data Analytics
(社群運算大數據分析)
Time: 2016/04/18 (14:00-16:00)
Place: China Airlines
Min-Yuh Day
戴敏育
Assistant Professor
專任助理教授
Dept. of Information Management, Tamkang University
淡江大學 資訊管理學系
http://mail. tku.edu.tw/myday/
2016-04-18
65
Related documents