Download Social Networks and PageRank

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Part I: Web Structure Mining
Chapter 2: Hyperlink Based Ranking
•
•
•
•
•
Social Network Analysis
PageRank
Authorities and Hubs
Link Based Similarity Search
Enhanced Techniques for Page Ranking
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
1
Social Networks
• Directed graph with weights assigned to its edges
• Nodes represent documents and the edges – citations
from one document to other documents.
• Prestige can be associated with the number of input
edges to a node (in-degree).
• Prestige has a recursive nature. depends on the
authority (or again, the prestige) of citations
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
2
Social Networks
• adjacency matrix A
– if document cites document A(u , v)  1
– otherwise A(u , v)  0
• prestige score
p(u)  v A(v, u ) p(v)
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
3
Social Networks
• Computing prestige
T

P A P
• Eigen decomposition
P A P
T
– Eigenvector P
– Eigenvalue 
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
4
Social Networks
a
0.548
c
0.726
b
0.413
 0.548   0 0 1  0.548 

 


1.325  0.414   1 0 0   0.414 
 0.726  1 1 0   0.726 

 


0 1 1 


A   0 0 1
1 0 0 


 0 0 1


T
A  1 0 0 
1 1 0 


 P  AT P
  1.325
P  0.548 0.414 0.726
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
T
5
Social Networks
Power Iteration
• P  P0
• Loop:
QP
P  A TQ
P
• While
1
P
P
P Q  
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
6
PageRank
• “Random web surfer” keeps clicking on
hyperlinks at random with uniform probability
• Implements random walk on the web graph
• Page u links to N u web pages
• Probability of visiting page v will be 1 N u
• Amount of prestige that page v receives from
page u is 1 N u of the prestige of u
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
7
PageRank
Propagation of page rank R (u )
10
20
20
20
60
20
30
26
4
2
13
13
N v  w A(v, w)
6
12
A(v, u ) R(v)
R(u )   
Nv
v
6
6
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
8
PageRank
Calculation of page rank
 0 0.5 0.5 


A  0 0 1
1 0 0 


a
0.4
0.4
0.2
0.2
c
0.4
0.2
b
0.2
PT  0.666 0.333 0.666
Norm X
2
 x12  x22  ...  xn2
 P  AT P
 0.4   0 0 1   0.4 
  
 
 0.2    0.5 0 0   0.2 
 0.4   0.5 1 0   0.4 
  
 
Norm
X 1  x1 x 2 ...  xn
PT  2 1 2
Integers
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, 2007.
Slides for Chapter 1: Information Retrieval an Web Search
9
Related documents