Download lecture aid

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Computation, Information, and Intelligence (COMS/ENGRI/INFO/COGST 172), Fall 2005
10/17/05: Lecture 22 aid — PageRank and hubs ’n’ authorities
Agenda: Further discussion of PageRank and the hubs-and-authorities algorithm.
Announcements: An organizational suggestion: if you haven’t already done so, it might be
worthwhile to acquire a three-ring binder and a three-hole punch. Then, if one took notes for each
lecture on loose-leaf paper, one could keep one’s lecture notes appropriately interleaved with each
lecture’s handouts in the binder.
I.
Reminder: the main PageRank equation Let ² be some number between 0 and 1.
score(t+1) (dj ) =
II.
²
+ (1 − ²)
n
X
d
pointing to
dj
score(t) (d)
.
outdegree(d)
Some facts about probabilities
• The probability of a non-impossible event e1 happening and then an event e2 happening is the
probability that e1 happens times the probability that e2 happens given that e1 happened.
• The probability of either (but not both) of two mutually exclusive alternative events e1 and
e2 happening is the probability of e1 happening plus the probability of e2 happening.
• The sum of the probabilities over all possible mutually exclusive alternatives for a given
probabilistic choice must be 1.
III. The “random surfer” model Upon arriving at a document, the user either chooses to
follow an existing hyperlink or to randomly jump to any document on the Web. The two cases
have probability (1 − ²) and ², respectively (note that these sum to 1), and in either case, the choice
among alternatives that then result is made uniformly at random.
We then interpret score(t) (dj ) as the probability that the surfer is at document dj at time t.
(OVER)
IV. Document set for comparison calculations We have to use a slightly different example
than that from last time’s handout because there’s a slight but annoying technical problem in
applying PageRank (even with “mini-links”/random jumps) when there are documents with zero
out-degree.
W
Y
X
V.
Z
V
PageRank results ² = 0.15.
PageRank scores, epsilon=.15
0.5
V
W
X
Y
Z
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
10
20
30
40
iteration
50
60
70
80
VI. Hubs-and-authorities results Since we are only interested in relative comparisons, we
have not aligned the axes of the three plots.
Authority scores
Hub scores
0.9
V
W
X
Y
Z
0.8
V
W
X
Y
Z
0.5
0.7
0.4
0.6
0.5
0.3
0.4
0.2
0.3
0.2
0.1
0.1
0
0
0
2
4
6
8
iteration
10
12
14
16
0
2
4
6
8
iteration
10
12
14
16