* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lecture aid
Survey
Document related concepts
Transcript
Computation, Information, and Intelligence (COMS/ENGRI/INFO/COGST 172), Fall 2005 10/17/05: Lecture 22 aid — PageRank and hubs ’n’ authorities Agenda: Further discussion of PageRank and the hubs-and-authorities algorithm. Announcements: An organizational suggestion: if you haven’t already done so, it might be worthwhile to acquire a three-ring binder and a three-hole punch. Then, if one took notes for each lecture on loose-leaf paper, one could keep one’s lecture notes appropriately interleaved with each lecture’s handouts in the binder. I. Reminder: the main PageRank equation Let ² be some number between 0 and 1. score(t+1) (dj ) = II. ² + (1 − ²) n X d pointing to dj score(t) (d) . outdegree(d) Some facts about probabilities • The probability of a non-impossible event e1 happening and then an event e2 happening is the probability that e1 happens times the probability that e2 happens given that e1 happened. • The probability of either (but not both) of two mutually exclusive alternative events e1 and e2 happening is the probability of e1 happening plus the probability of e2 happening. • The sum of the probabilities over all possible mutually exclusive alternatives for a given probabilistic choice must be 1. III. The “random surfer” model Upon arriving at a document, the user either chooses to follow an existing hyperlink or to randomly jump to any document on the Web. The two cases have probability (1 − ²) and ², respectively (note that these sum to 1), and in either case, the choice among alternatives that then result is made uniformly at random. We then interpret score(t) (dj ) as the probability that the surfer is at document dj at time t. (OVER) IV. Document set for comparison calculations We have to use a slightly different example than that from last time’s handout because there’s a slight but annoying technical problem in applying PageRank (even with “mini-links”/random jumps) when there are documents with zero out-degree. W Y X V. Z V PageRank results ² = 0.15. PageRank scores, epsilon=.15 0.5 V W X Y Z 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 10 20 30 40 iteration 50 60 70 80 VI. Hubs-and-authorities results Since we are only interested in relative comparisons, we have not aligned the axes of the three plots. Authority scores Hub scores 0.9 V W X Y Z 0.8 V W X Y Z 0.5 0.7 0.4 0.6 0.5 0.3 0.4 0.2 0.3 0.2 0.1 0.1 0 0 0 2 4 6 8 iteration 10 12 14 16 0 2 4 6 8 iteration 10 12 14 16