Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 10 Link Analysis Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: MemoryBased Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules 2 Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary – Apple’s HyperCard (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory 3 Introduction • As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations – Identifying authoritative sources of information on the WWW by analyzing page links – Understanding physician referral patterns – Analyzing telephone call patterns 4 Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs consist of – Nodes (vertices) which are the things in the graph that have relationships – Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph 5 Basic Graph Theory • A path is an ordered sequence of nodes connected by edges – Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them – Example: Weights support the association between two products being purchased together 6 Graph Theory Classic Problems 1. Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) 2. Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) – Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120) 7 Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) – Ex: WWW 8 Google – Directed Graph Example • Web pages = nodes • Hyperlinks = edges • Spiders & Web crawlers updating • Kleinberg’s Algorithm – Hub – a page that links to many authorities – Authority – a page that is linked to by many hubs 9 Google – example continued • Authority versus mere popularity – Rank by number of unrelated sites linking to a site yields popularity – Rank by number of subjectrelated hubs that point to them yields authority – Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it 10 Examples of Link Analysis • Recent Int’l Data Mining Conference – http://www.siam.org/meetings/sdm04/ • Chapter10-Example1.pdf • Chapter10-Example2.pdf • Chapter10-Example3.pdf • Megaputer (PolyAnalyst vendor) page: – http://www.megaputer.com/products/pa/algorithms/la.php3 11 End of Chapter 10 12