Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Building Networks from Networks Mining Network Data to Model User Behavior IPAM Workshop -- October 2, 2007 Personal Introduction Relevance to Workshop Relevance to Workshop Relevance to Workshop Network Flow Data Collaborators: • Filippo Menzcer (IU, ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Credit: Morehouse University Credit: Cisco Systems The Internet2/Abilene network Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Flows are exported in Cisco’s netflow-v5 format and anonymized before being written to disk. Data Dimensions • Abilene on April 14, 2005 – About 200 terabytes of data exchanged – This is roughly 25,000 DVDs of information • 600 million flow records – Almost 28 gigabytes on disk – 15 million unique hosts involved A flow is an edge. Weighted Bipartite Digraph Port 80 (Web) Port 6346 (Gnutella) Port 25 (Mail) Port 19101 (???) Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Application Correlation • Consider the out-strength of a client in the networks for ports p and q: Application Correlation • Build a pair of vectors from the distribution of strength values: Application Correlation • Examine the cosine similarity of the vectors: • When σ = 0, applications p and q are never used together. • When σ = 1, applications p and q are always used together, and to the same extent. Clustering Applications • We now have σ(p, q) for every pair of ports • Convert these similarities into distances: • If σ = 0, then d is large; if σ = 1, then d = 0 • Now apply Ward’s hierarchical clustering algorithm Next Stop: Behavioral Web Data (Clicks) Behavioral Web Data Collaborators: • Filippo Menczer (IU, ISI/Torino) • Santo Fortunato (ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) • Alessandro Flammini (IU) Thanks to my collaborators! Flow Analysis • Filippo Menczer (IU, ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) Click Analysis • Filippo Menczer (IU, ISI/Torino) • Santo Fortunato (ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) • Alessandro Flammini (IU) Thank you!