Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graph Mining: Introduction DavideMottin,KonstantinaLazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road CourseInformation Introductiontographmining Graphs:modelsandbasicconcepts GRAPH MINING WS 2016 2 Organization of the lecture • LectureandslidesinEnglish • Tuesday15.15– 16.45 • Twomandatory assignments: • [Individual] Onepresentationofapaperofchoiceamongalistofpapersabouttopics coveredinthelectures.Twoslots:13/12(firstpart)– 07/02(secondpart) • [Individual]Onesmallprojectofgraphanalyticstobecompletedbeforetheendofthe course • Examination(inEnglish!): • Oralexam(inthefirstthreeweeksafterthelectureperiod) • Gradingscheme: • 20%:Presentation • 10%:Project • 70%:Exam • Lectureswillberecordedandonline(tele-task) • Thereisnoofficialtextbookforthecourse • Registrationisrequiredforthislecture,notifythestudienreferat and myself:[email protected] GRAPH MINING WS 2016 3 About the lecturers DavideMottin PostdoctoralResearcher@KnowledgeDiscoveryandDataMining PhDin2015,UniversityofTrento ResearchInterests:GraphMining,DataMining,Graphdatabases, Preferencemodels,Queryparadigms KonstantinaLazaridou PhDCandidate@InformationSystemsResearch- WebScience MScin2015,UniversityIoannina ResearchInterests:GraphMining,SocialNetworkAnalysis,Webdata Mining,OpinionandSentimentAnalysis,DataStreamMining GRAPH MINING WS 2016 4 Course Web site Lecturematerial(slides,papers, books,tutorials,assignments, …)availableonline https://hpi.de/en/mueller/teac hing/aktuelle-vorlesung/ws1617/graph-mining.html § Theslidesarealso availableinthe intranet! GRAPH MINING WS 2016 5 Objectives § Understanding • Wheregraphsare,whytheyareimportant,andwhatarenewapplications • Themainchallengesfromdataminingperspective § Learn • • • • Howtoefficientlyquery,andstoreagraphusinggraphminingtechniques Analyzenetworkstounderstandthepropertiesandthebehaviorsofindividuals Thinkinaresearchperspective(novelty,clarity,…) Solvepracticalproblems § Workonrealscaledataandexistingtools GRAPH MINING WS 2016 6 Prerequisites § Basiccomputerscienceandprogramming. § Data-miningknowledgeisaplusbutisnotstrictlyrequired. § Basicprobabilitytheoryandlinearalgebraarebeneficial, althoughasmallrecap ofthemainconceptswillbedoneatthe beginningoftherequiredlectures. GRAPH MINING WS 2016 7 Schedule (tentative) 18.10 Introduction to graph mining 25.10 Social network analysis - Diffusion 01.11 Graph Querying: exact, approximate, and reachability 08.11 Frequent subgraph mining 15.11 Graph indexing 17.11 HPI-Kolloquium – Invited speaker: prof. Danai Koutra 22.11 Node classification 29.11 Some practical graph mining framework Project assignment 06.12 Link prediction 13.12 Student paper presentation [first part] 20.12 Christmas break 27.12 Christmas break 03.01 Non overlapping communities 10.01 Overlapping communities 17.01 Anomaly detection 24.01 Graph summarization Report handover 31.01 Summary of algorithms for different graph models 07.02 Student paper presentation [second part] GRAPH MINING WS 2016 8 Course Material - 1 Thereisnoofficialbookinthecourse.However,theslidesarebasedonmaterialsfrom thesebooks: § ©Aggarwal,C.C.andWang,H.eds., 2010.Managingandmininggraphdata (Vol. 40).NewYork:Springer. § ©Chakrabarti,D.andFaloutsos,C.,2012. Graphmining:laws,tools,andcase studies. SynthesisLecturesonDataMiningand KnowledgeDiscovery, 7(1),pp.1-207. § ©Easley,D.andKleinberg,J.,2010. Networks, crowds,andmarkets:Reasoningaboutahighly connectedworld.CambridgeUniversityPress. GRAPH MINING WS 2016 9 Course Material - 2 Somematerialisinspired,importedandmodifiedfromseveral existingcourses. § GraphMiningandExplorationatScale(prof.Danai Koutra) • http://web.eecs.umich.edu/~dkoutra/courses/F15_598/ § SocialandInformationNetworkAnalysis(prof.JureLeskovec) • http://web.stanford.edu/class/cs224w/ § OnlineSocialNetworksandMedia(prof.Evaggelia Pitoura,prof. Panayotis Tsaparas) • http://www.cs.uoi.gr/~tsap/teaching/cs-l14/ § DataMiningmeetsGraphMining(prof.LemanAkoglu) • http://www3.cs.stonybrook.edu/~leman/courses/14CSE590/index.htm GRAPH MINING WS 2016 10 How to send emails To:[email protected] Subject:Problem– Help Text: DearDr.DavideMottin, ✗ I’mastudentatthethirdyear,attendingthecourse, numberofshoes,quantityoffoodeatenyesterday To:[email protected] Subject:[GraphMining]Subgraphisomorphism Theslidesarenotclear.Idon’tunderstandthethings theres. Text: Yoursincerely, HiDavide, BigBug92 thesubgraphisomorphismconceptisnotentirelyclear tome.Whyisthefunctionbijective? ✔ Thanks, [FirstName-LastName] GRAPH MINING WS 2016 11 Some rule of thumbs § I’mavailableforanykindofconcern § Usethemailinglist:https://lists.hpi.unipotsdam.de/listinfo/graphmining-ws1617 § Seldomsendemailtomedirectly,unlessitisaveryimportant concern § Bequickandpreciseintheemails § Askmequestionsinthecourse,orrightafter/beforethelecture.If thequestionrequiresmoretimeaskforameetingwithme: • BetterifyouclusterandcomeingroupinsteadofalonesoIcananswertomany questionsatthesametime § Ifyouthinkthecourseload/organizationisunfairpleaseletmeknow beforetheendofthesemester.AfterthattherewillbeNOpossibility fordiscussion. GRAPH MINING WS 2016 12 Feedback § Thecourseistaughtforthefirsttime: • • • • Anyfeedbackisappreciated Anycommentsonslidesandclarityaswell Theremightbesomemistakehereandthere(butwewilldoourbest) Askquestionsifyoudon’tunderstandsomething.Betteraquestioninclassthana doubtduringtheexam! GRAPH MINING WS 2016 13 (There's) no such thing as a stupid question GRAPH MINING WS 2016 14 Content of the course Backgroundconcepts:probabilitytheory/statistics,basiclinearalgebra,basicgraph concepts(morphisms,degrees,matrixrepresentation,...) Socialnetworkanalysis: § Firstpart § • Diffusion • Powerlaws • Influencepropagation § Secondpart • • • • Graphqueryingandindexing: Exactandapproximatequeries Reachabilityqueries Frequentsubgraphmining Graphindexing § Nodeclassificationandnodesimilarity § § Linkprediction Communitiesandanomalies • Overlapping/Nonoverlappingcommunities • Anomalydetection § § § Graphsummarization Summaryofalgorithmsfordifferentmodels(graphstreams,evolvinggraphs,probabilistic graphs,coloredgraphs) Graphminingframeworks GRAPH MINING WS 2016 15 About the presentations § Thepresentationwillbe15minsintotal • 10minutespresentation • 5minutesquestions § Thegroupwillbedividedintotwohalves: • OnehalfwillpresentonDecember13papersregardingthefirstpartofthecourse • TheotherhalfwillpresentonFebruary7 regardingthesecondpartofthecourse § Everypersonpresentsonepaper § Firstcomefirstserved • iftwopeopleasktopresentthesamepaper,thesecondhastochangethechoice Paperlistforthefirstpartofthecourse:https://goo.gl/YMR0wD GRAPH MINING WS 2016 16 Questions? GRAPH MINING WS 2016 17 Lecture road CourseInformation Introductiontographmining Graphs:modelsandbasicconcepts GRAPH MINING WS 2016 18 The web August2016 >=50billionsofpages Atleast4.73billionpagesindexedbysearchengines Source:http://www.worldwidewebsize.com/ GRAPH MINING WS 2016 19 Social graphs facebook 1.5Bln users 450Bln Relationships 600Mln groups 10.5USDperuser Twitter 313Mln users 500Mln Tweets/day Avg 208followers/user Theyarecomplex:Groups,links,preferences,attributes GRAPH MINING WS 2016 20 Knowledge graphs 20Mlnentities 100Mlnrelationships 2500typesof relationships Otherknowledgegraphs: • YAGO • DBPedia • DBLP • Pubmed • Linkmdb • … Connectentitiessuchaspersons,organizations,countries, objectsthroughsemanticrelationships(e.g.ownsacompany) GRAPH MINING WS 2016 21 Biological networks Protein-proteininteractionnetworks Nodes:Proteins Edges:Physicalinteractions GRAPH MINING WS 2016 Metabolicnetworks Nodes:Metabolitesandenzymes Edges:Chemicalreactions 22 What else? Source:http://screenrant.com/game-thrones-protagonist-tyrion-math/ Source: http://phys.org/news/2016-02-math-reveals-unseen-worlds-star.html Anythingthatinvolvesrelationships(implicitorexplicit) canbemodeledasagraph! GRAPH MINING WS 2016 23 Graphs are everywhere SocialNetworks RecommendationGraphs GRAPH MINING WS 2016 Complex Ubiquituous Large RoadNetworks Valuable KnowledgeGraphs 24 Why Graphs? Why now? § Describecomplexdatawithasimplestructure • Nature,social,concepts,roads,circuits… § Samerepresentationformanydisciplines • Computerscience,biology,physics,economics,... § Availabilityof(BIG)data • Largenetworksarenowavailableandrequirecomplexalgorithms • Networksareevolvingovertime(e.g.,newusers/friendsinFacebook) § Usefulness • Analysiswilldiscovernontrivialpatterns,andallowsimplesmoothexplorations • Theyrevealuserbehaviors • Theyarevaluable(Facebook,Twitter,Amazon...Allofthembasedongraphs!!!) GRAPH MINING WS 2016 25 ”Graph mining is the process of discovering, retrieving and analyzing non trivial patterns in graph shaped data” Graph mining GRAPH MINING WS 2016 26 What can we do with graph mining? § Compressinggraphswithoutlosinginformation § Findingcomplexstructuresfast § Recognizingcommunitiesandsocialpatterns § Studythepropagationofviruses § Predictingiftwopeoplewillbecomefriends § Understandingwhataretheimportantnodes § Showinghowthenetworkwillevolve § Helpingthevisualizationofcomplexstructures § Findingroles,positiveandnegativeinfluenceprediction § … GRAPH MINING WS 2016 27 What is involved in graph mining? § Basicgraphalgorithms(shortestpaths,BFS,DFS,isomorphisms, traversals,randomwalks…) § Storageandindexing § Smartrepresentationsforcompactness § Modelingofproblemsasgraphs § Distancemetricsandsimilaritymeasures § Exact,Approximate,andheuristicalgorithms § Evolvingstructures § Interactivityandonlineupdates § Complexity(mostoftheproblemsarenotpolynomially solvable) GRAPH MINING WS 2016 28 Practical applications of graph mining GRAPH MINING WS 2016 29 Finding substructures GRAPH MINING WS 2016 30 Community detection GRAPH MINING WS 2016 31 Influence propagation GRAPH MINING WS 2016 32 Link prediction GRAPH MINING WS 2016 33 Graph evolution GRAPH MINING WS 2016 34 Detecting frauds GRAPH MINING WS 2016 35 Visualization Severalvisualizationtools: • General:Gephi,GraphViz,… • Biological:Cytoscape,Network Workbench • Social:EgoNet,NodeXL,... • Relational:Tulip GRAPH MINING WS 2016 36 Lost in the graph? Hopefullynotafterthiscourse;) GRAPH MINING WS 2016 37 Current: Query languages SELECT ?name ?email WHERE { ?person a foaf:Person . ?person foaf:name ?name . ?person foaf:mbox ?email . } QuerylanguagesARE: • Expressive • Powerful • Scalable • Compact SPARQL g.V().hasLabel('movie').as('a','b'). where(inE('rated').count().is(gt(10))). select('a','b'). by('name'). by(inE('rated').values('stars').mean()). order(). by(select('b'),decr). limit(10 GREMLIN but Not userfriendly Not interactive MATCH (node1:Label1)-->(node2:Label2) WHERE node1.propertyA ={value} RETURN node2.propertyA, node2.propertyB CYPHER GRAPH MINING WS 2016 38 Lecture road CourseInformation Introductiontographmining Graphs:modelsandbasicconcepts GRAPH MINING WS 2016 39 Network or graphs? § Network referstorealsystems • Web,Social,Biological,… • Terminology:Network,node,link/relationship § Graph isanabstractmathematicalmodelofanetwork • Webgraph,Socialgraph • Terminology:Graph,vertex/node,edge BUT weoftenusebothwithoutdistinction GRAPH MINING WS 2016 40 Graphs G=(V,E) G=(V,E,p) G=(V,E,l) 0.1 a 0.9 a 0.2b 0.2 c Verteces Edges Labeling Probability function 𝑙: 𝑉 ∪ 𝐸 → Σ 0.5 a 𝐸 ⊆ 𝑉×𝑉 0.3 b 0.6 c • UndirectedGraphs • Co-authorship,Roads,Biological • Directedgraphs 0.8 b • Follows,… • Labeled(orcolored)Graphs • Knowledgegraphs,… • Probabilisticgraphs • GRAPH MINING WS 2016 Causalgraphs 41 Graph databases (set of graphs) a a c a b a d c a b a c b G1 a b … b G2 G3 𝐷 = 𝐺- , 𝐺/ , … , 𝐺1 , 𝐺2 = 𝑉2 , 𝐸2 , 𝑙2 , 𝑙2 : 𝐸2 ∪ 𝑉2 → Σ Setofsmalllabeledgraphs Chemicalcompounds,Businessmodels,3Dobjects GRAPH MINING WS 2016 42 An example? Givemeanexampleofnetworkyouknow. Whatarethenodes? Whataretheedges? Whatshape? GRAPH MINING WS 2016 43 Important Terminology § Degreeofanode: • Numberof”neighbors”ofanode • Indirectedgraphs ⁃ In-degree:numberofinboundlinks ⁃ Out-degree:numberofoutgoinglinks Degreeofv:3 In-degree:1 Out-degree:2 a v a § Adjacentnode: • Anodeuisadjacenttoanodevifthereisanedgebetweenuandv,i.e. 𝑢, 𝑣 ∈ 𝐸 § Path: • Sequenceofadjacent,non-repeatingnodesinagraph • Lengthofapath=numberofedges § Diameterofagraph: • Sizeofthelongestshortestpath GRAPH MINING WS 2016 44 Graph representation 1 2 3 4 A = 5 6 1=>{2} 2=>{4} 3=>{1,2,4,6} 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 Adjacency matrix 𝑎27 = 8 1 𝑖, 𝑗 ∈ 𝐸 0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Adjacencylist Whataretheadvantages/disadvantagesofoneoranother representation? GRAPH MINING WS 2016 45 Static vs Evolving graph tn A1 A1 … A1 A1 A1 t1 Staticgraph Dynamic,temporal graph Adjacency matrixA 3DMatrix (tensor) GRAPH MINING WS 2016 46 Graph Isomorphism G2 G1 f Giventwographs,𝐺- : 𝑉- , 𝐸- , 𝑙- , 𝐺/ : 〈𝑉/ , 𝐸/ , 𝑙/ 〉𝐺- isisomorphic 𝐺/ iff existsabijective function𝑓: 𝑉- → 𝑉/ s.t.: 1. Foreach𝑣- ∈ 𝑉- , 𝑙 𝑣- = 𝑙(𝑓 𝑣- ) 2. 𝑣- , 𝑢- ∈ 𝐸- iff 𝑓 𝑣- , 𝑓 𝑢- ∈ 𝐸/ GRAPH MINING WS 2016 47 Subgraph Isomorphism Q G’ G Agraph,𝑄: 𝑉M , 𝐸M , 𝑙M issubgraphisomorphictoagraph 𝐺: 〈𝑉, 𝐸, 𝑙〉 ifexistsasubgraph𝐺 N ⊑ 𝐺,isomorphictoQ GRAPH MINING WS 2016 48 Frequent Subgraph Mining a a c c a b c b Problem FindallsubgraphsofGthatappearatleast 𝜎 times Suppose𝜎 = 2,thefrequentsubgraphsare (onlyedgelabels) • a,b,c • a-a,a-c,b-c,c-c • a-c-a… Exponentialnumberofpatterns!!! G GRAPH MINING WS 2016 49 Questions? GRAPH MINING WS 2016 50