Download Graph Mining: Introduction - Hasso-Plattner

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Graph Mining: Introduction
DavideMottin,KonstantinaLazaridou
Hasso Plattner Institute
Graph Mining course Winter Semester 2016
Lecture road
CourseInformation
Introductiontographmining
Graphs:modelsandbasicconcepts
GRAPH MINING WS 2016
2
Organization of the lecture
• LectureandslidesinEnglish
• Tuesday15.15– 16.45
• Twomandatory assignments:
• [Individual] Onepresentationofapaperofchoiceamongalistofpapersabouttopics
coveredinthelectures.Twoslots:13/12(firstpart)– 07/02(secondpart)
• [Individual]Onesmallprojectofgraphanalyticstobecompletedbeforetheendofthe
course
• Examination(inEnglish!):
• Oralexam(inthefirstthreeweeksafterthelectureperiod)
• Gradingscheme:
• 20%:Presentation
• 10%:Project
• 70%:Exam
• Lectureswillberecordedandonline(tele-task)
• Thereisnoofficialtextbookforthecourse
• Registrationisrequiredforthislecture,notifythestudienreferat and
myself:[email protected]
GRAPH MINING WS 2016
3
About the lecturers
DavideMottin
PostdoctoralResearcher@KnowledgeDiscoveryandDataMining
PhDin2015,UniversityofTrento
ResearchInterests:GraphMining,DataMining,Graphdatabases,
Preferencemodels,Queryparadigms
KonstantinaLazaridou
PhDCandidate@InformationSystemsResearch- WebScience
MScin2015,UniversityIoannina
ResearchInterests:GraphMining,SocialNetworkAnalysis,Webdata
Mining,OpinionandSentimentAnalysis,DataStreamMining
GRAPH MINING WS 2016
4
Course Web site
Lecturematerial(slides,papers,
books,tutorials,assignments,
…)availableonline
https://hpi.de/en/mueller/teac
hing/aktuelle-vorlesung/ws1617/graph-mining.html
§ Theslidesarealso
availableinthe
intranet!
GRAPH MINING WS 2016
5
Objectives
§ Understanding
• Wheregraphsare,whytheyareimportant,andwhatarenewapplications
• Themainchallengesfromdataminingperspective
§ Learn
•
•
•
•
Howtoefficientlyquery,andstoreagraphusinggraphminingtechniques
Analyzenetworkstounderstandthepropertiesandthebehaviorsofindividuals
Thinkinaresearchperspective(novelty,clarity,…)
Solvepracticalproblems
§ Workonrealscaledataandexistingtools
GRAPH MINING WS 2016
6
Prerequisites
§ Basiccomputerscienceandprogramming.
§ Data-miningknowledgeisaplusbutisnotstrictlyrequired.
§ Basicprobabilitytheoryandlinearalgebraarebeneficial,
althoughasmallrecap ofthemainconceptswillbedoneatthe
beginningoftherequiredlectures.
GRAPH MINING WS 2016
7
Schedule (tentative)
18.10
Introduction to graph mining
25.10
Social network analysis - Diffusion
01.11
Graph Querying: exact, approximate, and reachability
08.11
Frequent subgraph mining
15.11
Graph indexing
17.11
HPI-Kolloquium – Invited speaker: prof. Danai Koutra
22.11
Node classification
29.11
Some practical graph mining framework
Project assignment
06.12
Link prediction
13.12
Student paper presentation [first part]
20.12
Christmas break
27.12
Christmas break
03.01
Non overlapping communities
10.01
Overlapping communities
17.01
Anomaly detection
24.01
Graph summarization
Report handover
31.01
Summary of algorithms for different graph models
07.02
Student paper presentation [second part]
GRAPH MINING WS 2016
8
Course Material - 1
Thereisnoofficialbookinthecourse.However,theslidesarebasedonmaterialsfrom
thesebooks:
§
©Aggarwal,C.C.andWang,H.eds.,
2010.Managingandmininggraphdata (Vol.
40).NewYork:Springer.
§
©Chakrabarti,D.andFaloutsos,C.,2012.
Graphmining:laws,tools,andcase
studies. SynthesisLecturesonDataMiningand
KnowledgeDiscovery, 7(1),pp.1-207.
§
©Easley,D.andKleinberg,J.,2010. Networks,
crowds,andmarkets:Reasoningaboutahighly
connectedworld.CambridgeUniversityPress.
GRAPH MINING WS 2016
9
Course Material - 2
Somematerialisinspired,importedandmodifiedfromseveral
existingcourses.
§ GraphMiningandExplorationatScale(prof.Danai Koutra)
• http://web.eecs.umich.edu/~dkoutra/courses/F15_598/
§ SocialandInformationNetworkAnalysis(prof.JureLeskovec)
• http://web.stanford.edu/class/cs224w/
§ OnlineSocialNetworksandMedia(prof.Evaggelia Pitoura,prof.
Panayotis Tsaparas)
• http://www.cs.uoi.gr/~tsap/teaching/cs-l14/
§ DataMiningmeetsGraphMining(prof.LemanAkoglu)
• http://www3.cs.stonybrook.edu/~leman/courses/14CSE590/index.htm
GRAPH MINING WS 2016
10
How to send emails
To:[email protected]
Subject:Problem– Help
Text:
DearDr.DavideMottin,
✗
I’mastudentatthethirdyear,attendingthecourse,
numberofshoes,quantityoffoodeatenyesterday
To:[email protected]
Subject:[GraphMining]Subgraphisomorphism
Theslidesarenotclear.Idon’tunderstandthethings
theres.
Text:
Yoursincerely,
HiDavide,
BigBug92
thesubgraphisomorphismconceptisnotentirelyclear
tome.Whyisthefunctionbijective?
✔
Thanks,
[FirstName-LastName]
GRAPH MINING WS 2016
11
Some rule of thumbs
§ I’mavailableforanykindofconcern
§ Usethemailinglist:https://lists.hpi.unipotsdam.de/listinfo/graphmining-ws1617
§ Seldomsendemailtomedirectly,unlessitisaveryimportant
concern
§ Bequickandpreciseintheemails
§ Askmequestionsinthecourse,orrightafter/beforethelecture.If
thequestionrequiresmoretimeaskforameetingwithme:
• BetterifyouclusterandcomeingroupinsteadofalonesoIcananswertomany
questionsatthesametime
§ Ifyouthinkthecourseload/organizationisunfairpleaseletmeknow
beforetheendofthesemester.AfterthattherewillbeNOpossibility
fordiscussion.
GRAPH MINING WS 2016
12
Feedback
§ Thecourseistaughtforthefirsttime:
•
•
•
•
Anyfeedbackisappreciated
Anycommentsonslidesandclarityaswell
Theremightbesomemistakehereandthere(butwewilldoourbest)
Askquestionsifyoudon’tunderstandsomething.Betteraquestioninclassthana
doubtduringtheexam!
GRAPH MINING WS 2016
13
(There's) no such thing as a stupid question
GRAPH MINING WS 2016
14
Content of the course
Backgroundconcepts:probabilitytheory/statistics,basiclinearalgebra,basicgraph
concepts(morphisms,degrees,matrixrepresentation,...)
Socialnetworkanalysis:
§
Firstpart
§
• Diffusion
• Powerlaws
• Influencepropagation
§
Secondpart
•
•
•
•
Graphqueryingandindexing:
Exactandapproximatequeries
Reachabilityqueries
Frequentsubgraphmining
Graphindexing
§
Nodeclassificationandnodesimilarity
§
§
Linkprediction
Communitiesandanomalies
• Overlapping/Nonoverlappingcommunities
• Anomalydetection
§
§
§
Graphsummarization
Summaryofalgorithmsfordifferentmodels(graphstreams,evolvinggraphs,probabilistic
graphs,coloredgraphs)
Graphminingframeworks
GRAPH MINING WS 2016
15
About the presentations
§ Thepresentationwillbe15minsintotal
• 10minutespresentation
• 5minutesquestions
§ Thegroupwillbedividedintotwohalves:
• OnehalfwillpresentonDecember13papersregardingthefirstpartofthecourse
• TheotherhalfwillpresentonFebruary7 regardingthesecondpartofthecourse
§ Everypersonpresentsonepaper
§ Firstcomefirstserved
• iftwopeopleasktopresentthesamepaper,thesecondhastochangethechoice
Paperlistforthefirstpartofthecourse:https://goo.gl/YMR0wD
GRAPH MINING WS 2016
16
Questions?
GRAPH MINING WS 2016
17
Lecture road
CourseInformation
Introductiontographmining
Graphs:modelsandbasicconcepts
GRAPH MINING WS 2016
18
The web
August2016
>=50billionsofpages
Atleast4.73billionpagesindexedbysearchengines
Source:http://www.worldwidewebsize.com/
GRAPH MINING WS 2016
19
Social graphs
facebook
1.5Bln users
450Bln Relationships
600Mln groups
10.5USDperuser
Twitter
313Mln users
500Mln Tweets/day
Avg 208followers/user
Theyarecomplex:Groups,links,preferences,attributes
GRAPH MINING WS 2016
20
Knowledge graphs
20Mlnentities
100Mlnrelationships
2500typesof
relationships
Otherknowledgegraphs:
• YAGO
• DBPedia
• DBLP
• Pubmed
• Linkmdb
• …
Connectentitiessuchaspersons,organizations,countries,
objectsthroughsemanticrelationships(e.g.ownsacompany)
GRAPH MINING WS 2016
21
Biological networks
Protein-proteininteractionnetworks
Nodes:Proteins
Edges:Physicalinteractions
GRAPH MINING WS 2016
Metabolicnetworks
Nodes:Metabolitesandenzymes
Edges:Chemicalreactions
22
What else?
Source:http://screenrant.com/game-thrones-protagonist-tyrion-math/
Source: http://phys.org/news/2016-02-math-reveals-unseen-worlds-star.html
Anythingthatinvolvesrelationships(implicitorexplicit)
canbemodeledasagraph!
GRAPH MINING WS 2016
23
Graphs are everywhere
SocialNetworks
RecommendationGraphs
GRAPH MINING WS 2016
Complex
Ubiquituous
Large RoadNetworks
Valuable
KnowledgeGraphs
24
Why Graphs? Why now?
§ Describecomplexdatawithasimplestructure
• Nature,social,concepts,roads,circuits…
§ Samerepresentationformanydisciplines
• Computerscience,biology,physics,economics,...
§ Availabilityof(BIG)data
• Largenetworksarenowavailableandrequirecomplexalgorithms
• Networksareevolvingovertime(e.g.,newusers/friendsinFacebook)
§ Usefulness
• Analysiswilldiscovernontrivialpatterns,andallowsimplesmoothexplorations
• Theyrevealuserbehaviors
• Theyarevaluable(Facebook,Twitter,Amazon...Allofthembasedongraphs!!!)
GRAPH MINING WS 2016
25
”Graph mining is the process of discovering,
retrieving and analyzing non trivial patterns
in graph shaped data”
Graph
mining
GRAPH MINING WS 2016
26
What can we do with graph mining?
§ Compressinggraphswithoutlosinginformation
§ Findingcomplexstructuresfast
§ Recognizingcommunitiesandsocialpatterns
§ Studythepropagationofviruses
§ Predictingiftwopeoplewillbecomefriends
§ Understandingwhataretheimportantnodes
§ Showinghowthenetworkwillevolve
§ Helpingthevisualizationofcomplexstructures
§ Findingroles,positiveandnegativeinfluenceprediction
§ …
GRAPH MINING WS 2016
27
What is involved in graph mining?
§ Basicgraphalgorithms(shortestpaths,BFS,DFS,isomorphisms,
traversals,randomwalks…)
§ Storageandindexing
§ Smartrepresentationsforcompactness
§ Modelingofproblemsasgraphs
§ Distancemetricsandsimilaritymeasures
§ Exact,Approximate,andheuristicalgorithms
§ Evolvingstructures
§ Interactivityandonlineupdates
§ Complexity(mostoftheproblemsarenotpolynomially
solvable)
GRAPH MINING WS 2016
28
Practical applications of graph mining
GRAPH MINING WS 2016
29
Finding substructures
GRAPH MINING WS 2016
30
Community detection
GRAPH MINING WS 2016
31
Influence propagation
GRAPH MINING WS 2016
32
Link prediction
GRAPH MINING WS 2016
33
Graph evolution
GRAPH MINING WS 2016
34
Detecting frauds
GRAPH MINING WS 2016
35
Visualization
Severalvisualizationtools:
• General:Gephi,GraphViz,…
• Biological:Cytoscape,Network
Workbench
• Social:EgoNet,NodeXL,...
• Relational:Tulip
GRAPH MINING WS 2016
36
Lost in the graph?
Hopefullynotafterthiscourse;)
GRAPH MINING WS 2016
37
Current: Query languages
SELECT ?name ?email
WHERE
{
?person a foaf:Person .
?person foaf:name ?name .
?person foaf:mbox ?email .
}
QuerylanguagesARE:
• Expressive
• Powerful
• Scalable
• Compact
SPARQL
g.V().hasLabel('movie').as('a','b').
where(inE('rated').count().is(gt(10))).
select('a','b').
by('name').
by(inE('rated').values('stars').mean()).
order().
by(select('b'),decr). limit(10
GREMLIN
but
Not userfriendly
Not interactive
MATCH (node1:Label1)-->(node2:Label2)
WHERE node1.propertyA ={value}
RETURN node2.propertyA, node2.propertyB
CYPHER
GRAPH MINING WS 2016
38
Lecture road
CourseInformation
Introductiontographmining
Graphs:modelsandbasicconcepts
GRAPH MINING WS 2016
39
Network or graphs?
§ Network referstorealsystems
• Web,Social,Biological,…
• Terminology:Network,node,link/relationship
§ Graph isanabstractmathematicalmodelofanetwork
• Webgraph,Socialgraph
• Terminology:Graph,vertex/node,edge
BUT
weoftenusebothwithoutdistinction
GRAPH MINING WS 2016
40
Graphs
G=(V,E)
G=(V,E,p)
G=(V,E,l)
0.1
a
0.9
a 0.2b
0.2
c
Verteces Edges
Labeling
Probability
function
𝑙: 𝑉 ∪ 𝐸 → Σ
0.5
a
𝐸 ⊆ 𝑉×𝑉
0.3
b
0.6
c
• UndirectedGraphs
•
Co-authorship,Roads,Biological
• Directedgraphs
0.8
b
•
Follows,…
• Labeled(orcolored)Graphs
•
Knowledgegraphs,…
• Probabilisticgraphs
•
GRAPH MINING WS 2016
Causalgraphs
41
Graph databases (set of graphs)
a
a
c
a
b
a
d
c
a
b
a
c
b
G1
a
b
…
b
G2
G3
𝐷 = 𝐺- , 𝐺/ , … , 𝐺1 , 𝐺2 = 𝑉2 , 𝐸2 , 𝑙2 , 𝑙2 : 𝐸2 ∪ 𝑉2 → Σ
Setofsmalllabeledgraphs
Chemicalcompounds,Businessmodels,3Dobjects
GRAPH MINING WS 2016
42
An example?
Givemeanexampleofnetworkyouknow.
Whatarethenodes?
Whataretheedges?
Whatshape?
GRAPH MINING WS 2016
43
Important Terminology
§ Degreeofanode:
• Numberof”neighbors”ofanode
• Indirectedgraphs
⁃ In-degree:numberofinboundlinks
⁃ Out-degree:numberofoutgoinglinks
Degreeofv:3
In-degree:1
Out-degree:2
a
v
a
§ Adjacentnode:
• Anodeuisadjacenttoanodevifthereisanedgebetweenuandv,i.e. 𝑢, 𝑣 ∈ 𝐸
§ Path:
• Sequenceofadjacent,non-repeatingnodesinagraph
• Lengthofapath=numberofedges
§ Diameterofagraph:
• Sizeofthelongestshortestpath
GRAPH MINING WS 2016
44
Graph representation
1
2
3
4
A =
5
6
1=>{2}
2=>{4}
3=>{1,2,4,6}
0
1
0
0
0
0
0
0
0
1
0
0
1
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
Adjacency
matrix
𝑎27 = 8
1 𝑖, 𝑗 ∈ 𝐸
0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Adjacencylist
Whataretheadvantages/disadvantagesofoneoranother
representation?
GRAPH MINING WS 2016
45
Static vs Evolving graph
tn
A1
A1
…
A1
A1
A1
t1
Staticgraph
Dynamic,temporal
graph
Adjacency
matrixA
3DMatrix
(tensor)
GRAPH MINING WS 2016
46
Graph Isomorphism
G2
G1
f
Giventwographs,𝐺- : 𝑉- , 𝐸- , 𝑙- , 𝐺/ : 〈𝑉/ , 𝐸/ , 𝑙/ 〉𝐺- isisomorphic
𝐺/ iff existsabijective function𝑓: 𝑉- → 𝑉/ s.t.:
1. Foreach𝑣- ∈ 𝑉- , 𝑙 𝑣- = 𝑙(𝑓 𝑣- )
2. 𝑣- , 𝑢- ∈ 𝐸- iff 𝑓 𝑣- , 𝑓 𝑢- ∈ 𝐸/
GRAPH MINING WS 2016
47
Subgraph Isomorphism
Q
G’
G
Agraph,𝑄: 𝑉M , 𝐸M , 𝑙M issubgraphisomorphictoagraph
𝐺: 〈𝑉, 𝐸, 𝑙〉 ifexistsasubgraph𝐺 N ⊑ 𝐺,isomorphictoQ
GRAPH MINING WS 2016
48
Frequent Subgraph Mining
a
a
c
c
a
b
c
b
Problem
FindallsubgraphsofGthatappearatleast
𝜎 times
Suppose𝜎 = 2,thefrequentsubgraphsare
(onlyedgelabels)
• a,b,c
• a-a,a-c,b-c,c-c
• a-c-a…
Exponentialnumberofpatterns!!!
G
GRAPH MINING WS 2016
49
Questions?
GRAPH MINING WS 2016
50