Download Waikato Machine Learning Group Talk on Graph-RAT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Graph-RAT Overview
By
Daniel McEnnis
What is Graph-RAT
 Relational Analysis Toolkit
 Database abstraction layer
 Evaluation platform
 Robustly evaluate all different ways of
performing recommendation
2/32
Kinds of Analysis




Recommendation Systems
Data Mining
Relational Machine Learning
MIR document retrieval
3/32
Talk Outline






Base Components
Queries
Algorithms
Schedulers
Graph-RAT Language
Conclusion and Examples
4/32
Base Components
 Graphs
 Actors
 Links
 Properties
Name
John
E
Library
[Vector]
A
A
Age
B
Hobbies
C Hiking
Biking
22
D
5/32
Properties





Variables of Graph-RAT
Can be arbitrary Java types
Can be attached to anything
Unique ID string for each object
Accessed only as sets, not as objects
6/32
Data View
 Hyper-graph structure defined by the set of
actors and links in a graph
 Accessible from the enclosing graph
 Can be cyclic
A
B
E
C
D
7/32
Metadata View
 Not constructed by default
 Implicit graph described by modes and the
relations between them
 Needed for relational machine learning
User
Friend
8/32
Query Language




Constructs sets retrieved from a graph
Functional structure
Similar to SQL
4 types




Graph Queries
Actor Queries
Link Queries
Property Queries
9/32
Query Structure
 Cascading queries in a LISP style syntax
 Each child query is of a different type
 Restrictions can be added at runtime
10/32
Query Examples
 LinkByActor(




false,
ActorByMode(false, “Target”,”.*”)
ActorByMode(false, “Source”,”.*”)
SetOperation.XOR)
11/32
Query Comparisons
 Similar to the JENA interface
 Construction is similar to Jung system
 Implements all SQL queries that do not
require temporary tables
12/32
0.4.3 Query
 Uses graph primitives instead of Queries
 Algorithms use hard-coded GraphByID
13/32
Algorithms
 Functions that execute over a given graph
 Metadata is a part of the algorithm
 Excepting output algorithms, no side effects
are permitted.
execute(Graph graph)
 Properties utilized or created are declared
up front.
IODescriptor getInput()
IODescriptor getOuput()
14/32
Propositional Algorithms
 Utilizes aggregator function as a parameter
 Crosses all ways of shifting data






Aggregate By Link
Aggregate By Link Property
Aggregate On Graph
Graph To Actor
Link To Graph
Graph To Graph
15/32
Aggregator Functions
 1 or more elements to equal or fewer
elements
 Examples
•
•
•
•
Statistical Moments
Arithmetic Operations
Null Aggregation
Concatentation
16/32
Social Network Analysis
Algorithms
 Prestige Algorithms





Degree
Betweeness
Closeness
Page Rank
HITS
 Graph Triples
17/32
Classification Algorithms
 Machine Learning Primitives
 Uses Weka
 Separate algorithms for training and
classifying
18/32
Clustering Algorithms
 Several graph-based algorithms




Weak Component Clustering
Strong Component Clustering
Edge Betweeness Clustering
Norman-Girvan Edge Betweeness
 Also has primitives calling Weka on vector
data
19/32
Similarity Algorithms
 Comparisons between modes
 Types of Similarity
•
•
•
Similarity By Link
Similarity By Property
Graph Similarity
 Distance Functions
•
•
•
All Weka distance functions
KLDistance
Exponential Distance
20/32
Collaborative Filtering
Algorithms
 Traditional recommendation algorithms
 Item to Item
 User to User
 Associative Mining
21/32
Array-Based Algorithms
 Transform To Array
 Principal Component Analysis
22/32
Evaluation
 All forms of evaluating results
 Set Based (precision and recall)
 Weighted Set (Correlations)
 Ordered Lists (Kendall Tau, Half Life)
 Cross-Validation algorithms
 By Actor
 By Link
 By Graph
23/32
Data Acquisition
 Components for acquiring source data
 File Reader Types
 Reading different file formats
 Web Crawling Types
 LiveJournal or LastFM
 Connection Types
 Links different sets together
24/32
Web Crawler
 Custom Multi-threaded web crawler
 Dynamic parsers
 Properties passing between both crawls and
parser execution
 Stop and filter conditions are parameterized
25/32
Existing Parsers
 Base HTML parsing
 XML Parsing (SAX)




LiveJournal FOAF
LastFM REST services
Graph-RAT documents
Yahoo search queries
26/32
Comparisons





SQL
LINQ
Matlab
Other graph packages
Prolog?
27/32
Embedded Use
 Dynamic Loading
 AbstractFactory abstract superclass
 Example - Retrieving links to YouTube
videos from GData
28/32
Graph-RAT Language
 Base Graph-RAT:
 Data Acquisition components executed
 For each algorithm entry:
 Graph Query selects a set of graphs
 Algorithm is executed over each graph
 Cross-Validation Graph-RAT
 Mode, relation, or graph chosen in advance,
 Data Acquisition components run once
 Algorithm entries rerun for each fold
 Statistical Graph-RAT
 List of cross-validation schedulers
 Statistical metrics of which performed better
29/32
User To User Collaborative
Filtering Example




Aggregate By Link(Artist->User)
Similarity By Link (User->User)
Aggregate By Link (User->User)
Property to Link (User->Artist)
30/32
Setup Example
31/32
Setup Example
<Scheduler class=“BasicScheduler”>
<Graph>
<MemGraph/>
</Graph>
…
</Scheduler>
DataAquisition
<DataAcquisition>
<Class>Crawl LastFM</Class>
<Name>Crawl LastFM</Name>
<MemGraph/>
<Property><Name>Proxy</Name>
<Value>proxy.waikato.ac.nz</Value>
</Property>
…
</DataAquisition>
33/32
Query Entry
<Algorithm>
<Query>
<GraphByID>
<Pattern>.*</Pattern>
</GraphByID>
</Query>
</Algorithm>
34/32
Algorithm Entry
<Algorithm>
<Query>…</Query>
<Class>GraphTriples</Class>
<Name>Graph Triples</Name>
<Property><Name>Relation</Name>
<Value>Friends</Value>
</Property>
<Property><Name>Destination</Name>
<Value>TriplesVector</Value>
</Property>
…
</Algorithm>
Future Work





Stabilization - 0.5.1 to beta
Statistical testing on result sets
Upgrading the GUI interface
Memory performance upgrades
Octave Integration
36/32
Questions?
 http://graph-rat.sourceforge.net
 Stable (beta) release is 0.4.3
37/32