Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graph-RAT Overview By Daniel McEnnis What is Graph-RAT Relational Analysis Toolkit Database abstraction layer Evaluation platform Robustly evaluate all different ways of performing recommendation 2/32 Kinds of Analysis Recommendation Systems Data Mining Relational Machine Learning MIR document retrieval 3/32 Talk Outline Base Components Queries Algorithms Schedulers Graph-RAT Language Conclusion and Examples 4/32 Base Components Graphs Actors Links Properties Name John E Library [Vector] A A Age B Hobbies C Hiking Biking 22 D 5/32 Properties Variables of Graph-RAT Can be arbitrary Java types Can be attached to anything Unique ID string for each object Accessed only as sets, not as objects 6/32 Data View Hyper-graph structure defined by the set of actors and links in a graph Accessible from the enclosing graph Can be cyclic A B E C D 7/32 Metadata View Not constructed by default Implicit graph described by modes and the relations between them Needed for relational machine learning User Friend 8/32 Query Language Constructs sets retrieved from a graph Functional structure Similar to SQL 4 types Graph Queries Actor Queries Link Queries Property Queries 9/32 Query Structure Cascading queries in a LISP style syntax Each child query is of a different type Restrictions can be added at runtime 10/32 Query Examples LinkByActor( false, ActorByMode(false, “Target”,”.*”) ActorByMode(false, “Source”,”.*”) SetOperation.XOR) 11/32 Query Comparisons Similar to the JENA interface Construction is similar to Jung system Implements all SQL queries that do not require temporary tables 12/32 0.4.3 Query Uses graph primitives instead of Queries Algorithms use hard-coded GraphByID 13/32 Algorithms Functions that execute over a given graph Metadata is a part of the algorithm Excepting output algorithms, no side effects are permitted. execute(Graph graph) Properties utilized or created are declared up front. IODescriptor getInput() IODescriptor getOuput() 14/32 Propositional Algorithms Utilizes aggregator function as a parameter Crosses all ways of shifting data Aggregate By Link Aggregate By Link Property Aggregate On Graph Graph To Actor Link To Graph Graph To Graph 15/32 Aggregator Functions 1 or more elements to equal or fewer elements Examples • • • • Statistical Moments Arithmetic Operations Null Aggregation Concatentation 16/32 Social Network Analysis Algorithms Prestige Algorithms Degree Betweeness Closeness Page Rank HITS Graph Triples 17/32 Classification Algorithms Machine Learning Primitives Uses Weka Separate algorithms for training and classifying 18/32 Clustering Algorithms Several graph-based algorithms Weak Component Clustering Strong Component Clustering Edge Betweeness Clustering Norman-Girvan Edge Betweeness Also has primitives calling Weka on vector data 19/32 Similarity Algorithms Comparisons between modes Types of Similarity • • • Similarity By Link Similarity By Property Graph Similarity Distance Functions • • • All Weka distance functions KLDistance Exponential Distance 20/32 Collaborative Filtering Algorithms Traditional recommendation algorithms Item to Item User to User Associative Mining 21/32 Array-Based Algorithms Transform To Array Principal Component Analysis 22/32 Evaluation All forms of evaluating results Set Based (precision and recall) Weighted Set (Correlations) Ordered Lists (Kendall Tau, Half Life) Cross-Validation algorithms By Actor By Link By Graph 23/32 Data Acquisition Components for acquiring source data File Reader Types Reading different file formats Web Crawling Types LiveJournal or LastFM Connection Types Links different sets together 24/32 Web Crawler Custom Multi-threaded web crawler Dynamic parsers Properties passing between both crawls and parser execution Stop and filter conditions are parameterized 25/32 Existing Parsers Base HTML parsing XML Parsing (SAX) LiveJournal FOAF LastFM REST services Graph-RAT documents Yahoo search queries 26/32 Comparisons SQL LINQ Matlab Other graph packages Prolog? 27/32 Embedded Use Dynamic Loading AbstractFactory abstract superclass Example - Retrieving links to YouTube videos from GData 28/32 Graph-RAT Language Base Graph-RAT: Data Acquisition components executed For each algorithm entry: Graph Query selects a set of graphs Algorithm is executed over each graph Cross-Validation Graph-RAT Mode, relation, or graph chosen in advance, Data Acquisition components run once Algorithm entries rerun for each fold Statistical Graph-RAT List of cross-validation schedulers Statistical metrics of which performed better 29/32 User To User Collaborative Filtering Example Aggregate By Link(Artist->User) Similarity By Link (User->User) Aggregate By Link (User->User) Property to Link (User->Artist) 30/32 Setup Example 31/32 Setup Example <Scheduler class=“BasicScheduler”> <Graph> <MemGraph/> </Graph> … </Scheduler> DataAquisition <DataAcquisition> <Class>Crawl LastFM</Class> <Name>Crawl LastFM</Name> <MemGraph/> <Property><Name>Proxy</Name> <Value>proxy.waikato.ac.nz</Value> </Property> … </DataAquisition> 33/32 Query Entry <Algorithm> <Query> <GraphByID> <Pattern>.*</Pattern> </GraphByID> </Query> </Algorithm> 34/32 Algorithm Entry <Algorithm> <Query>…</Query> <Class>GraphTriples</Class> <Name>Graph Triples</Name> <Property><Name>Relation</Name> <Value>Friends</Value> </Property> <Property><Name>Destination</Name> <Value>TriplesVector</Value> </Property> … </Algorithm> Future Work Stabilization - 0.5.1 to beta Statistical testing on result sets Upgrading the GUI interface Memory performance upgrades Octave Integration 36/32 Questions? http://graph-rat.sourceforge.net Stable (beta) release is 0.4.3 37/32