Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
(def functor BeeSpace v3) Core BSv3 Features • Personalized Collections – All functions operate on virtual collections. • Gene Analysis Functions – Gene Annotation, Summarization, etc. • Topic Exploration – Evolve/extract/expand/map/compare topics. Challenges New problems: • Modifying all functions to operate on virtual collections. • Build Intelligent Gene Retrieval functionality. • DB-supported apps. • Optimize & parallelize implementations. • Mod Indexing strategies. Constraint: 5 month timeline Old problems: • Better tokenization needed. • Teaming and code sharing. • Upgrade of Lemur & Indri. • Structured Queries. • Multiple languages; diverse skill sets. Big Hill, Little Time Accomplishments • Gene-focus tokenization scheme implemented. • Intelligent Gene Retrieval function near completion. • Optimized & parallelized (EM) Theme Clustering. • Developed DB infrastructure for application support: multiple DBs, tables, DAO access. • Developed Common Library (4K lines C++) for sharing across applications. Accomplishments cont… • Upgraded Lemur/Indri and normalized indexing. • Developed (6) Collection operations and Boolean Query support. Development Life Cycle Project Goals Documentation & Req’s Tracking Requirements Definition Deployment & Backup Conceptual Models Deployment Analysis Sys Dev Life Cycle (Waterfall) Testing Design Logical/Physical Models Scaffolding & Testing Software Implementation Software Libraries & Applications Logical View Application Layer BeeSpace Navigator Query & Analysis Layer Search Engine Gene Analysis Topic Exploration Clustering Common Software Library Data & Knowledge Layer XML Full Text Indexes RDBMS Data Processing Layer Tokenization Named Entity Recognition Classifiers Indexers Mining & Analysis of EAR Graphs • Need to be able to quickly analyze and mine knowledge nets and EAR graphs with ad-hoc operations. User-driven exploration via 4GL query language. • Many data models already fit within an Entity/Attribute/Relationship (EAR) model. Very flexible design. • 4GL approach to operations: increases target audience and allows for query optimization: select src, sum(wt) from edge group by trg order by src; Sample EAR Data Model Entity Topic: 11 Collection: fly Weight: 0.10 Joint: 0.11 Relation MI: 0.023 Class: gene Year: 2001 Term: amfor Prior: 0.07 Doc: pmid:123 Length: 133 DFreq: 4 Attribute Class: behavior Year: 1995 Prior: 0.12 Term: forage Doc: pmid:444 Length: 401 DFreq: 83 Weight: 0.10 Joint: 0.03 Topic: 23 MI: 0.012 Collection: bee Approach • Adopt a layered system (stack) approach: Data layer, core software layer, interpreter, GUI. Possibility for administration client. • Data Layer: currently implemented on top of RDBMS. Achieves flexibility, outreach/reuse, and is often cluster-compatible. • Core Software Layer: C++ STL implementation with functional programming paradigm. • Best-of-Worlds Effect: combines salient features of relational modeling with functional power and adaptive-object modeling methodology. Motivation #1 (define (deriv exp var) (cond ((constant? exp) 0) ((variable? exp) (if (same-variable? exp var) 1 0) ((sum? exp) (make-sum (deriv (addend exp) var) (deriv augend exp) var))) ….) Ref: “Structure and Interpretation of Computer Programs”, Abelson et al. Motivation #2 • C++ STL: • class myFunctor : std::unary_functor<int, double> • { double operator()(int x) { return 2.0*x; } }; • std::for_each(list.begin(), list.end(), myFunctor); • std::set_intersection(a.begin(), a.end(), b.begin(), b.end, result.begin()); Applications • • • • • • • Concept switching. Theme extraction. Theme expansion, shrinking, morphing. Path finding; net flow analysis. Support for propagation nets, belief nets. Clustering, clique finding, etc. *** Not just standard CS/statistical algorithms, but utilizing semantic information and userdirectives. Modeling a Concept Space • Alternative definitions: F { f : CS CS} Case 1: powerset: implies F is monoid CS : 2C F CS CS wrt funct comp (*) k j k j Z f f cos et ( g ) : { f g | f F } Case 2: finite vector space: F is R^n (choose generalized velocities) F L(CS , CS ) Case 3: random vectors: F can be modeled w/ functs over rvs. F ~ f ( x1 , xn ; ) Theme =>point; Region => set-of-subset/compact set/distribution => need to capture variances ~ N(mu, sigma) Other ideas… • EM ~ separation: w ind. d | c => matrix factorization. Golub discusses Jacobi iterations (parallelizable) • Cons inference markov net w/ Gibbs over max. cliques. • What are min# of operators needed for mining of EAR… p( w , d ) p(wi | ck ) diag ( p(ck ) p(d j | ck ) p( w , w ) p(wi | ck ) diag ( p(ck ) p(w j | ck ) i i i 1.. n j 1.. m j j i , j 1.. n p(ci ) : g (ci ) / g (ci ) t t