Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 7, 2007 BeeSpace Technology: From V3 to V4 Query Docs Genes Function Search & Navigation Function Analysis Literature Question Answers Question Answers ER Graph Mining Inference Engine Entities Relations Knowledge Base Expert Knowledge New Functions in V4 • Massive Entity/Relation Extraction • Graph Indexing and Mining • Integration of Expert Knowledge & Reasoning • Personalization & Info/Knowledge Sharing • “Plug and Play” (PnP) Massive Entity Recognition • Class1: Small Variation (Dictionary/Ontology) – Organism, Anatomy , Biological Process, Pathway, Protein Family • Class2: Medium Variation – Gene, cis Regulatory Element • Class3: Large Variation – Phenotype, Behavior Massive Relation Extraction • Expression Location – • • • the expression of a gene in some location (tissues, body parts) Homology/Orthology – one gene is homologous to another gene Biological process – one gene has some role in a biological process Genetic/Physical/Regulatory Interaction – – one gene interacts with another gene in a certain fashion (3 types of relations) a simple case: Protein-Protein Interaction (PPI) Entity Relation Graph Mining • The extracted entities and relations form a weighted graph • Need to develop techniques to mine the graph for knowledge – Store graphs – Index graphs – Mining algorithms (neighbor finding, path finding, entity comparison, outlier detection, frequent subgraphs,….) – Mining language Integration of Expert Knowledge • How can we combine expert knowledge with knowledge extracted from literature? • Possible strategies: – Interactive mining (human knowledge is used to guide the next step of mining) – Trainable programs (focused miner, targeting at certain kind of knowledge) – Inference-based integration Inference-Based Discovery • • • Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example – Regulate (GeneA, GeneB, ContextC). [Literature mining] – SeqSimilar(GeneA,GeneA’) [Sequence mining] – Regulate(X,Y,C) Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] – ? Regulate(GeneA’,GeneB,ContextC) Personalization & Workflow Management • Different users have different tasks personalization – Tracking a user’s history and learning a user’s preferences – Exploiting the preferences to customize/optimize the support – Allowing a user to define/build special function modules • Workflow management Information/Knowledge Sharing • Different users may perform similar tasks Information/Knowledge sharing – Capturing user intentions – Recommend information/knowledge – How do we solve the problem of privacy? • Massive collaborations? – Each user contributes a small amount of knowledge – All the knowledge can be combined to infer new knowledge Plug and Play • Users’ tasks vary significantly • Need flexible combinations of basic modules • Need to move toward a “discovery workbench” – How do we design basic modules? – How do we support synthesis of information and knowledge? BeeSpace V4 User Vertical Search Services Search & Navigation User Xin Xu, PnP Function Analyzers, Peixiang, Bio Text Mining Yue Literature User Customized Knowledge Base ER Graph Mining, Peixiang Inference Engine, Yue, Xin, Bio Entities Relations Knowledge Base Expert Knowledge Discussion • Task Model? • PnP Modules? • Massive Collaboration?