* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A Risk Minimization Framework for Information Retrieval
Gene expression profiling wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007 BeeSpace Technology: From V3 to V4 Query Docs Genes Function Search & Navigation Function Analysis Literature Question Answers Question Answers ER Graph Mining Inference Engine Entities Relations Knowledge Base Expert Knowledge New Functions in V4 • Massive Entity/Relation Extraction • Graph Indexing and Mining • Integration of Expert Knowledge & Reasoning • Personalization & Info/Knowledge Sharing • “Plug and Play” (PnP) Massive Entity Recognition • Class1: Small Variation (Dictionary/Ontology) – Organism, Anatomy , Biological Process, Pathway, Protein Family • Class2: Medium Variation – Gene, cis Regulatory Element • Class3: Large Variation – Phenotype, Behavior Massive Relation Extraction • Expression Location – • • • the expression of a gene in some location (tissues, body parts) Homology/Orthology – one gene is homologous to another gene Biological process – one gene has some role in a biological process Genetic/Physical/Regulatory Interaction – – one gene interacts with another gene in a certain fashion (3 types of relations) a simple case: Protein-Protein Interaction (PPI) Entity Relation Graph Mining • The extracted entities and relations form a weighted graph • Need to develop techniques to mine the graph for knowledge – Store graphs – Index graphs – Mining algorithms (neighbor finding, path finding, entity comparison, outlier detection, frequent subgraphs,….) – Mining language Integration of Expert Knowledge • How can we combine expert knowledge with knowledge extracted from literature? • Possible strategies: – Interactive mining (human knowledge is used to guide the next step of mining) – Trainable programs (focused miner, targeting at certain kind of knowledge) – Inference-based integration Inference-Based Discovery • • • Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example – Regulate (GeneA, GeneB, ContextC). [Literature mining] – SeqSimilar(GeneA,GeneA’) [Sequence mining] – Regulate(X,Y,C) Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] – Regulate(GeneA’,GeneB,ContextC) – ADD: InPathway(GeneB, P1) – InPathway(X,P) Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] – InvolvedInPathway(GeneA’,P1) Personalization & Workflow Management • Different users have different tasks personalization – Tracking a user’s history and learning a user’s preferences – Exploiting the preferences to customize/optimize the support – Allowing a user to define/build special function modules • Workflow management Information/Knowledge Sharing • Different users may perform similar tasks Information/Knowledge sharing – Capturing user intentions – Recommend information/knowledge – How do we solve the problem of privacy? • Massive collaborations? – Each user contributes a small amount of knowledge – All the knowledge can be combined to infer new knowledge Plug and Play • Users’ tasks vary significantly • Need flexible combinations of basic modules • Need to move toward a “discovery workbench” – How do we design basic modules? – How do we support synthesis of information and knowledge? BeeSpace V4 User Vertical Search Services Search & Navigation User PnP Function Analyzers Text Mining Literature User Customized Knowledge Base ER Graph Mining Inference Engine Entities Relations Knowledge Base Expert Knowledge Discussion • Task Model? • PnP Modules? • Massive Collaboration? BeeSpace V4: System Architecture User User Interface/ Workflow Manager User Special Search Search & Navigation Topic Modelng Literature Inference Engine Modeling & Personalization PnP Function Analyzers Machine Learning NLP Information Extraction ER Graph Mining Entities Relations Hypothesis Knowledge Base Expert Knowledge … NCBI Genome Databases BeeSpace V4: System Architecture User Yuanhua, Moushumi User Interface/ Workflow Manager User Inference Yue, Xin, Engine Moushumi Yuanhua Modeling & Personalization Xu, Yue Special Search Moushumi Yuanhua Search & Navigation PnP Function Xin, Xu, Moushumi Analyzers Peixiang Xin, Yuanhua Topic Modelng Machine Xin, Xu, Yue Learning NLP Literature Yue Information Extraction Peixiang ER Graph Mining Entities Relations Hypothesis Knowledge Base Expert Knowledge … NCBI Genome Databases Modules • Navigation & Search (Improve V3) [Yuanhua] • Information Extraction [Yue] • ER Graph Mining [Peixiang] • Specialized Search [Xu] • Function Analyzers [Xin] • User Modeling, Personalization, Workflow [Yuanhua] • Inference Engine [Yue] Informatics Research Themes • • • • • • Specialized Search – Hypothesis search Information Extraction – Entities, relations Graph Mining – Indexing, query language, mining algorithms Function analyzers – Gene set annotator Personalization – User model Inference engine – Knowledge representation language, uncertainty Example of Interactive Graph Mining Behavior B2 isa Co-occur-fly Gene A1 Orth-mos Gene A1’ Reg isa Co-occur-bee Behavior B1 Behavior B3 Co-occur-mos Co-occur-fly Gene A2 Gene A3 Reg Reg Reg Gene A4’ orth Gene A4 Gene A5 1.X=NeighborOf(B4, Behavior, {co-occur,isa}) {B1,B2,B3} 2. Y=NeighborOf(X, Gene, {c-occur, orth} {A1,A1’,A2,A3} 3. Y=Y + {A5, A6} {A1,A1’, A2, A3,A5,A6} 4. Z=NeighborOf(Y, Gene, {reg}) {A4, A4’} X= PathBetween({A4,A4’}, B4, {co-occur, reg,isa}) Behavior B4 Inference-Based Discovery • • • Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example – Regulate (GeneA, GeneB, ContextC). [Literature mining] – SeqSimilar(GeneA,GeneA’) [Sequence mining] – Regulate(X,Y,C) Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] – Regulate(GeneA’,GeneB,ContextC) – ADD: InPathway(GeneB, P1) – InPathway(X,P) Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] – InvolvedInPathway(GeneA’,P1) PnP Function Analyzers • Basic objects – GeneSet, DocSet, SentSet, TermSet • Basic operators – Gene summarizer – GeneSet annotator –… Splitter Filter/Attractor Converter …. EntitySet GeneSet BehaviorSet … Doc/SentSet ModelOrg …. GeneSearch: GeneSetDoc/SentSet DocSplitter: Doc/SentSet{Set1, …,Setk}