Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk The Motivation • In various biological studies researchers often come up with a list of (possibly related) genes • If the relations between these genes are unknown or hypothetic, they have to be confirmed either experimentally or through a database search (or both) • Manual browsing or searching is a very tedious task; any interpretation of the results requires expert knowledge slide 2 The Goal To automate the search in order to – assist a biologist in forming explanations of actual and hypothetical relationships between sets of genes – using • various types and sources of data, and • various similarity assessment tools, and • background (domain) knowledge slide 3 The Field The most important participating disciplines Biology Bioinformatics Computer Science slide 4 The Biologist’s Problem Given a collection of genes, how can we explain the relationships between them, using the available data and knowledge? – How does gene g1 regulate (activate, inhibit) gene g2? – What is the functional similarity of gene g3 to gene g4? – What is the metabolic (signalling) pathway common to gene g5 and g6 in the context of disease d1? slide 5 The Bioinformatician’s Problem Given a collection of (biological) objects, which of their properties can we compare and how, and where can we find their values? – Where do we find the gene sequence (protein structure) data? – How do we assess the similarity between two gene sequences (protein structures)? – Where do we find the suitable tools, how do we use them and how do we interpret the results? slide 6 The Computer Scientist’s Problem Given a collection of distributed data and tools to link them, how do we build an explanatory path between objects from a query? A search problem: – separate, partially overlapping graphs – coloured nodes – coloured, weighted, dynamic edges slide 7 Simplified Search Space Graph with homogeneous vertices and edges Task: find (shortest) paths slide 8 More Realistic Search Space Graph with qualitatively different vertices, qualitatively different edges weighted with qualitatively different weights Task: find (plausible) paths slide 9 Even More Realistic Search Space Each node is connected to a multitude of other nodes; combinatorial explosion – an exhaustive search unfeasible ... Task: find heuristics to guide the search (generic and specific) slide 10 A Trivial Example Query Input query GAS CCK slide 11 A Trivial Example Background Query GAS Peptide Hormone Initial mapping CCK slide 12 A Trivial Example Background Query Activation spreading GAS Peptide Hormone CCK Hormone Is a Acts on Receptor slide 13 A Trivial Example Background Query GAS Peptide Hormone CCK Hormone Plausible inheritance (inference) Is a so n Acts on Ac t Receptor slide 14 A Trivial Example Background Query Activation spreading GAS Peptide Hormone CCK Hormone Is a so Extracellular Receptor n Acts on Ac t Receptor Is a slide 15 A Trivial Example Background Query GAS Peptide Hormone Is a Ac t Acts on GAS-R so Extracellular Receptor n Acts on Acts on Data CCK Hormone Data retrieval and mapping Receptor Is a CCK A-R slide 16 A Trivial Example Background Query Induction GAS Peptide Hormone Is a CCK Acts on GAS-R Ac t so Extracellular Receptor n Acts on Acts on Acts on Data Hormone Receptor Is a CCK A-R slide 17 A Trivial Example Background Query Activation spreading GAS Peptide Hormone Is a CCK Acts on GAS-R Ac t so n Extracellular Receptor Acts on Acts on Receptor CCK A-R to Is a Be lon gs Acts on Data Hormone Receptor Family slide 18 A Trivial Example Background Query Plausible inheritance GAS Peptide Hormone Is a CCK Ac t so n Acts on Extracellular Receptor GAS-R Acts on Acts on Receptor Belongs to CCK A-R to Is a Be lon gs Acts on Data Hormone Receptor Family slide 19 A Trivial Example Background Query GAS Peptide Hormone Is a Ac t so n Acts on Extracellular Receptor GAS-R Acts on Acts on Acts on CCK Data Hormone Data retrieval and mapping Formulation of an explanation Receptor Be lon gs to Bel ong s Belongs to Belongs to CCK A-R to Is a Receptor Family CCK-R slide 20 Explanation Schema Query parse Model map Data retrieve match Explanation chain slide 21 System Architecture HG/U QI EI GDK CR CB DB DI DI CB: CR: DB: EI: GDK: HG/U: QI: T: case base core reasoner databases explanation interface general domain knowledge hypothesis generator, user query interface tools T DI DI slide 22 Related Work Basic research in gastric cancer Genomic & proteomic datawarehouse Syntactic & semantic database integration Natural language understanding Knowledge representation & modelling Knowledge intensive reasoning and learning slide 23 Concerns Is it reasonable? (what do biologists say) Is it possible? (what do bioinformaticians say) Is it feasible? (what do computer scientists say) Isn’t it too ambitious (for a PhD study)? slide 24 Disclaimer An in silico solution is actually a hypothesis that requires physical (experimental) confirmation. slide 25 Acknowledgments Agnar Aamodt, IDI.IME (AI, ML, CBR) Astrid Lægreid, IKM.DMF (biology, bioinformatics) Arne Sandvik, IKM.DMF (medicine) Frode Sørmo, IDI.IME (Creek) slide 26