Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Automated Explanation of
Gene-Gene Relationships
Wacek Kuśnierczyk
The Motivation
• In various biological studies researchers often come up
with a list of (possibly related) genes
• If the relations between these genes are unknown
or hypothetic, they have to be confirmed either
experimentally or through a database search
(or both)
• Manual browsing or searching is a very tedious task;
any interpretation of the results requires expert
knowledge
slide 2
The Goal
To automate the search in order to
– assist a biologist in forming explanations of
actual and hypothetical relationships between
sets of genes
– using
• various types and sources of data, and
• various similarity assessment tools, and
• background (domain) knowledge
slide 3
The Field
The most important participating disciplines
Biology
Bioinformatics
Computer
Science
slide 4
The Biologist’s Problem
Given a collection of genes, how can we explain the
relationships between them, using the available
data and knowledge?
– How does gene g1 regulate (activate, inhibit) gene g2?
– What is the functional similarity of gene g3 to gene g4?
– What is the metabolic (signalling) pathway common to gene
g5 and g6 in the context of disease d1?
slide 5
The Bioinformatician’s Problem
Given a collection of (biological) objects, which of
their properties can we compare and how, and
where can we find their values?
– Where do we find the gene sequence (protein structure)
data?
– How do we assess the similarity between two gene sequences
(protein structures)?
– Where do we find the suitable tools, how do we use them and
how do we interpret the results?
slide 6
The Computer Scientist’s Problem
Given a collection of distributed data and tools to
link them, how do we build an explanatory path
between objects from a query?
A search problem:
– separate, partially overlapping graphs
– coloured nodes
– coloured, weighted, dynamic edges
slide 7
Simplified Search Space
Graph with homogeneous
vertices and edges
Task: find (shortest) paths
slide 8
More Realistic Search Space
Graph with qualitatively
different vertices, qualitatively
different edges weighted with
qualitatively different weights
Task: find (plausible) paths
slide 9
Even More Realistic Search Space
Each node is connected to
a multitude of other nodes;
combinatorial explosion – an
exhaustive search unfeasible
...
Task: find heuristics to guide the
search (generic and specific)
slide 10
A Trivial Example
Query
Input query
GAS
CCK
slide 11
A Trivial Example
Background
Query
GAS
Peptide
Hormone
Initial mapping
CCK
slide 12
A Trivial Example
Background
Query
Activation spreading
GAS
Peptide
Hormone
CCK
Hormone
Is a
Acts on
Receptor
slide 13
A Trivial Example
Background
Query
GAS
Peptide
Hormone
CCK
Hormone
Plausible inheritance
(inference)
Is a
so
n
Acts on
Ac
t
Receptor
slide 14
A Trivial Example
Background
Query
Activation spreading
GAS
Peptide
Hormone
CCK
Hormone
Is a
so
Extracellular
Receptor
n
Acts on
Ac
t
Receptor
Is a
slide 15
A Trivial Example
Background
Query
GAS
Peptide
Hormone
Is a
Ac
t
Acts on
GAS-R
so
Extracellular
Receptor
n
Acts on
Acts on
Data
CCK
Hormone
Data retrieval and
mapping
Receptor
Is a
CCK A-R
slide 16
A Trivial Example
Background
Query
Induction
GAS
Peptide
Hormone
Is a
CCK
Acts on
GAS-R
Ac
t
so
Extracellular
Receptor
n
Acts on
Acts on
Acts on
Data
Hormone
Receptor
Is a
CCK A-R
slide 17
A Trivial Example
Background
Query
Activation spreading
GAS
Peptide
Hormone
Is a
CCK
Acts on
GAS-R
Ac
t
so
n
Extracellular
Receptor
Acts on
Acts on
Receptor
CCK A-R
to
Is a
Be
lon
gs
Acts on
Data
Hormone
Receptor Family
slide 18
A Trivial Example
Background
Query
Plausible inheritance
GAS
Peptide
Hormone
Is a
CCK
Ac
t
so
n
Acts on
Extracellular
Receptor
GAS-R
Acts on
Acts on
Receptor
Belongs to
CCK A-R
to
Is a
Be
lon
gs
Acts on
Data
Hormone
Receptor Family
slide 19
A Trivial Example
Background
Query
GAS
Peptide
Hormone
Is a
Ac
t
so
n
Acts on
Extracellular
Receptor
GAS-R
Acts on
Acts on
Acts on
CCK
Data
Hormone
Data retrieval and
mapping
Formulation of an
explanation
Receptor
Be
lon
gs
to
Bel
ong
s
Belongs to
Belongs to
CCK A-R
to
Is a
Receptor Family
CCK-R
slide 20
Explanation Schema
Query
parse
Model
map
Data
retrieve
match
Explanation
chain
slide 21
System Architecture
HG/U
QI
EI
GDK
CR
CB
DB
DI
DI
CB:
CR:
DB:
EI:
GDK:
HG/U:
QI:
T:
case base
core reasoner
databases
explanation interface
general domain knowledge
hypothesis generator, user
query interface
tools
T
DI
DI
slide 22
Related Work
Basic research in gastric cancer
Genomic & proteomic datawarehouse
Syntactic & semantic database integration
Natural language understanding
Knowledge representation & modelling
Knowledge intensive reasoning and learning
slide 23
Concerns
Is it reasonable?
(what do biologists say)
Is it possible?
(what do bioinformaticians say)
Is it feasible?
(what do computer scientists say)
Isn’t it too ambitious (for a PhD study)?
slide 24
Disclaimer
An in silico solution is actually a hypothesis that
requires physical (experimental) confirmation.
slide 25
Acknowledgments
Agnar Aamodt, IDI.IME (AI, ML, CBR)
Astrid Lægreid, IKM.DMF (biology, bioinformatics)
Arne Sandvik, IKM.DMF (medicine)
Frode Sørmo, IDI.IME (Creek)
slide 26