Download Title goes here

Document related concepts

Gene expression profiling wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Transcript
Abduction, Induction,
and the Robot Scientist
Ross D. King
Department of Computer Science
University of Wales, Aberystwyth
The Concept of a Robot Scientist
We have developed the first computer system that is capable
of originating its own experiments, physically doing them,
interpreting the results, and then repeating the cycle*.
Background
Knowledge
Analysis
Hypothesis Formation
Consistent
Hypotheses
Experiment
Final Theory
Experiment
selection
*King et al. (2004) Nature, 427, 247-252.
Robot
Results
Interpretation
Motivation: Technological

In many areas of science our ability to generate data
is outstripping our ability to analyse the data.

One scientific area where this is true is in Systems
Biology, where data is now being generated on an
industrial scale.

The analysis of scientific data needs to become as
industrialised as its generation.
Motivation: Philosophical

What is Science?

The question whether it is possible to automate the
scientific discovery process seems to me central to
understanding science.

There is a strong philosophical position which holds
that we do not fully understand a phenomenon unless
we can make a machine which reproduces it.
The Philosophical Problems

A number of classical philosophical issues arose in
the Robot Scientist project:
– the relation between abstract and physical objects,
– correspondence semantics and the verification
principle,
– the nature of Universals,
– the problem of induction and its relation to
abduction.
– Etc.

Many of the philosophy positions we have physically
implemented in the Robot Scientist originate with
Carnap and the Logical Empiricism school .
Ontologies and the Relation
between Abstract and Physical
Objects
Ontologies
An ontology is “a concise and unambiguous
description of what principal entities are relevant to an
application domain and the relationship between
them”*.
*Schulze-Kremer, S., 2001, Computer and Information Sci. 6(21)
Dualism

The most fundamental ontological division in our
design of the Robot Scientist is between <abstract>
and <physical> objects

We argue for this ontological division because it
makes explicit the separation between models and
reality.

All the objects which the Robot Scientist deals with
computationally are <abstract>, and all the objects it
deals with physically are <physical>.
SUMO

Use of this dualism allows us also to be consistent
with the SUMO upper ontology, and its associated
ontologies. In SUMO the most fundamental
ontological division is between <abstract> and
<physical> objects.

Although SUMO has many faults, it is currently the
most widely used top ontology, and no clearly better
alternative exists.
Overall View of the Universe
Physical Objects

By definition, <physical> objects follow the laws of
physics, e.g. yeast cells can interact with chemical
compounds in their growth media and thereby grow,
robot arms can move 96 well plates, etc.

The key <physical> object is the Computer. It controls
the movement of all the <physical> objects.

Our new fully automated Robot Scientist has a very
large amount of laboratory automation hardware
designed to execute yeast growth experiments.
Hardware

We have a new fully automated robotic system, cost
£450,000 from Caliper Life Sciences. It is in the final
stages of commissioning.

It is designed to fully automate yeast growth
experiments.

It has a -20C freezer, 3 incubators, 2 readers, 2 liquid
handlers, 3 robotic arms, a washer, etc.

It is capable of initiating ~1,000 new experiments and
>200,000 observations per day in a continuous cycle.
Sketch of New Robotic Hardware
The New Robot “Adam”
During Commissioning
Abstract Objects

Just as the key <physical> component is the
computer’s hardware, the key <abstract> component
is the computer’s software.

We argue that the software/hardware identity is the
key to bridging the <physical>/<abstract> dichotomy
both in the Robot Scientist and elsewhere.
Turing Machines as Hardware

To me, the key to understanding the power of a
computer is that it implements, in a <physical>
device, an <abstract> logical program.

What distinguished Turing from the other great
logicians of his time was that he proposed a model of
computation that was explicitly both physical and
abstract.
Denoting Rules

We need to explicitly link object in the <abstract>
world to those in the <physical> world.

This is done using <denoting rules>.

Such rules are sometimes termed calls “rules of
designation”, or reference rules.
Overall View of the Universe
The Correspondence View of Truth
and the
Verification Principle
“What is True?”

The Robot Scientist implement a correspondence
view of truth. Truth is correspondence with reality.

Within the Robot Scientists <abstract> propositions
are consistently labelled as “true” or “false”.

As the Robot Scientist has <physical> effectors it can
verify the truth or falsehood of these propositions by
specific <physical> tests
Denotation Example

To illustrate the role of denotation rules we describe
the <denoting rules> for the yeast strains kept in the
Robot Scientist's deep-freeze.
– <abstract> stored_yeast_strain(Yeast_strain_id)

“Yeast_strain_id” is the name of the class of all
names of yeast strains.

The example proposition
stored_yeast_strain(ypr060c) states that the yeast
strain named “ypr060c” is stored in the Robot
Scientist.
The <denoting rule> relates this <abstract>
proposition to a <physical> state.

Denotation Example 2

The <physical> denotation of
stored_yeast_strain(ypr060c) is that: in the <physical>
deep-freeze of the Robot Scientist there is a sample of
the <physical> yeast strain named “ypr060c” (identified
by a <physical> bar-code reader.

The Robot Scientist can verify the truth or falsehood of
this proposition by physically comparing the yeast
strains it has in its deep-freeze labelled as “ypr060c”
with a sample of defined reference strains from the UK
National Collection of Yeast Cultures or other similar
centres.
Truth
Truth Relations
The Nature of Universals
Induction and Universals

I argue that for a number of the <abstract>
ontological objects used by the Robot Scientist, their
truth values cannot be physically verified in finite
time.

I argue that these <abstract> objects are
“<Universals>”.

To reason about these <abstract> objects from their
corresponding denoted <physical> objects requires
an explicit induction
An Example of Universals

An example proposition such as
yeast_strain(ypr060c) refers to the set of all
examples of this strain named “ypr060c”.

This is a <Universal> and denotes all examples of
this yeast strain in the past/present/future <physical>
Universe.

To reason about yeast_strain(ypr060c), from
examples, such as deep_freeze_well_content(
000000000001_0_0, ypr060c), requires an explicit
induction.
The denotation of deep_freeze_well_
content(000000000001_0_0, ypr060c) is a specific
<physical> sample of the strain named “ypr060c”, not
the <Universal>.

Universals
Stationarity



For the inductive inferences of the Robot Scientist to
be valid we need to assume stationarity between and
within experiment.
A central role of <meta-data> is to monitor this
stationarity.
The Robot Scientist, in the absence of <metadata>
evidence to the contrary, assumes that:
– All the samples of a given strain are identical.
– Yeast strains samples only differ in known ways.
– All the samples of a given chemical compounds
are identical.
– Experimental conditions only vary in the measured
ways.
– Etc.
Observational and
Theoretical Terms

The relationship between the various types of term in
the Robot Scientist experiments illuminates another
area of interest in the philosophy of science: the
relationship between observed and theoretical terms.

The main type of observation that the Robot Scientist
is designed to perform is optical density (OD)
measurement.

These observations are represented using predicates
of the form:
– <abstract> od_observation(Od_reader_id, Growth_plate_id,
Well_id, Time_stamp, Od_observation_id)
– <abstract> od_observation_result(Od_observation_id, Od_value).
Data and Metadata

There is a useful distinction between experimental
<data> and <metadata>. Metadata is data used to
describe data, especially to allow a scientific
experiment to be repeated.

In addition to OD readings, the Robot Scientist also
measures many other experimental variables: the
inoculation time of wells, the temperature of the
incubators (that holds the 96-well plates), the
humidity of the incubators, the O2 levels in the
incubators, etc.
Calculated Terms

From the OD observations of a 96 well plate, the
Robot Scientist makes calculations concerning the
growth of the particular knockout strains on the plate.

These may be qualitative (growth v non-growth),
such as those in the original Robot Scientist work, or
quantitative as in more recent Robot Scientist work
(growth rate, maximum growth yield, etc.).
Some Example Growth Curves
Theoretical Terms

It is possible, at least in principle, to work with
theories that deal exclusively with <observed terms>
and <calculable terms>.

However, the history of science demonstrates that it
is often more illuminating, and effective, to include
<theoretical terms> - objects that are not directly
observable in the experiment or calculable from the
observables.

Example <theoretical terms> in the Robot Scientist's
model are, genes, enzymes, he mapping of genes to
enzymes, metabolic networks, paths in a metabolic
networks, etc.
Correspondence Rules

To map <theoretical terms> with <observable terms>
and <calculable terms> we require <correspondence
rules> (Carnap 1974).

The most important correspondence rule is the one
that relates the predicate
observed_growth(Experiment) to the <theoretical
term> path in the model of metabolism.

This correspondence is the key concept in the model:
the idea that paths in metabolic pathways from
growth metabolites to a set of essential metabolites
can be related to growth of a cell.
Phenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisae
Glycerate
C00631 -2-Phosphate
D-Erythrose
-4-Phosphate
C00279
YGR254W
YHR174W
YMR323W
Phosphoenol
pyruvate C00074
YBR249C
YDR035W
YDR127W
3-Dehydroquinate
C00108
5-o-1-carboxyvinyl
-3-phosphoshikimate
C01269
3-deoxy-D-arabinoheptulosonate-7-phosphate
C04961
C04302
Anthranilate
YGL148W
C00251 Chorismate
Shikimate –3C03175 phosphate
C00944
YDR127W
--d-ribosyl
anthranilate
YER090W
(YKL211C)
YDR007W
C01302 1-(2-Carboxyl
YPR060C
C00254
YDR354W N-5’-Phospho
phenylamino)-1’deoxy-D-ribulose5’-phosphate
Prephenate
YBR166C
YNL316C
YKL211C
YDR127W
C03506
3-Dehydroshikimate
YDR127W
C00493
C01179
YHR137W
YGL202W
Shikimate
5-Dehydroshikimate
C02652
C00463
Phenylpyruvate (3-Indolyl)-YGL026C
Indole
glycerol
C00166
phosphate
p-Hydroxyphenyl
pyruvate
C02637
YDR127W
Metabolite import
Growth Medium
TYROSINE
C00082
YHR137W
YGL202W
PHENYLALANINE
C00079
YGL026C
YGL026C
TRYPTOPHAN
C00078
Observed / Theoretical
Abduction and Induction
Hypothesis Formation
and Abduction 1

The formation of hypotheses has traditionally been
the hardest part of science to envisage automating.
Indeed, many philosophers of science have openly
expressed views that hypothesis formation could only
be truly accomplished by humans.

Hypothesis formation has traditionally been closely
associated with the “problem of induction”.

We argue that most hypothesis formation in modern
biology is abductive rather than inductive (Reiser et
al, 2002),
Hypothesis Formation
and Abduction 2

What are hypothesised in the Robot Scientist, and in
most of molecular biology, are factual relationships
between objects, e.g. the gene ypr060c codes for
enzyme chorismate mutase, gene ypr060c exists at
location 675628- 674858 (C) on chromosome 16, etc.

N.B. these relationship are ground. Induction is still
required by the robot, but only to reason about
Universals.

This emphasis on abduction is very different from the
general account of the role of induction in science,
which appears heavily physics centred and based on
universal laws e.g. conservation of energy.
Model of Metabolism

The model of metabolism used by the Robot Scientist is that of
“metabolic graphs” (Reiser et al, 2002) and (Bryant et al, 2002).

Each vertex corresponds to a set of compounds that are
available to the cell.

The cell has a unique start vertex corresponding to the nutrients
available to the cell in the growth medium.

An edge corresponds to a reaction and the destination of an
edge is the set of available compounds plus the reaction's
products.

A pathway corresponds to a monotonically increasing set of
compounds available to the cell.
Phenyalanine, Tyrosine, and Tryptophan Pathways for S. cerivisae
Glycerate
C00631 -2-Phosphate
D-Erythrose
-4-Phosphate
C00279
YGR254W
YHR174W
YMR323W
Phosphoenol
pyruvate C00074
YBR249C
YDR035W
YDR127W
3-Dehydroquinate
C00108
5-o-1-carboxyvinyl
-3-phosphoshikimate
C01269
3-deoxy-D-arabinoheptulosonate-7-phosphate
C04961
C04302
Anthranilate
YGL148W
C00251 Chorismate
Shikimate –3C03175 phosphate
C00944
YDR127W
--d-ribosyl
anthranilate
YER090W
(YKL211C)
YDR007W
C01302 1-(2-Carboxyl
YPR060C
C00254
YDR354W N-5’-Phospho
phenylamino)-1’deoxy-D-ribulose5’-phosphate
Prephenate
YBR166C
YNL316C
YKL211C
YDR127W
C03506
3-Dehydroshikimate
YDR127W
C00493
C01179
YHR137W
YGL202W
Shikimate
5-Dehydroshikimate
C02652
C00463
Phenylpyruvate (3-Indolyl)-YGL026C
Indole
glycerol
C00166
phosphate
p-Hydroxyphenyl
pyruvate
C02637
YDR127W
Metabolite import
Growth Medium
TYROSINE
C00082
YHR137W
YGL202W
PHENYLALANINE
C00079
YGL026C
YGL026C
TRYPTOPHAN
C00078
Abduction Code 1
% computes if the model predicts growth or not
theoretical_growth(Experiment) ←
growth_medium(Experiment, {Growth_medium}) ∧
essential_metabolites({Essential_metabolites}) ∧
path({Growth_medium}, {Essential_metabolites})
% path(Starting_point, End_point)
path({X}, {Y}) ← edge({X}, {Y})
path({X}, {Z}) ← edge({X}, {Y}) ∧ path({X}, {Z})
edge({X}, {Y}) ← reaction({A}, {B}) ∧ subset({A}, {X}) ∧ union({X}, {B}, {Y})
reaction({Reactants}, {Products}) ←
reaction(Enzyme, {Reactants}, {Products}) ∧
¬ reaction_removed(Enzyme)
% growth_medium(Experiment, {Metabolites})
growth_medium(experiment1, {a})
% essential_metabolites({Metabolites})
essential_metabolites({c, d}).
reaction_removed(Gene, Enzyme) ←
¬ gene(Gene).
encodes(Gene, Enzyme)
% The abducible
% reaction_details(Enzyme, {Reactants}, {Products})
reaction(e1, {a}, {b})
reaction(e2, {a}, {c})
reaction(e3, {b}, {d})
reaction(e4, {c}, {d})
gene(g1)
gene(g2)
¬ gene(g3)
% example gene knocked out
Extension: Missing Arcs/Nodes
M1
M4
E2
E1
E7
E3
M6
M2
E4
M3
E6
E5
M5
Extension to a Genome Scale
Model of Yeast Metabolism

We have extended our model of aromatic amino acid
metabolism to cover most of what is known about
yeast metabolism.

Includes 1,166 ORFs (940 known, 226 inferred)

Growth if path from growth medium to defined endpoints.

83% accuracy (based on 914 strain/medium
predictions)

Challenging for a purely logical approach.
This Model is Incomplete

It is not possible to find a path from the inputs (growth
medium) to all the end-point metabolites using only
reactions encoded by known genes.

This suggests automated strategies for determining
the identity of the missing genes - new biological
knowledge.

One strategy, based on using EC enzyme class of
missing reactions, is to identify genes that code for
this EC class in other organism, then find
homologous genes in yeast.
Automated Model Completion
Hypothesis
Formation
Bioinformatics
Database
Model of
Metabolism
Experiment
Formation
Reaction
?
Experiment
Gene Identification
Testing Hypotheses 1

A key philosophical step in the Robot Scientist's cycle of
experimentation is the process of deciding on the truth or
falsehood of hypotheses.

The abductive hypothesis generation stage generates a set of
models, each of which has a different abduced
encodes(Gene_id, Enzyme_id) proposition.

These propositions allow for each model, the deduction of
whether on not the model predicts growth for a particular
experiment, e.g. whether the proposition
theoretical_growth(experiment_1) is provable or not for the
metabolites used in the experiment named “experiment_1”.
Testing Hypotheses 2

These deductions are monitored by a meta-logical program
which determines the truth or falsehood of the
theoretical_growth proposition in the various models.

This leads to the key idea of the Robot Scientist: we can use the
<physical> Robot Scientist's effectors to actually execute the
<physical> experiment and determine whether growth occurs or
not.

In the <physical> experiment, growth is determined by
observation of the plates used in the experiment and denotation
rules of the form described above. This procedure results in
determination of the truth or falsehood of growth of the
proposition <abstract> observed_growth(experiment_1).
Testing Hypotheses 3

This results in a set of theoretical_growth(experiment_1)
propositions with different truth values, each one associated
with a particular abduced hypothesis, and a single
observed_growth(experiment_1) proposition with an empirically
determined truth value.

In the cases where the truth values of
theoretical_growth(experiment_1) and
observed_growth(experiment_1) are different, we have the
classical philosophy of science case of a conflict between theory
and observation.

We can then either take the simple approach of eliminating from
consideration all the abduced hypotheses which result in
incorrect predictions about observations, or preferably, we can
take a probabilistic approach and decrease appropriately the
probability of these hypotheses.
Modelling Growth
Applications of Philosophy
Generic ontology of experiments
e-Science
•
Controlled vocabulary for scientific
experiments.
•
Formalized computational
representation of scientific
experiments.
•
Unified standard for representation,
annotation, storage, and access to
experimental results.
•
Automated reasoning over
experimental data and conclusions.
Ontology of science
(formalization of scientific
methods, technologies,
infrastructure of science)
EXPO
Ontology of
scientific experiments
concepts: 220
language: OWL
Scientific Experiment
Experiment goal
Experiment design
Experiment object
Experiment results
Experiment action
Classification of experiments
Experiment Method
Soldatova et al. (2006) Royal Society Interface
The Position of EXPO
SUMO
Upper level
EXPO
Bibliographic
Data Ontology
BiblioReference
Generic level
Mes.Unit
SubjectOfExp.
ObjectOfExp.
Domain Model
Domain level
Plant
ontology
Measurement
ontology
PSI
MO
FuGO
MSI
ChEBI
EXPO’s top classes
EXPO development
EXPO v.1
Tool: Hozo Ontology Editor
Concepts: 220
Language: OWL
http://sourceforge.net/projects/expo
The need for a Robot Scientist
ontology (EXPO-RS)

The robot requires detailed and formalized
description: domains, background knowledge,
experiment methods, technologies, hypotheses
formation and experiment designing rules, etc.

Integrity of data and metadata.

Open access of the RS experimental data and
metadata to the scientific community.
Soldatova et al. (2006) Bioinformatics
EXPO-RS

Formalization of the entities involved in Robot Scientist
experiments.

A controlled vocabulary for all the participants of the
project.

Identification of metadata essential for the
experiment's description and repeatability.

Coordination of the planning of experiments, their
execution, access to the results, technical support of
the robot, etc.

Modelling a database for the storage of experiment
data and track experiment execution.
EXPO-RS: Metadata
EXPO-RS: equipment
EXPO-RS: equipment functionality
EXPO-RS for the DB
Conclusions

The Robot Scientist concept represents the logical next step
in scientific automation.

A major motivation for the development of the Robot Scientist
was to help illuminate our understanding of science.

I argue that the Robot Scientist helps to clarify such issues in
the philosophy of science as: the problem of induction and its
relation to abduction, the relation between abstract and
physical objects ,correspondence semantics, the verification
principle, the nature of Universals, the relation between
observed and theoretical terms.
Acknowledgements
















Ken Whelan
Amanda Clare
Larisa Soldatova
Mike Young
Jem Rowland
Andrew Sparkes
Wayne Aubrey
Emma Byrne
Philip Reiser
Ffion Jones
Ugis Sarkans
Douglas Kell
Steve Oliver
Stephen Muggleton
Chris Bryant
David Page
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth
Aberystwyth (EBI)
Manchester (Aberystwyth)
Manchester
Imperial College (York)
Robert Gordons (York)
Wisconsin
BBSRC, EPSRC
Caliper Life Sciences, PharmDM