Download inconsistency slides public - SRI Artificial Intelligence Center

Document related concepts
no text concepts found
Transcript
Politics and Pragmatism
in Scientific Ontology Construction
Mike Travers
Inconsistency Robustness 2011
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
My background
• SSS
Artificial Intelligence
Media Science
Human Interface
Constructionism
Visual Programming
Knowledge Representation
Agent-based systems
Programming Languages
Scientific Software
Philosophy of Science
Narrative Theory
(@startups, large
companies, open source
projects, and now SRI)
Scientific KM
Collaboration
Decision Support
Publishing
Standards
Sociology
Cognitive Science
Synopsis
• Knowledge representation inevitably involves
inconsistency, controversy, hence politics;
• Scientific representation does too, but it has
worked-out practices for dealing with it;
• KR should work more like science rather than
the other way around;
• Representational Pragmatism: a conceptual
framework to make it happen
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
What’s a knowledge infrastructure?
• A system of
–
–
–
–
Technologies,
Institutions,
Standards,
and Practices
• that serve to support knowledge
–
–
–
–
–
–
Collection
Storage
Curation
Sharing
Validation
…
Knowledge Infrastructure #1: Science
• The scientific community
• An elaborate web of
– People (scientists and others)
– Institutions (labs, journals, funding agencies, instrument
makers…)
– Practices (publishing criteria, protocols, conferences)
• Works pretty well! The gold standard for
knowledge in fact.
• But there are issues of scaling, quality, inertia,
siloing, epistemological closure…
Knowledge Infrastructure #2:
The Semantic Web
• Set of technical standards for sharing
formalized knowledge
• Aspires to be a universal framework for
knowledge
• A grand vision of global-scale knowledge
representation
• And tremendously important and needed.
Provenance
Reasoning
Classification
Relations &
Properties
Naming
These two are becoming one…
Bioscience is by far the largest application area for semantic
web technology
Some non-robust properties of the
semantic web
• Too inexpressive
(Can’t represent default reasoning or n-way
predicates)
• Too complex
(Prevents widespread acceptance)
• Too logic-based
(Emphasizes wrong things)
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
Convergence and Controversy
• Ontologies are supposed to define a common
understanding of a domain
• But “common” is easier said than done
• In practice:
– Many different constituencies
– With different ideas about what’s important
– Many side-factors complicate things (implementation
cost, personal status, existing non-rigorous usages…)
– Compromise is necessary but rarely produces elegant
results
Example: psychiatric illness
• What constitutes a mental illness?
– Not at all obvious that categories correspond to real
phenomena
– Huge changes over over time
– Currently defined by DSM-IV through a highly
politicized process
– History of PTSD (Scott, 1990)
• “combat fatigue” or cowardice
• In and out of the DSM
• Finally recognized as PTSD, partly as response to Vietnam
War
Psychiatric illness (2)
• Homosexuality
– Formerly a pathology, now not, through a highly politicized
process
• Attention Deficit Disorder
– Cluster of symptoms, not clear what the boundaries should be
– Opinions often determined by theories of child-rearing or
institutional aspects of school.
• Insurers and economics are important actors in debate
• Summary:
– these disorders are social constructed categories
– over a definite but unclear underlying reality.
Example: category fudging
• In Pathway Tools, SRI’s bioinformatics
knowledge base
• This is a widely used system for curating
genomes and metabolic pathways
• Underlying frame system
• Web based interface
Example: Gene/Protein conflation
• Genes and Proteins are different things
• But biologists tend to want to use the same
name for a gene and its product
• Tension between formal ontology and actual
scientific usage
• Equivalently, an argument between the
computer scientists who build the system and
the biologists who use it and curate it
Gene (DNA)
trpA
Gene product
(Protein)
Search for “trpA”
Moral of this somewhat trivial example
• There are tensions (inconsistencies) between
formal representation and actual usage
• And, software makers end up having to cope
with these tensions in design decisions
• Usually in a kludgy way!
• Eg, papering over the conflict in the user
interface layer
• Would be nice to have a better theory of how
do this.
Example: how do we classify
mitochondria?
• Organelles (part of cell)
• But descended from
separate endosymbiotic
organisms
• With their own DNA
• (Generally but not
universally accepted
theory)
There are consequences
• “If we accept that mitochondria are
bacteria, then the record books have to be
rewritten. The first bacterial genome
sequence was completed not by American
arriviste Craig Venter …in 1995, but
instead by … Fred Sanger, who completed
the human mitochondrial genome
sequence in 1981!”
Expressivity in Description Logics
• Description Logics (DL) are the basis for
semantic web ontology.
• Selected largely for computational tractability
• But DL make it hard to do simple things such as
representing defaults
– All cats have hair
– Except for this one!
• Expressivity has been
traded away
• A compromise and perhaps
not the right one
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
Bruno Latour
• French philosopher and
sociologist of science
• Roundly reviled for
perceived anti-realism
• Started with
anthropological studies
of science in labs and
fields
• Ends in a rather unique
view of representation
and even metaphysics
Latour for dummies
• Science is a social construction
(but not an arbitrary one)
• Network based: a network consists of humans and
non-human actors (lab animals, instruments, funding
institutions…)
• Agonistic – trials of strength between networks
• Understand how science works by tracing the flow of
inscriptions, abstractions, and power through these
networks
• An enriched realism, that provides a rich account of the
relation between phenomena and representation
Dual face of science
Settled science:
“That’s the way it is”
Objective
Black-boxed
Politically Established
Natural
Science under construction:
Unsettled
Contentious
Searching for allies (people, funding, t
Building networks of alliance
Social
• Science in the making:
– EG: Watson and Crick’s work on the structure of
DNA
• Speculations (A three-strand model was proposed)
• Contending theories
• Eventually a winner emerges
• Science made
– Now that the structure of DNA is known,
• it’s a “black box”
• we can make instruments that measure it
• representations of its sequence
Under construction
Black boxed
Where the representation meets the
road
• Science is: “the transformation of rats and
mice into paper”
• Situated representations
– From phenomena
– Lab notebook
– Tables in articles
– Laws of nature
Concrete, situated
Abstract, objective
Jeff Shrager, “Diary of an Insane Cell Mechanic”
Intercalation of representations and
the phenomenon
Analogizing to KR
Knowledge Representation:
Realist
Objective
Settled
Factual
Established
Abstract
Graph structures
Knowledge Construction:
Situated representations
Unsettled
Bottom-up
User interfaces
Ad-hoc structures
A new view of the relation between
world and representation
• Latour refocuses epistemology
– Less on the truth of representations,
– More on their connection to the world via
networks of actants.
• Should be a natural fit for computationalists
– Who also make systems of symbols with causal
connections to the world and each other
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
Realism vs Conceptualism
• Realism: a movement in philosophy of KR
• Led mostly by Barry Smith, SUNY Buffalo
(eg “Beyond Concepts: Ontology as Reality Representation”, 2004)
• The problem: nobody knows what makes a good
ontology
• His solution: Aristotelian universals
– Bad ontologies are…those whose general terms lack
the relation to corresponding universals in reality, and
thereby also to corresponding instances. Good
ontologies are reality representations...
Realism is extremely annoying
• Both vacuous and wrong
• Vacuous: because it presupposes we know what is
real beforehand
• Wrong: because it doesn’t correspond to actual
scientific knowledge representation
• Examples of failure:
– Higgs bosons – we don’t know if they are real
– Genes – were hypothesized before their
“implementation” was known; when were they real?
– Software for synthetic chemistry – mixes real and notyet-real molecular structures
Afferent: software for drug discovery
chemists
But Realism is Winning
• Basis of BFO (Basic Formal Ontology)
• Which is used by OBO Foundry and other
bio-ontology efforts
• Nobody wants to be against “realism”… so
they picked a good name
Realism only deals with half of science
• May work for ready-made science,
•
hopeless for science-in-the-making
• Where we don’t know what’s real
• And which is where the action is
Representational Pragmatism
• Needed: a term with good connotations to compete
with “realism”.
• Connects to a philosophical tradition (James, Peirce,
Dewey, Rorty)
– “It is astonishing how many philosophical disputes collapse
into insignificance the moment that you subject them to
this simple test of tracing a concrete consequence” -James
• Bottom-up rather than top-down; opposed to
premature ontologizing; Latourian
• Support the divergent representational practices of
actual science
• Help science towards convergence, objectivity, and
realism, rather than demanding it upfront.
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
Some encouraging developments
• Linked data vs semantic web
A somewhat more bottom-up, pragmatic
approach to universal knowledge
infrastructure
• Freebase, DBPedia similar efforts
• Open Science movement
– Open Access Journals (PLoS, etc)
– Open Data (standards)
– Open Notebook (practices)
BioBike: a platform
for symbolic biocomputing
• A web-based, programmable tool for advanced
biocomputing
– Knowledge-based
– Programmable
– Social
• Really the inspiration of many of the ideas
here
• Joint work with Jeff Shrager (Stanford), Jeff
Elhai (VCU), and others
Reworked to be more social
Biocomputation
Bio-blog
menu
Knowledge/
data analysis
Integration
with
services
Commentary
Prototype-based KR
• How the mind categorizes (Rosche, Lakoff)
• A perennial minority theme in computation:
– 60s: Sutherland, Sketchpad
– 70s: Early frame-based KR systems
– 80s: Ungar and Smith, SELF programming language
– 90s: Ken Haase, Framer
– Now: Javascript
• A structured way to manage inconsistency
Biology is prototype-based
• Every feature of a biological class started out
as an exception to a general case!
• aka mutation
• Classes are Aristotelian
• Prototypes are Darwinian
Overview
•
•
•
•
•
•
•
Introduction
Two kinds of knowledge infrastructure
Ontological controversies: some examples
The nature of actual scientific representation
Representational pragmatism
Technical directions
Conclusions
The Problems
• Ontologies are plagued with inconsistencies (or
compromise) because they are inevitably the product of
different interests.
• Ontologies generally only try to capture the settled science
• Realism is vacuous, question-begging; if we knew at the
start what was real we wouldn't need to do science
• Knowledge construction is social, tentative, situated, multiviewpoint, and only objective at its endpoints.
The Solutions
• Tools that support how science is actually done, at web scale and with
greater visibility and traceability
• A pragmatic view of scientific representation
– That let scientists work bottom-up from their results
– that foregrounds the concrete relations between representation and reality
(circulating reference)
– connects science in progress with settled science, supporting and preserving
controversy, unsettledness, and argument structure
• More simply: integrate data and knowledge and the processes that connect
them.
• Open Science: institutions, standards, practices.
• A representational infrastructure that supports prototypes, default
reasoning, and exceptions.
Thank you!