Download thesis-proposal.R - Machine Listening (Now Music, Mind and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Embodied cognitive science wikipedia , lookup

Human–computer interaction wikipedia , lookup

Technological singularity wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Transcript
Computing point-of-view
By Hugo Liu
Thesis Proposal for the degree of Doctor of Philosophy
at the Massachusetts Institute of Technology
November 2005
Professor Pattie Maes
Associate Professor of Media Arts and Sciences
Massachusetts Institute of Technology
Professor William J. Mitchell
Head, Program in Media Arts and Sciences
Alexander W. Dreyfoos, Jr. (1954) Professor
Professor of Architecture and Media Arts and Sciences
Massachusetts Institute of Technology
Professor Larifari Aufhebung
King of Candy Land
Computing point-of-view
Hugo Liu
Media Arts and Sciences, MIT
[email protected]
November 2005
Abstract
A point-of-view affords individuals the ability to judge and react broadly to
people, things, and everyday happenstance. Your same sense-of-beauty is
versatile enough to judge almost anything you put before it, be it a
painting, a sunset, or a novel's ending. Yet point-of-view is ineffable and
quite slippery to articulate formally through words—just as light has no
resting mass, perhaps it could be said that viewpoint cannot be measured in
stasis. Drawing from semiotic and epistemological theories, this proposal
narrates a computational theory for representing, acquiring, and tinkering
with point-of-view. I define viewpoint as a self's collected situations within
latent semantic spaces such as culture, taste, identity, and aesthetics. The
topology of these spaces are acquired through linguistic ethnography of
online cultural corpora, and an individual's locations within these spaces is
inferred through psychoanalytic machine readings of egocentric texts. Once
acquired, viewpoints can gain embodiment as viewpoint artifacts, which
allow the exploration of someone else through interactivity and play. The
proposal will illustrate the theory by discussing interactive-viewpointartifacts built for five viewpoint realms—aesthetics, attitudes, cultural
taste, taste-for-food, and humor. I describe core enabling technologies such
as common sense reasoning and textual affect sensing, and propose a
framework to evaluate the judiciousness of point-of-view representations
and the value of viewpoint artifacts in affording people new ways for
organizing, shaping, and searching human narrative content.
Introduction
Since the late 1950s, every few years, some researcher in Artificial
Intelligence has exclaimed eureka, that they have almost engineered
a human intelligence, or some basal capability of a person. But in
2005, four years after the computer H.A.L. should have played tricks
with man in space, Artificial Intelligence feels still the same distance
from this ever-present mirage of human-level intelligence.
So it seems there were several bad paradigms stalling progress
on representing and computing people. First, too grandiose of claims
were made about formal logic and purely symbolic representation—
nicknamed Good Ole Fashioned AI by its detractors. Logic, with its
immaculate and universal calculus, treats minds like Rube-Goldberg
machines, and idealizes thought process the way that Descartes did.
Logic failed because thought is far too flexible, rich and
opportunistic than can be contained by a mathematically rigid,
symbolically sparse, and non-opportunistic representation like firstorder predicate calculus. Second, much ado was made about purely
Key Words
Point-of-view Models
User Modelling
Common Sense
Textual Affect Sensing
Aesthetics
Culture
connectionist representations like artificial neural networks. The
idea was that a properly wired ‘baby machine’ could be deployed in
the world and re-derive human mental capability applying only
first-principles. Like logic, this touched another extreme of the
representational spectrum, namely it was representationally agnostic.
The approach has yet to demonstrate compelling emergent
intelligence. Around (??), Marvin Minsky wrote a nice piece
reporting the stalemate—he suggested that the common error was
that ‘neat’ representations could never capture ‘scruffy’ and diverse
human intelligence. He proposed that the intelligence modeling
enterprise should instead be focused on combining ‘multiple
representations’ (SOM). In advocating the overthrow of Cartesian
hegemony, Minsky may have unbeknowingly inspired Gilles
Deleuze and Felix Guattari’s defining work of our time “A Thousand
Plateaus: Capitalism and Schizophrenia—“ () also advocating the
overthrow of Modernism’s immaculate linear account of life and
thought.
While some illusions have been overcome, Artificial Intelligence
needed in the boom of expert systems and needs now again in the
boom of knowledge-based approaches to sort out the importance of
microscopic knowledge, given as expert rules, or “facts about the
world—“ whatever that may mean. The shadow of Descartes haunts
‘facts’ as much as logic—for even if facts are received cum grano salis
and their truth conditions are hedged, they still purport to evoked by
people engaged in thinking. As a matter of reflexivity, much of our
Open Mind Common Sense work at this lab is as vulnerable as Cyc (
) to the stamp-collecting syndrome. Cyc’s 3 million assertions and
Open Mind’s 800,000 sentence-based “facts” do not further them in a
‘horse-race’ toward human level knowledge.
So long as
representation is purely symbolic—as facts are—abilities granted to
children like dexterously manipulating a ball ( ) or granted to adults
like skill with people might occupy billions if not more sentences to
describe judiciously. The warning to heed is that human intelligence is
not about possessing rote knowledge. Having knowledge around does
not ensure that it can be applied judiciously and opportunistically to
form coherent thoughts and reactions.
Motivated by a search for coherent yet flexible representation
and emulation of human intelligence, we identify point-of-view as a
crucial metaphor for conceptualizing human intelligence.
A
layperson’s dissection of the “point-of-view” concept—two
participants in an argument are debating the merits of an artwork
and find that they disagree; one says to the other, “but from my
point-of-view, I see things differently.” Here point-of-view evokes
an image of the two debaters standing at opposite ends of an
opinion-space. In the middle is a large blob representing the true
meaning of the artwork. The claim “from my point-of-view, I see
things differently” reifies as one debater reporting that he can see a
different side of the true meaning of the artwork than can the other
debater, while allowing that she herself cannot grasp the whole
meaning. So, having point-of-view relieves the anxiety of having
true thoughts—instead, it privileges coherency and integrity over
truth itself, for standing from the same vantage point, a debater will
tend to report all sightings of meaning blobs with the same
idiosyncratic tendencies, always seeing a certain side to things.
A point-of-view is easy. Every person is always operating under
one or more points-of-view regardless of having reflexivity about it,
because cognitive economy dictates that our knowledge and
memories are always consolidated and systematized, with at least
patchwork consistency. In Metaphors We Live By, George Lakoff and
Mark Johnson [] report that language itself is organized and unified
by culturally-specific metaphorical frameworks, which then shape
the thoughts of cultural participants in the way that Lacan [] and
Whorf [] had presaged. For example, time is money, as in “I spent
my day on you, I can’t believe I invested so much time in you, and
you weren’t worth it.”
The grandeur of point-of-view’s economy is easily
demonstrated. Look at this artwork, do you find it beautiful? Read
this book ending, is it beautiful? Is this sunset beautiful? Or this
government? Most likely, your sense-of-beauty viewpoint prepared
you to judge all of these things, or at least attempt judgment. Pointof-view affords the immediacy of judgment over person, thing, idea,
or situation placed within its realm. There is no need to move, to be
agile, for judgment often happens like the natural reflex of a knee
popping when stricken with a mallet. Whereas a facts-oriented view
of thought requires conceptual knowledge, every person has
abundant judgmental knowledge for virtue of possessing points-ofview like sense-of-beauty, sense-of-humor, sense-of-culturalidentity, a palette for food, and a personality. It is not necessary to
store each judgment as a fact, for point-of-view’s lucidity readily
produces judgments as it reacts to whatever fodder is put before it.
Economical, flexible, and broad in applicability, point-of-view is
a powerful framework and mover for human judgmental thought,
arguably exceeding conceptual and logical thought in breadth and
utility. If point-of-view could be successfully modeled, acquired, and
animated computationally for a few important human realms such
as aesthetics, identity, and opinions, in toto, the computational
system would be emulating a significant basal capability of human
thinking.
To be clear, a computational model of an individual’s point-ofview would constitute a stereotype of that person that is not as agile,
and that might make the same judgment if asked ten times in a row.
But I will argue in this proposal that this would still be an extremely
useful stereotype.
What if every person could access a
computational stereotype representing 80% of their mentor’s
judgmental capability—to bounce random things off their ‘virtual
mentors’ without resource bounds?
There would be real
consequences for education if students could ‘tinker’ a la
constructionist learning ( ) with the stereotyped opinions and
perspectives of mentors, computationally producing ‘just-in-time’
and ‘just-in-context’ reactions to the student’s actions.
The goal of the research proposed here is to design, build, and
validate systems for 1) modeling an individual’s point-of-view
within various realms—such as aesthetics, attitudes, and identity; for
2) automatically acquiring an individual’s point-of-view model
through machine readings of egocentric (self-effacing, selfdescribing) texts and 3) organizing the model into coherency; and for
4) animating point-of-view placed inside interactive artifacts such as
virtual mentors by causing the artifact to judge and react to a very
broad range of things placed before it, ‘just-in-time,’ and ‘just-incontext’.
I plan to address these four steps as follows. 1) To develop
representations of viewpoints across the realms of concern, I will
draw heavily from well-established semiotic and epistemological
theories of said realms from the psychology and literary theory
literatures. For example, Carl Jung’s Modes of Perception (Think,
Intuit, Sense, and Feel) [] form the dimensions of my proposed
aesthetic viewpoint space, as I pose aesthetics as the perceptual
manner and priority with which an individual approaches some
topic—a realist sees a sunset, but a romantic might prefer to feel the
sunset. The realist is thus located at the position, 100% Sense, 20%
Think, 20% Intuit, 20% Feel, for example. 2) To automatically
acquire an individual’s point-of-view, I propose to apply natural
language processing tools such as my widely used MontyLingua
package ( ), in conjunction with my common sense reasoning
package ConceptNet ( ), and my textual affect sensing system known
as Emotus Ponens ( ). In particular, I anticipate that reading emotion
out of text will be vital to modeling viewpoint because human
judgment often reifies in narratives through emotional appraisal or
mannerisms around a topic’s discussion. 3) To make point-of-view
models somewhat coherent, I will apply analogy-based reasoning ( ).
For example, knowing that a person loves trees, by analogicalextension, they might also love rocks (note that this is different from
a layperson intention for the word ‘analogy’); however, pitfalls must
be avoided—for example, a dog lover may hate cats, even though
dogs and cats are both pets. 4) Finally, to animate point-of-view, I
will follow Brad Rhodes’ methodology of Just-in-Time-InformationRetrieval (JITIR) [] which prescribes that interface agents—in my
case a virtual mentor reacting to things that you are writing or doing
using its viewpoint—continuously mine present user context and
utterances, searching for opportunities to retrieve and present
relevant information – in my case, a viewpoint-produced judgment
about whatever the user is doing—on the chance that it can lend
insight, inspire, or teach the user.
While the acquired models will not be absolutely complete or
always correspond to true viewpoint, and while none of the
produced reactions will be as spontaneous or as flexible as those of
the actual person, I believe that even a first-order approximation of
model acquisition and animation can produce incisive models of
individual perspective, that upon animation will afford novel and
effective new ways to search, gain insight into, be inspired by, and
connect with someone else and their collected narrative content. I
have italicized three words in the previous sentence because these
words constitute the tripartite agenda of our Ambient Intelligence
Group. I believe that our group has the most to gain from such a
thesis as the methodological conclusions of this research would
directly inform much of the impact we seek for our technologies to
have on people.
Finally, this thesis is as diverse and as simple as I believe Media
Laboratory research should be—diverse in the methods and theories
it draws from, but simple in that it is attacking a basic problem of
relevance to people—so basic that it’s goal could be explained to
anyone on the street. This thesis draws from Sociology, Literary
Theory and Psychology for its computational framing of point-ofview, from Computational Linguistics and Artificial Intelligence for
reasoning about text, and from Interaction Design for designing
point-of-view artifacts. I have developed but not assembled nor
integrated some implementations for this thesis, and already it forms
the basis for an AAAI workshop on computational aesthetics which I
will co-chair upon the proposed completion of this thesis. To do
justice to an idea as complex and with as long as a history as ‘pointof-view,’ it will be important to clothe the thesis in all of the relevant
literatures and to spend as much time on a computational theory of
point-of-view, as on technical details of implementation. Otherwise,
this work would lose a golden opportunity to be absorbed by an AI
community that is interested in how machines can appraise beauty
and emotion, and by a humanities and cultural studies community
that would be very interested in the computation of its long-standing
but thought incomputable theories of identity and aesthetics. The
rest of this proposal will reflect my emphasis on the importance of
grounding this thesis in the literatures, and on the importance of
distilling reusable methodology and a robust theoretical framework.
I will, of course, motivate all theory with many implemented
demonstrations and task-based evaluations.
References
Certeau, Michel de (1997), Culture in the Plural. Ed. and intro. Luce Giard. Trans. and
afterword Tom Conley. Minneapolis: U of Minnesota Press.
Geertz, Clifford (1973), The interpretation of cultures. New York: Basic.
Goffman, E. (1959), The Presentation of Self in Everyday Life. Garden City, NY:
Doubleday.
Kluckhohn, Clyde (1949), Mirror for Man. McGraw-Hill Book Co.
Krueger, Myron (1983), Artificial Reality, Addison Wesley.
Liu, H., Maes, P., Davenport, G. (2006), “Unraveling the Taste Fabric of Social
Networks”, International Journal on Semantic Web and Information Systems 2(1).
Idea Academic Publishers.
Liu, H. Davenport, G., Maes, P. (forthcoming). “Taste Fabrics and the Beauty of
Homogeneity.” Association of Information Systems SIG SEMIS Bulletin 2(x), ISSN
1556-2301.
Rokeby, David (1995), “Transforming Mirrors: Subjectivity and Control in Interactive
Media.” In Penny, Simon (Ed.), Critical Issues in Electronic Media: 133-158, Series
in Series in Film History and Theory, Albany: SUNY Press.
Rozin, Daniel (2005). Works, http://www.smoothware.com/danny/ (accessed 25
October 2005)
Ruby, Jay (1980), “Exposing yourself: reflexivity, anthropology, and film” In Semiotica
30-1/2: 153-179.
Shakespeare, William (1997), As you like it. In The Riverside Shakespeare, Houghton
Mifflin Co., 2nd edition. Originally written circa 1598-1600.
Sonenberg, Janet (2003), Dreamwork for Actors. Routledge, Inc.