Download sai-avatar1.doc

Document related concepts

Concept learning wikipedia , lookup

Affective computing wikipedia , lookup

Intelligence explosion wikipedia , lookup

Soar (cognitive architecture) wikipedia , lookup

Pattern recognition wikipedia , lookup

Speech-generating device wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Personal knowledge base wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Wizard of Oz experiment wikipedia , lookup

Semantic Web wikipedia , lookup

Ecological interface design wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Human–computer interaction wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Transcript
A dissertation Report on
AVATAR SIMULATED BOT: WITH
SEMANTIC MEMORY AND CONGNITIVE
APPROACH
SAI - SIMULATED ARTIFICIAL INTELLIGENCE BOT
A dissertation Report on
AVATAR SIMULATED BOT: WITH SEMANTIC
MEMORY AND COGNITIVE APPROACH
PARUL INSTITUTE OF ENGINEERING AND
TECHNOLOGY (MCA)
VADODARA
(2011–2012)
by:095250693053
Anitha .K. Swamy
095250693058
Patel Nikisha A
ACKNOWLEDGEMENT
The 5th Semester Dissertation is a golden
opportunity for learning, doing research work,
understanding & implementation of new technologies
and self development. We consider ourselves very lucky
and honored to have so many wonderful people lead us
through in completion of this dissertation.
Mr. V.N.Acharya, HOD, MCA Dept. and our
Dissertation Guide Falguni Ranadive monitored our
progress and arranged all facilities to make dissertation
easier. She was always so involved in the entire process,
shared his knowledge, and encouraged us to think. Thank
You, Dear Madam. We choose this moment to
acknowledge her contribution gratefully.
Last but not the least, there were so many who
shared valuable information that helped in successful
completion of this dissertation. I would like to thank all of
them on behalf of us both team members.
TABLE OF CONTENTS
GLOSSARY OF IMPORTANT TERMS AND
ABBREVIATIONS
AVATAR
SAI
ARTIFICIAL INTELLIGENCE
EMBODIED AGENTS
AIML: ARTIFICIAL MARKUP LANGUAGE
AIML, or Artificial Intelligence Markup Language, is an XML dialect for creating natural
language software agents.
1.)LIST OF FIGURES AND CHART
Then following figures shows the concept of
backpropogation.
A simple agent program which maps every possible
percepts sequence to a possible action the agent can
perform or to a coefficient, feedback element, function
or constant that affects eventual actions:
Agent function is an abstract concept as it could
incorporate various principles of decision making like
calculation of utility of individual options, deduction
over logic rules, fuzzy logic, etc.
The program agent, instead, maps every possible
percept to an action.
Artificial embodied agents
schematically as above
are
often
described
Simple embodied agents
Simple reflex agents act only on the basis of the current
percept, ignoring the rest of the percept history. The
agent function is based on the condition-action rule: if
condition then action. This agent function only succeeds
when the environment is fully observable
Self-Learning Embodied agents
Learning has an advantage that it allows the agents to
initially operate in unknown environments and to
become more competent than its initial knowledge alone
might allow. The most important distinction is between
the "learning element", which is responsible for making
improvements, and the "performance element", which is
responsible for selecting external actions.
The learning element uses feedback from the "critic" on
how the agent is doing and determines how the
performance element should be modified to do better in
the future. The performance element is what we have
previously considered to be the entire agent: it takes in
percepts and decides on actions.
The last component of the learning agent is the "problem
generator". It is responsible for suggesting actions that
will lead to new and informative experiences.
2.) MAIN TEXT
2.1 ABSTRACT:
This dissertation introduces a unified data acquisition,
processing and synthesis framework for the creation of
biomechanically correct human avatars
2.2 KEY TERMS :
Avatar, Simulated Artificial Intelligence bot,artificial
intelligence,bot,chatbot assistant, virtual assistant
,semantic memory, embodied conversational agent.
2.3 INTRODUCTION:
One of the most significant roles played by
technology is connecting people and mediating their
communication with one another. Remote conversations
were unthinkable before but are now routinely
conducted with devices ranging from two-way personal
computers to videoconference systems.
Building technology that mediates conversation
presents a number of challenging research and design
questions. Apart from the fundamental issue of what
exactly gets mediated, two of the more crucial questions
are how the person being mediated interacts with the
mediating layer and how the receiving person
experiences the mediation (see Figure 1). This literary
introduction provides the framework of mediated
conversation by means of automated avatars.
A lot of efforts in constructing interfaces based on
natural language have been devoted to creating a
simulated pseudo bot that the user seems to feel that
the bot understands the meaning of the words. Since the
famous “Eliza” program of Weizenbaum chatterbots
attempt to discover keywords and sustain dialog by
asking pre-prepared questions without understanding
the subject of conversation or the meaning of individual
words. This is quite evident from the Loebner prize
chatterbot competition, popularity of bots based on
AIML language, and the general lack of progress in text
understanding and natural language dialogue systems.
Cheating has obviously its limitations and it is doubtful
that good natural language interfaces may be built this
way. An alternative approach used by humans requires
various types of memory systems to facilitate concept
recognition,building episodic relations among concepts,
and storing the basic information about the world,
descriptions of objects, concepts, relations and possible
actions in the associative semantic memory. Although
the properties of semantic memory may be partially
captured by semantic networks so far this has been
demonstrated only in narrow domains , and it is not easy
to see how to create a large-scale semantic network that
could be used in an unrestricted dialog with a chatterbot.
In this paper cognitive inspirations are drawn upon
to make a first step towards creation of avatars
equipped with semantic memory that will be able to use
language in an intelligent way. This requires ability to
ask questions relevant to the subject of discourse,
questions that constrain and narrow down possible
ambiguities. Very ambitious projects that use a
sophisticated frame-based knowledge representation,
have been pursued for decades and can potentially
become useful in natural language processing, although
this has yet to be demonstrated. However, the
complexity of the knowledge-based reasoning in large
systems make them unsuitable for real-time tasks, such
as quick analysis of large amounts of text found on the
web pages, or simultaneous interactions with many
users.
An alternative strategy followed here is to start from
the simplest knowledge representation for semantic
memory and to find applications where such
representation is sufficient. Drawing on its semantic
memory an avatar may formulate and may answer many
questions that would require exponentially large number
of templates in AIML or other such languages.
Endowing avatars with linguistic abilities involves
two major tasks: building semantic memory model, and
providing
all
necessary
means
for
natural
communication.
This paper describes our attempts to create
Humanized Interface based on a 3D human face model,
with speech synthesis and recognition, which is used to
interact with Web pages and local programs, making the
interaction much more natural than typing these actions
are based primarily on the information in its semantic
memory. Building such memory is not a simple task and
requires development of automatic and manual data
collection and retrieval algorithms, using various tools for
analysis of natural language sources.
Avatar :SAI-SIMUALTED ARTIFICIAL
INTELLIGENCE BOT
In computing,an avatar is
the graphical
representation of
the user or
the
user's alter
ego or character. It may take either a dimensional form,
as in games or virtual worlds, or a two-dimensional form
as an icon in Internet forums and other online
communities. It can also refer to a text construct found
on early systems such as MUDs. It is an object
representing the user. The term "avatar" can also refer to
the personality connected with the screen name, or
handle, of an Internet user.
Avatar chatbot is any fictional character not
controlled by a person who created this usually means a
character controlled by the computer through artificial
intelligence. In artificial intelligence, an embodied agent,
also sometimes referred to as an interface agent, is
an intelligent agent that interacts with the environment
through a physical body within that environment. Agents
that are represented graphically with a body, for example
a human or a cartoon animal, are also called embodied
agents, Embodied conversational agents are embodied
agents (usually with a graphical front-end as opposed to
a robotic body) that are capable of engaging in
conversation with one another and with humans
employing the same verbal and nonverbal means that
humans do (such as gesture, facial expression, and so
forth).
One of the trends of recent years has been the
humanizing of digital channels, giving a face to things
which are not human. This has led to the creation
of avatars (also known as bots or chatter-bots) artificial
intelligences with which users can “converse”. The
success of such bots varies greatly, there are few which
respond in a convincingly human way, it is no great
mystery why they are commonly referred to as “Bots”
often resulting in a stilted, mechanical interaction where
straying off a recognized path can lead to poor responses
However, this has not stopped their spread across
the commercial world with several high profile
companies adopting them as part of their customer
services. An avatar such as SAI BOT render to the idea of
an artificial intelligence able to respond in an intelligent
manner to your questions is indeed an exciting one.
However, do these bots really manage it? Or are they
just human faced Avatars disguising a search engine
beneath? Or such as a intelligent virtual assistant may
basically consist of a dialog system, an avatar, as well
an expert system to provide specific expertise to the
user. Other components An automated online assistant
also has an expert system that provides specific service,
whose scope depends on the purpose of it.
Also, servers and other maintaining systems to keep the
automated assistant online may also be regarded as
components of it.
History
An avatar is a user’s visual embodiment in a virtual
environment. The term, borrowed from Hindu mythology
where it is the name for the temporary body a god
inhabits while visiting earth, was first used in its modern
sense by Chip Morningstar who along with Randall
Farmer created the first multi-user graphical online world
Habitat in 1985 (Damer
1998). Habitat was a
recreational environment where people could gather in a
virtual town to chat, trade virtual props, play games and
solve quests Users could move their avatars around the
graphical environment using cursor keys and could
communicate with other online users by typing short
messages that would appear above their avatar. Habitat
borrowed many ideas from the existing text-based MUD
environments, but the visual dimension added a new
twist to the interactions and attracted a new audience
(Morningstar and Farmer 1990). Avatar-based systems
since Habitat have been many and varied, the
applications ranging from casual chat and games to
military training simulations and online classrooms.
In electronic media such as chat bots, this usually
means a character controlled by the computer
through artificial intelligence.
Web 2.0 Avatars, powered by Digital Conversations,
provide a level of immersion not found in these bots.
Why? Because Digital Conversations are scripted just like
any good book. And like books they are designed to
guide a user, through high quality dialogue and
interactions, to an outcome. Along with this, the ability
to understand user interactions through DecisionMetrics
means that these Web 2.0 Avatars can be adapted to
emergent demands as they appear. The dialogue can be
improved and built up as and when needed. High quality
dialogue, clear concise options for a user to choose and a
humanized avatar all combine to create an immersive
experience, with the psychological appeal of interacting
with a character or object .
The key to immersion and believability is high quality
dialogue, and it is high quality dialogue that Digital
Conversations has been created for.
FUNAMENTAL CONCEPTS
ARTIFICIAL INTELLIGENCE:
Artificial intelligence (AI) is the intelligence of
machines and the branch of computer science that aims
to create it. Traditionally it is defined as the field of "the
study and design of intelligent agents" where an
intelligent agent is a system that perceives its
environment and takes actions that maximize its chances
of success. John McCarthy, who coined the term in
1956, defines it as "the science and engineering of
making intelligent machines."
Classified and statistical learning methods employed
for functioning of the avatar
Neural networks
Main articles: Neural network and Connectionism
A neural network is an interconnected group of
nodes, akin to the vast network of neurons in the human
brain.
The study of artificial neural networks began in the
decade before the field AI research was founded, in the
work of Walter Pitts and Warren McCullough. Other
important early researchers were Frank Rosenblatt, who
inventedthe perceptron and Paul Werbos who developed
the back propagation algorithm.
The main categories of networks are acyclic
or feedforward neural networks (where the signal passes
in only one direction) and recurrent neural
networks (which allow feedback). Among the most
popular feedforward networks are perceptrons, multilayer perceptrons and radial basis networks. Among
recurrent networks, the most famous is the Hopfield net,
a form of attractor network, which was first described
by John Hopfield in 1982. Neural networks can be applied
to the problem of intelligent control (for robotics)
or learning, using such techniques as Hebbian
learning and competitive learning.Hierarchical temporal
memory is an approach that models some of the
structural and algorithmic properties of the neocortex.
Intelligent agent paradigm
An intelligent agent is a system that perceives its
environment and takes actions which maximize its
chances of success. The simplest intelligent agents are
programs that respond to real life entity based questions
such agents are symbolic and logical,neural networks and
others may use new approaches. The paradigm also gives
researchers or users ability to communicate with such
embodied agents (use this for neural net descript
)Artificial Neural Networks (ANNs) are a new approach
that follow a different way from traditional computing
methods to solve problems. Since conventional
computers use algorithmic approach, if the specific steps
that the computer needs to follow are not known, the
computer cannot solve the problem. That means,
traditional computing methods can only solve the
problems that we have already understood and knew
how to solve. However, ANNs are, in some way, much
more powerful because they can solve problems that we
do not exactly know how to solve. That's why, of late,
their usage is spreading over a wide range of area
including, virus detection, robot control, intrusion
detection systems, pattern (image, fingerprint, noise..)
recognition and so on.
ANNs have the ability to adapt, learn, generalize,
cluster or organize data. There are many structures of
ANNs including, Percepton, Adaline, Madaline, Kohonen,
BackPropagation
and many
others.
Probably,
BackPropagation ANN is the most commonly used, as it is
very simple to implement and effective. In this work, we
will deal with BackPropagation ANNs.
Backpropogation is used here as we are using aiml to
create a pattern recognizing chatbot which will later on
after long duration and consistent input of data by users
will then result to unsupervised learning via output
weights of backpropgation network being inputted as
input of the neural net’s layers We also make use of
genetic algorithm tech.
Digital Conversation
A Digital Conversation is a scripted dialogue (in
other words it is dialogue written by a human, just like
the script of a movie) which takes place between a
person and a computer via any digital medium from web
browsers and PDAs to mobile
phones and Interactive
television.
2.4 LITERATURE SURVEY
2.5 MAJOR THESES AND HYPOTHESES PRESENTED
Reference from : Towards Avatars with Artificial Minds: Role of
Semantic Memory
W The first step towards creating avatars with human-like artificial minds is to give them humanlike
memory structures with an access to general knowledge about the world. This type of
knowledge is stored in semantic memory. Although many approaches to modeling of semantic
memories have been proposed they are not very useful in real life applications because they lack
knowledge comparable to the common sense that humans have, and they cannot be implemented
in a computationally efficient way. The most drastic simplification of semantic memory leading
to the simplest knowledge representation that is sufficient for many applications is based on the
Concept Description Vectors (CDVs) that store, for each concept, an information whether a
given property is applicable to this concept or not. Unfortunately even such simple information
about real objects or concepts is not available. Experiments with automatic creation of concept
description vectors from various sources, including ontologies, dictionaries, encyclopedias and
unstructured text sources are described. Haptek-based talking head that has an access to this
memory has been created as an example of a humanized interface (HIT) that can interact with
web pages and exchange information in a natural way. A few examples of applications of an
avatar with semantic memory are given, including the twenty questions game and automatic
creation of word puzzles.
Reference : Avatar online augmented conversation (map
chat alppimcation) for thesis and hypothesis
Major thesis (ABSTRACT)
This thesis is concerned with both of these questions and proposes a theoretical framework of mediated
conversation by means of automated avatars. This new approach relies on a model of face-to-face
conversation, and derives an architecture for implementing these features through automation. First the
thesis describes the process of face-to-face conversation and what nonverbal behaviors contribute to its
success. It then presents a theoretical framework that explains how a text message can be automatically
analyzed in terms of its communicative function based on discourse context, and how behaviors, shown
to support those same functions in face-toface conversation, can then be automatically performed by a
graphical avatar in synchrony with the message delivery. An architecture, Spark, built on this framework
demonstrates the approach in an actual system design that introduces the concept of a message
transformation pipeline, abstracting function from behavior, and the concept of an avatar agent,
responsible for coordinated delivery and continuous maintenance of the communication channel. A
derived application, MapChat, is an online collaboration system where users represented by avatars in a
shared virtual environment can chat and manipulate an interactive map while their avatars generate
face-to-face behaviors. A study evaluating the strength of the approach compares groups collaborating
on a route-planning task using MapChat with and without the animated avatars. The results show that
while task outcome was equally good for both groups, the group using these avatars felt that the task
was significantly less difficult, and the feeling of efficiency and consensus were significantly stronger. An
analysis of the conversation transcripts shows a significant improvement of the overall conversational
process and significantly fewer messages spent on channel maintenance in the avatar groups. The
avatars also significantly improved the users’ perception of each others’ effort. Finally, MapChat with
avatars was found to be significantly more personal, enjoyable, and easier to use. The ramifications of
these findings with respect to mediating conversation are discussed.
Contributions and Organization of Thesis
First the background about human face-to-face conversation is reviewed in
chapter 2 along with a review of computer mediated communication and
computational models of conversation. Then the theoretical framework
describing the augmentation of online conversation based on a model of
face-to-face conversation is introduced in chapter 3. This model lists the
essential processes that need to be supported and how nonverbal behavior
could be generated to fill that role. The theory is taken to a practical level
through the engineering of an online conversation system architecture
called Spark described in chapter 4 and then an actual implementation of
this architecture in the form of a general infrastructure and a programming
interface is presented in chapter 5. A working application for online
collaborative route planning is demonstrated in chapter 6 and the
implementation and approach evaluated in chapter 7. Possible follow-up
studies, application considerations and interesting issues are discussed in
Chapter 8. Finally future work and conclusions in chapters 9 and 10 place
the approach in a broader perspective, reflecting on general limitations as
well as on the kinds of augmented communication this work makes
possible.
Results and analysis
This thesis on MapChat application makes contributions to several different fields of study:
To the field of computer mediated communication, the thesis presents a theory of how textual real-time
communication can be augmented by carefully simulating visual face-to-face behavior in animated
avatars. The thesis demonstrates the theory in an implemented architecture and
evaluates it in a controlled study. To the field of human modeling and simulation, the thesis presents a
set of behaviors that are essential to the modeling of conversation. It is shown
how these behaviors can be automatically generated from an analysis of the txt to be spoken and the
discourse context. To the field of HCI, the thesis presents a novel approach to augmenting an
online communication interface through real-time discourse processing and automated avatar control.
For avatar-based systems, it provides an alternative to manual control and performance control of
avatars. For any communication system it introduces the idea of a communication proxy in
the form of a personal conversation agent remotely representing
participants. To the field of systems engineering, the thesis presents a powerful way to
represent, transmit and transform messages in an online real-time messaging application.
Design
While the first study addresses the question how seeing the animated
avatar bodies affected the communication, follow-up studies could ask
other but related questions. Here three different study designs are
suggested, all using the task and outcome measures described above.
Study I
Hypothesis: “Groups using the new face-to-face paradigm do
better on a judgment task over those groups that use the stateofthe art in online collaboration.”
Goal: Compare the face-to-face avatar paradigm with a
shared workspace paradigm currently representing the stateofthe art in online collaboration. This comparison has the
potential to demonstrate the power of a new paradigm and
uses a system most people are familiar with as a reference.
Method: Two sets of groups solve the judgment task, one
using a popular collaboration system such as NetMeeting that
integrates a text chat with a shared whiteboard and another
using a Spark based system. In NetMeeting the inventions to
be ordered would be images on the whiteboard. In the 3D
avatar environment, they would be objects on the table in
front of the avatars.
Study II
Hypothesis: “Groups that use avatars modeling conversational
behavior in avatars do better on a judgment task than groups
that use minimally behaving avatars”
Goal: To show the impact of modeling appropriate behavior
by demonstrating that the effects found with the new
animated avatars so far are not due to their mere presence but
to their carefully crafted behavior.
Method: Two sets of groups solve the judgment task, both
using a Spark based system, but for one group all behaviors
are turned off except for lip movement when speaking and
random idle movement.
Study III
147
Hypothesis: “Groups that use the animated avatars and
groups that interact face-to-face show improvement in
judgment task performance over groups that use text only
chat. Groups interacting face-to-face show the greatest
improvement.”
Goal: To show that the avatars that animate typical face-toface
behavior actually move the performance of online
collaboration closer to that of actual face-to-face.
Method: Three sets of groups solve the judgment task, one
face-to-face, one using the Spark based system with the
avatars visible and one with no avatars visible.
The follow-up studies should be between-subject studies to get around a
possible learning effect. A between-subject study would also alleviate
problems associated with scheduling groups of subjects for return visits
and the possibly shifting group membership.
Implementation
For the study that has been conducted, the implemented system and
animation was deemed “good enough” by experts to represent the
theoretical model. As mentioned in the MapChat technical evaluation,
there were still a few issues, especially with time lag. Furthermore, the
animations themselves felt a little “stick-figure-like” because Pantomime
is currently only capable of rendering joint rotations of stiff segments with
no natural deformation of the body.
All of these technical issues are under constant improvement. Lag times
improve as computers get faster, and several parts of the MapChat
implementation are being fixed and optimized as a result of running the
user study. Using Pantomime to control a skinned character animation
rendering engine instead of Open Inventor has been successfully tested, so
future animations in Pantomime may see drastic improvement in
2.6 DATA AND ANALYSIS
2.7 RESULTS AND DISCUSSIONS
Overview
The goal of the work presented in this dissertation is to augment online
conversation by employing avatars that model face-to-face behavior.
The goal can be divided into 3 parts:
( Bullet these points )understand what a person means to communicate (input),
define a set of processes crucial for successful interaction
the set of behaviors that support them (model), and finally
coordinate those behaviors in a real-time performance
(output put figure here of our avatar).
Future work can expand on each of these parts.
9.2 Input and interpretation
Speech
While text is will most likely continue to be the most popular messaging
and chat medium, voice-over-IP technology is providing increasingly
higher quality voice conferencing for applications ranging from shared
whiteboards to games. There are certainly situations where voice is the
best option, such as when hands are not free to type. It is therefore
important to consider what it would take to augment a speech stream using
the approach presented here.
discussion
The long-term goal of our project is to create an avatar
that will be able to use natural language in a meaningful
way. An avatar based on the Haptek head has been
created, equipped with semantic memory, and used as an
interface to interactive web pages and software programs
implementing word games. Many technologies have to be
combined to achieve this. Such avatars may be used as
a humanized interfaces for natural communication with
chatterbots that may be placed in virtual environments.
Several novel ideas have been developed and presented
here to achieve this:
1) Knowledge representation schemes should be optimized
for different tasks. While general semantic memories
have been around for some time [9, 10] they have
not led to large-scale NLP applications. For some applications
much simpler representation afforded by the concept
description vectors is sufficient and computationally
much more efficient.
2) CDV representation facilitates some applications,
such as generation of word puzzles, while graph representation
of general semantic memory cannot be easily used
to identify a subset of features that should be used in a
question. Binary CDV representation may be systematically
extended towards frame-based knowledge representation,
but it would be better to use multiple knowledge
representation schemes optimized for different applications.
Although reduction of all relations stored in semantic
memory to the binary CDV matrix is a drastic
simplification some applications benefit from the ability
of quick evaluation of the information content in concept/
feature subspaces. Careful analysis is needed to find
the simplest representations that facilitate efficient analysis
for different applications. Computational problems
due to a very large number of keywords and concepts
with the number of relations growing into millions are
not serious if sparse semantic matrix representation is
used.
3) An attempt to create semantic memory in an automatic
way, using definitions and assertions from Wordnet
and ConceptNet dictionaries, Sumo/Milo ontologies
and other information sources has been made. Although
analysis of that information was helpful, creating full description
even for simple concepts, such as finding all
properties of animals, proved to be difficult. Only a small
number of relevant features have been found, despite the
large sizes of databases analyzed.
Thus we have identified an important challenge for lexicographers:
creation of a full description of basic concepts.
Information found in dictionaries is especially brief
and without extensive prior knowledge it would not be
possible to learn much from them. The quality of the
data retrieval (search) depends strongly on the quality
of the data itself. Despite using machine readable dictionaries
with verification based on dictionary glosses still
spurious information may appear, for example strange
relations between keywords and concepts which do not
appear in real world. This happens in our semantic memory
in about 20% of all entries and is much more common
if context vectors are generated by parsing general texts.
In most cases it is still possible to retrieve sensible data
from such semantic memory. One way to reduce this effect
is to parse texts using phrases and concepts rather
than single words. The quality of semantic memory increases
gradually as new dictionaries and other linguistic
resources are added. It is also designed to be fine-tuned
2.8 CONCLUSION AND FUTURE DIRECTIONS
Conclusions
This paper describes the use of personalisation and a
Personal Assistant application in a platform which aims
to support an open active mobile multimedia
environment, with a range of applications and services
specifically geared to young people. An important
aspect of the platform is the provision of context-aware
services, including location awareness. The importance
of the latter has been recognised in several studies
The effectiveness of any system of personalisation
depends to a large extent on the information captured in
the user profile. To this end the Personal Assistant plays
a key role in maintaining the user profile.
To improve the interface with the user exchangeable
avatars are included, which can be controlled by the
user. This provides a further level of personalisation
and, it is hoped, will make the task of interaction more
enjoyable.
In conclusion, the Youngster approach combines
innovative technology and service developments in a
platform, which has been developed for young people.
Future directions
An avatar based on Haptek head has been equipped
with semantic memory and used as an interface to chatterbots,
interactive web pages and software programs
implementing word games. This avatar may be used as
humanized interface for natural communication with
text-based web pages, or placed in virtual environments
in cyberspace. Semantic memory stored in relational
database is used efficiently in many applications after
reduction to a sparse CDV matrix. A chatterbot used
with avatar equipped with semantic memory may ask
intelligent questions knowing the properties of objects
mentioned in the dialog. Most chatterbots try to change
the topic of conversation as they do get lost in the conversation
Information found in dictionaries is especially brief.
and without extensive prior knowledge it would not be
possible to learn much from them. The quality of the
data retrieval (search) depends strongly on the quality
of the data itself. Despite using machine readable dictionaries
with verification based on dictionary glosses
still spurious information may appear, for example
strange relations between keywords and concepts
which do not appear in real world. This happens in our
semantic memory in about 20% of all entries and is
much more common if context vectors are generated by
parsing general texts. In most cases it is still possible to
retrieve sensible data from such semantic memory. One
way to reduce this effect is to parse texts using phrases
and concepts rather than single words. The quality of
semantic memory increases gradually as new dictionaries
and other linguistic resources are added. It is also
designed to be fine-tuned during its usage.
Use of avatar
IV. USE OF AVATARS
One particular device that is being used to improve the
interface with the user is the use of avatars. Ideally an
avatar may be viewed as an interface to a system with a
virtual representation of personal presence and personal
characteristics. In this way it provides a user-friendly
environment and helps the user to interact in an easier
way with the system.
Avatars can provide more than just an interface to a
system. More advanced avatars can be adaptive to
behave differently in different situations and hence are
better able to attract and hold the user's attention. In
some cases they could even take on the characteristics
of a pet, or more appropriately in this case, a
tamagotchi-like entity.
There are a number of instances of the use of avatars in
applications with varying degrees of success. The most
well known (and least liked!) example is the Microsoft
Office Assistant that pops up when you least expect it to
offer you help in writing a letter. Similar examples
include Prody Parrot and Bonzi Buddy, both of which
interact with the user in a variety of ways (including
telling jokes, playing games, reading email and
providing reminders). More advanced avatars such as
Haptek's Virtual Friend are now beginning to appear
although the resources required for this are beyond
those currently available on mobile phones and PDAs.
In the case of youngsters the use of avatars is
potentially very appealing. However, once again the
importance of personalisation comes into this.
Providing a standard avatar for all users is unlikely to
be effective. Instead in the Youngster platform a more
flexible mechanism is provided by which the
youngsters can select their own avatars and insert them
at appropriate points in the interface. They may even
develop their own, exchange them with friends, or
possibly even buy new avatars to use in the system.
Thus instead of a paper clip, a parrot or even a Homer
Simpson charicature, they may include their own
realisation of Britney Spears or the like.
This notion of exchangeable avatars not only provides
the opportunity for personalisation by the user but also
can be used to handle avatars on different devices.
Within the Youngster project we have focused on
producing avatars for both Java phones and PDAs.
Because of the limitations on Java phones the avatars
are more restricted than on PDAs making it all the more
important to allow youngsters to create their own.
The idea of having multiple profiles for different modes
of operation or identities of the user can be coupled
with the use of different avatars for different profiles
thereby adding to the interest level for youngsters.
3) REFERENCES
http://en.wikipedia.org/wiki/Avatar_(computing)
http://en.wikipedia.org/wiki/Embodied_agent
http://en.wikipedia.org/wiki/Automated_online_assistant
http://en.wikipedia.org/wiki/Artificial_intelligence
http://en.wikipedia.org/wiki/Natural_language_processing
http://en.wikipedia.org/wiki/Digital_conversation
http://www.existor.com/
http://alice.pandorabots.com/
http://www.pandorabots.com/botmaster/en/home
http://www.alicebot.org/documentation/ptags.html
http://www.chatbots.org/
http://www.chatbots.org/research/
http://www.chatbots.org/journals/
http://www.chatbots.org/papers/
http://www.chatterbotcollection.com/category_contents.php?id_cat=20
http://www.chatterbotcollection.com/item_display.php?id=6
http://www.alicebot.org/documentation/
http://www.alicebot.org/TR/2001/WD-aiml/
http://alicebot.blogspot.com/2009/03/basics-ii-simple-wildcard-reductions.html
http://www.pandorabots.com/botmaster/en/aiml-converter-intro.html
http://pandorabots.com/pandora/pics/spellbinder/desc.html
http://www-05.ibm.com/de/media/downloads/beyond-advertising.pdf
https://www.virtualeternity.com/
http://intellitar.com/
Intellitar™ is the developer of the Intelligent Avatar Platform™ (IAP) and creator of Virtual Eternity™. Founded in
2008, Intellitar is delivering the first intelligent and interactive technology platform for building and creating life-like
avatars or an "Intellitar". The IAP allows a user to create and train his or her Intellitar to accurately reflect the
personality, voice, look, knowledge, and life experiences of its creator. These Intellitar's can improve and expand the
online experience for businesses and individual users in a more realistic and life-like manner.
http://www.intellitar.com/Arwyn.php
http://en.wikipedia.org/wiki/ELIZA
http://www.evolver.com/
See www.loebner.net/Prizef/loebner-prize.html
www.haptek.com
See www.microsoft.com/speech/default.mspx
See www.a-i.com
See www.20q.net
R. Wallace, The Elements of AIML Style, ALICE
A. I. Foundation (2003), see www.alicebot.org
Towards Avatars with Artificial Minds: Role of Semantic Memory W_lodzis_law Duch,† Julian
Szyma´nski,‡ and Tomasz Sarnatowicz§ School of Computer Engineering, Nanyang Technological University,
Singapore
J. Weizenbaum, Computer Power and Human Reason:
From Judgment to Calculation. W. H. Freeman
& Co. New York, NY, 1976.
Image-Based Avatar Reconstruction by Maria-Cruz Villa-Uriol1, Alba P´erez2, Falko Kuester1
and Nader Bagherzadeh1 Visualization and Interactive Systems Group 2Robotics and Automation Laboratory
School of Engineering University of California
Avatar Augmented Online Conversation by Hannes Högni Vilhjálmsson
Submitted to the Program in Media Arts and Sciences,School of Architecture and Planning, on
May 2, 2003 in Partial Fulfillment of the Requirements of the Degree of Doctor of Philosophy at
the Massachusetts Institute of Technology
Allen, J. (1995). Natural Language Understanding. Redwood City, CA,
The Benjamin/Cummings Publishing Company, Inc.
Andre, E., T. Rist, et al. (1998). "Integrating reactive and scripted
behaviors in a life-like presentation agent." Proceedings of AGENTS'98:
261-268.
“Visual pattern recognition by moment invariants,” IRE
Transactions on Information Theory, vol. 8, no. 2, pp.
J. P. Barreto, K. Daniilidis, N. Kelshikar, R. Molana, and X. Zabulis. EasyCal, http:
//www.cis.upenn.edu/∼sequence/research/downloads/EasyCal/, January 2004.
D. Friedman, A. Steed, and M. slater. Spatial social behavior in second life. In
Intelligent Virtual Agents,
Roger D. 1999. Simulation: The Engine Behind the Virtual World. eMatter 1,
Simulation 2000 series (12). Also available at
http://www.modelbenders.com/papers/sim2000/SimulationEngine.PDF.
A.M. Collins, M.R. Quillian, Retrieval time from semantic
memory. Journal of Verbal Learning and Verbal
Behavior 8, 240-7,
Hubal, R.C., Frank, G.A., and Guinn, C.I. AVATALK
Virtual Humans for Training with Computer Generated
Forces pdf
The aim of this study has been to describe what the avatar is, how it structures our play and
our participation with a fictional world, and how avatar-based singleplayer computer games
are different from other kinds of singleplayer games; the avatar exploits the concretising
realism of the computer as a simulating machine, and situates us in a gameworld via
prosthetic and fictional embodiment.
4) APPENDICES
http://www.intellitar.com/
Intellitar™ is the developer of the Intelligent Avatar Platform™ (IAP) and creator of Virtual Eternity™. Founded in
2008, Intellitar is delivering the first intelligent and interactive technology platform for building and creating life-like
avatars or an "Intellitar". The IAP allows a user to create and train his or her Intellitar to accurately reflect the
personality, voice, look, knowledge, and life experiences of its creator. These Intellitar's can improve and expand the
online experience for businesses and individual users in a more realistic and life-like manner.
From here 2nd portion of editing starts the figures go in list of figures and charts ok as I have done in
sewquence and write the exact words I did in different I mean calibri font ok
Photo phitting of core avatar face
Core face model for avatar
Wireframe view before adding
WireFrame view for
skin and with features
adjustment of features
Flat shading mode view
after adjustment of features
With Lores skin
Incorporating model’s features such as eyes teeth skin etc..
Model with medres skin form…complete 3d face
Generating a peculiar face for the avatar rendering it to
morphing according to basic races here Asian race has been
given more priority for the avatar’s look
Rendering of avatar’s face from core avatar face to give a
realistic look genetic algorithm is employed as a means of
morphing employing randomness techinique of genetic
algorithm of
Basic Shape morphing of avatar’s feautres
Basic Colour morphing for avatar
Model with closed smile morphing
Model with eyes blinking and open smile morphing
Phoneme aah recitation
Avatar’s core 3D face model with added features importing
skin and hair for the realistic avatar creation ready to be
used with channelized phonetic recitation capability &
morphing of basic facial features to be able to converge with
speech enabled avatar
(Now niki u can add pics from our site of our avatar ok u
add text urself)
(The documentation remaining part
from here starts)
(Anbstract)
Our goal involoves e a baseline AI engine, and the necessary tools to train and modify the
artificial intelligence to reflect a more personal and uniquely created AI brain for user. The more
content a user provides to its AI brain, the responses generated by the AI are more dynamic and
begin to reflect the personality of the user. The graphical user web interface used for
Avatar, supports the creation of an emotional speech database. Stimuli to elicit
emotions can be provided by the interface, for example by reading a set of
emotional sentences. which facilitates the real experience of the emotions.
However, the sentences can also be personalized so as to help the reader to
better immerse into emotional states. This procedure reduces the effort of
building a prototypical personalized emotion recognizer to just a few minutes.
(Put this under SAI’s Fundamental concept used after intelligent
agens and before digital conversation )
Framework
Autonomous Facial Expression Synthesis by FaceGenModeller and VirtualEternity’s Speech and
Expression Technology
SAI Bot is based on the classical Evie : Owned by Existor Portal’s Avatar Bot framework.
Sensory Input
Processing
By
Brain
Actuator
Response
The model translates into the following:
Sensory Input: Text, Speech, Music, Video, Still Images, Touch.
Processing: Recognizing Input and synthesizing appropriate expression.
Actuator Response: Showing selected expression on simulated face model.
o The sensory input is any kind of stimulus provided to the Avatar bot by text.
o The brain is where the input is processed and a response is synthesized.
o The actuator is responsible for manifestation of the response.
3.2.2 Brain
The function of SAI’s brain is to
o Recognize and match input with patterns stored in the memory.
o Synthesize response based on match found
o Send signals to actuators responses to manifest the selected expression
The technology behind the brain is described in depth in section 3.3.2
Training and Storage:
- The brain is trained and stores a large number of input patterns.
- Input patterns will be different corresponding to different stimuli.
For example text input patterns will be language elements
- All such patterns are then mapped to responses in the form of output response which
basically works on unsupervised learning element of neural net employing backproprgation.
- Expressions are quantified and stored in the robot’s memory.
- Training is thus provided in terms of pattern creation for various stimuli, and
expression creation and storage for response.
3.2.3 Response
Quantification of a Facial Expression
The face is a structure made of bones, muscles and skin.
A graphical simulation uses the following replacements:
Bones: A wireframe model to define facial structure
Muscles: Contraction and Expansion motion provided to the wireframe
Skin: Texture map on top of the wireframe
Any expression is created by a particular orientation of bones and manipulation of
skin as driven by muscles.
Thus, the actuators are the facial muscles.
Hence a facial expression can be quantified by storing the amount of contraction
or expansion of each of the muscles of the face.
Philosophy and Goals
o Create conversationally interactive and environment sensitive animatronic
characters
o Progress towards the next generation of animatronic character technology
Inputs: Visual, auditory, and proprioceptive sensory inputs
Outputs: Vocalizations, facial expressions, communicative cues
Hardware
o Processors: insert urself
Operating Systems
o Windows NT
Software
o SolidWorks digital design software via FaceGenModeller 3.5
o Virtualeternity commercial portal’s technology is used to encode a non-linear interactive narrative
o Sphinx: Open source voice recognition software and also Virtualeternity commercial portal’s speech –
to-text synthesis technology via Myevoice inbuilt
o visual perception system via FaceGenModeller and CharacterBuilder
Features in avatar incorporated
Avatar
There are no limits. Artificial Intelligence is communication. Natural language is universal.
Face
Like an Instant Messenger, purely textual conversation can be very effective and the Avatar has speech and the effect is
more powerful.
Avatar SAI Bot can provide information in a natural and pleasing way.
The avatar is created from a portfolio of images according to the requisite by the user
Character
SAI with dynamic, AI-controlled reactions and emotions, adds another layer of engagement.
Voice
The voices we use with our avatars are licensed from our partners such as MyEvoice partner of
VirtualEternity portal And Similarly, the Speech recognition technology used
o Movable Eyelids and eyebrows and speaking phonemes
o visual perception concept
o the software enables to create a complete environment and stage designed for interactive show.
Learning
AI
The unique, universal contextual machine learning techniques are key. Memory of what happened when, where and
why
enable
predictions
of
what
should
come
next.
From past conversations, whether with the public or specialist trainers, the semantic memory predicts the thing most
appropriate for the Bot to say. In the same way it predicts the time reactions and emotions for an avatar to display,
and a virtual typing style to present.
With enough learning, these techniques enable lifelike, entertaining interactions on any subject, in any language.
And all that is needed for the learning to work is interaction itself, setting up a positive feedback loop.
Scripted AI
SAI B when used in commercial applications calls for more than lifelike probabilities. It calls for certainty in the
delivery and gathering of information, and the completion of processes. So we can write the script.
Any software can provide a branching tree of possibilities, loops, asides, searches and more. Any software could
provide fixed outputs along the way. Our outputs - what the Evie says - are highly dynamic, reflective of the users'
needs and their language style.
The real difference, however, is understanding. People can say things in nearly infinite numbers of ways, yet we
humans understand seemingly without effort. Machines have usually failed at this task - for example, picking up on
a few words and all too often getting the sense wrong.
Emotional AI
You will see Evie expressing emotions as you type, and as she responds. Our AI has learned how to do this from
user feedback over several years. Words describing reactions to input, and emotions felt as she replies, including
intensity, are converted to dynamic subtly-changing motion in avatar
3.3.2 Text Based Stimulus using A.I.M.L.
3.3.2.1 Introduction to A.I.M.L.
In its current implementation uses a modified version of Artificial
Intelligence Markup Language for text input recognition, processing and response
synthesis.
A.I.M.L. lies at the center of the robot’s brain. All intelligence is quantified, stored
and processed using A.I.M.L. as the common format for intelligence data
exchange.
Here is a block diagram of the functioning of A.I.M.L.
Block Diagram of functioning of A.I.M.L.
Responder acts as an interface between human and the A.I.M.L. engine whose
function is to interpret the data from responder and generate a response which can
be sent back to the human via the responder.
Bezz uses a modified version of A.I.M.L. which has been enhanced to include
expression data along with text response. The expression data is parsed by the 3D
simulation and appropriate expression is manifested.
The A.I.M.L. program and the facial simulation program are thus connected by a
pipe.
Mapping the Classical Evie Existor’s bot Framework
Elements of AIML
AIML contains several elements. The most important of these are described in further detail below.
[edit]Categories
Categories in AIML are the fundamental unit of knowledge. A category consists of at least two further
elements: the pattern and template elements. Here is a simple category:
<category>
<pattern>WHAT IS YOUR NAME</pattern>
<template>My name is John.</template>
</category>
When this category is loaded, an AIML bot will respond to the input "What is your name" with the
response "My name is John."
[edit]Patterns
A pattern is a string of characters intended to match one or more user inputs. A literal pattern like
What Is your name
will match only one input, ignoring case: "what is your name". But patterns may also contain wildcards,
which match one or more words. A pattern like
What is your name *
will match an infinite number of inputs, including "what is your name", "what is your shoe size", "what is
your purpose in life", etc.
The AIML pattern syntax is a very simple pattern language, substantially less complex than regular
expressions and as such not even of level 3 in the Chomsky hierarchy. To compensate for the
simple pattern matching capabilities, AIML interpreters can provide preprocessing functions to expand
abbreviations, remove misspellings, etc.
[edit]Template
A template specifies the response to a matched pattern. A template may be as simple as some literal text,
like which will substitute the user's any information (if known) into the sentence.
Template elements include basic text formatting, conditional response (if-then/else), and random
responses.
Templates may also redirect to other patterns, using an element called srai. This can be used to
implement synonymy, as in this example (where CDATA is used to avoid the need for XML escaping):
<category>
<pattern>WHAT IS YOUR NAME</pattern>
<template><![CDATA[My name is <bot name="name"/>.]]></template>
</category>
<category>
<pattern>WHAT ARE YOU CALLED</pattern>
<template>
<srai>what is your name</srai>
</template>
</category>
A hello world example in A.I.M.L.
<category>
<pattern>How are you today</pattern>
<template>
I'm fine, Thank you.
</template>
</category>
<category> tag encloses all nodes. There can be many <category> tags in an
A.I.M.L. file.
<pattern> tag indicates what the user is expected to type.
<template> is the corresponding response.
Extension of A.I.M.L for Expressions
A new custom tag <xp> is used to encode expression data into an A.I.M.L. file.
Example:
<category>
<pattern>How are you today</pattern>
<template>
I'm fine, Thank you. <xp> smile
</template>
</category>
The A.I.M.L. engine code was also modified to parse out expression data and send
to the facial simulation program.
Thus an extensive A.I.M.L. set with corresponding facial expressions is created as
the knowledge of the robot.
The A.I.M.L. based pattern matching and response is the robot’s intelligence.
The output is a facial expression corresponding to text response both of which are
driven by the initial text input
A.I.M.L. related information can be found at the A.L.I.C.E. AI Foundation website
Creators’ Thoughts
Our goal is to simulate life, or provide the illusion of life. To that end, we are
doing just that by trying to simulate a human (or anthropormorphic)
entertainment experience through an Avatar : SAI (Simulated Artificial Intelligence Bot)
put this under list of figures
text-to speech processing
Overview of a typical TTS system
A text-to-speech system (or "engine") is composed of two parts:[3] a front-end and a back-end. The front-end
has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the
equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization.
The front-end then assigns phonetic transcriptions to each word, and divides and marks the text intoprosodic
units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is
calledtext-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information
together make up the symbolic linguistic representation that is output by the front-end. The back-end—often
referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain
systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),[4] which is
then imposed on the output speech.