Download PhoNET-A Voice Based Web Technology - freesols

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PhoNET-A Voice Based Web Technology
Abstract
Voice based web access is a rapidly developing
technology. PhoNET is a solution for these and many
other problems faced by the netizens. The basic idea is
that using an ordinary phone to browse the web and the
primary motivations are: to provide a widely available
means for creating new interactive voice applications;
addressing needs for mobility; and addressing issues
inaccessibility. Basis of the idea are the age old IVR
systems used to serve information for the dialers through
a pre programmed process. Phonet is a very long journey
from the IVRs; it involves the most complex technologies
of the century Like Speech Recognition (SR), Text to
speech (TTS) conversion and artificial intelligence (AI).
This enables a user to be connected to internet as long as
he has access to a phone. PhoNET uses the traditional
HTML content so the web site need not be rewritten or
redesigned. We present a detailed analysis in the most
possible simplest way of how the technologies like SR,
TTS and AI are integrated to develop a intelligent
Platform (phoNET) to achieve voice based web access
which involves Document processing and Document
Rendering. In Document Processing we describe two
approaches, telephone browsing and transcoding,
focusing mostly on the former since that work is more
mature. In Document Rendering we present the major
problem i.e., the relevance of cognitive thought to text
rendering along with its most suitable solution. In the end
we examine the challenges and further developments
involved in practical application of the proposed
technology-The phoNET.
1. Introduction
Today’s telecom business has seen recent growth, especially
in bandwidth infrastructure for long distance (LD) and data.
The industry is currently experiencing strong growth in the
wireless segment as mobile devices prove to be very popular
with both consumers and business. An evolving market
segment is “Internet anywhere,” and many companies are
trying approaches to present viable products for this market.
One approach is Internet access over wireless devices such as
cell phones with a screen. However, this method has inherent
limitations such as small screen size, lack of a keyboard, the
need for a special device (web-enabled phone), the need to
rewrite and maintain a special website, and severe bandwidth
constraints using wireless data transfer protocols.
Another approach that is becoming popular is voice-based
limited Internet access, which overcomes all of the limitations
of the wireless data devices but one; they still limit access to
the few sites that are re-engineered for voice. They typically
deliver content such as news, weather, horoscopes, and stock
quotes, etc. over the phone. These companies are called
“Voice Portals.” Voice portals were the first web
applications that tried to integrate websites with voice
which gave birth to the enterprise based PBX systems.
Other solutions, such as Personal Digital Assistants,
phones with display screens and other Internet appliances,
are available, but have limitations. Users must have
special hardware with intelligence built in, and often must
view the Web through small, difficult-to-read screens.
Such devices are often expensive, as well.
Our solution, which presents a third option, gives users all of
the benefits of the voice portals, yet has complete access to
the entire Internet without limitation. With our Voice Internet
technology phoNET, anyone can surf, search, send and
receive email, and conduct e-commerce transactions, etc.
using their voice from anywhere using any phone, with the
more freedom of movement than a standard Internet browser
which requires a PC and an Internet connection.
PhoNET technology is faster and cheaper than existing
alternatives. Today, only the largest of companies are making
their Web sites telephone-accessible because existing
technology requires a manual, costly and time-consuming rewrite of each page. With the voice internet technologyphoNET, existing Web pages are used, allowing users to
leverage their Web investment. The software dynamically
converts existing pages into audio format, significantly
lowering the up-front investment a business must make to
allow users to hear and interact with their Web site by phone.
2.Motivation
The primary method of access today continues to be the
computer, which has certain advantages as well as some
limitations. Computers offer a visual Internet experience that
is usually rich in content. Some basic computer skills and
knowledge are needed to access the Internet. But, computerbased access is proving insufficient for the professional on the
move. When in the car or away from the office or computer,
accessing the Web is difficult, if not impossible. And, an
increasing number of people prefer an interface that allows
them to hear and speak rather than see and click or type.
The computer-based Internet experience also does not meet
the needs of another segment of the population – the visually
impaired. Neither visual displays of information, nor
1
keyboard-based interactions naturally meet their needs, and
this segment is often unable to benefit from all that the
Information Age has to offer.
Some existing Internet users have also identified problems
with the visual Internet experience. Pages are increasingly full
of graphics, advertisement banners, etc., which move, flash,
and blink as they vie for attention. Some find this
“information overload” annoying, and lament the delays it
creates by severely taxing the available bandwidth.
The "Digital Divide"
While computers and their use are on the rise, they’re not
ubiquitous yet. A large segment of the population still doesn’t
have access in the United and other parts of the world. Thus,
Internet is limited to only a small fraction of the world
population; the majority is left out from the Internet. This gap
between those who can effectively use new information from
the Internet, and those who cannot is known as The digital
divide. Bridging this digital divide is the key to ensure that
most people in the world has the capability to access the
Internet. Making computers ubiquitous is not a very attractive
and feasible solution, at least in the near future, because of
various barriers. One key barrier is cost, although the price of
a computer has come down significantly in recent years.
Insufficient visual Internet Infrastructure is another barrier in
many countries and it will take a while to build such
infrastructure. Other consumers have a basic distaste for
complex technology, which prevents them from accessing
Web-based information via a computer. A more natural, less
cumbersome way to interface with the net would provide
them an opportunity to experience the Internet as well, thus
bridging the Digital Divide.
As the need for alternative access to the Internet becomes
more evident, several technology companies are pursuing
solutions. Their products include “smart” cell phones with
visual displays, intelligence built into the handset, and voiceactivated Web sites. These products address different aspects
of the problems outlined above.
While these alternative technologies are in the pipeline, few
are ready for market. But the very existence of a race to
market by many companies is evidence of a large potential
market.
.
3. The Challenge
To integrate existing technologies, or develop new
technologies, to make simple, affordable, alternative
Internet access possible.
As the need for an alternative access method to the Internet
has become evident, progress continues to be made by
technologists to provide such solutions. One key area of focus
has been voice-based technology, which would allow a very
natural interface for most people, and address the limitations
described earlier. A voice interface provides an alternative to
the visually based interface. A device such as the telephone
provides a readily accessible alternative to the computer.
Several technologies existing today are keys to the solution,
but the problem lies in successfully integrating these
technologies into useful applications of greater value than
their individual components. These technologies include:

Voice Extensible Markup Language (VXML)
which is an extension of HTML, the normal
language in which Web pages are created. This
technology adds voice capability to a Web page.
The page can then be displayed, as usual, over a
computer, but it can also be presented in audio
format with voice navigation.

Speech Language Application Tags (SALT)
specification
for
supporting
multimodal
communication from PCs, cell phones, PDAs and
other handheld devices. For example, input can be
voice (such as asking for directions) and output can
be data (a map pops up). SALT is a lightweight set
of extensions to existing markup languages,
allowing developers to embed speech enhancements
in existing HTML, xHTML and XML pages. As
with VoiceXML, applications will be portable thanks to the separation from the underlying
hardware and platform.
The "Language Divide"
Today more than eighty percent of website contents are
written in English language. People in China, Japan and other
countries in Asia, countries in Europe and Latin America
speak a language other than English as their native language.
These people are left out of a significant portion of the World
Wide Web. For example, Japanese or a Chinese can not
understand the content of CNN or New York Times. This gap
of not having access to major part of the Internet because of
language barrier is called "The Language Divide".
Bridging this Language divide is the key to ensure that
most people in the world has the capability to access the
major part of the Internet. The demand for machine
translation is growing phenomenally as more people each day
embracing the Internet. A service that translates the accessed
information into the desired language would clearly add value
to these users.
2

Speech recognition (SR), which allows computers,
through the use of software, to recognize spoken
language, eliminating the need for the computer
keyboard as an interface. The vocabulary
recognized in products using this technology tends
to be limited.

Text-to-speech (TTS), which allows text to be
converted automatically to synthetic speech. It
allows communications between computers and
humans through a “natural” interface, such as
speech.

Telephone integration is the key to interface with
computers from a remote location. A protocol is
needed to communicate with the computer from a
telephone using voice. This also includes
multimedia integration (e.g. with .wav files).

Intelligent software agents are needed to automate
communication between a telephone and a
computer, a computer and a Web site, to interpret
the contents of a web page, to extract key
information that makes sense in audio, to efficiently
navigate through web pages, and to manage access
to the Internet.

Language processing allows translation to other
languages, understanding and interpreting of
structured sentences. Natural language processing
allows us to understand and interpret human
languages.
The first technology listed – VoiceXML – is a very elegant
solution that leverages technology specifically developed for
audio Internet access. However, it requires that Web sites be
customized, or VXML-enabled. This means rewriting the web
pages in VXML. According to analysts, today there are more
than a billion web pages. Assuming that it takes one hour and
costs about $100 to rewrite one page, the cost to voice-enable
all sites would be about $100B. Clearly, it will take several
years before the majority of popular pages are VXMLenabled. Today, only a very small portion of the total Web
pages is voice-enabled using VXML.
The second technology listed – SALT – is another elegant
technology that allows developers to embed speech
enhancements in existing HTML, DHTML and XML pages.
However, like VoiceXML, it requires that websites be
rewritten or enhanced with SALT and hence it will also take
many years before majority of popular pages are SALTenabled.
The paper presents a solution that successfully integrates the
other technologies listed into a useful, audio-based approach
for accessing the Internet today. It is independent of the
timeline, interest and willingness of content providers to
update their pages to be VXML or SALT-enabled.
Another approach is to provide Internet access over wireless
devices such as palm pilot or a cell phone with a screen.
However, this method has inherent limitations such as the
small size of the screen and the need for a special phone. Also
there is need to rewrite the website in WML. Today’s wireless
Internet industry is facing many challenges due to limitation
of bandwidth and small screen. The cost of cell phone based
Internet access is very high and users do not want to pay high
service fee. Also our eyes and fingers are not changing but the
devices are getting smaller and smaller. Thus, existing visual
based access is going to be even more difficult in future.
4. The phoNET Solution
An audio Internet Technology that allows users to listen
to email, buy on-line or surf and hear any Web site, using
a simple and natural interface - an ordinary telephone.
No computer is needed.
Subscribers dial a toll-free number, and start accessing the
Internet using voice commands. Speech recognition
technology in the company’s system allows users to give
simple commands, such as "go to Yahoo" or "read my email"
to get to the Net-based information they want, when they want
it, whether they’re out on an appointment, stuck in traffic,
sitting in an airport, or cooking dinner. They’ll be able to
quickly locate information, such as late-breaking news, traffic
reports, directions, or anything else they’re interested in on the
World Wide Web. Our product phoNET has the capability to
automatically down load web contents, filter out graphics,
banners and images. It then renders extracted texts into
concise, meaningful and very suitable in audio format texts
before using TTS to convert into speech. phoNET also
converts the rendered texts into other languages in real time. It
can also be easily integrated with any back end application
such as CRM/SCM, ERP etc. Thus, phoNET completely
3
eliminates the need to rewrite any content in VXML, SALT
or WML. So we strongly believe
that our automation based approach
will be very successful. Using textto-speech
technology,
an
"intelligent agent" will read the
requested information out loud via a
computerized voice, and process
the user’s voice commands.
speaking person can ask to surf an English website in Chinese
5. Technology Overview.
The idea of
listening to the Internet may at
first sound a bit like watching the
radio. How does a visual medium
rich in icons, text, and images
translate itself into an audible
format that is meaningful and
pleasing to the ear? The answer
lies in an innovative
integration of three distinct
technologies that render visualcontent into short, precise,
easily navigable, and meaningful text that can be
converted to audio.The technologies and steps employed
to accomplish this feat are:
Document Processing
1. Speech recognition
2. Text-to-speech translation, and
Document Rendering
3. Artificial Intelligence
The phoNET platform acts as an “Intelligent Agent”
(IA) located between the user and the Internet (Figure 1). The
IA automates the process of rendering information from the
Internet to the user in a meaningful, precise, easily navigable
and pleasant to listen to audio format. Rendering is achieved
by using Page Highlights (a method to find and speak the key
contents on a page), finding right as well as only relevant
contents on a linked page, assembling right contents from a
linked page, and providing easy navigation. These key steps
are done using the information available in the visual web
page itself and proper algorithms that use information such as
text contents, color, font size, links, paragraph, and amount of
text. Artificial Intelligence techniques are used in this
automated rendering process. This is similar to how the
human brain renders from a visual page; selecting the
information of interest and then reading ii.
The IA includes a language
translation engine that dynamically translates web contents
from one language into another in real time. Thus, a Chinese
- the Intelligent Agent would access the English website,
extract the content of the website and translate it on the fly in
Chinese and read it back to the user in Chinese.
The platform incorporates the highest quality speech
recognition and text to speech engines from third party
suppliers.
ThePhoNET architecture is shown in Figure 2. The process
starts with a telephone call placed by a user. The user is
prompted for a logon pass phrase. The users pass phrase
4
establishes the first connection to the Web site associated with
that phrase and loads the first Web page. The HTML is
parsed, as described later, separating text from other media
types, isolating URL from HTML anchors and isolating the
associated anchor titles (including ALT fields) for grammar
generation. Grammar generation computes combinations of
the words in titles to produce a wide range of alternative ways
to say subsets of the title phrase. In this process simple
function words (i.e., "and," "or," "the," etc.) are not allowed
to occur in isolation where they would be meaningless.
Browser control commands are mixed in to control typical
browser operations like "go back "and "go home" (similar to
the typical browser button commands). Further automatic
expansion of the language models for link titles can include
thesaurus substitution of words, etc. soon to be implemented
.The current processing allows the user to say any keyword
phrase from the title with deletions allowed to activate the
associated link. Grammar processing continues with
compilation and optimization into a finite-state network where
redundancies have been eliminated. The word vocabulary
associated with the grammar is further processed by a TextTo-Speech (TTS) pronunciation module that generates
phonetic transcriptions for each word of the grammar. Since
the TTS engine uses pronunciation rules it is not limited to
dictionary words. The grammar and vocabulary are then
loaded into the speech recognizer. This process typically takes
about a second. At the same time, the Web document is
described to the user, as discussed later.
The
user may then speak a navigation or browser command phrase
to control browsing. Each user navigation command takes the
user to a new Web page. If the command is ambiguous the
dialog manager collects the possible interpretations into a
description list and asks the user to choose one. Recognizing
that a collection of Web pages is inherently a finite-state
network, it is easy to see that this mechanism provides the
basis for a finite-state dialog and Web application control
system.
6. Document Processing
Document analysis is performed in the
HTML parser, grammar generator, and Hyper Voice
processor modules. The typical HTML Web page is first
parsed into a list of elements based mostly on the HTML tags
structure. Some elements are aggregations (tables, for
instance) but the element list is not a full parse tree, which we
found was not needed and in some cases actually complicates
processing. Images, tables, forms and most text structure
elements like paragraphs are recognized and processed
according to their recognized type. Much of the effort in
building a robust HTML processor is dealing with malformed
HTML expressions such as unclosed tag scope, overlapping
tag scopes, etc. Unfortunately space does not allow for fully
addressing this issue here. Commercial browsers currently
handle these issues in differing ways. Briefly, the handling of
HTML errors by Phone Browser mostly follows the style of
Netscape Communicator. Images often have ALT attribute
tags that are used to derive the voice navigation commands
for these items. The location of each image is announced
along with any associated caption. This feature can be
disabled on a site-by-site basis when the user does not want to
hear about images. Tables are first classified according to
purpose, either layout or content. Most tables are actually
used for page layout which can be recognized by the variety
and types of data contained in the table cells. Data tables are
processed by a parser according to one of a set of table model
formats that Phone Browser recognizes. This provides
primarily a simple way of reading the table contents row by
row, which is often not very satisfying. Alternatively a
transcoder can be used to reconstruct the table in sentential
format. An example of this is a Phone Browser stock quote
service where the transcoder extracts data from the Website
table and builds a new Web page containing sentences
describing the company name, ticker symbol, last trade price,
change, percent change and volume rather than simply
reading the numbers. The user can also ask for related news
reports. Forms with pop-up menus and radio buttons are
handled by creating voice command grammars from the
choices described in the HTML. Each menu or button choice
is spoken to the user who can repeat that phrase or a phrase
subset to activate that choice. Open dialog forms (e.g. search
engines) present a larger challenge. Since there is nothing to
define a grammar, the implication is a full language model.
While large vocabulary dictation speech systems are
available, most require speaker training to achieve sufficiently
5
high accuracy for most applications. Phone Browser is
intended to be immediately usable without training so
dictation is not yet supported. This also implies that creating
arbitrary text for messaging is also not yet supported. One
additional type of form input is an extension to HTML. A
GSL (Grammar Specification Language) or JSGF (Java
Speech Grammar Format) specification can be inserted into
an HTML anchor using an attribute tag (currently LSPSGSL).
Using this method an application can specify an elaborate
input grammar allowing many possible sentences to address
the associated hyperlink and construct a GET type form
response where the QUERY_STRING element is constructed
by inserting the speech recognition text results. Grammar
specifications written this way may represent many thousands
of possible sentence inputs giving the end user great speaking
flexibility.
7. Document-Rendering
Rendering-Definition
Information technology uses this term ‘Rendering’ refers to
how information is presented according to the medium, for
example, graphically displayed on a screen, audibly read
using a recording device, or printed on a piece of paper.
In the context of voice/audio Internet, Web content rendering
entails the translation of information originally intended for
visual presentation into a format more suitable to audio.
Conceptually this is quite a straightforward process but
tactically, it poses some daunting challenges in executing this
translation. What are those challenges and why are they so
difficult to overcome? These questions are explored in the
next section of this paper.
The Rendering Problem
Computers possess certain superhuman attributes, which far
outstrip that of mortal man—most notable are their
computational capabilities. The common business spreadsheet
is a testament to this fact. Other seemingly more mundane
tasks, however, present quite a conundrum for even the most
sophisticated of processors. Designing a high-speed special
purpose computer capable of defeating a grandmaster at chess
took the computing industry over 50 years to perfect.
Employing strategic thinking is not a computer’s forte. That is
because in all the logic embodied in their digitized ones and
zeroes, there is no inherent cognitive thought. This one
powerful achievement of the brain along with our ability to
feel and express emotion separates the human mind from its
computerized equivalent—the centralized processing unit
(CPU).
The relevance of cognitive thought to text rendering may
not be immediately obvious but it is one of the major
challenges faced when attempting to present information
designed for one medium and rendering it to another. This is
because there are no hard and fast objective rules to follow.
Computers are very good at following instructions when they
can be reduced to very objective decision points. They are not
so good when value judgments are involved. A human being
can readily distinguish a cat from a dog, or a relevant news
link on a Web page from a link for an advertisement. For a
computer this simple exercise is significantly more
challenging than applying the Taylor expansion formula to a
set of polynomials—something a computer can do quite
handily.
Solving the problem
To solve the rendering problem, some intelligent techniques
must be applied. The relevant data must be selected,
navigated to its conclusion, and reassembled for presentation
by a different medium. All of this must be done for all web
pages, dynamically, in real-time and in an automated fashion.
We have used an Intelligent Agent (IA) that uses various
intelligence techniques including “artificial intelligence”.
Using Visual Clues
Understanding the process that our brains go through in
making qualitative choices is key to developing an artificially
intelligent solution. In the example of Web page navigation
we know that our brains do not attempt to read and interpret
an entire page of data rather they take their cues from the
visual clues implemented by the Web designer. These clues
include such things as placement of text, use of color, size of
font, and density of content.
From these clues a list of potential areas of interest can be
developed and presented as a list of candidates.
Upon selecting an item of interest it is
common to have to navigate to another Web page to read all
the data of interest (just like in the newspaper example). To
do so we click on a Web link. When following a page link the
problem of continuity of thought is encountered because
almost assuredly the newly linked page contains data in
addition to the thread of information we are attempting to
follow. In order to maintain continuity with the item from the
previous page a contextual correlation must be made. Once
again, this cognitive process poses a formidable challenge for
the computer and requires application of Intelligent Agent (or
artificial intelligence) principles to solve.
Simplifying for speech
The first step involves dynamically removing all the
programming constructs and coding tags that comprise the
instruction to a Web browser on how to visually render the
data. HTML, CHTML, XML, and other languages are
6
typically used for this purpose. Because the data is now being
translated or rendered to a different medium, these tags no
longer serve any purpose.
It is doubtful that every single data item on a page will be
read. Just like reading a newspaper, we read only items of
interest and generally skip advertisements completely. Thus,
we need to automatically render important information on a
page and then when a topic is selected, only the relevant
information from the linked page corresponding to the
selected topic needs to be presented. . Rendering is achieved
by using Page Highlights (using a method to find and speak
the key contents on a page), finding right as well as only
relevant contents on a linked page, assembling right contents
from a linked page, and providing easy navigation.
A column of text information can be
converted to audio that can be heard with ease. The rate of
hearing i.e. content delivery can be controlled to suit user’s
needs. The selection of a website, Page Highlight, speed of
hearing etc can be all done by Voice Commands. This results
Voice Internet i.e. basically talking and listening to the
Internet.
The same column of text can be displayed on a small screen
that can be viewed at ease as a small screen can easily display
a column; but not a whole page. The contents are then
automatically scrolled using various speeds and hence can
easily be viewed and absorbed at ease. This is what results a
MicroBrowser or “true” wireless Internet that does not need
any re-write of the website and presents contents at ease in a
meaningful way.
Finding and Assembling Relevant Information
To find relevant information, the Intelligent Agent (IA) uses
various deterministic and non-deterministic algorithms that
use contextual and non-contextual matches, semantic analysis,
and learning. This is again very similar to how we do use our
eyes and brain to find the relevant contents. To ensure realtime performance, algorithms are simplified as needed yet
producing very satisfactory results. Once relevant contents are
determined, they are assembled in appropriate order that
makes sense when listen to in audio or viewed on a small
screen.
A content rich page with a small number of links makes
rendering and navigation easy since there are only a few
choices, and one can quickly select a particular topic or
section. If the site is rich in content, links and
images/graphics, the problem is more difficult but good
solution still exists by carefully selecting a built-in feature
called “Page Highlights”. The most difficult case is when a
page is very rich in images/graphics and links. In such cases,
the main information is located several levels down from the
home page and so navigation becomes more difficult as one
has to go through multiple levels. Using multi-level Page
Highlights and customized Highlights, the content can still be
rendered well. But in this case, it is not as easy to navigate as
the other two cases. Usually most of the Internet contents fall
under the first and second categories.
Rendering to a new medium
The two key media for rendering into are Audio using any
phone and Visual using a cell phone screen or PDA. There is
a good synergy between these two modes from rendering
standpoint. Both need small amount of meaningful
information at a time that can be heard or viewed at ease with
easy navigation. This is achieved by using Page Highlights
mentioned above and finding relevant contents, column at a
time like we do when we read a news paper or website.
8. Applications
This technology can find applications for Service
Providers, Businesses and governments in the following
areas:
For Service Providers 



Surf and browse the Web ·
Email (send, receive, compose, copy, forward,
reply, delete and more) ·
Search the Web ·
Voice Portal Features such as News, Weather,
Stock Quotes, Horoscopes and more.
For Businesses








Airline reservations and tracking
Package tracker
Reservations
eCommerce
Customer service
Alert service
Order Confirmation
CRM Applications
For Governments



All key benefits for Businesses Plus
Easy Accessibility to all Government contents
More
efficient
communication
between
Government and citizens, Government and
7

businesses,
and
between
Government
departments.
Meeting Government obligation to bridge the
Digital and Language Divides (with automatic
translation of Internet content from English to
any other language and vice-versa.), and
providing Internet to elderly, visually impaired
and blind people in a very simple and cost
effective way.
6. http://www.lhs.com/
7. http://trqce.wisc.edu/world/web
8. http://www.dcp.ucla.edu/
9. Future Work
Implementation and results of this new proposed technology
depends on the developments in the field of speech
recognition and artificial intelligence. The major challenge of
this technology is its complexity which comes with a high
price tag. But it is not far more than the complexity and cost
involved in voice enabling a web site using the present
technologies. We believe that Voice/speech based interface
options will become an important part of the overall
solution to access the Internet content. And an automated
approach to Voice-Enable or create Voice Portals would
be most practical and more common way than rewriting
web contents in different languages and maintaining
multiple version of the web sites.
10. Conclusion
we considered the possibility of accessing web through an
ordinary phone . We presented a new technology which
provides a true audio Internet experience. Using an
ordinary telephone and simple voice commands, users will
be able to surf and hear the entire Internet for the
information they desire. A computer is not needed. Any
web page will be accessible , but not limited to sites
written with Wireless Application Protocol, and pages that
are specially written in Voice Extensible Mark-up
Language (VXML). We presented a detailed analysis of
how the technologies like SR, TTS and AI are integrated
to develop a intelligent Platform (phoNET) to achieve
voice based web access. We presented the major problems
involved in Document processing and document rendering
along with solution.
References
1.
2.
4.
5.
http://www.w3.org/Voice/
http://www.voicexml.org/
Internet speech Inc.
Avaya Labs
8