Download May 2015 - TMA Associates

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial intelligence wikipedia , lookup

Computer and network surveillance wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
May 2015
ISSN 1932-8214
Editor, William Meisel
Google adds App Indexing
Lets Google index apps like websites and search within them
In a talk at the Mobile Voice Conference in Vemuri explained to Speech Strategy News:
April, “Grow with the Future of Search and
“App Indexing lets Google index apps just
Apps,” Sunil Vemuri, Product Manager,
like websites. Deep links to your Android
Google, noted that an increasing number of
app appear in Google Search results, letting
searches are being done within apps on mobile
users get to your native mobile experience
devices rather than through web browsers,
quickly, landing exactly on the right content
emphasizing that “mobile is not desktop”;
within your app. And, if users don't have
Google is supporting this trend with an “app
your app installed, Google will surface an
indexing API” that allows independent apps
install button, directing them to the Play
with search functionality to be directly searched
Store, where they can install your app.”
from within Google searches (including voice Further information on the app indexing API is
search) without first launching the independent at https://developers.google.com/app-indexing.
Continued on page 20
app.
David Nahamoo of IBM discusses “Cognitive Computing”
Automating and augmenting human intelligence to deal with “big data”
David Nahamoo, IBM Fellow, Conversational inflection point of the growth of “big data,” with
Systems, Watson Group, IBM, gave a talk on data from sensors/devices and social media
“Cognitive Computing: Automating and leading the charge. He noted that the exponential
Augmenting Human Intelligence” at the Mobile growth of Moore’s Law, which has changed our
Voice Conference in April organized by AVIOS lives significantly, has physical limits, while the
and Bill Meisel. Nahamoo claimed, despite the growth of data doesn’t. He said that “data is the
growth of data on the Web, that we are at an
Continued on page 21
SmartAction launches “Intelligent Voice Automation” for customer service
Conducts a context-sensitive dialog rather than using a pre-structured menu
Last month’s Editor’s Notes suggested that
In a statement, the company said that a
companies should design a “Customer traditional hosted IVR system generally lacks an
Involvement System” (CIS), not an Interactive underlying intelligence that directs the
Voice Response (IVR) system, given the lack of conversation and cannot link deeply with a
flexibility and long series of menus associated company’s database. This limits the breadth of
with classical IVR systems. SmartAction in applications that can be implemented without
April suggested something along the same lines, annoying customers with cumbersome menus or
an “Intelligent Voice Automation” (IVA) system becoming a programming challenge.
Continued on page 21
that “provides the highest level of completion
with artificial intelligence call automation.”
Recent blogs at TheSoftwareSociety.com (on the human-computer connection):
The role of the Top 1% in reducing income inequality
The US stock market: Good as gold?
Speech Strategy News
May 2015
2
Table of Contents
Google adds App Indexing
Lets Google index apps like websites and search
within them
David Nahamoo of IBM discusses “Cognitive
Computing”
1 1 1 Automating and augmenting human intelligence to
deal with “big data”
1 SmartAction launches “Intelligent Voice
Automation” for customer service
1 Conducts a context-sensitive dialog rather than using
a pre-structured menu
1 Editor’s Notes
4 Machine intelligence requires human intelligence
to be effective
4 Bill Meisel, Publisher & Editor
4 Mobile Voice Conference shows rapid progress in
“the intelligent connection”
5 Always available natural language connection to
computing resources and information
Amazon Web Services announces Amazon
Machine Learning
Make predictions based on a database with no
machine learning experience
5 8 8 8 8 Indian bank to use Nuance voice authentication 9 9 GTL addresses inmate identification on calls with
biometric voiceprints
9 Detects in realtime changes in speaker during a call 9 NewsHedge launches financial news service
10 Translate Your World launches new version
10 Runs as an app in a desktop browser using text-tospeech for audible alerts
10 Applicable to teleconferences, education, and
consultations with customers in stores
x.ai creates a virtual assistant for scheduling
meetings
14 Physician-friendly CDI software and services deliver
improvements in medical documentation
14 Winscribe releases Quick Speech Recognition for
healthcare professionals
15 Immediate speech recognition shows results to
person dictating
15 Nuance has new clinical documentation tools for
mobile devices and wearables
16 Joins Samsung at HIMSS to preview new dictation
capabilities on Samsung Gear S Watch
16 MedMaster Mobility interprets physician dictation
into mobile devices
17 250,000 movies and TV episodes available for
streaming
11 11 12 17 NIST machine learning challenge for language
recognition
17 Based on the i-vector paradigm
17 Fujitsu introduces a communications tool for the
hearing impaired
18 Speech converted to text in real time in meetings or
classrooms
18 New NissanConnect Services program set to
launch on 2016 Nissan Maxima
18 8.0-inch color display with multi-touch control and
speech recognition
18 Brainasoft offers personal assistant for
controlling a Windows PC
19 Speak or type text to do tasks such as play music,
open programs, or dictate text
19 IBM tests Numenta’s “brain algorithms”
Machine intelligence based on “principals of the
Translates voice or text into 34 languages in real time10 neocortex”
Fujitsu ties written material to a spoken
presentation
13 Roku streaming TV player adds voice search for
content
17 7 Authentication doesn’t require specific passwords
Voice search and 1-click purchasing
M*Modal launches Clinical Documentation
Improvement platform
6 Auraya Systems releases “universal speaker
recognition” system
Targeted at small- and medium-sized businesses
Amazon introduces shopping app for Apple watch13 Integrates Nuance healthcare speech recognition and
NLP
17 Speech analytics applied to optional customer
feedback after a call
Single license for all functionality
App identifies spam calls and gives them a “number
disconnected” message
13 6 NICE Systems announces analytics to improve the
IVR experience
7 CallFinder speech analytics to be available to
customers of EPIC Connections
YouMail provides an answer for spam calls on
mobile phones
13 20 20 VERBATIM-VR improves speech recognition by
letting users report errors
20 Tunes speech-to-text for individual companies
20 News briefs .............................................................. 22 Speech recognition, image recognition, and machine
learning top Google CEO’s list of more important
Ericsson launches closed captioning service in US12 projects .........................................................................22 Live subtitling for broadcasters and operators using
Sensory CEO discusses how “deep learning” relates to
speech recognition
12 privacy .......................................................................... 22 Gates notes Microsoft’s 40th anniversary ..................... 22 Assistant analyzes email communications using NLP12 Speech Strategy News
May 2015
3
Orion adds IVR capabilities to its public sector
Tencent develops smartphone operating system ........ 30 workforce management software............................... 22 Interactive Intelligence launches cloud services in
Mphasis to use Artificial Solutions natural language
Australia and New Zealand ......................................... 30 technology in customer support ................................. 23 NSF grant supports Alelo research in teaching language
Convergys Analytics and Nexidia announce partnership23 and cross-cultural communication with avatars and
Cable operator selects Fonolo call-backs to improve the
robots ........................................................................... 30 customer experience ................................................... 23 Nuance voice biometrics chosen by SK Telecom ........ 23 Statistics and Surveys ............................................. 31 I PRINT N MAIL analyzes responses from direct mail
Mobile advertising revenue will top $60 billion globally
with speech-recognition tools ..................................... 23 in 2019 ......................................................................... 31 Nuance, MEDITECH, and IMO collaborate to automate
Speech Analytics market reviewed ............................... 31 patient problem lists and support regulatory reporting24 44% of US adults live in mobile-phone-only households31 Lyft expands its offerings to include Nuance’s Dragon
Voice search use rising .................................................. 31 Medical Practice Edition 2 for otolaryngologists ....... 24 Google will take 55% of search ad dollars globally in
Acusis service enters patient encounter summaries into
2015 ............................................................................. 31 an Electronic Health Record ....................................... 24 Mobile ad spend to top $100 billion worldwide in 2016,
Accusonus speech enhancement available for Cadence
51% of digital ad market ............................................. 32 DSPs ............................................................................. 25 Facebook accounts for three-quarters of global social
OKI microphone technology picks up sound in specific
network ad spend ........................................................ 32 areas, using two microphone arrays .......................... 25 Robotics sales flourish ................................................... 32 VXi Bluetooth headset includes noise-cancelling and
US ad spending in 2015 ................................................ 32 voice prompts .............................................................. 25 “Augmented reality” predicted to be four times bigger
Intel shows prototype smartphone with reduced-size
than “virtual reality” by 2020 ..................................... 32 RealSense technology ................................................. 26 Web self-service surpasses phone in customer service
CPqD biometric authentication and Brazilian
channel preference ..................................................... 32 Portuguese speech recognition available on a new
Microphone market to reach $1.81 million by 2020 .. 33 IBM chip ....................................................................... 26 60% of consumers self-install smart home devices, but
New Tensilica Fusion DSP from Cadence Design
majority would prefer professional assistance .......... 33 Systems features low energy use ............................... 26 Artificial Intelligence for enterprise applications to reach
Microsoft’s browser update said to include Cortana ... 26 $11.1 billion in market value by 2024 ...................... 33 Microsoft’s Skype Translator test preview adds new
“Cognitive computing” market projected to grow at 38%
languages and other options ...................................... 26 CAGR to 2019 .............................................................. 33 Getty Images and Microsoft partner to add images to
products like Bing and Cortana .................................. 27 Financial Notes ...................................................... 34 Cortana to recommend movies ..................................... 27 Blinkx acquires All Media Network ................................ 34 Amazon Echo can now be used to control WeMo and
Adacel acquires CSC’s NexSim ATC simulator business34 Hue home devices ....................................................... 27 PeerTV acquires an interest in Speech Modules that
Amazon Echo adds podcasts ......................................... 27 gives it some exclusive rights outside Israel ............. 34 Siri’s synthetic voice gets some improvements ........... 27 SensorSuite raises capital for its wireless monitoring
Dictionary.com app supports Apple Watch with speech
and energy saving solutions for large buildings ........ 35 recognition to display definitions................................ 27 SoundHound + LiveLyrics offers new Apple Watch app28 People ...................................................................... 35 Apple moves to Siri back-end built on open-source
Apache Mesos platform .............................................. 28 Interactions adds to management team, including
former AT&T research personnel................................ 35 IBM teams with Apple and others on AI health program
using Watson ............................................................... 28 David Stone joins Inference as VP Sales, APAC ........... 35 Fonolo adds John Gengarella to its Advisory Board ..... 36 Digital Alert Systems adds enhanced multilingual
alerting for Emergency Alert Systems ........................ 28 Attensity appoints Cary Fulbright as Chief Strategy
Officer ........................................................................... 36 Peterbilt introducing next generation SmartNav
infotainment system .................................................... 29 For Further Information on Companies Mentioned in
Infobip adds inbound and outbound voice
this Issue ................................................................. 36 communications to its mobile services cloud ........... 29 New Ford Galaxy includes SYNC 2 with voice control .. 29 Blog (with a chance to comment!)
43 CogniToys toy dinosaur can answer questions and more29 The Software Society (www.thesoftwaresociety.com).. 43 ChatGrape launches search engine for specific apps
and documents ............................................................ 29 A5 Technologies uses speech recognition to teach
English to Japanese speakers .................................... 30 Geppetto Avatars developing AI-based platform with
avatars.......................................................................... 30 Speech Strategy News
May 2015
4
Editor’s Notes
Machine intelligence requires human intelligence to be effective
Bill Meisel, Publisher & Editor
“Machine learning” is being delivered as a and we have only 250,000 samples in each box
Web service, with Amazon being the latest entry (assuming they are evenly distributed). Carry
(p. 6). Such services seem to imply that a this forward to 20 variables, and there are 220
company can just ship data to the services and (1,048,576) boxes, and less than one data point
get predictive algorithms with no expertise and (one labeled example) available for each box,
minimal involvement. While there may be some hardly enough to give us confidence in the
applications where this is the case, it is likely to statistical significance of the result (and, of
be the exception.
course, we’d like more resolution in each
The key issue is how the data is described, the variable than half its range). This illustrates what
variables used as inputs to the predictive has been called the “curse of dimensionality”; it
algorithm. Often, just using raw data is has been recognized as a real problem in
ineffective, and human understanding of the data empirical analysis since Richard Bellman, the
is required to summarize it in terms of the inventor of “dynamic programming,” coined the
“features” most relevant to making an accurate term in the 1960s.
prediction. The objective is to describe the data
Where human intelligence comes in is
with fewer variables, but variables that defining variables that summarize key aspects of
sufficiently describe the input conditions. To the input data. In some cases, statistical methods
give an example familiar to most readers, speech can help. For example, “principal components”
recognition algorithms don’t use raw digitized analysis can determine which of the original
speech data, but reduce it to a description of the variables are linearly correlated and can provide
frequency spectrum (cepstral coefficients) a description of the data with a lowered
calculated for each 10- or 20-millisecond slice dimensionality with some loss of resolution. The
of speech. We understand that it is the frequency successful “i-vector” approach (p. 17) uses this
spectrum and its change over time that idea. And some methods of machine learning
distinguishes phonemes, the elements of speech. use a layered approach (deep neural networks,
Why not just use the raw data? Even a large for example) that could be interpreted as
amount of data can be sparse from a statistical attempting to create such summarizing variables
point of view if the dimensionality is high, if the in early layers. But we shouldn’t have blind faith
data is described in terms of too many input in such interpretations; I suspect neural networks
variables. The reason is related to the problem of would have a hard time simulating the
exponential growth in the number of data cases calculation of cepstral coefficients, for example.
labeled with an associated outcome necessary to At the mobile voice conference, one speaker
maintain a given density of data points (and thus said that, in one deep neural network trained on
statistical validity) as the number of variables speech, the early layers seemed to be simulating
grow.
a Fast Fourier Transform; my reaction would be,
Consider, for example, that we have a million why not just begin with a FFT rather than an
labeled examples. If they are described by only imperfect simulation?
one variable, then there are 500,000 samples in
The example I used of a 20-dimensional space
each half of the variable’s range (to use a being very big is admittedly a bit misleading. I
resolution much less than is usually the case). In assumed an even distribution of data throughout
this case, we have plenty of examples to the space, which is not usually the case. Data
calculate the difference in outcome when in the tends to “cluster” in certain regions and be
first half of the variable’s range versus the sparse or even absent in others. Knowing this,
second half. If we have two variables and divide one can use “cluster analysis” (“unsupervised
each in half, there are four such ½ by ½ “boxes,” learning”) to find the a number of different areas
Speech Strategy News
May 2015
5
where the data is similar. These clusters
correspond to similar cases, and may be classes
or subclasses of what we label. The advantage is
that unsupervised learning, by definition,
doesn’t require data where the outcome we are
seeking is known, so much more data may be
available to find clusters than to find outcomes
directly. In fact, a number of talks at the Mobile
Voice Conference in April (following article)
described training neural nets without labeling
outcomes—clustering—and then using those on
a smaller amount of labeled data to identify
outcomes, with successful results. Some might
argue that this approach is not an application of
human intelligence, but just another statistical
method. Perhaps the border between machine
and human intelligence is not a high wall.
As in the case of speech recognition, a deep
understanding of the data and its meaning may
be required to come up with effective
summarizing features or effective methods of
using limited labeled data. Human intelligence is
a more important part of the process than terms
like “machine intelligence” or “artificial
intelligence” suggest.
Mobile Voice Conference shows rapid progress in “the intelligent connection”
Always available natural language connection to computing resources and information
The fifth annual Mobile Voice Conference was
held April 20-21 in San Jose, California,
emphasizing the theme summarized in the
graphic from the conference web site (above).
The conference was created by the Applied
Voice Input Output Society (AVIOS) with Bill
Meisel, the writer of this newsletter and the
Executive Director of AVIOS, creating the
program.
The key technology that transforms speech
recognition into speech understanding is Natural
Language Processing (NLP). Many of the papers
reflected the growing maturity of this
technology. The technology can be applied to
text as well as voice, and presentations such as
that by David Nahamoo of IBM took this
broader view, speaking of “cognitive
computing” (p. 1).
A major theme was personal/virtual assistants:
conversational agents, virtual assistants that
ranged from general apps that go beyond web
search and try to deal with anything you request
to specialized assistants for delivering customer
service or employee efficiency. Rob Chambers
of Microsoft gave a keynote address,
“Relationships with Personal Assistants, from
the Assistants’ Point of View,” discussing how
Cortana and similar personal assistants must be
designed with an understanding of how the user
will behave. Sunil Vemuri of Google indicated a
major expansion of the company’s search
functionality to searching within apps (p. 1).
In a talk, Bill Meisel asserted that a general
assistant like Google’s voice search, Apple’s
Siri, or Microsoft’s Cortana will evolve into a
universal interface that works on anything with a
microphone and internet connection, unifying
apps and devices with a personal assistant that
adapts to an individual’s specific needs. He
suggested that, to the degree this interface
replaces web search, every company will be
expected to have a company app, just as today
they need a web site. That company app will
Speech Strategy News
May 2015
6
require a similar natural language interface for
full compatibility.
A number of talks at the conference described
such specialized virtual assistants. For example,
Raj Tumuluri of Openstream, a conference
sponsor, discussed “An in-store virtual assistant
for retail workers and the platform used to build
it.”
Other talks emphasized multimodal interfaces,
combining voice and the graphical user interface
of the device. Jeff Rogers of Sensory Inc.,
another conference sponsor, spoke on
“Combining Voice and Vision for an Improved
Sensory Experience.” Todd Mozer of Sensory
revealed in a panel discussion that Sensory’s
embedded speech recognition solutions will
soon go beyond “wake-up” words and voice
control to support more flexible interactions on
the device.
Speech-to-text technology has matured,
although new approaches such as deep neural
networks—discussed in a number of talks at the
conference—provide the prospect of further
advances. Natural language processing is less
mature, and several speakers and panelists at the
conference discussed what we can do now and
how we can do better. For example, Phil Gray of
Interactions Corporation, a conference
sponsor, spoke on “Challenges associated with
creating natural language interfaces.”
Voice is a two-way interaction, and text-tospeech technology was also a feature of many
talks. One development is increased flexibility
in customizing voices, as discussed by Dan
Bagley of Cepstral, a conference sponsor, in his
presentation entitled, “Enhanced flexibility in
text-to-speech: Customization and personality.”
Talking to “things” also appeared in a number
of talks. For example, Yoryos Yeracaris of
Interactions Corporation, spoke about “The
Interface of Things: delivering real-world
natural language understanding solutions.”
Bill Scholz, AVIOS president, summarized,
“Presentations at this year’s conference moved
well beyond a focus on speech recognition and
synthesis, into the challenging world of
conversation and understanding, exploiting
natural language and dialog management
technologies. Applications have moved beyond
merely recognizing words and sentences into
understanding the meaning and intent of the
speaker, even seeking further clarification
through conversational interchange.”
This summary only touches on the many
subjects covered at the event. The core theme is
that the trends emphasized by the conference are
proceeding quickly, driven by demand and
intense activity by many companies. Many of
the presentations at the conference will be
available in PDF form at the AVIOS web site in
May.
Amazon Web Services announces Amazon Machine Learning
Make predictions based on a database with no machine learning experience
Amazon Web Services (AWS) announced Amazon Machine Learning, said that the
Amazon Machine Learning, a fully managed technology
powers
the
product
cloud service. The service lets a developer use recommendations
customers
receive
on
historical data to build and deploy predictive Amazon.com, is what makes Amazon Echo able
models. The company claims that no machine to respond to your voice, and is what allows
learning experience is required.
Amazon to unload an entire truck full of
AWS indicated that models can be used for products and make them available for purchase
applications such as detecting problematic in as little as 30 minutes. Kara Hurst, Director of
transactions, preventing customer churn, and Amazon Global Sustainability, said Amazon
improving customer support. The technology is Machine Learning is used to analyze customer
based on the same machine learning technology feedback on packaging and create predictions to
used by developers within Amazon to generate identify products that are suited for the
more than 50 billion predictions a week, the company’s “Frustration Free” and “eCommerce
company said. Jeff Bilger, Senior Manager, ready packaging” standards.
Speech Strategy News
May 2015
7
Amazon Machine Learning’s APIs and
wizards guide developers through the process of
creating and tuning machine learning models.
These models can be deployed and scaled to
support billions of “predictions” (a predicted
outcome given specific values of input variables
used by the model). Amazon Machine Learning
is integrated with Amazon Simple Storage
Service (Amazon S3), Amazon Redshift (data
warehouse), and Amazon Relational Database
Service (Amazon RDS), allowing customers to
work with the data they've already stored in the
AWS Cloud.
A developer creates a predictive model using
a database, the “training” step. The model
summarizes the statistical conclusions implicit
in the data. Once created, AWS also hosts the
model and lets you use it to make predictions
one at a time (in real time). Pricing is based on
the number of such transactions.
With Amazon Machine Learning, developers
can use the AWS Management Console or APIs
to quickly create models and generate
predictions from them with high throughput
without worrying about provisioning hardware,
distributing and scaling the computational load,
managing dependencies, or monitoring and
troubleshooting the infrastructure. There is no
setup cost, and developers pay as they go so they
can start small, getting into a beta test with a low
investment.
Because high-quality data is critical to
building accurate models, Amazon Machine
Learning allows developers to visualize the
statistical properties of the datasets that will be
used to train the model to find patterns in the
data. This saves time by allowing developers to
understand data distributions and identify
missing or invalid values prior to model training.
Amazon Machine Learning then automatically
transforms the training data and optimizes the
machine learning algorithms so that developers
don’t need a deep understanding of machine
learning algorithms or tuning parameters to
create the best possible model.
In an example Amazon provided, a single
Amazon developer using the Amazon Machine
Learning technology was able in 20 minutes to
solve a problem that had previously taken two
developers 45 days to solve. (None of these
developers had prior experience in machine
learning.) Both models achieved the same
accuracy of 92%.
A customer, Space Ape Games, a mobile and
tablet gaming startup, has used the service. Toby
Moore, CTO and co-founder, said that the
service has been used to predict the types of
content, such as live events and tournaments,
that customers enjoy the most and let the game
adapt to their play styles. “We've been very
impressed with Amazon Machine Learning so
far, and plan to deploy Amazon Machine
Learning across multiple departments in our
organization to help us build and deploy
predictive models for our current and future
games,” he added.
The move by Amazon follows IBM’s recent
launch of hosted Watson Analytics and
Microsoft’s hosted Azure Machine Learning.
Google’s machine learning offering, Prediction
API, was launched in 2012. Russian Internet
search engine company Yandex offers Yandex
Data Factory (YDF), based on the machine
learning that it has developed internally for its
own services, used for search, music
recommendations, and speech and image
recognition (SSN, January 2015, p. 10).
NICE Systems announces analytics to improve the IVR experience
Speech analytics applied to optional customer feedback after a call
NICE Systems announced the launch of IVR NICE’s Customer Engagement Analytics
Journey Analytics, a solution designed to reduce platform, which helps organizations sequence
customer effort and improve their experience and visualize the customer journey to understand
with automated Interactive Voice Response why customers are contacting them, to predict
(IVR) systems. The cloud-based IVR Journey their next move, and to personalize the customer
Analytics solution is the third addition to engagement.
Speech Strategy News
May 2015
8
According to NICE’s 2013 Global Customer
Survey, 73% of consumers use IVR. But at least
half the time, they do not succeed in resolving
their issue—one-third of those callers simply
hang up, and the other two-thirds bypass the
system or try to contact a live agent.
The system gathers insights from the customer
journey prior to, during, and after the IVR
interaction. It can determine certain patterns of
behavior and then use this information to
optimize the IVR experience by whittling down
the menu options to provide only the options
relevant to the particular journey. It also
provides visual mapping so that any IVR service
bottlenecks can be easily pinpointed and
resolved.
Organizations can also use NICE’s feedback
solution for the IVR channel to solicit real-time
customer feedback immediately following either
an interaction that was contained in the IVR or
an interaction that was handled by a contact
center agent. Using speech analytics, they can
better understand the customer experience and
improve their systems to improve it. This could
include customer service recovery, employee
coaching, or process changes, depending on the
customer feedback received.
“A weak link anywhere in the customer
journey can shatter the entire experience, and the
IVR is typically the first step in a service call,”
said Miki Migdal, President of the NICE
Enterprise Product Group.
Auraya Systems releases “universal speaker recognition” system
Single license for all functionality
Auraya Systems’ ArmorVox Speaker Identity detection and tracking solutions, and gender
System is a voice biometrics authentication and detection all using a single software license.”
verification software system, designed for use
ArmorVox can be configured for cloud,
by voice systems integrators and developers. customer premises equipment, or hosted
The company released ArmorVox 2015, saying authentication services. ArmorVox is available
it is the world’s first “Universal Speaker for Microsoft Windows and Linux operating
Recognition” system. It supports both text- systems.
dependent and text-independent speaker
In February, cloud-based voice automation
recognition, as well as gender detection. firm Inference Solutions (p. 35) said it had
ArmorVox 2015 also has a modified front-end integrated a voice biometrics solution developed
with better noise suppression and improved by Auraya Systems into its software. ArmorVox
mobile phone and cross-channel performance.
system has been integrated with the Inference
Clive Summerfield, Auraya Founder and Studio platform, enabling the latter’s users to
CEO, said, “By fusing technologies into a single confirm caller identity with voice biometrics
software license, partners can implement active rather than through extensive security
and passive voice biometric applications, fraud questioning.
CallFinder speech analytics to be available to customers of EPIC Connections
Targeted at small- and medium-sized businesses
CallFinder, a provider of cloud-based call the unstructured data contained in voice
recording and speech analytics solutions (SSN, conversations with their customers. They can
April 2015, p. 10), announced a strategic automatically categorize and analyze the calls to
alliance with EPIC Connections, a global identify business patterns and trends.
provider of contact center consulting and
outsourcing services. The company also EPIC Connections
partnered with Aizan Technologies, a cloudEPIC clients can now use CallFinder to
based voice solutions provider in Canada. extract business intelligence contained in phone
Organizations using speech analytics can review conversations with customers to improve the
Speech Strategy News
May 2015
overall customer experience. “CallFinder’s
solution is a potential fit for our clients in the
SMB space, and for contact centers that operate
under 100 seats,” says Jim Grace, Director of
Corporate Development at EPIC Connections.
speech analytics functionality alongside call
routing, IVR, and recording capabilities. Aizan
offers a full suite of CallMiner functionality,
ranging from automatically created scorecards to
highly customized professional services.
“We are very excited to partner with Aizan
Technologies,” said Terry Leahy, CEO at
CallMiner. “They have the ability to deliver a
wide range of services to fine-tune and manage
analytics to deliver maximum return on
investment for their customers. Their level of
service and support is unparalleled in the carrier
space.”
Aizan Technologies
Aizan Technologies now provides Canadian
companies with access to CallMiner speech
analytics as a hosted service, while retaining
data within Canada and without having to
purchase and manage premise-based equipment.
With CallMiner analytics integrated into Aizan’s
cloud platform, customers now have access to
9
Indian bank to use Nuance voice authentication
Authentication doesn’t require specific passwords
ICICI Bank in India is deploying voice- voice data when customers use the service to
recognition
technology
for
biometric identify if callers are agitated, in a hurry, or
authentication, using speaker authentication irritated, which could be used to avoid upselling
technology from Nuance Communications. or other activities on the call that might add to
Customers will be able to call and transfer funds the irritation.
to registered recipients or pay bills without
According to the article, the system will not
having to enter card numbers or keying in PIN require specific passwords for authentication. It
codes, reports the Times of India.
will, however, require at least 35 seconds of
Customers enroll in “just 10 seconds” when voice before it can authenticate.
they call the bank. There are also plans to use
GTL addresses inmate identification on calls with biometric voiceprints
Detects in realtime changes in speaker during a call
Global Tel*Link (GTL), a provider of displayed with a time stamp in the monitoring
correctional technology solutions, announced system indicating when any speaker change
the release of Voice IQ, a new feature for its activity was detected.
inmate telephone platforms that solves issues of
GTL also released VisMobile Add-On, an
inmate identification on calls. Voice IQ uses addition to its VisMobile video visitation
voice biometrics to track and verify the identity application for Android smartphone and tablet
of inmates and prevent fraud. Over 1.1 million users. This release, an addition to the VisMobile
inmates utilize GTL’s inmate telephone services, app that allows users to register for and schedule
and calls are controlled to avoid direction of visits, gives users whose loved ones are
criminal activities from within prison.
incarcerated in facilities that allow Internet
Voice IQ builds a voiceprint profile for each video visitation the opportunity to conduct video
inmate and enrolls that print in its repository for visits from their Android devices. Internet
comparison in future calls. During a call, Voice visitation with smartphones and tablets is said to
IQ continuously compares portions of the call to incorporate all of the same safety and security
the recorded voiceprint to verify the inmate’s features as GTL’s other Internet and on-site
identity during the call. A specific icon is video visitation technologies.
Speech Strategy News
May 2015
10
NewsHedge launches financial news service
Runs as an app in a desktop browser using text-to-speech for audible alerts
NewsHedge introduced an application that product’s reliability and speed. “We’ve
runs in a desktop Web browser called combined AJAX with compressed page updates
NewsHedge Squawk. It uses text-to-speech to across the entire experience to squeeze seconds
alert users of financial news that might move a from every breaking announcement,” he said.
particular stock or the market in general. “Our back-end is entirely Modern C++ through
NewsHedge Squawk uses direct access the critical path, making everything blazing
exchange feeds to process 8,000+ assets in real fast.”
time, tick-by-tick. The company charges a fee of
$49 per month for the service.
NewsHedge features the audio alerts in
Squawk, although there is also a visual
representation (see image). Kevin Evenhouse,
founder and CEO, said, “NewsHedge Squawk
not only delivers market information that’s
notable and relevant in real time, but it does so
audibly—the method of receiving information
that human beings react to fastest. We’re not
giving traders one more thing to look at. We’re
giving them something to listen to. We combine
our proprietary smart-detection algorithms with
text-to-speech technology to literally tell you
what’s notable and market moving.”
Drew Dormann, co-founder and CTO, sees the
NewsHedge Squawk
HTML5 front end as a critical part of the
Translate Your World launches new version
Translates voice or text into 34 languages in real time
Translate Your World, Inc. (TYWI)
The software makes use of capabilities outside
launched its version 2.0 software of its its application. TYWI has built-in autotranslation software, with no less an objective translation and can connect to other translation
than to “change the way the world software. One of the advantages of TYWI is
communicates,” according to owner Sue Reager. that the TYWI software provides personal
The software is said to be capable of translating, dictionaries so that the user has increased
transcribing into text, and speaking the results in control of the results of automated translation.
34 languages in real time. Reager said, “You can
Speech recognition is accomplished with the
use it to have people who speak different capability built into most devices, including the
languages all have a business meeting together. speech recognition built into Microsoft
The barriers of language and education have Windows or use of Nuance’s Dragon
been lifted.” Chester Anderson, the company’s NaturallySpeaking. TYWI software works
vice president of business development, added, harmoniously with speech recognition by
“I firmly believe that technology can make both translating the text results, keeping track of an
our lives and collectively the world a better audience and their preferences. The service can
place to live and work in.” The company said then deliver to each audience member’s device
that the software already has preorders lined up what each wants to hear as translated voice or
from large companies.
read as subtitles. TYWI translated voices are
Speech Strategy News
May 2015
11
delivered by the online text-to-speech company
ReadSpeaker.
The software supports “parrots” for situations
where several people are speaking, with
translation required. A “parrot” is a person who
listens and repeats the speaker’s words in the
original language into a headset or
microphone. Parrots speak clearly and have
trained TYWI software for their voices. The
audience does not hear the parrot, only the
TYWI software “hears.” TYWI will translate
what the parrot says. The parrot can be an
individual on a company’s staff or a TYWI pro.
TYWI enables simultaneous interpretation on
the web. A company can use its choice of
interpreter or a pro from TYWI. The interpreter
can be located almost anywhere in the
world. Your audience chooses their output: a)
to listen to the interpreter's voice or b) to read
subtitles of what the interpreter says, created by
the interpreter’s voice speaking into the
software. The interpreter’s voice can be
automatically translated as subtitles in other
languages.
Fujitsu ties written material to a spoken presentation
Applicable to teleconferences, education, and consultations with customers in stores
In an Internet audio/video presentation or only a few spoken words. When tested in a
meeting, the person speaking may be discussing prototype system designed to automatically
material available to those participating outside highlight the correct place in presentation
the
presentation.
Fujitsu
Laboratories materials, the technology was found to detect the
announced that it has developed technology that, correct section with 97% accuracy.
using speech recognition technology on the
Fujitsu shared some of the technical
speaker’s voice, detects in real time the challenges in developing the technology. They
applicable area in presentation or remote- noted that a challenge in speech recognition is
conference materials. The company expects the that many short words have similar
technology to be used where information is pronunciation, which increases the likelihood of
explained, such as teleconferences, electronic errors in recognition. Fujitsu addressed this
education, and consultations with customers in problem by combining these short words with
stores. The technology supports business the words located in their immediate proximity
communications that are often based on and storing them in a speech-recognition
supporting materials, such as pamphlets used for dictionary as single words. This reduced
product explanations, meetings that follow an recognition errors by roughly 60% compared to
agenda, or talks that use slides that are shared previous technologies, according to the
with participants.
company.
Displaying a section of meeting materials,
By statistically calculating the relationship
product pamphlets, and other presentation between the sequence of a spoken presentation
materials while that section is being discussed and the materials’ structural information,
by the presenter is effective in promoting including layout, paragraphing, and location of
understanding, Fujitsu notes. To be effective, it explanations, it became clear that when the
is necessary to identify at a glance the place content being discussed exceeds a certain
being explained within the materials. Fujitsu has “distance” from a point in the materials, the
developed technology that compares spoken frequency that the spoken presentation
words against the content of the presentation transitions to that place drops precipitously.
materials. The technology uses characteristics of Using this sequential characteristic and the
the presentation’s sequence based on statistical frequency of words contained in a given part of
calculations to filter candidate sections of the the spoken presentation, this technology is able
presentation materials, in order to accurately to filter the candidate supporting material for the
identify the correct section in real time, based on next part of the presentation, and can accurately
Speech Strategy News
May 2015
12
infer a correspondence with the spoken
presentation, even with only a few spoken words
being recognized.
Fujitsu
aims
to
have
a
practical
implementation of this technology in a remote
communications-support system within 2015. In
addition, when combined with other company
technology, this technology has a broad range of
potential applications to help businesses run
more efficiently, such as giving support to
operators in call centers by providing
information related to frequently asked
questions or providing information-desk support
or educational support.
x.ai creates a virtual assistant for scheduling meetings
Assistant analyzes email communications using NLP
x.ai is creating a Personal Assistant that can for everyone and confirms the information into
schedule meetings. In the U.S. alone, the all parties’ calendars by sending out an invite.
company said there are 87 million knowledge
x.ai indicated that the software passes each
workers who spend nearly five hours per week email through natural language processing and
scheduling meetings.
supervised learning engines that understand the
x.ai launched two personal assistants, twins context of the information before it is enriched
“Amy” and “Andrew.” Designed to interact and and stored in a database. Based on these
perceive needs like a personal assistant, these inputs—plus relevant context such as user
virtual assistants eliminate the tedious email scheduling
preferences—the
assistant
ping-pong that accompanies arranging a determines the appropriate course of action and
meeting. Users simply copy their virtual crafts a response using a set of dynamic email
assistant on email with up to four individuals responses to set up a meeting based on a
they wish to schedule a meeting, then the mutually agreeable time and place.
assistant takes over and coordinates the
MongoDB offers a database of the same
schedules using natural language processing. name. The company announced that x.ai uses
The assistant identifies the best time and place the MongoDB database in its virtual personal
assistant.
Ericsson launches closed captioning service in US
Live subtitling for broadcasters and operators using speech recognition
Ericsson’s closed captioning business is the the caption data after it has been broadcast. For
largest in Europe. The company announced in example, the caption data can be used in content
April the United States launch of a closed discovery or archive search. The platform is
captioning service that displays text on a currently being used to deliver both live and
television, video screen, or other visual display offline captioning services for major broadcast
to indicate what is being spoken. The company clients, including the BBC and Sky. Ericsson
has established a broadcast and media services also plans to roll out video description services
hub based in Atlanta, Georgia, to provide closed for vision-impaired audiences in the U.S. over
captioning and video description services to both the coming months.
domestic and international clients.
In
August
2014,
the
Federal
Ericsson’s closed captioning services will be Communications Commission (FCC) outlined
delivered using the company’s enterprise-level new regulations intended to increase the quality
software platform, developed in-house using of captioning services, and to provide smoother
speech-recognition
technology
from
an and
more
accurate
closed-captioned
undisclosed source. The service allows multiple communications across a wider-reach of
captioners to prepare and deliver real-time programming. Key milestones were set by the
services for clients while maximizing reuse of
Speech Strategy News
May 2015
FCC relating to the broadcast of English and
Spanish-language programs in the United States:
§ As of January 15, 2015, video distributors
and broadcasters are obliged to meet specific
guidelines
relating
to
accuracy,
synchronicity, completeness and placement
of captioning for online video content;
§
§
13
By January 1, 2016, clips lifted straight from
a program and posted onto the Internet,
known as straight-lift clips, must be
captioned; and
By January 1 2017, montages of video clips
must be captioned.
YouMail provides an answer for spam calls on mobile phones
App identifies spam calls and gives them a “number disconnected” message
It’s no surprise to readers that spam calls on voicemail. With Smart Blocking, users can just
mobile phones are a problem, but YouMail, ignore calls when they’re not certain who is
which provides cloud-based telecom services for calling, and YouMail takes care of the rest.
consumers and small businesses (SSN, April Smart Blocking leverages the app’s huge
2015, p. 42), has quantified the problem. The volume of incoming calls each day to rapidly
company announced the results of a public determine that an incoming number is a
survey showing that despite the Do Not Call spammer placing unwanted calls.
Registry, 74% of U.S. mobile phone users
YouMail’s technology identifies spam callers
wrestle with spam calls each month. (The by dynamically analyzing traffic patterns of calls
government admits that they can pursue only a made to YouMail users, as well as feedback
small fraction of the complaints they receive of from YouMail users. In this way, almost
companies ignoring do-not-call instructions.) immediately after a new spam number is used to
Data from the nearly 5 billion calls that YouMail make a call, YouMail users will appear to have
has answered for its users shows that 10-15% of disconnected numbers to those callers. In a
all missed telephone calls are considered spam sense, the app is using crowd-sourcing to make
by their recipients and that the average person it unnecessary for each user to identify a call as
receives an average of 25-30 spam calls/month. spam. Alex Quilici, CEO of YouMail, noted
More data from the survey is available at that, since many spammers maintain shared lists
http://blog.youmail.com/post/116391615477/the of disconnected numbers, this can rapidly and
-youmail-spam-calling-survey.
significantly reduce the volume of spam calls to
But YouMail has gone beyond identifying the any YouMail user.
size of the problem. They have what they think
Users can also actively block any individual
is a cure. The company has released Smart number, allowing full control over who can
Blocking, a feature that automatically identifies leave a message and who will reach what
spammers and fools them into thinking they’ve appears to be a disconnected number. In
reached a disconnected number.
addition, YouMail provides a variety of other
To use Smart Blocking, YouMail users features including smart greetings, automatic
download the YouMail app for iPhone or replies, and access to messages across different
Android to replace their standard wireless carrier devices.
Amazon introduces shopping app for Apple watch
Voice search and 1-click purchasing
Amazon has introduced a shopping app that features including 1-Click purchase and saving
will be available on the Apple Watch in Canada, to a Wish List. The Amazon shopping app for
China, France, Germany, Japan, U.S., and U.K. Apple Watch is a companion to the Amazon
The shopping app addresses the small form mobile shopping app for iPhone.
factor through voice search and quick tap
Speech Strategy News
May 2015
Paul Cousineau, Director of Mobile Shopping,
explained, “There are times when it might not be
convenient to get your phone out of your pocket.
So we worked to distill the best parts of the
Amazon shopping experience into fast and
simple access points from your wrist.”
The Amazon app for Apple Watch includes
the following features:
§ Search the Amazon Catalog: The Amazon
shopping app allows customers with an
Apple Watch to search the Amazon catalog
and find “glanceable” product information
such as product name, price, shipping
information, product images, and star
ratings.
§
§
§
§
14
1-Click Purchase: With the 1-Click purchase
feature on millions of eligible items,
customers can conveniently go from search
to purchase in seconds, making it even easier
to order familiar items.
Add to Wish List: Customers can quickly and
easily add any item to their Wish List.
Save a Shopping Idea: Make a note by
simply saying it and save it for later review.
Get More Information from iPhone: If
Amazon customers want additional search
results or more product information while
shopping, they can simply use a “Handoff”
feature and open the search or product detail
page in the Amazon shopping app on their
iPhone.
M*Modal launches Clinical Documentation Improvement platform
Physician-friendly CDI software and services deliver improvements in medical documentation
M*Modal,
a
provider
of
clinical back-end CDI specialist (CDIS) processes to
documentation and “Speech Understanding” improve the quality of the clinical note using
solutions (SSN, April 2014, p. 31), announced a existing systems and workflows. This platform
full
suite
of
Clinical
Documentation integrates documentation improvement directly
Improvement (CDI) software. M*Modal offers a with the report creation process, identifying
general-purpose data aggregation, analytics, and deficiencies,
gaps,
and
improvement
physician engagement platform configured for opportunities at the time of documentation.
CDI, which they say offers a cost-effective way M*Modal’s
solutions
enable
central
for hospitals to enhance clinical report quality, management of reporting requirements for
productivity, and patient outcomes.
higher case coverage and better efficiency.
The company also announced enhancements
M*Modal helps healthcare organizations solve
to its Fluency Direct front-end speech software, their CDI challenges in three ways:
where physicians immediately see the results of § Assess:
Using
natural
language
the speech recognition. Improvements in system
understanding technology to convert
customization, platform integration, speech
transcription data into information that is
understanding, and software management are
shareable, sortable and searchable, hospitals
intended to help physicians and healthcare
gain access to large volumes of patient data
professionals more quickly and easily create
trapped in the narrative. This allows them to
high-quality
clinical
documentation
in
identify
documentation
improvement
Electronic Health Record (EHR) systems.
opportunities and target actions related to
In addition, the company announced it is
physician education, certain conditions, etc.
utilizing its real-time Computer-Assisted § Engage: M*Modal solutions embed CDI
Physician Documentation (CAPD) capability to
into the document creation workflow with
deliver educational content from Precyse
automated, real-time feedback delivered to
University.
clinicians as they dictate or type into the
EMR. The context-dependent feedback is
Clinical Documentation Improvement
less disruptive to physicians. M*Modal’s
solutions also educate physicians to
M*Modal’s cloud-based CDI solutions are
document for better clinical care, compliance
built into transcription, front-end speech and
Speech Strategy News
May 2015
and coding, including the upcoming ICD-10
standard.
§ Collaborate: M*Modal offers a back-end
CDI system workflow management and
clinical intelligence solution to boost
efficiency,
automation
and
data
centralization.
M*Modal’s CDI solutions are available now,
with flexible deployment models to run in
different architectures, including Citrix-based
desktop
and
application
virtualization
environments.
M*Modal and Precyse
Front-end speech recognition
Fluency Direct is cloud-based software that
allows healthcare providers to verbally create
and edit patient narratives directly in EHR
templates. The solution leverages M*Modal's
speech understanding and natural language
understanding technologies to combine accurate
speech recognition with embedded clinical
documentation improvement (CDI) capabilities.
M*Modal’s Fluency Direct software is
interoperable with over 80 leading EHRs.
Fluency Direct’s Computer Assisted Physician
Documentation (CAPD) is now integrated
with M*Modal's CDI platform, providing realtime messaging and alerting to improve patient
record accuracy. The amount of time necessary
to personalize and train the system has been
greatly reduced. There is also further enhanced
deployment flexibility and scalability to allow
expanded support for application streaming,
virtual desktops, and thin clients.
15
M*Modal is delivering educational content
from Precyse University using its ComputerAssisted Physician Documentation (CAPD).
This method engages and educates physicians as
they document patient care in any electronic
health record (EHR) system when using
M*Modal Fluency Direct.
Leveraging M*Modal Natural Language
Understanding (NLU) technology, the CAPD
capability automatically identifies common
documentation deficiencies and delivers in-line
feedback to physicians as they dictate or type the
note, asking for required clarifications when
appropriate to support documentation best
practices. Integration with Precyse University
adds context-sensitive, clinically-relevant and
physician-specific educational content on
conditions to ensure adequate and compliant
documentation.
Specific deployment announced
M*Modal also announced a specific
deployment. It is delivering transcription
services, front-end speech recognition, and
integrated Clinical Documentation Improvement
(CDI) workflow management to Kindred
Healthcare’s
Hospital
Division. Kindred
Healthcare, headquartered in Louisville,
Kentucky, is the largest diversified provider of
post-acute care services in the United States. Its
Hospital Division serves patients at 97
transitional care hospitals.
Winscribe releases Quick Speech Recognition for healthcare professionals
Immediate speech recognition shows results to person dictating
Winscribe released Winscribe Quick Speech transcriptionist for review and editing.
Recognition (QSR), a “front-end” speech Immediate review while the case is fresh should
recognition solution as part of its medical lead to more accurate reporting. Doctors who
documentation management software solutions simply dictate a text report sometimes prefer the
for healthcare professionals. A front-end back-end solution as saving them time, but a
solution presents the results of the speech-to-text growing requirement for the report to be entered
transcription immediately to the doctor dictating into a structured Electronic Medical Record
a report, making it immediately available for (EMR) makes a free-form report less viable.
review and correction. This is in contrast to
Winscribe QSR is claimed to make EMR data
back-end solutions, where the result of the entry faster and easier. It is designed to enable
speech recognition is presented to a physicians and other healthcare professionals to
Speech Strategy News
May 2015
16
quickly create documentation, craft emails, enter
data into Health Information Systems (HIS), and
communicate with co-workers and patients more
efficiently. QSR joins Winscribe’s suite of
speech productivity solutions, which include
enterprise-level
medical
documentation
management,
digital
dictation,
speech
recognition
workflow
management,
transcription, and mobile speech technology
software solutions.
With the ever-increasing implementation of
EMR systems into hospitals, Accountable Care
Organizations (ACOs), clinics, private practices,
and insurance providers, these systems continue
to garner attention and speculation regarding
their usability, the negative effects on physician
productivity, and the loss of time available for
patients. Speech recognition, on the other hand,
has gained recent attention as a proven method
for improving EMR usability, reducing
documentation costs and boosting the
productivity levels of physicians and other
medical staff.
Winscribe QSR offers real-time, front-end
speech recognition technology that the company
claims has a low edit rate, enabling clinicians to
quickly perform data entry and generate other
documentation with confidence. Winscribe said
the product has an intuitive interface and is easyto-use, requiring only a few minutes of training.
Physicians simply dictate, review, and insert the
recognized text, and then they are ready to move
to the next field or task.
Winscribe QSR supports general and medicalspecific vocabularies, which further enhance the
accuracy of recognized text. Winscribe QSR’s
‘snippet’ functionality also makes it simple to
create macros and templates that providers can
initiate with a unique voice command to insert
commonly used phrases, such as discharge
instructions, risk and benefit statements, and
normal findings. Winscribe QSR’s has a
centralized management console that can ‘learn’
and manage new words, phrases and user
profiles, based on pre-existing group knowledge.
Winscribe QSR basically serves as a keyboard
replacement that is adaptable and works with
existing applications and any information
systems that allow typed text entry, including
Microsoft Office applications, Web browsers,
and Health Information Systems. In addition,
Winscribe QSR works with virtually any EMR.
Nuance has new clinical documentation tools for mobile devices and wearables
Joins Samsung at HIMSS to preview new dictation capabilities on Samsung Gear S Watch
Nuance Communications announced its workflows that allow physicians to record vital
newest innovations for bringing clinical signs, interact with patient alerts, document
documentation to smart devices, smart watches, telephone encounters, and place medication, lab,
and the Internet of Things at the 2015 Health and radiology orders.
Information Management Systems Society
Jonathon Dreyer, director of cloud and mobile
Annual Conference in April.
solutions, Nuance Communications, said, “2015
Nuance announced PowerMic Mobile, has turned into the Year for the Internet of
available in May 2015. The mobile app can turn Things and the phenomenon is becoming firmly
any iOS or Android mobile phone into a secure entrenched in healthcare.” He noted that a new
dictation device that allows physicians to class of documentation tools “bridge
dictate, edit, and navigate within the Electronic conveniences found in clinicians’ personal lives
Health Record simply by speaking.
to the healthcare environment.”
In addition, Nuance has teamed with
Nuance also showed using its cloud-based
Samsung to develop a use case for PowerMic medical speech recognition with Metrix
Mobile that will allow physicians to dictate Health’s Glass wearable. The combination
directly into an EHR using Dragon Medical 360 enables surgeons to document operative notes
and the Samsung Gear S smart watch. Nuance during care, helping to communicate these
demonstrated Florence, its intelligent virtual critically important notes immediately rather
assistant, on a Samsung Gear S. Florence than later from memory.
provides a series of voice-driven clinical
Speech Strategy News
May 2015
17
MedMaster Mobility interprets physician dictation into mobile devices
Integrates Nuance healthcare speech recognition and NLP
Master Mobile Products designs, develops, Recognition
and
Clinical
Language
and deploys mobile healthcare applications Understanding (CLU), based on technology
optimized for the Apple iPad and iPad mini. The from Nuance Communications’ healthcare
company announced MedMaster Mobility, solutions. MedMaster claims the “only
which allows physicians to create structured data successful mobile implementation of Nuance’s
from dictation for Electronic Medical Record CLU engine” to date.
(EMR) systems using an iPad. It is said to be
The company’s basic MedMaster application
independent of the EMR system, automatically includes full read-write mobile access to patient
generating, for example, standard diagnosis chart data, scheduling, creation of medical
codes in ICD9, SNOWMED, and RXNORM issues, soap notes, medications, vitals, messages,
formats from unstructured dictation. The etc. With MedMaster Mobility, physicians can
solution eases physician adoption of EMR.
practice medicine 24/7 from an exam room,
Practitioners can create structured data using hospital room, or family room using multiple
MedMaster’s fully integrated Medical Speech and diverse EMR providers.
Roku streaming TV player adds voice search for content
250,000 movies and TV episodes available for streaming
Roku provides a streaming platform for listed by price from top streaming channels.
delivering TV entertainment. Roku streaming There are currently 17 channels, with 250,000
players and the Roku Streaming Stick are made movies and TV episodes available for streaming.
by Roku and sold through retailers in the US, Roku Founder and CEO Anthony Wood, said,
Canada, the UK, and Ireland. Roku also licenses “Now with a fast and fun way to search by
a reference design and operating system to TV voice, we’ve made the Roku 3—the best
manufacturers to create co-branded Roku TV streaming player on the market—even better.”
models.
“Roku Feed” is a new feature that allows
In April, the company released new ways for consumers to follow entertainment and get
consumers to find and discover streaming automatic updates on pricing and availability.
entertainment. A new Roku 3 streaming player Roku is launching the feature with a focus on
has voice search and a faster streaming player “Movies Coming Soon,” providing information
than the previous release. Roku Search lets on when a box office hit is available for
consumers search for movies, TV shows, actors, streaming, which services offer the movie, and
and directors, and receive all available results how much it costs.
NIST machine learning challenge for language recognition
Based on the i-vector paradigm
The National Institute of Standards and paradigm widely used by state-of-the-art speaker
Technology (NIST) will coordinate a special “i- and language recognition systems (a tutorial on
vector” challenge in 2015 based on data used in the subject is available from Howard Lei of the
previous
NIST
Language
Recognition International Computer Science Institute,
Evaluations (LREs) and certain other sources. ICSI) and will largely follow the approach taken
The challenge is intended to foster interest in in the recent NIST-coordinated Speaker
this field from the broader machine learning Recognition i-Vector Challenge. By providing icommunity. It will be based on the i-vector vectors directly, and not utilizing audio data, the
Speech Strategy News
May 2015
evaluation is intended to be readily accessible to
participants from outside the audio processing
field.
This challenge focuses on the development of
new methods for using i-vectors for language
identification in the context of conversational
telephone or narrowband broadcast speech. It is
designed to foster research progress, including
goals of:
§
18
Exploring new ideas in machine learning for
use in language recognition,
§ Making the language recognition field
accessible to more participants from the
machine learning community, and
§ Improving the performance of language
recognition technology.
The evaluation plan is available at
https://ivectorchallenge.nist.gov. Challenge data
will be made available on May 15.
Fujitsu introduces a communications tool for the hearing impaired
Speech converted to text in real time in meetings or classrooms
Fujitsu Limited and Fujitsu Social Science communication, with built-in functions for PCs
Laboratory Limited announced that, starting in allowing text input and “stamp” tools (to insert
mid-May, they will begin sales of LiveTalk, a emoticons and preregistered, frequently used,
communications tool for people with hearing fixed phrases).
disabilities, to companies and schools in Japan.
Even if multiple people speak at once, the text
LiveTalk is software designed for situations in conversion is processed in parallel and displayed
which multiple people share information, such simultaneously, making it possible to accurately
as meetings or classroom settings. It recognizes grasp the flow of a conversation. If there are any
a speaker’s speech using handheld and headset mistakes in the conversion of speech into text,
mics, immediately converts it into text, and they can be corrected on the PC. Text is
displays it on multiple PC screens. The software transmitted in real time to all PCs connected to a
uses AmiVoice SP2 speech-recognition software given wireless LAN router environment. The
from Advanced Media, Inc. The software was software can also be used on tablet computers.
developed with a 2013 grant from the Japanese
Fujitsu said in a statement, that, by promoting
Ministry
of
Internal
Affairs
and smooth communication by hearing-impaired
Communications.
with people who can hear well, this software can
All participants, including people with hearing be expected to broaden employment and
disabilities, to see the shared information in real educational opportunities.
time. LiveTalk also enables two-way
New NissanConnect Services program set to launch on 2016 Nissan Maxima
8.0-inch color display with multi-touch control and speech recognition
The new 2016 Nissan Maxima features the
The services are accessed through a
new NissanConnect Services powered by redesigned NissanConnect system that includes
SiriusXM. The connected services program updated graphics on an 8.0-inch color display
features vehicle security, monitoring, and with multi-touch control and Nissan Voice
remote services. Every 2016 Maxima also Recognition. Every Maxima also includes a 7.0includes standard NissanConnect infotainment inch Advanced Drive Assist Display, two front
features, providing access to Online Search with USB connection ports for iPod and other
Google, SiriusXM Traffic, and SiriusXM Travel compatible devices, streaming audio via
Link (fuel prices, weather, movie listings, stock Bluetooth, and a hands-free text messaging
info, sports). The new telematics program is assistant.
Nissan’s first in-vehicle launch of SiriusXM’s
Three levels of services will be available on
connected vehicle services.
the new Maxima Platinum model when it goes
Speech Strategy News
May 2015
19
on sale in the US in summer 2015. The base
package includes emergency services and
maintenance alerts. The mid-tier package adds
remote control services including Remote Start
and Remote Door Lock/Unlock, as well as
monitoring alerts including Valet Alert and
Curfew Alert. Alerts include vehicle speed,
curfew (with available notification to the driver
20 minutes before the curfew alert), valet alerts
(if the vehicle is more than two miles from dropoff), and geographical boundaries if set. The full
package adds to the suite of concierge services,
including Assisted Search, Connected Search,
and Journey Planner.
NissanConnect links users to Cloud Services
three ways: (1) beamed in through radio and
satellite, (2) brought in through smartphone, and
(3) built-in with a cellular network embedded
telematics control unit (TCU). Nissan notes that
the multiple connectivity options help in the
case of an emergency, including connecting to a
live person for assistance.
Remote Access provides access to the
automobile through a compatible computer or
smartphone. Services include remote door lock
and unlock, remote engine start, and remote
horn and flashing lights to help find the Maxima
in a garage or parking lot.
Brainasoft offers personal assistant for controlling a Windows PC
Speak or type text to do tasks such as play music, open programs, or dictate text
Brainasoft offers Braina (derived from “Brain
programs like Microsoft Word using
Artificial”), personal assistant software for
Dictation mode.
Windows PCs. Braina is designed to let you § Play Videos - For example, say “Play video
control your PC using what the company calls
Godfather.”
“natural language” commands, although § Calculator - Do calculations by speaking,
keywords are required for many commands.
e.g., “45 plus 20 minus 10.”
You can either type commands or speak to the § Dictionary and Thesaurus - E.g., “Define
assistant. A Braina for Android app supplements
encephalon,” or “What is intelligence?”
the software on the PC by letting you interact § Open and Close any Programs - E.g., “Open
with your computer over a WiFi network.
notepad,” “Close notepad.”
Recently, the company had only four employees, § Open and Search Files and Folders - E.g.,
so the software obviously makes extensive use
“Open file studynotes.txt,” “Search folder
of the functionality built into the Windows OS
authentication.”
and Android OS, as well as external services.
§ Control a Powerpoint Presentation - Say
The company says that Braina allows you to
“next slide” or “previous slide.”
easily dictate (speech-to-text), update social § News and Weather Information - E.g.,
network status, play songs and videos, search
“Weather in London,” “Show news about
the web, open programs and websites, find
Cortana.”
information, and more. More specifically, § Search Information on the Internet - E.g.,
commands include the following functionality:
“Find information on Thalassemia disease,”
§ Play Songs - For example, just say, “Play All
“Search Dodgers score on Google,” “Search
You Need is Love” or “Play Neil Diamond”
for Albert Einstein on Wikipedia,” “Search
and Braina will play it for you from
images of cute puppies.”
anywhere in your computer or even the web. § Set Alarms - E.g., “Set alarm at 7:30 am.”
§ Dictate to any Software or Website - Use a § Remotely Shutdown Computer.
speech-to-text feature in third party § Notes - Braina can remember notes for you,
e.g., “Note I have given 550 dollars to John.”
Speech Strategy News
May 2015
20
IBM tests Numenta’s “brain algorithms”
Machine intelligence based on “principals of the neocortex”
Numenta is a company founded by Jeff Center. The group is working on designs for
Hawkins, founder of the company Palm, a computers that would implement Hawkins’s
developer of “Personal Digital Assistants” ideas in hardware. The approach is to stack
(PDAs, early versions of handheld computers), multiple silicon wafers on top of one another,
which was purchased by HP in 2010. The with physical connections running between them
company’s web site summarizes its goals as to mimic the networks described by Numenta’s
follows:
algorithms. The IBM group is reportedly also
“Numenta has developed a cohesive theory, core working on using Numenta’s algorithms to
software technology, and numerous applications analyze satellite imagery of crops and to spot
all based on principles of the neocortex. This early warnings signs of mechanical failures in
technology lays the groundwork for the new era data from pumps or other machinery.
of machine intelligence. Our innovative work
The IBM Research web site indicates that
delivers
breakthrough
capabilities
and
Winfried
Wilcke, Sr. Mgr, Nanoscale Science &
demonstrates that a computing approach based on
biological learning principles will make possible a Technology and Distinguished Research Staff, is
new generation of capabilities not possible with leading the effort. At a conference in February,
Wilcke claimed Numenta’s software was closer
today’s programmed computers.”
According to MIT Technology Review, IBM to biological reality than other machine learning
has established a research group to work on software. He said Numenta had struck a balance
Numenta’s learning algorithms at its Almaden between taking cues from biology and making
research lab in San Jose, California, with a software that is practical.
group of about 100 called the Cortical Learning
VERBATIM-VR improves speech recognition by letting users report errors
Tunes speech-to-text for individual companies
VERBATIM-VR Ltd. has announced reduce. A request for the source of these error
software supporting speech recognition dictation rates referenced a talk by a Microsoft researcher
products that are said to allow easy tuning of in 2012. There is apparently more that will be
company- or industry-specific vocabularies for announced later.
all users within a company. The key idea is that
the speech recognition software can be tuned to
the company’s specific business and specific
products by what might be considered crowdsourcing by employees to report speech
recognition errors, with software that updates all
employees’ speech recognition to reduce similar
errors. (See image.)
The company declined to indicate what speech
recognition technology they support at this time.
The company claims current speech recognition
applied to verticals such as tax law, radiology,
Verbatim-VR operation
and banking/customer-service, has 7-10% error
rates, which they state they can substantially
Google (cont.)
Continued from page 1
Google search has evolved into a personal
assistant that goes beyond providing a list of
web sites to trying to provide a direct answer, in
Speech Strategy News
May 2015
part with “Knowledge Graphs.” Vemuri
explained, “We’ve built APIs that are easy to
integrate with and that allow you to capitalize on
all of Google’s work in natural language
recognition, as well as our semantic
understanding of people, places, and things—
through the Knowledge Graph.”
IBM’s Nahamoo (cont.)
Nahamoo contrasted “analytic systems” such
as web search versus the deeper analytics of
cognitive systems. The first can be defeated by
the “static” of big data, whereby cognitive
systems such as IBM’s Watson have the
capability of filtering the static.
He indicated that the technology behind
Watson is a “massively parallel probabilistic
evidence-based architecture.” The software
generates and scores many hypotheses using a
combination of natural language processing,
information retrieval, machine learning, and
reasoning algorithms. These gather, evaluate,
weigh and balance different types of evidence to
deliver the answer with the best support it can
find.
Nahamoo also indicated, during a Q&A
session, that the speech recognition now in the
Watson cloud benefits from continual
improvement over the five years that the
company collaborated with Nuance (an
agreement that has expired, allowing IBM to sell
the technology directly). He said the current
version in Watson is currently considered a beta
version.
The slides of Nahamoo’s presentation will be
available at http://avios.org/?page_id=2386.
Continued from page 1
next natural resource.”
Nahamoo said “cognitive systems” learn and
interact naturally with people to amplify what
either humans or machines could do on their
own. They help us solve problems by
penetrating the complexity of Big Data.
He claimed that this trend will change
traditional Information Technology (IT),
allowing it to deal with unstructured as well as
structured data, with natural language
supplementing machine language.
He asked what if:
§ The time to discover new sources of energy
went from 2 years to 3 months?
§ Every patient had a full-time, dedicated staff
of medical specialists?
§ Every citizen in need had a full-time,
dedicated, expert case worker?
§ Every soldier doubled their ability to sense,
reason, plan and act?
§ Legal documents could be reviewed for
consistency and accuracy with precedent?
§ Every US worker had an expert assistant
dedicated to their success on the job?
§ Every student had a personal, full-time,
world class tutor?
SmartAction (cont.)
Continued from page 1
SmartAction claims its Intelligent Voice
Automation (IVA) accurately recognizes speech,
understands callers’ meaning and intent, and
remembers the evolving context of each
conversation. IVA dynamically responds with
personalized, context-relevant, accurate answers,
making it more likely a customer will complete
their transaction without requiring an agent. And
when IVA can’t complete a transaction, it
captures and provides all relevant call
information to live agents, making the call flow
more efficient and satisfying to customers.
21
SmartAction indicated that their artificial
intelligence call automation automatically
incorporates generalized improvements from
other customers. This and general R&D
upgrades are reflected in improved performance
over time.
The AI technology that powers IVA was
developed by Adaptive A.I. Inc., SmartAction’s
parent company. The company doesn’t indicate
the specific source of its speech recognition,
saying on its web site that the system “uses a
speech recognition technology with the most
advanced open, natural language, speech
recognition system available.” The web site
indicates that the system “evaluates multiple
Speech Strategy News
May 2015
hypotheses from the speech recognition engine
and selects the most likely interpretation based
on context.”
22
News briefs
Speech recognition, image recognition, and machine learning top Google CEO’s list of more
important projects
At a conference in April, Google chairman Eric Schmidt said that there are three projects that
rank above all others in importance: speech recognition, image recognition, and machine learning.
Sensory CEO discusses how “deep learning” relates to privacy
In a blog entry, Todd Mozer, CEO, Sensory, discussed how big data and privacy relate to “deep
learning” (deep neural nets). Mozer notes that a lot of the Big Data is personal information used as
the data source for Deep Learning. Basically, Deep Learning is neural nets learning from your
personal data, stats, and usage information, he indicated. This is why when you sign a EULA (end
user license agreement), you typically give up the rights to your data, whether its usage data, voice
data, image data, personal demographic info, or other data supplied through the “free” software or
service. One reason the data is collected is to improve the speech recognition; another is because the
speech recognition is sufficiently complex to require cloud-based processing. (Mozer noted that
“Sensory will change this second point with our upcoming TrulyNatural release!”)
When data is retained to improve speech recognition or natural language understanding using
Deep Learning, it runs the same risk as any data held by a company and supposedly protected,
Mozer notes. And given that many large companies have been hacked, protection is difficult to
guarantee.
Mozer indicated that Sensory will also attack the first point by the company’s “embedded”
approach to deep-neural-net-based speech recognition which the company will will soon be bringing
to market. Sensory uses Deep Learning approaches to train their nets with data collected from EULA
consenting and often paid subjects. The company then takes the recognizer built from that research
and runs it on our OEM customers’ devices, and because of that, never has to collect personal data.
Gates notes Microsoft’s 40th anniversary
Microsoft was founded on April 4, 1975, 40 years ago. In a letter to Microsoft employees, Bill
Gates, among other things, said he believed “computing will evolve faster in the next 10 years than
it ever has before…We are nearing the point where computers and robots will be able to see, move,
and interact naturally, unlocking many new applications and empowering people even more.” Gates
said he was impressed by the vision and talent he sees in product reviews he participates in. He
wrote, “The result is evident in products like Cortana, Skype Translator, and HoloLens—and those
are just a few of the many innovations that are on the way.”
Orion adds IVR capabilities to its public sector workforce management software
Orion Communications, a provider of public sector workforce management software and
services, announced the addition of IVR technology to its web-based AgencyWeb software. By
combining the AgencyWeb workforce management solution with a robust IVR platform, public
sector agencies are able to extend many of the system’s capabilities to field personnel or those with
no computer access. Daily scheduling, court event management, disaster planning, and day-to-day
workforce management tasks are automatically delivered over a high-performance IVR platform.
AgencyWeb IVR is built using Voice over IP (VoIP) technologies and supports VoiceXML,
CCXML, SIP, TTS, and speech recognition. It is available as either on-premise or cloud
deployments.
Speech Strategy News
May 2015
23
Mphasis to use Artificial Solutions natural language technology in customer support
Artificial Solutions, which provides natural language interaction solutions (SSN, September
2014, p. 11), announced a partnership with Mphasis, an IT services provider. Mphasis will integrate
Artificial Solutions’ digital agents, natural language analytics, and other components from the
company’s Teneo platform into its Customer Experience Management (CEM) solutions to enable its
clients in the banking and insurance industries to deliver a better customer experience.
“Our technology will support Mphasis with the natural language capabilities required for its
Customer Experience Managementsolutions,” said Lawrence Flynn, CEO at Artificial Solutions.
“The Teneo platform will transform how customers communicate with organizations in the banking
and insurance sector.”
Mphasis CEM solution offers an integrated enterprise-wide approach to manage customer
experience by leveraging continuous analysis of customers’ behaviors and processes. It focuses on
identifying, anticipating, and satisfying customers’ needs across all touch points.
Convergys Analytics and Nexidia announce partnership
Convergys announced a strategic partnership with Nexidia. The partnership will combine
Convergys’ customer experience analytics expertise with Nexidia’s speech analytics technology.
Convergys Analytics will use the technology to offer its clients the ability to gain access to the
often-untapped unstructured customer feedback found in spoken conversations between contact
center agents and customers. Convergys’ analysts will leverage the data to uncover, explore, and
recommend corrective action to resolve clients’ underlying business issues impacting the customer
experience.
Cable operator selects Fonolo call-backs to improve the customer experience
Fonolo, which replaces call center hold time with a callback, announced that Suddenlink, the
seventh largest cable operator in the US, has selected its call-back solution. By adding Fonolo’s InCall Rescue solution to its call center, Suddenlink's customers can now choose to “press 1 to receive
a call-back” instead of waiting on hold, without losing their place in line. When their turn arrives,
the customer’s phone will ring and a live agent will be on the line.
Nuance voice biometrics chosen by SK Telecom
Nuance Communications announced that SK Telecom, a mobile service provider in South Korea
with over 28 million subscribers, has deployed Nuance’s VocalPassword voice biometric solution to
provide authentication for its customers. When a customer dials into the contact center, they are
asked to speak the predefined phrase: “At SK Telecom, my voice is my password,” to be
authenticated into their account and then speak with a representative about their inquiry or request.
“When it comes to authentication, we’ve seen that PINS, passwords and security questions are
leaving accounts vulnerable and can no longer be considered a safe and secure method,” said Robert
Weideman, executive vice president and general manager, Enterprise Division, Nuance.
I PRINT N MAIL analyzes responses from direct mail with speech-recognition tools
I PRINT N MAIL, a direct mail marketing firm, introduced the NextGen Direct Mail program.
The service includes lead tracking, recording, analytics, and speech recognition to help marketers
assess the campaign’s performance and improve conversion rates based on calls generated by direct
mail.
Under NextGen, after the mailers are sent, each inbound call from prospects are tracked and
recorded, and marketers can access the information in real-time. NextGen has speech recognition
technology that can follow a phone conversation between a sales rep and the prospect. On a call-bycall basis, it then assigns a score on how well the rep does. The program also rates how ready the
prospects are, based on their tone of voice and the questions they ask.
Speech Strategy News
May 2015
24
The company claimed that, with the tool, marketers can quickly discover which territories work
best, what products have the biggest potential, what the most frequent questions are, and which
promotions are most intriguing. Big score gaps between the sales rep and the prospect (for example,
a “hot” prospect vs. an underwhelming sales rep) can be flagged so a supervisor can immediately
attempt to “save” the lead and convert the prospect into customers. An irate customer call can also
be flagged so a manager can join in and help defuse the situation.
Nuance, MEDITECH, and IMO collaborate to automate patient problem lists and support
regulatory reporting
Nuance Communications announced a collaboration with Intelligent Medical Objects (IMO)
and MEDITECH to take rich clinical narratives from physician notes, extract key facts as structured
data, and automatically populate patient problem lists and allergy lists using physician-friendly
terminologies to preserve the clinical intent of physician documentation in the Electronic Health
Record (EHR). This approach enables healthcare providers to translate unstructured narrative text
into clinically-accurate, discrete data in the EHR to support patient problem lists for “Meaningful
Use” (the federal requirement for support of EHR technology) as well as requirements for ICD-10
compliance and quality reporting.
This process overcomes common challenges physicians have creating problem lists in the EHR,
providing efficiencies and relief through simply converting narrative created during physician
documentation into real-time, actionable information for medical decision making:
§ Physicians can use Nuance Dragon Medical 360 real-time speech recognition or a mix of voice
recognition and structured physician documentation sections to create a clinically-rich narrative
at the point of care, which reflects the full patient's condition.
§ Nuance's Clinical Language Understanding (CLU) engine analyzes the physician free text
narrative and extracts key patient information, such as patient problems and allergies in real time
as discrete, structured data.
• IMO’s Intelligent Connect platform automatically maps this to IMO’s terminology libraries,
and to standardized code sets, thus preserving clinical intent while supporting clinical
documentation standards for interoperability and continuity of care.
• The full patient story is then available in MEDITECH’s EHR including structured data,
unlocked from previously unstructured patient information, and this can now be analyzed
and used for a variety of reporting and compliance measures.
Lyft expands its offerings to include Nuance’s Dragon Medical Practice Edition 2 for
otolaryngologists
Lyft, a healthcare software firm (not to be confused with the taxi service), has announced the
availability of Dragon Medical Practice Edition 2 for otolaryngologists (ear, nose, and throat
specialists), bringing advanced speech-recognition technology to another important niche in the
healthcare community. Otolaryngologists can now experience the full benefits of a speech-enabled
practice backed by Lyft, which offers value-added support services that ensure successful integration
with leading Electronic Health Record (EHR) software. Lyft offers otolaryngologists support and
services for installation, configuration, and training to maximize the benefits of integrating Dragon
Medical Practice Edition 2 into their practices.
Acusis service enters patient encounter summaries into an Electronic Health Record
Acusis provides human clinical documentation specialists to enter patient encounter summaries
directly into a client’s preferred EHR system, working over the Internet. The physician records a
summary of the patient encounter by using their preferred dictation method, including the
AcuMobile Smart Phone application or any other handheld device. The captured audio is processed
through the Acusis Speech Recognition and Formatting engines. Using the Acusis Workflow,
Speech Strategy News
May 2015
25
Acusis completes the standard dictation and editing process. The Acusis Clinical Documentation
Specialists access the local EHR to identify or create the appropriate patient encounter. They then
enter this information into the applicable fields.
Accusonus speech enhancement available for Cadence DSPs
Cadence Design Systems and Accusonus announced that the AccusonusFocus-MDR and
Focus-DNR speech enhancement software has been ported to and optimized for the Cadence
Tensilica HiFi Audio/Voice digital signal processors (DSPs). The Accusonus software provides an
integrated solution for reverberation and noise suppression, addressing challenging indoor acoustic
environments. The Focus-MDR and Focus-DNR speech enhancement products offer a single-or
multi-microphone approach to suppression of reverberation, as well as direct and ambient noise.
OKI microphone technology picks up sound in specific areas, using two microphone arrays
OKI announced that it has developed a technology called Area Sound Enhancement System that
makes it possible to pick up sounds in a target area by positioning several directional microphones.
The technology lets users hear speakers’ voices in a specified area, even in acoustically cluttered
environments in which several people are talking at the same time, like conference rooms or offices.
In environments in which speakers’ voices overlap with other voices and ambient noise,
difficulties in hearing the speakers’ voices may interfere with smooth communication. Using
directional microphones such as shotgun microphones and microphone arrays can pick up sounds in
specific directions, but these microphones pick up not only sounds in a target area but noises in the
same direction as well because the directionality extends linearly.
The Area Sound Enhancement System developed by OKI uses two microphone arrays, and
crosses each directionality of arrays from different directions in a target area. A common component
in all directionalities of arrays is estimated to be a target sound, and then other components are
suppressed as noises. This makes it possible to pick up only sounds in the target area and ensures
clear speakers' voices for video conferencing and other remote communication in noisy
environments. The technology also allows the speakers to talk while shifting their positions and
walking about, as long as they remain within the area covered by the microphone arrays.
VXi Bluetooth headset includes noise-cancelling and voice prompts
VXi
Corporation
introduced
the user’s mouth, which VXi claims gives a 90%
BlueParrott Reveal, their first extendable-boom reduction in ambient noise.
Bluetooth headset designed for mobile
professionals (see image). Despite its small form
factor, the microphone includes VXi’s noisecanceling technology.
Noise-canceling microphones work by
proximity, and just don’t work as effectively
when they’re far from the user’s mouth. The
device’s extendable microphone boom allows
moving the microphone closer to the user’s
mouth. With the Reveal, when ambient noise is
low to moderate, users can leave the boom in its
retracted position, and still experience some
noise canceling. When life gets loud, they
simply slide Reveal’s boom out. This puts
VXI noise-cancelling Bluetooth headset
Reveal’s microphone about an inch closer to the
Speech Strategy News
May 2015
26
Intel shows prototype smartphone with reduced-size RealSense technology
At the opening day of the Intel Developer Forum in China on April 8, the company showed a
new prototype for a six-inch smartphone that integrates Intel RealSense technology, which includes
speech recognition and 3D scanning. The RealSense camera and technology have been reduced to
50% of their former size.
CPqD biometric authentication and Brazilian Portuguese speech recognition available on a
new IBM chip
The Smart Authentication solution from CPqD, a Brazilian research institute, uses face and
voice recognition for user authentication in applications such as banking and e-commerce. The
solution is running on the Power8 processor recently launched by IBM. It is available for testing in
the IBM Client Center in Sao Paulo. The software can be used in multiple communication channels
(Internet, phone, and mobile phone) and combines biometric technologies, text-to-speech, and
speech recognition from CPqD for Portuguese spoken in Brazil.
New Tensilica Fusion DSP from Cadence Design Systems features low energy use
Cadence Design Systems announced the new Cadence Tensilica Fusion digital signal processor
(DSP) based on the its Xtensa Customizable Processor. This scalable DSP is designed for
applications requiring merged controller plus DSP computation, ultra-low energy and a small
footprint. It can be designed into systems on chip (SoCs) for wearable activity monitoring, indoor
navigation, context-aware sensor fusion, secure local wireless connectivity, face trigger, voice
trigger, and speech recognition, the company indicated.
The Tensilica Fusion DSP uses 25% less energy, based on running a power diagnostic derived
from the Sensory Truly Handsfree always-on algorithm, when compared to the current low-power
Cadence Tensilica HiFi Mini DSP. Bernard Brafman, vice president of business development at
Sensory, said, “By taking full advantage of the architectural features in the new Fusion DSP, we
were able to optimize our software to achieve further power reduction for functions such as trigger
to search, user-defined triggers, and speaker verification and identification.”
Microsoft’s browser update said to include Cortana
Early views of Microsoft’s new Windows 10 build offer a glimpse of Microsoft’s “Project
Spartan” browser. Project Spartan will supposedly replace its Internet Explorer in steps. It comes
with the Cortana digital assistant. By putting Cortana in a browser, the speech-interactive assistant
becomes available on any platform using the browser.
Microsoft’s Skype Translator test preview adds new languages and other options
Microsoft is adding new features to its Skype Translator real-time language translation service,
as part of the second phase of its current test preview. The new service translates conversations both
ways in near real-time. The service will display an on-screen transcript of the call, and also
ultimately will translate instant-message chats in more than 45 languages. The service is currently
available only on devices running Windows 8.1 or the Windows 10 Technical Preview.
In the second test phase, initiated April 8, Microsoft added support for Chinese (Mandarin) and
Italian, adding to English and Spanish, which were launched in December.
According to a blog post announcing the second test phase of Skype Translator, Microsoft also is
adding new features to the preview, including:
§ The ability to mute audio for customers who prefer to read translation vs. hearing it spoken;
§ The option for partial translations, which reduces delay time between when someone finishes
speaking and when a translation starts;
Speech Strategy News
§
§
May 2015
27
The ability to add speech recognition warnings, so that customers are prompted when the
translator is having a hard time understanding the speaker, in which case, suggested ways to
resolve the issue will be offered; and
The addition of a text-to-speech option, allowing users to switch between text-to-speech and
speech-to-speech translation.
Getty Images and Microsoft partner to add images to products like Bing and Cortana
Getty Images and Microsoft announced a new partnership to develop image-rich, compelling
products and services for Microsoft products like Bing and Cortana using Getty Images’ imagery. In
the coming years, the two companies’ technology teams will partner to provide real-time access to
Getty Images imagery and associated metadata to enhance the Microsoft user experience.
Cortana to recommend movies
Microsoft’s Cortana on devices running Windows Phone 8.1 will soon be able to recommend
movies based on user interests. Windows Phone owners can activate the free film concierge service
by turning the feature on in Cortana’s Notebook settings. According to Microsoft, after a few days
Cortana will begin to make recommendations. If users are interested in a recommended movie,
clicking on it will bring up details such as synopsis, reviews, cast list, and trailers, and even allow
users to purchase tickets.
Amazon Echo can now be used to control WeMo and Hue home devices
Home automation devices, WeMo switches and Philips Hue connected LED light bulbs, now
work with Amazon Echo. You can now use Echo to switch on the lamp before getting out of bed,
turn on the fan or heater while reading in your favorite chair, or dim the lights from the couch to
watch a movie, according to Amazon.
One simply connects WeMo and Hue devices to a home Wi-Fi and names them in their
respective app. Then the user says, “Alexa, discover my appliances.” After Echo's confirmation, you
can control your devices by voice, e.g.,
§ “Alexa, turn on the hallway light”
§ “Alexa, turn on the coffee maker”
§ “Alexa, dim the living room lights to 20%”
§ “Alexa, turn on the electric blanket”
§ “Alexa, turn on the outdoor decorations.”
Amazon Echo adds podcasts
Amazon’s Echo has now added support for accessing podcasts. A user can listen to new episodes
of popular podcasts by saying “Alexa, play the podcast [name] on TuneIn.”
Siri’s synthetic voice gets some improvements
Apple’s new iOS 8.3 update includes some improvements in how Siri speaks. A brief YouTube
video illustrates the difference. The new update also includes a speech recognition setting for New
Zealand English as distinct from Australian English.
Dictionary.com app supports Apple Watch with speech recognition to display definitions
Dictionary.com, an IAC company, announced new functionality in the Dictionary.com app to
provide customized support for the Apple Watch. Dictionary.com services include access to millions
of English definitions, synonyms, pronunciations and example sentences, which can now be easily
accessed directly from a user’s wrist. Michele Turner, CEO of Dictionary.com, said, “We’ve added
functionality to our app to take advantage of the immediacy of the watch, letting people speak or tap
to get a definition or synonym, or quickly glance to learn our Word of the Day.”
Speech Strategy News
May 2015
28
One touch navigates between definitions and synonyms. The user can also keep track of recent
word searches and view favorites synced from an iPhone. App activity can be launched in the
iPhone for more information on the word the user was last viewing from their watch.
SoundHound + LiveLyrics offers new Apple Watch app
SoundHound, which provides sound recognition and search technologies, announced its
flagship music product, SoundHound + LiveLyrics, is one of the first third-party apps to deploy in
the Apple Watch. The SoundHound app allows Apple Watch users to capture, collect and enjoy the
music they hear. By tapping one’s wrist, users can bring lyrics to their fingertips. Users can view
collected songs by scrolling on the song history list, synched with the user’s iPhone and iPad.
Apple moves to Siri back-end built on open-source Apache Mesos platform
Mesos is an open-source distributed systems kernel from the Apache Software Foundation.
Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction.
The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka,
Elastic Search) with APIs for resource management and scheduling across entire datacenter and
cloud environments.
Apple is now on its third-generation system for handling Siri queries, moving to the Mesos
platform, according to the Mesosphere blog. Apple reportedly made the announcement at the Bay
Area Mesos meeting in April. During a presentation, Apple engineers said that the switch to Mesos
would reduce latency, assist scalability, and made it easier to deploy new services as Siri’s
capabilities are expanded. Mesos is also used by other large technology companies, including
Twitter and eBay.
IBM teams with Apple and others on AI health program using Watson
IBM announced alliances with Apple, Medtronic, and Johnson & Johnson to put artificial
intelligence to work drawing potentially life-saving insights from the booming amount of health data
generated on personal devices. IBM is collaborating with the companies to use its Watson artificial
intelligence system to give users insights and advice from personal health information gathered from
fitness trackers, smartphones, implants, or other devices. IBM wants to create a platform for sharing
that information.
“All this data can be overwhelming for providers and patients alike, but it also presents an
unprecedented opportunity to transform the ways in which we manage our health,” IBM senior vice
president John Kelly said in a news release. IBM also said it is acquiring a pair of healthcare
technology companies and establishing an IBM health unit.
Digital Alert Systems adds enhanced multilingual alerting for Emergency Alert Systems
Digital Alert Systems, a division of Monroe Electronics, introduced its DASDEC
OmniLingual Alert Module software, which gives the company’s DASDEC emergency messaging
platform enhanced multilingual alerting capabilities for Emergency Alert Systems (EAS), as well as
text-to-speech (TTS) in a wide variety of languages. The OmniLingual Alert Module provides
television and radio broadcasters with the option to transmit EAS alerts in multiple languages, or to
automatically add non-English alerts as post-alert audio to serve audiences with limited English
proficiency. There is support for Spanish, Portuguese, French, German, Italian, Polish, and
Lithuanian, among other languages. The module also provides a dedicated translation software
package that incorporates both text and TTS translation.
Digital Alert Systems indicated that there are more than 25 million people with limited English
proficiency in the U.S., and almost 61 million Americans do not speak English at home. Persons
with limited English ability account for 25% or more of the total population in seven U.S. cities.
Speech Strategy News
May 2015
29
Peterbilt introducing next generation SmartNav infotainment system
Peterbilt Motors Company announced the next generation of its in-dash SmartNav
infotainment system. The system features an expanded array of virtual gauges, auto-activated safety
cameras, improved hands-free calling, and the capability to provide real-time traffic and fuel price
information. Operators engage SmartNav through a touch-sensitive, full-color seven-inch display.
The company indicated that the speech recognition capabilities for hands-free calling had been
improved. The system includes Bluetooth connectivity and pairing with Bluetooth-enabled devices,
as well as controls for using certain devices.
“One of the key improvements to the new SmartNav system is its ability to be customized with
approved applications developed by Peterbilt, PACCAR or third parties,” says Scott Newhouse,
Peterbilt Chief Engineer. “For instance, future functionality could include integration with reefer
trailers or truck bodies to provide operational data, like reefer temperature. SmartNav’s new flexible
architecture allows it to be updated quickly with additional features and capabilities.”
Infobip adds inbound and outbound voice communications to its mobile services cloud
Infobip, a provider of mobile services to IT companies, start-ups, and app developers, announced
the launch of Infobip Voice, an enterprise-grade suite of cloud-based voice applications. Infobip
Voice lets enterprises and developers quickly and easily add voice capabilities to their existing
software through an API-powered cloud platform. With support for all major mobile services,
including SMS messaging, push notifications, and direct carrier billing, Infobip provides developers
with a wide range of tools. Silvio Kutic, founder and CEO at Infobip, explained, “As telecoms
services are increasingly becoming IP-based, introducing voice to our existing mobile services cloud
was the next logical step.”
Key features of Infobip Voice include support for both voice and SMS messaging over the same
DID (Direct Inward Dialling) number, inbound and outbound calls, text-to-speech capabilities for
voice messaging or voice-enabled two-factor authentication, and a web interface for service
management and analysis.
New Ford Galaxy includes SYNC 2 with voice control
The new Ford Galaxy includes the Ford SYNC 2 with Voice Control, which enables drivers to
operate phone, entertainment, climate and navigation systems using conversational language and
also features Emergency Assistance.
CogniToys toy dinosaur can answer questions and more
Powered by IBM Watson’s cognitive technology and its own speech-recognition, Elemental
Path’s CogniToys dinosaur can answer thousands of questions—and even tell jokes. A press of its
belly gets the conversation started. The first dinos will be available through Kickstarter in November
for $99.99 each and are expected to head to retail in 2016. The toy is designed to let parents monitor
their children’s progress and moderate content.
Elemental Path’s co-founder Donald Coolidge said that the company doesn’t see itself as a toy
company, however. “We’re more a technology company,” he said, planning to improve its platform
and work with companies to build educational products, like the dinos, which have learning modules
that becoming increasingly challenging.
ChatGrape launches search engine for specific apps and documents
The ChatGrape service reduces what co-founder and COO Leo Fasbender calls the “look-up
factor” when linking to and referencing external information, such as a document, calendar entry, or
code repository update, from within the app’s chat box. The service uses natural language and close
integration with the other applications whose information it accesses. It is designed to help teams
communicate more efficiently, and in particular reduce what co-founder and COO Leo Fasbender
Speech Strategy News
May 2015
30
calls the “look-up factor” when linking to and referencing external information, such as a document,
calendar entry, or code repository update, from within the app’s chat box.
ChatGrape has built what is essentially a search engine that indexes various tools and services
that you connect the app to. These include Box, DropBox, Google Drive, GitHub, BitBucket, Jira
and others. Typing # directly into a chat triggers ChatGrape’s smart autocomplete — now called the
‘Grape Browser’ — enabling you to quickly look up and link to/reference files or data from any
supported third-party service. The startup also offers an API, making it possible for companies to
integrate their own internal data into the chat app.
A5 Technologies uses speech recognition to teach English to Japanese speakers
A5 Technologies’s flagship product is called A5 Pro, an English-language training application
for Japanese business professionals in the 20-45 age bracket who need better English to advance
their careers. It will be launched later this year as a consumer app and A5 is in discussion with
strategic partners in Japan to promote and market the product there. A speech recognition algorithm
allows learners to practice speaking in private, with instant feedback on their pronunciation.
Geppetto Avatars developing AI-based platform with avatars
Geppetto Avatars is an early stage company with an AI-based platform that provides “humanlike” interaction (avatars/agents) and mimics human sensors for the purpose of providing user
education, support, assessment, and surveillance. The company indicated it is working on algorithms
that will translate conversations, emotions, images, videos, audio, drawing, motion, location, and
taps into meaningful and valuable insights.
Tencent develops smartphone operating system
Chinese Internet service portal Tencent Holdings released an operating system for smartphones
and smartwatches as it tries to attract more of the 557 million Chinese accessing the Internet through
mobile devices. The software, called TOS+, includes voice recognition and payment systems. The
company will work with partners to integrate the software into devices including smart glasses.
Tencent is leveraging its ownership of China’s two most-popular instant messaging applications—
WeChat and QQ—to boost its efforts against Google’s Android. Asia’s largest Internet company,
Alibaba, also developed its own operating system, called YunOS.
Interactive Intelligence launches cloud services in Australia and New Zealand
Interactive Intelligence Group has launched its PureCloud Collaborate and PureCloud
Communicate cloud services for customers in Australia and New Zealand. First announced for U.S.
customers in March on AWS US East Region, these collaboration and communications cloud
services are now available from Amazon’s data center in the AWS Asia Pacific (Sydney) Region.
The services are the first to be offered from the company’s new multitenant, enterprise-grade
PureCloud platform, which is based on modern, distributed cloud architecture, a unified, singleplatform cloud solution running applications for multiple use cases: collaboration, communications,
and next-up customer engagement. Functionality includes IP PBX capabilities, including autoattendant, call recording, speech recognition, and unified messaging.
NSF grant supports Alelo research in teaching language and cross-cultural communication
with avatars and robots
With an award from the National Science Foundation (NSF), Alelo (a Hawaiian word that
means “language” or “tongue”) developed a range of products to teach cross-cultural communication
using virtual worlds. Alelo’s tools have already been applied in military training and English-as-asecond-language programs and are now being applied to a wide range of learning applications. The
company said that virtual role-play—where learners engage in simulated encounters with artificially
Speech Strategy News
May 2015
31
intelligent agents that behave and respond in a culturally accurate manner—has been shown to be
effective at teaching cross-cultural communication.
“You learn by playing a role in a simulation of some real-life situation,” said W. Lewis Johnson,
a former professor at the University of Southern California and co-founder of Alelo. “You practice
communication with some artificial intelligent interactive characters that will listen and respond to
you, depending on what you say and do. It helps develop fluency, but it also helps to develop
confidence.”
To develop a version to teach English as a second language (ESL) to those in the United States,
the researchers interviewed immigrants to determine what cultural issues they found most
problematic. The instruction they created—available in multiple languages—explains how to handle
different situations that one might face as a new immigrant in the United States and provides tips on
culturally appropriate behavior.
Statistics and Surveys
Mobile advertising revenue will top $60 billion globally in 2019
According to 451 Research, global mobile advertising revenue was $18.8 million in 2014 (using
exchange rates at the time). It is forecast to rise to $28.7 million in 2015, reaching $61.4 million in
2019.
Speech Analytics market reviewed
Grand View Research issued a market report, Global Speech Analytics Market analysis Size
and Segment Forecasts To 2020. Introduction of advanced technological tools enables organizations
to take action on unstructured data acquired from customer interactions, thereby enhancing customer
experiences, and gaining a competitive advantage. The report indicated that the technology works on
three approaches: direct phase recognition, phonetic recognition, and Large Vocabulary Continuous
Speech Recognition (LVCSR). If you want to know more, you’ll have to pay a stiff price.
44% of US adults live in mobile-phone-only households
According to data released GfK MRI in April, 44% of US adults lived in households with a
mobile phone but no landline phone in 2014, compared to 26% in 2013. The percentage in 2014 rose
to 64% for Millennials, but 45% of Gen Xers and 32% of baby boomers reported mobile-only usage.
Hispanics were also heavily mobile-phone-only, at 60%.
Voice search use rising
In publicity for their MindMeld API product, Expect Labs (SSN, January 2015, p. 24) cites the
following statistics:
§ Major search engines are seeing as much as 10% of their traffic coming from voice.
§ Over the past year, Google voice search use more than doubled.
§ In a survey of U.S. smartphone users, 55% of teens and 41% of adults use voice search every
day.
§ Search experts predict that, within five years, over half of all global search traffic will be driven
by voice.
The source of these figures was not provided.
Google will take 55% of search ad dollars globally in 2015
Google will take 55% of search ad dollars globally in 2015 and Baidu, the next largest player,
will take 8.8%, according to eMarketer. Digital ad spending worldwide will reach $170.85 billion
in 2015, according to new estimates from eMarketer. This year, search ads will account for $81.59
Speech Strategy News
May 2015
32
billion worldwide, an increase of 16.2% over 2014. By 2019, search ad spending will reach $130.58
billion globally, still growing at nearly 10% year over year.
Mobile ad spend to top $100 billion worldwide in 2016, 51% of digital ad market
The global mobile advertising market will surpass $100 billion in spending and account for more
than 50% of all digital ad expenditure for the first time in 2016, according to eMarketer. The US
and China will drive growth in the short term, accounting for nearly 62% of mobile ad spending
worldwide next year. The mobile ad market will grow to $196 Billion in 2019, 70.1% of digital ad
expenditure.
Facebook accounts for three-quarters of global social network ad spend
The global social network market continued to show strong growth in 2014, according to
Strategy Analytics’ Global Social Network Forecast. According to the report, globally, social
networks surpassed 2 billion users for the first time in 2014, of which Facebook accounted for 68%.
North America had the highest ratio of social network users to its population (64%) in 2014,
followed by Western Europe at 55%, but China accounts for almost 25% of global social network
users with 495 million users in 2014.
Ad spend on social networks grew 41% globally in 2014 totaling over $15.3 billion, accounting
for 11% of global digital ad spend. Facebook accounted for three-quarters of global social network
ad spend in 2014, while Twitter accounted for 8%. In 2015, ad spend on social networks is expected
to grow by 29%, totaling $24.2 billion.
Robotics sales flourish
In 2014, global robotics sales exceeded 200,000 units, surpassing the previous year’s 180,000
units, according to the International Federation of Robotics. The trend is expected to continue:
Global robotics spending should grow from US$15 billion in 2010 to US$67 billion in 2025, the
Boston Consulting Group reports.
US ad spending in 2015
According to eMarketer, ad spending in 2015 in the US will break down as follows:
(Billions)
Search ad spending
Display ad spending
Mobile
12.85
14.67
Desktop
12.82
12.38
Total
25.66
27.05
“Augmented reality” predicted to be four times bigger than “virtual reality” by 2020
Digi-Capital has released a report on augmented reality (AR), where glasses can add a virtual
overlay of the real world (e.g., Google Glasses), and virtual reality (VR), where the wearable
immerses you in the virtual world. VR and AR headsets both provide stereo 3D high definition
video and audio. Digi-Capital admits the difficulty of forecasting a market that is in such early
stages, but nevertheless forecast that AR/VR could hit $150B revenue by 2020, with AR taking the
lion’s share around $120 billion and VR at $30 billion.
Web self-service surpasses phone in customer service channel preference
According to a new report from Forrester Research, consumers now rely on self-service more
than they rely on phone calls, with Web self-service use rising from 67% in 2012 to 76% in 2014.
The phone, on the other hand, has remained stagnant at 73% during the same span of time.
Presumably, these numbers reflect the fact that a user can use both channels at different times.
Contact centers are facing challenges as they transition into a new, self-service-driven business
model. When it comes to delivering support via chat and social media, for example, companies
aren’t deploying a proportional amount of resources to meet demand, Kate Leggett, Forrester analyst
Speech Strategy News
May 2015
33
and report coauthor, says. Two-thirds of contact centers offer chat support, and more than half
provide service through social media, but those percentages should be higher, she explains. When
these services are available, customers respond favorably—only 10% of chat users and 25% of
Twitter users are dissatisfied with the support functionality of these channels.
Microphone market to reach $1.81 million by 2020
According to new market research by MarketsandMarkets, the total microphone market will be
worth $1.81 Billion by 2020 at an estimated CAGR of 7.5%. The major drivers for the microphone
market, according to the research report, are the increasing demand of consumer electronics devices,
low-cost and compact size of MEMS microphones, and increased number of microphones per
device, among others. There are restraints in the market such as difficult packaging and integration.
The increasing demand from emerging economies is one of the key opportunities for the microphone
market.
60% of consumers self-install smart home devices, but majority would prefer professional
assistance
Newly released Parks Associates research reports 84% of U.S. broadband households set up
their entertainment and computing devices on their own, while 60% of U.S. broadband households
set up their smart home devices on their own. The report finds that despite this independent
onboarding process, consumers’ need for tech support persists and will increase as they bring more
connected devices into the home.
“Consumers’ home networks are rapidly expanding through the adoption of complex connected
devices,” said Patrice Samuels, Research Analyst, Parks Associates. “For example, 27% of U.S.
broadband households owned a connected health device by the end of 2014. As consumers embrace
new categories of devices, support needs will increase dramatically. Support providers must invest
in new tools and solutions that minimize the burden on support resources.”
Tech support is a key factor in tying together the Internet of Things, according to Parks
Associates. Approximately 60% of U.S. broadband households have concerns over the device
security and data security when using connected devices.
Artificial Intelligence for enterprise applications to reach $11.1 billion in market value by
2024
According to a new report from Tractica, the market for enterprise AI systems will increase
from $202.5 million in 2015 to $11.1 billion by 2024. The market intelligence firm forecasts that
enterprise AI deployments will also drive significant investments in professional services such as
installation, training, customization, integration, and maintenance, along with additional spending on
IT hardware and services including computing power, graphics processor units (GPUs), networking
products, storage, and cloud computing. The technologies by the new report include cognitive
computing, deep learning, machine learning, predictive APIs, natural language processing, image
recognition, and speech recognition.
“While artificial intelligence has been just beyond the horizon for decades, a new era is
dawning,” says principal analyst Bruce Daley. “Systems modeled on the human brain such as deep
learning are being applied to tasks as varied as medical diagnostic systems, credit scoring, program
trading, fraud detection, product recommendations, image classification, speech recognition,
language translation, and self-driving vehicles. The results are starting to speak for themselves.”
“Cognitive computing” market projected to grow at 38% CAGR to 2019
MarketsandMarkets issued a market study on “cognitive computing,” described as systems that
“work on the principle of the neocortex, a part of human brain that helps humans in effective
decision making on the basis of contextual and behavioral analysis.” More specifically, technologies
include natural language processing (NLP), machine learning, and, automated reasoning; and the
Speech Strategy News
May 2015
34
deployment model includes on-premises and cloud. In 2014, natural language processing (NLP)
accounted for the largest market share followed by machine learning, the company said. The
cognitive computing market is expected to grow from $2.5 billion in 2014 to $12.6 billion by 2019,
representing a Compound Annual Growth Rate of 38.0% from 2014 to 2019.
Financial Notes
Blinkx acquires All Media Network
blinkx pioneered Internet Video Search using its COncept Recognition Engine (CORE), which
leverages speech recognition and text and image analysis to understand the meaning and context of
video content to generate improved search relevancy for consumers and a brand-safe environment
for advertisers. On April 16, blinkx announced the completion of its acquisition of All Media
Network. Through All Media, blinkx gains access to a number of premium consumer properties,
including Sidereel.com, Allmusic.com, Allmovie.com and Celebified.com. blinkx expects All
Media, an all cash transaction that was funded in late March 2015, to be earnings accretive within
the first full year post acquisition. All Media represents a step forward in the company's strategy to
build a complete digital advertising platform for brand-safe, cross-screen advertising at scale.
Adacel acquires CSC’s NexSim ATC simulator business
On April 15, Adacel and Computer Sciences Corporation (CSC) announced they have signed
and closed a definitive agreement for the sale of the CSC NexSim ATC simulator line and associated
support services. Terms of the agreement were not released, but effective immediately Adacel has
acquired full rights to the NexSim product line and source code from CSC. Adacel is a developer of
advanced simulation and training solutions, speech recognition applications, and operational air
traffic management systems.
PeerTV acquires an interest in Speech Modules that gives it some exclusive rights outside
Israel
PeerTV on April 16 revealed a deal to cooperate with Speech Modules Holdings, which is
listed on the Tel Aviv Stock Exchange, in a deal that will see the two companies issue each other
new shares valued at GBP200,000. PeerTV will have exclusive rights in all countries outside of
Israel to market software and hardware products and services based on Speech Modules’ technology.
PeerTV is a vendor of end-to-end technologies and solutions for OTT (Over-The-Top) TV
market. The company combines standard cable and satellite TV features, such as live TV and Videoon-Demand with new features, such as Internet browsing, social networking, gaming, and voice
communication.
The company uses Speech Modules speech recognition technology in a remote control (SSN,
February 2015, p. 14). PeerTV said that development of the advanced remote control units to be
used in home entertainment systems, including TV, audio, and games is proceeding as planned, and
PeerTV expects to start presenting that new product to customers later in Q2 2015.
PeerTV will issue Speech Modules shares equivalent to a 15.4% stake in the company, while
Speech Modules will issue Peer with 7.3 million shares at NIS0.16 per share. PeerTV will therefore
have a 15.66% stake in Speech Modules and be able to appoint a director to Speech Modules’ board.
Speech Modules can sell the shares issued to it by PeerTV, although there will be restrictions in
place to prevent more than a quarter of the shares being sold in the space of one month.
Speech Strategy News
May 2015
35
SensorSuite raises capital for its wireless monitoring and energy saving solutions for large
buildings
SensorSuite Inc. provides wireless monitoring and energy saving solutions for multi-residential,
commercial, and industrial buildings. The company announced the successful closing of funding
from Extreme Venture Partners, BDC Capital, and private angels.
SensorSuite’s system overcomes the traditional installation barriers to retrofitting existing
buildings with energy saving control systems – the rewiring. When compared with wired solutions,
SensorSuite delivers significantly shorter payback periods, especially in retrofits, and in some cases
is the only viable solution for electrically heated buildings. “Commercial and industrial buildings are
emerging as one of the largest opportunities for real-time analytics and intelligent control
applications,” said Robert Platek, Founder and CEO of SensorSuite. “SensorSuite’s innovations
enable property managers to save time while conserving energy.”
People
Interactions adds to management team, including former AT&T research personnel
Interactions announced the addition of five new members to its executive team, following the
2014 acquisition of the AT&T Watson speech recognition and natural language interpretation
technology and research program. Best known for its industry-leading customer care virtual assistant
solutions, Interactions is expanding its core solutions by developing technology to enable the
“Interface of Things,” a new generation of speech, touch, and text-enabled interfaces.
As a result of this recent growth, the following individuals have joined Interactions’ executive
team:
§ Jay Wilpon will serve as Interactions’ Senior Vice President of Natural Language Research.
With more the 150 published papers and patents in speech and natural language research to his
name, Wilpon is one of the world’s pioneers and a chief evangelist for speech and natural
language technologies and services.
§ David Thomson joins Interactions as the Vice President of Speech Research, where he manages
Interactions’ research and development teams.
§ Ben Stern is joining Interactions as Vice President of Software Systems R&D. With more than
25 years of experience advancing speech recognition and language understanding technologies,
including the development and delivery of the AT&T Watson engine, Stern is tasked with the
continued development of the Watson technology and professional services, including
packaging, performance tuning, features and enhancements.
§ Mahesh Nair has been promoted to Vice President of Engineering, bringing nearly 20 years of
experience designing and developing massively scalable systems in the Internet of
Things/Machine to Machine, Interactive Voice Response, Business Process Management,
Business-to-Business and financial domains.
§ Jane Price has been elevated to the position of Vice President of Marketing. As Vice President
of Marketing, Price oversees product and go-to-market strategy, and Interactions’ corporate
brand and communications.
David Stone joins Inference as VP Sales, APAC
David Stone has joined Inference as VP Sales, APAC. Stone’s most recent role was with
Telstra as Group Manager for Contact Centre Solutions Sales. He will be based in Sydney, but will
be available throughout the Asia Pacific region.
Speech Strategy News
May 2015
36
Fonolo adds John Gengarella to its Advisory Board
Fonolo (p. 23) announced the addition of John Gengarella to its advisory board. Gengarella was
formerly CEO of Voxify, an enterprise SaaS solution for speech self-service. Following Voxify’s
merger with 24/7, he led the acquisition of Tellme from Microsoft and served as the Chief Revenue
Officer of 24/7. Gengarella previously held senior positions at Siebel Systems and Oracle.
Attensity appoints Cary Fulbright as Chief Strategy Officer
Attensity has appointed Cary Fulbright as Chief Strategy Officer. Attensity uses semantic
technologies and context-based discovery to analyze more than 150 million data sources. Fulbright
joins from Salesforce, where he served as Chief Strategy Officer. In his new role, Fulbright will
oversee a number of initiatives, including the formalizing of the company’s planning processes.
The firm has also promoted James Purchase to the role of Vice President of Business
Development from Vice President of Product Management; and Nick Arnett has been promoted to
fill the position as Director of Product Management.
For Further Information on Companies Mentioned in this Issue
Company
24/7 Inc. ([24]7)
A5 Technologies
Accusonus
Acusis
Adacel Inc.
Adaptive A.I. Inc
Advanced Media Inc.
Aizan Technologies
Alelo Inc.
Alibaba
All Media Network
Amazon
Amazon Web
Services
Location
Business
Campbell, CA Customer service
solutions
Dublin,
Language learning
Ireland
software
Kastritsi,
Speech and music
Greece
processing tools
Pittsburgh,
Medical transcription
PA
services
Orlando, FL
Defense and air traffic
control solutions
El Segundo,
AI technologies used
CA
in customer service
and other applications
Tokyo, Japan Asian speech
technology
Richmond
Hosted voice solutions
Hill, ON,
Canada
Los Angeles, Language training
CA
courses
China
E-Commerce
San
Digital content
Francisco,
CA
Seattle, WA
Product sales on the
Web (and more)
Seattle, WA
Technology
infrastructure platform
in the cloud
Contact info
www.247-inc.com
www.a5pro.com
www.accusonus.com
(412)209-1300; www.acusis.com
(407)581 1560; www.adacel.com
www.adaptiveai.com
+81 3 59581031; www.advancedmedia.co.jp
(905)882-5563; www.aizan.com
(310)945-5985; www.alelo.com
www.alibaba.com
http://allmedianetwork.com
www.amazon.com
http://aws.amazon.com
Speech Strategy News
May 2015
37
Companies mentioned in this issue
Company
Apache Mesos
Apache Software
Foundation
Apple
Applied Voice Input
Output Society
(AVIOS)
Artificial Solutions
AT&T
AT&T Labs
Attensity
Auraya Systems
Baidu, Inc.
BDC Capital
Blinkx
Boston Consulting
Group
Brainasoft
Cadence Design
Systems, Inc.
CallFinder
Cepstral, LLC
Location
Business
Distributed Systems
Kernel
—
Open-source software,
including NLP
Cupertino,
Personal computers,
CA
music players, wireless
phones
San Jose, CA Non-profit organization
supporting quality
speech application
development
Stockholm,
Virtual assistant
Sweden
development tools
San Antonio, Telecommunications
TX
services
Florham
Speech research
Park, NJ
Saarbrücken, Semantic annotation
Germany
Canberra,
Speaker authentication
Australia
—
Beijing, China Web search in
Chinese
Wakefield,
Financing
MA
San
Audio-Video Search
Francisco,
CA, and
London, UK
New York,
Business consulting
NY
Personal assistant
software for Windows
PC
Berkshire, UK Chip design
Burlington,
VT
Pittsburgh,
PA
ChatGrape
CogniToys (Elemental -Path)
Computer Sciences
El Segundo,
Corporation (CSC)
California
Call Recording and
Speech Analytics
TTS engine
Communications
application
Interactive toys
Healthcare information
systems
Contact info
http://mesos.apache.org
www.apache.org;
http://opennlp.apache.org
www.apple.com
(408)323-1783; www.avios.com
+46 8 663 54 50; www.artificialsolutions.com
www.att.com;
www.wireless.att.com
(973)360-8127;
www.research.att.com
+49 681 857 670;
www.attensity.com
+61 2 6201 5253;
www.auraya.net;
www.armorvox.com
http://ir.baidu.com
(781)928-1100;
www.bdcnewengland.com
(415)655-1450; +44 20 8906
6857; www.blinkx.com;
www.blinkx.tv
www.bcg.com
www.brainasoft.com
+44.1344.360333;
www.cadence.com
(800)639-1700;
www.mycallfinder.com
(412)432-0400;
www.cepstral.com
https://chatgrape.com
www.elementalpath.com
(310)615-0311; www.csc.com
Speech Strategy News
May 2015
38
Companies mentioned in this issue
Company
Convergys
Corporation
Location
Cincinnati,
OH
CPqD
Digi-Capital
Brazil
San
Francisco,
CA
Lyndonville,
NY
Digital Alert Systems
(division of Monroe
Electronics)
eBay
eMarketer
EPIC Connections
Ericsson
Expect Labs
Extreme Venture
Partners
Facebook
Federal
Communications
Commission (FCC)
Fonolo
Forrester Research
Fujitsu
Geppetto Avatars
Getty Images
GfK MRI
Business
Customer care and
employee-benefit
solutions
Research institute
Consulting and market
analysis
Contact info
(513)723-7153;
www.convergys.com
Emergency Alert
Systems
(585)765-1155;
www.digitalalertsystems.com
www.cpqd.com.br/en_us
http://www.digi-capital.com/
San Jose, CA Internet auctions
New York,
Market research
NY
Omaha, NE
Contact center
consulting and
outsourcing services
Stockholm,
Telecommunications
Sweden
products
San
Natural language
Francisco,
interpretation and
CA
MindMeld API
Toronto, ON, Venture capital firm
Canada
Palo Alto, CA Social web service
Washington, US government
DC
agency
www.ebay.com
(212)763-6010;
www.emarketer.com
(402)884-4700;
http://epicconnections.com
Toronto, ON, IVR navigation service
Canada
Cambridge,
Market research
MA
Tokyo, Japan Information and
communication
technology (ICT)based business
solutions
Berkeley, CA AI-based platform
or Milwaukee, providing human-like
WI
avatars
Chicago, IL
Image licensing
(416)366-2500; www.fonolo.com
New York,
NY
Global Tel*Link (GTL) Reston, VA
Market research
Correctional
technology
www.ericsson.com
www.expectlabs.com
www.extremevp.com
www.facebook.com
www.fcc.gov
(617)613-6000;
www.forrester.com
+81-3-6252-2220;
www.fujitsu.com
www.geppettoavatars.com
(312)344 4500;
www.gettyimages.com
(212)884-9200; www.gfkmri.com
www.gtl.net
Speech Strategy News
May 2015
39
Companies mentioned in this issue
Company
Google
Location
Mountain
View, CA,
and
Cambridge,
MA
Grand View Research San
Francisco,
CA
HP (Hewlett-Packard) Palo Alto, CA
I PRINT N MAIL
IBM
ICICI Bank
Inference
Communications
San
Francisco,
CA
New York,
NY
Somers, NY
India
Victoria,
Australia
Infobip
London, UK
Intel Corporation
Santa Clara,
CA
Northbrook,
IL
Franklin, MA
IAC
Intelligent Medical
Objects (IMO)
Interactions
Corporation
Interactive
Intelligence Group
Inc.
International
Computer Science
Institute (ICSI)
International
Federation of
Robotics
Johnson & Johnson
Lyft
Indianapolis,
IN
Berkeley, CA
Frankfurt,
Germany
Business
Voice and directory
search
Contact info
(650)253-0000; www.google.com;
www.google.com/mobile;
www.grandcentral.com
Market research
www.grandviewresearch.com
Computer and
software products and
consultiong
Direct mail marketing
www.hp.com
Web sites such as
Dictionary.com
Information systems
Bank
Speech recognition
and voice automation
applications
Mobile messaging and
payments services
Semiconductors
http://iac.com
Clinical interface
terminology databases
Virtual agent services
for call centers and
speech recognition
technology
Unified
Communications and
IVR
Research institute
(847) 272-1242; www.imoonline.com
(317)810-2800;
www.interactions.net
Industry organization
www.ifr.org
New
Healthcare products
Brunswick,
NJ
Spokane, WA IT support
http://nextgen.iprintnmail.com
(877)426-3774; www.ibm.com
www.icicibank.com
+61 1300 191 431;
www.inferencecommunications.co
m
www.infobip.com
www.intel.com
(317)872-3000; www.ININ.com
www.icsi.berkeley.edu
(732)524-0400; www.jnj.com
(509)789-5750;
www.lyftsolutions.com
Speech Strategy News
May 2015
40
Companies mentioned in this issue
Company
M*Modal
Location
Franklin, TN
MarketsandMarkets
Dallas, TX
Master Mobile
Products
Meditech (Medical
Information
Technology, Inc.)
Apple Valey,
CA
Westwood,
MA
Medtronic
Dublin,
Ireland
Redmond,
WA
Japan
Microsoft
Ministry of Internal
Affairs and
Communications
MongoDB
Mphasis (an HP
Company)
National Institute of
Standards and
Technology (NIST)
National Science
Foundation (NSF)
NewsHedge
New York,
NY
India
(781)821-3000;
www.meditech.com
Various applications,
products, and services
Government
organization
(206)454-2030;
www.microsoft.com
www.soumu.go.jp/english/index
Database
www.mongodb.org
IT services
www.mphasis.com
www.medtronic.com
Research organization
www.nsf.gov
Financial news alert
service
Audio content search
(312)532-9833;
www.newshedge.com
(404)495-7220; www.nexidia.com;
www.nexidiatv.com
+972 9 775-3777; www.nice.com
Ra'anana,
Israel
Franklin, TN
Burlington,
MA
Oracle Corp.
Integrated software
solutions for
healthcare
organizations
Medical technology
Washington,
D.C.
Chicago, IL
NICE Systems
OKI Electric Industry
Co.
Openstream, Inc.
(888)600-6441;
www.marketsandmarkets.com
www.medmastermobile.com
www.nist.gov
Atlanta, GA
Numenta
Medical dictation
Contact info
(800)233-3030;
www.mmodal.com
Gaithersburg, Research awards and
MD
testing
Nexidia
Nissan USA
Nuance
Communications
Business
Speech recognition
technology for
healthcare
transcription
Market research
Multimedia analytics
Vehicle manufacturer
Speech technology,
applications, and
services
Machine intelligence
www.nissanusa.com
(617)428-4444; www.nuance.com
Redwood
City, CA
Tokyo, Japan Text-to-speech
(650)369-8282;
http://numenta.com
+81 3 3580 8950; www.oki.com
Edison, NJ
Mobile Internet
infrastructure platform
and applications
Business software and
hardware systems
(732)507-7030;
www.openstream.com
Redwood
Shores, CA
(650)506-7000; www.oracle.com
Speech Strategy News
May 2015
41
Companies mentioned in this issue
Company
Orion
Communications
Parks Associates
PeerTV
Peterbilt Motors
Company
Philips Hue
Precyse
Precyse University
ReadSpeaker
Roku Inc.
Salesforce
Samsung Electronics
SensorSuite
Sensory, Inc.
Siebel Systems
SiriusXM
SK Telecom
SmartAction
SoundHound
Space Ape Games
Speech Modules
Strategy Analytics
Location
Dallas, TX
Dallas, TX
Business
Workforce
management software
Market research
Contact info
www.orioncom.com
(972)490-1113;
www.parksassociates.com
Petach-Tikva, TV solutions
+972 9 740 7315;
Israel
www.peertv.com
Denton, TX
Infotainment system
(940)591-4000;
www.peterbilt.com
-LED light bulbs
www2.meethue.com/en-XX
Wayne, PA
Medical transcription
(610)688-2464;
solutions using speech www.precyse.com
recognition
Wayne, PA
Professional education www.precyse.com/precyseunivers
arm of Precyse
ity
Uppsala,
Voice-on-the-web
+46 18 60 44 94;
Sweden
services
www.readspeaker.com
Saratoga, CA Streaming Media
www.roku.com
Player
San
CRM and sales
(415)901-7000;
Francisco,
support software
www.salesforce.com
CA
Seoul, South Wireless telephones
www.samsung.com
Korea
and TVs
Mississauga, Wireless monitoring
www.sensorsuite.com
ON, Canada and energy saving
solutions
Santa Clara, Embedded speech
(408)625-3300;
CA
recognition and
www.sensory.com
speaker ID
San Mateo,
Customer relationship (650)295-5000; www.siebel.com
CA
management software
New York,
Satellite radio
www.siriusxm.com
NY
Seoul, South Mobile service provider www.sktelecom.com
Korea
El Segundo,
Customer service
(310)776-9200;
CA
automation
www.smartaction.com
San Jose, CA Music identification
(408)441-3200;
and search
www.soundhound.com
London, UK
Mobile and tablet
www.spaceapegames.com
games
Nes Ziona,
Speech recognition
+972 73 222 5555;
Israel
technology
www.speechmodules.com
Newton, MA
Market reports
617 614-0700;
www.strategyanalytics.net
Speech Strategy News
May 2015
42
Companies mentioned in this issue
Company
Suddenlink
TMA Associates
-Tarzana, CA
Tractica
Boulder, CO
Translate Your World
Twitter
VERBATIM-VR
Atlanta, GA
-Hadera,
Israel
Dover, NH
VXI Corporation
Location
Business
Cable operator
Consulting, market
studies, newsletters,
and conferences in
business implications
of speech and
telephone technology
Market intelligence
focused on human
interaction with
technology
Translation services
Social text service
Speech-to-text
vocabulary software
Microphones and
accessories
Home automation
WeMo (Belkin)
Playa Vista,
CA
WinScribe
Chicago, IL
Dictation solutions
x.ai
New York,
NY
Moscow,
Russia
Aliso Viejo,
CA
AI-based virtual
assistant
Search and cloud
speech recognition
Voicemail-to-text
service
Yandex
YouMail, Inc.
Contact info
www.suddenlink.com
(818)708-0962; www.tmaa.com
(303)248-3000; www.tractica.com
www.translateyourworld.com
www,twitter.com
www.VERBATIM-VR.com
(603)742-2888; www.vxicorp.com
www.belkin.com/us/Products/hom
e-automation/c/wemo-homeautomation
(866)494-6727;
www.winscribe.com
https://x.ai
www.yandex.com
(800)374-0013; www.youmail.com
Speech Strategy News
May 2015
43
Blog (with a chance to comment!)
The Software Society (www.thesoftwaresociety.com)
THE HUMAN-COMPUTER CONNECTION
The role of the Top 1% in reducing income inequality
The US stock market: Good as gold?
Amazon Echo: Why you'll want one
Is the US stock market the new gold standard?
Apple’s next big thing isn’t a thing
Can Artificial Intelligence create a new non-technical job category?
Will mobile apps redefine the Web?
Artificial Intelligence: Hype or the next big thing?
Expanding brainpower—The next phase of economic growth?
The Productivity Paradox: Efficiency without Jobs?
Don’t underestimate Microsoft
Is speech recognition on mobile phones a big deal, or just a gimmick?
Chain reactions in technology
I wish to subscribe to Speech Strategy News for one year (12 issues), payable in US$ on US bank—
Individual*
Corporate*
Individual*
Corporate*
PDF
PDF
PDF
PDF
6 monthly issues
6 monthly issues 12 monthly issues 12 monthly issues
$215
$750*
$425
$1,495*
* Corporate subscriptions: Unlimited users within a corporation for PDF version with Web access through corporate
password. Individual subscriptions cannot be shared (neither passwords nor electronic copies).
Please invoice me.
Or go to www.tmaa.com/subscribetossn
Please send information on your consulting.
Name:
Company:
Address:
Check enclosed, payable to TMA Associates
(in U.S. $ on a U.S. bank).
Invoice me.
Charge my—
Visa MasterCard American Express
City, State
ZIP/Postal code
Card #
Country
Expiration date:
Email (required for email alerts or a Web subscription):
Signature:
_______________________________________________
Phone:
Copyright TMA Associates 2014; All rights reserved. TMA Associates, P.O. Box 570308, Tarzana, CA 91357-
0308 USA. Tel: (818) 708-0962.
263
Speech Strategy News is published twelve times per year by TMA Associates, Editor: William S. Meisel. Trademarks mentioned in this publication
are the property of the companies mentioned; they are used editorially. The material herein is based on data from sources believed to be reliable,
but is not guaranteed as to accuracy and does not purport to be complete. From time to time, the author or TMA Associates may have consulting
assignments, advisory positions, own stock, or have other business relations with organizations in speech recognition and associated areas,
including companies discussed in this newsletter. Speech Strategy News is a trademark of TMA Associates.