Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
May 2015 ISSN 1932-8214 Editor, William Meisel Google adds App Indexing Lets Google index apps like websites and search within them In a talk at the Mobile Voice Conference in Vemuri explained to Speech Strategy News: April, “Grow with the Future of Search and “App Indexing lets Google index apps just Apps,” Sunil Vemuri, Product Manager, like websites. Deep links to your Android Google, noted that an increasing number of app appear in Google Search results, letting searches are being done within apps on mobile users get to your native mobile experience devices rather than through web browsers, quickly, landing exactly on the right content emphasizing that “mobile is not desktop”; within your app. And, if users don't have Google is supporting this trend with an “app your app installed, Google will surface an indexing API” that allows independent apps install button, directing them to the Play with search functionality to be directly searched Store, where they can install your app.” from within Google searches (including voice Further information on the app indexing API is search) without first launching the independent at https://developers.google.com/app-indexing. Continued on page 20 app. David Nahamoo of IBM discusses “Cognitive Computing” Automating and augmenting human intelligence to deal with “big data” David Nahamoo, IBM Fellow, Conversational inflection point of the growth of “big data,” with Systems, Watson Group, IBM, gave a talk on data from sensors/devices and social media “Cognitive Computing: Automating and leading the charge. He noted that the exponential Augmenting Human Intelligence” at the Mobile growth of Moore’s Law, which has changed our Voice Conference in April organized by AVIOS lives significantly, has physical limits, while the and Bill Meisel. Nahamoo claimed, despite the growth of data doesn’t. He said that “data is the growth of data on the Web, that we are at an Continued on page 21 SmartAction launches “Intelligent Voice Automation” for customer service Conducts a context-sensitive dialog rather than using a pre-structured menu Last month’s Editor’s Notes suggested that In a statement, the company said that a companies should design a “Customer traditional hosted IVR system generally lacks an Involvement System” (CIS), not an Interactive underlying intelligence that directs the Voice Response (IVR) system, given the lack of conversation and cannot link deeply with a flexibility and long series of menus associated company’s database. This limits the breadth of with classical IVR systems. SmartAction in applications that can be implemented without April suggested something along the same lines, annoying customers with cumbersome menus or an “Intelligent Voice Automation” (IVA) system becoming a programming challenge. Continued on page 21 that “provides the highest level of completion with artificial intelligence call automation.” Recent blogs at TheSoftwareSociety.com (on the human-computer connection): The role of the Top 1% in reducing income inequality The US stock market: Good as gold? Speech Strategy News May 2015 2 Table of Contents Google adds App Indexing Lets Google index apps like websites and search within them David Nahamoo of IBM discusses “Cognitive Computing” 1 1 1 Automating and augmenting human intelligence to deal with “big data” 1 SmartAction launches “Intelligent Voice Automation” for customer service 1 Conducts a context-sensitive dialog rather than using a pre-structured menu 1 Editor’s Notes 4 Machine intelligence requires human intelligence to be effective 4 Bill Meisel, Publisher & Editor 4 Mobile Voice Conference shows rapid progress in “the intelligent connection” 5 Always available natural language connection to computing resources and information Amazon Web Services announces Amazon Machine Learning Make predictions based on a database with no machine learning experience 5 8 8 8 8 Indian bank to use Nuance voice authentication 9 9 GTL addresses inmate identification on calls with biometric voiceprints 9 Detects in realtime changes in speaker during a call 9 NewsHedge launches financial news service 10 Translate Your World launches new version 10 Runs as an app in a desktop browser using text-tospeech for audible alerts 10 Applicable to teleconferences, education, and consultations with customers in stores x.ai creates a virtual assistant for scheduling meetings 14 Physician-friendly CDI software and services deliver improvements in medical documentation 14 Winscribe releases Quick Speech Recognition for healthcare professionals 15 Immediate speech recognition shows results to person dictating 15 Nuance has new clinical documentation tools for mobile devices and wearables 16 Joins Samsung at HIMSS to preview new dictation capabilities on Samsung Gear S Watch 16 MedMaster Mobility interprets physician dictation into mobile devices 17 250,000 movies and TV episodes available for streaming 11 11 12 17 NIST machine learning challenge for language recognition 17 Based on the i-vector paradigm 17 Fujitsu introduces a communications tool for the hearing impaired 18 Speech converted to text in real time in meetings or classrooms 18 New NissanConnect Services program set to launch on 2016 Nissan Maxima 18 8.0-inch color display with multi-touch control and speech recognition 18 Brainasoft offers personal assistant for controlling a Windows PC 19 Speak or type text to do tasks such as play music, open programs, or dictate text 19 IBM tests Numenta’s “brain algorithms” Machine intelligence based on “principals of the Translates voice or text into 34 languages in real time10 neocortex” Fujitsu ties written material to a spoken presentation 13 Roku streaming TV player adds voice search for content 17 7 Authentication doesn’t require specific passwords Voice search and 1-click purchasing M*Modal launches Clinical Documentation Improvement platform 6 Auraya Systems releases “universal speaker recognition” system Targeted at small- and medium-sized businesses Amazon introduces shopping app for Apple watch13 Integrates Nuance healthcare speech recognition and NLP 17 Speech analytics applied to optional customer feedback after a call Single license for all functionality App identifies spam calls and gives them a “number disconnected” message 13 6 NICE Systems announces analytics to improve the IVR experience 7 CallFinder speech analytics to be available to customers of EPIC Connections YouMail provides an answer for spam calls on mobile phones 13 20 20 VERBATIM-VR improves speech recognition by letting users report errors 20 Tunes speech-to-text for individual companies 20 News briefs .............................................................. 22 Speech recognition, image recognition, and machine learning top Google CEO’s list of more important Ericsson launches closed captioning service in US12 projects .........................................................................22 Live subtitling for broadcasters and operators using Sensory CEO discusses how “deep learning” relates to speech recognition 12 privacy .......................................................................... 22 Gates notes Microsoft’s 40th anniversary ..................... 22 Assistant analyzes email communications using NLP12 Speech Strategy News May 2015 3 Orion adds IVR capabilities to its public sector Tencent develops smartphone operating system ........ 30 workforce management software............................... 22 Interactive Intelligence launches cloud services in Mphasis to use Artificial Solutions natural language Australia and New Zealand ......................................... 30 technology in customer support ................................. 23 NSF grant supports Alelo research in teaching language Convergys Analytics and Nexidia announce partnership23 and cross-cultural communication with avatars and Cable operator selects Fonolo call-backs to improve the robots ........................................................................... 30 customer experience ................................................... 23 Nuance voice biometrics chosen by SK Telecom ........ 23 Statistics and Surveys ............................................. 31 I PRINT N MAIL analyzes responses from direct mail Mobile advertising revenue will top $60 billion globally with speech-recognition tools ..................................... 23 in 2019 ......................................................................... 31 Nuance, MEDITECH, and IMO collaborate to automate Speech Analytics market reviewed ............................... 31 patient problem lists and support regulatory reporting24 44% of US adults live in mobile-phone-only households31 Lyft expands its offerings to include Nuance’s Dragon Voice search use rising .................................................. 31 Medical Practice Edition 2 for otolaryngologists ....... 24 Google will take 55% of search ad dollars globally in Acusis service enters patient encounter summaries into 2015 ............................................................................. 31 an Electronic Health Record ....................................... 24 Mobile ad spend to top $100 billion worldwide in 2016, Accusonus speech enhancement available for Cadence 51% of digital ad market ............................................. 32 DSPs ............................................................................. 25 Facebook accounts for three-quarters of global social OKI microphone technology picks up sound in specific network ad spend ........................................................ 32 areas, using two microphone arrays .......................... 25 Robotics sales flourish ................................................... 32 VXi Bluetooth headset includes noise-cancelling and US ad spending in 2015 ................................................ 32 voice prompts .............................................................. 25 “Augmented reality” predicted to be four times bigger Intel shows prototype smartphone with reduced-size than “virtual reality” by 2020 ..................................... 32 RealSense technology ................................................. 26 Web self-service surpasses phone in customer service CPqD biometric authentication and Brazilian channel preference ..................................................... 32 Portuguese speech recognition available on a new Microphone market to reach $1.81 million by 2020 .. 33 IBM chip ....................................................................... 26 60% of consumers self-install smart home devices, but New Tensilica Fusion DSP from Cadence Design majority would prefer professional assistance .......... 33 Systems features low energy use ............................... 26 Artificial Intelligence for enterprise applications to reach Microsoft’s browser update said to include Cortana ... 26 $11.1 billion in market value by 2024 ...................... 33 Microsoft’s Skype Translator test preview adds new “Cognitive computing” market projected to grow at 38% languages and other options ...................................... 26 CAGR to 2019 .............................................................. 33 Getty Images and Microsoft partner to add images to products like Bing and Cortana .................................. 27 Financial Notes ...................................................... 34 Cortana to recommend movies ..................................... 27 Blinkx acquires All Media Network ................................ 34 Amazon Echo can now be used to control WeMo and Adacel acquires CSC’s NexSim ATC simulator business34 Hue home devices ....................................................... 27 PeerTV acquires an interest in Speech Modules that Amazon Echo adds podcasts ......................................... 27 gives it some exclusive rights outside Israel ............. 34 Siri’s synthetic voice gets some improvements ........... 27 SensorSuite raises capital for its wireless monitoring Dictionary.com app supports Apple Watch with speech and energy saving solutions for large buildings ........ 35 recognition to display definitions................................ 27 SoundHound + LiveLyrics offers new Apple Watch app28 People ...................................................................... 35 Apple moves to Siri back-end built on open-source Apache Mesos platform .............................................. 28 Interactions adds to management team, including former AT&T research personnel................................ 35 IBM teams with Apple and others on AI health program using Watson ............................................................... 28 David Stone joins Inference as VP Sales, APAC ........... 35 Fonolo adds John Gengarella to its Advisory Board ..... 36 Digital Alert Systems adds enhanced multilingual alerting for Emergency Alert Systems ........................ 28 Attensity appoints Cary Fulbright as Chief Strategy Officer ........................................................................... 36 Peterbilt introducing next generation SmartNav infotainment system .................................................... 29 For Further Information on Companies Mentioned in Infobip adds inbound and outbound voice this Issue ................................................................. 36 communications to its mobile services cloud ........... 29 New Ford Galaxy includes SYNC 2 with voice control .. 29 Blog (with a chance to comment!) 43 CogniToys toy dinosaur can answer questions and more29 The Software Society (www.thesoftwaresociety.com).. 43 ChatGrape launches search engine for specific apps and documents ............................................................ 29 A5 Technologies uses speech recognition to teach English to Japanese speakers .................................... 30 Geppetto Avatars developing AI-based platform with avatars.......................................................................... 30 Speech Strategy News May 2015 4 Editor’s Notes Machine intelligence requires human intelligence to be effective Bill Meisel, Publisher & Editor “Machine learning” is being delivered as a and we have only 250,000 samples in each box Web service, with Amazon being the latest entry (assuming they are evenly distributed). Carry (p. 6). Such services seem to imply that a this forward to 20 variables, and there are 220 company can just ship data to the services and (1,048,576) boxes, and less than one data point get predictive algorithms with no expertise and (one labeled example) available for each box, minimal involvement. While there may be some hardly enough to give us confidence in the applications where this is the case, it is likely to statistical significance of the result (and, of be the exception. course, we’d like more resolution in each The key issue is how the data is described, the variable than half its range). This illustrates what variables used as inputs to the predictive has been called the “curse of dimensionality”; it algorithm. Often, just using raw data is has been recognized as a real problem in ineffective, and human understanding of the data empirical analysis since Richard Bellman, the is required to summarize it in terms of the inventor of “dynamic programming,” coined the “features” most relevant to making an accurate term in the 1960s. prediction. The objective is to describe the data Where human intelligence comes in is with fewer variables, but variables that defining variables that summarize key aspects of sufficiently describe the input conditions. To the input data. In some cases, statistical methods give an example familiar to most readers, speech can help. For example, “principal components” recognition algorithms don’t use raw digitized analysis can determine which of the original speech data, but reduce it to a description of the variables are linearly correlated and can provide frequency spectrum (cepstral coefficients) a description of the data with a lowered calculated for each 10- or 20-millisecond slice dimensionality with some loss of resolution. The of speech. We understand that it is the frequency successful “i-vector” approach (p. 17) uses this spectrum and its change over time that idea. And some methods of machine learning distinguishes phonemes, the elements of speech. use a layered approach (deep neural networks, Why not just use the raw data? Even a large for example) that could be interpreted as amount of data can be sparse from a statistical attempting to create such summarizing variables point of view if the dimensionality is high, if the in early layers. But we shouldn’t have blind faith data is described in terms of too many input in such interpretations; I suspect neural networks variables. The reason is related to the problem of would have a hard time simulating the exponential growth in the number of data cases calculation of cepstral coefficients, for example. labeled with an associated outcome necessary to At the mobile voice conference, one speaker maintain a given density of data points (and thus said that, in one deep neural network trained on statistical validity) as the number of variables speech, the early layers seemed to be simulating grow. a Fast Fourier Transform; my reaction would be, Consider, for example, that we have a million why not just begin with a FFT rather than an labeled examples. If they are described by only imperfect simulation? one variable, then there are 500,000 samples in The example I used of a 20-dimensional space each half of the variable’s range (to use a being very big is admittedly a bit misleading. I resolution much less than is usually the case). In assumed an even distribution of data throughout this case, we have plenty of examples to the space, which is not usually the case. Data calculate the difference in outcome when in the tends to “cluster” in certain regions and be first half of the variable’s range versus the sparse or even absent in others. Knowing this, second half. If we have two variables and divide one can use “cluster analysis” (“unsupervised each in half, there are four such ½ by ½ “boxes,” learning”) to find the a number of different areas Speech Strategy News May 2015 5 where the data is similar. These clusters correspond to similar cases, and may be classes or subclasses of what we label. The advantage is that unsupervised learning, by definition, doesn’t require data where the outcome we are seeking is known, so much more data may be available to find clusters than to find outcomes directly. In fact, a number of talks at the Mobile Voice Conference in April (following article) described training neural nets without labeling outcomes—clustering—and then using those on a smaller amount of labeled data to identify outcomes, with successful results. Some might argue that this approach is not an application of human intelligence, but just another statistical method. Perhaps the border between machine and human intelligence is not a high wall. As in the case of speech recognition, a deep understanding of the data and its meaning may be required to come up with effective summarizing features or effective methods of using limited labeled data. Human intelligence is a more important part of the process than terms like “machine intelligence” or “artificial intelligence” suggest. Mobile Voice Conference shows rapid progress in “the intelligent connection” Always available natural language connection to computing resources and information The fifth annual Mobile Voice Conference was held April 20-21 in San Jose, California, emphasizing the theme summarized in the graphic from the conference web site (above). The conference was created by the Applied Voice Input Output Society (AVIOS) with Bill Meisel, the writer of this newsletter and the Executive Director of AVIOS, creating the program. The key technology that transforms speech recognition into speech understanding is Natural Language Processing (NLP). Many of the papers reflected the growing maturity of this technology. The technology can be applied to text as well as voice, and presentations such as that by David Nahamoo of IBM took this broader view, speaking of “cognitive computing” (p. 1). A major theme was personal/virtual assistants: conversational agents, virtual assistants that ranged from general apps that go beyond web search and try to deal with anything you request to specialized assistants for delivering customer service or employee efficiency. Rob Chambers of Microsoft gave a keynote address, “Relationships with Personal Assistants, from the Assistants’ Point of View,” discussing how Cortana and similar personal assistants must be designed with an understanding of how the user will behave. Sunil Vemuri of Google indicated a major expansion of the company’s search functionality to searching within apps (p. 1). In a talk, Bill Meisel asserted that a general assistant like Google’s voice search, Apple’s Siri, or Microsoft’s Cortana will evolve into a universal interface that works on anything with a microphone and internet connection, unifying apps and devices with a personal assistant that adapts to an individual’s specific needs. He suggested that, to the degree this interface replaces web search, every company will be expected to have a company app, just as today they need a web site. That company app will Speech Strategy News May 2015 6 require a similar natural language interface for full compatibility. A number of talks at the conference described such specialized virtual assistants. For example, Raj Tumuluri of Openstream, a conference sponsor, discussed “An in-store virtual assistant for retail workers and the platform used to build it.” Other talks emphasized multimodal interfaces, combining voice and the graphical user interface of the device. Jeff Rogers of Sensory Inc., another conference sponsor, spoke on “Combining Voice and Vision for an Improved Sensory Experience.” Todd Mozer of Sensory revealed in a panel discussion that Sensory’s embedded speech recognition solutions will soon go beyond “wake-up” words and voice control to support more flexible interactions on the device. Speech-to-text technology has matured, although new approaches such as deep neural networks—discussed in a number of talks at the conference—provide the prospect of further advances. Natural language processing is less mature, and several speakers and panelists at the conference discussed what we can do now and how we can do better. For example, Phil Gray of Interactions Corporation, a conference sponsor, spoke on “Challenges associated with creating natural language interfaces.” Voice is a two-way interaction, and text-tospeech technology was also a feature of many talks. One development is increased flexibility in customizing voices, as discussed by Dan Bagley of Cepstral, a conference sponsor, in his presentation entitled, “Enhanced flexibility in text-to-speech: Customization and personality.” Talking to “things” also appeared in a number of talks. For example, Yoryos Yeracaris of Interactions Corporation, spoke about “The Interface of Things: delivering real-world natural language understanding solutions.” Bill Scholz, AVIOS president, summarized, “Presentations at this year’s conference moved well beyond a focus on speech recognition and synthesis, into the challenging world of conversation and understanding, exploiting natural language and dialog management technologies. Applications have moved beyond merely recognizing words and sentences into understanding the meaning and intent of the speaker, even seeking further clarification through conversational interchange.” This summary only touches on the many subjects covered at the event. The core theme is that the trends emphasized by the conference are proceeding quickly, driven by demand and intense activity by many companies. Many of the presentations at the conference will be available in PDF form at the AVIOS web site in May. Amazon Web Services announces Amazon Machine Learning Make predictions based on a database with no machine learning experience Amazon Web Services (AWS) announced Amazon Machine Learning, said that the Amazon Machine Learning, a fully managed technology powers the product cloud service. The service lets a developer use recommendations customers receive on historical data to build and deploy predictive Amazon.com, is what makes Amazon Echo able models. The company claims that no machine to respond to your voice, and is what allows learning experience is required. Amazon to unload an entire truck full of AWS indicated that models can be used for products and make them available for purchase applications such as detecting problematic in as little as 30 minutes. Kara Hurst, Director of transactions, preventing customer churn, and Amazon Global Sustainability, said Amazon improving customer support. The technology is Machine Learning is used to analyze customer based on the same machine learning technology feedback on packaging and create predictions to used by developers within Amazon to generate identify products that are suited for the more than 50 billion predictions a week, the company’s “Frustration Free” and “eCommerce company said. Jeff Bilger, Senior Manager, ready packaging” standards. Speech Strategy News May 2015 7 Amazon Machine Learning’s APIs and wizards guide developers through the process of creating and tuning machine learning models. These models can be deployed and scaled to support billions of “predictions” (a predicted outcome given specific values of input variables used by the model). Amazon Machine Learning is integrated with Amazon Simple Storage Service (Amazon S3), Amazon Redshift (data warehouse), and Amazon Relational Database Service (Amazon RDS), allowing customers to work with the data they've already stored in the AWS Cloud. A developer creates a predictive model using a database, the “training” step. The model summarizes the statistical conclusions implicit in the data. Once created, AWS also hosts the model and lets you use it to make predictions one at a time (in real time). Pricing is based on the number of such transactions. With Amazon Machine Learning, developers can use the AWS Management Console or APIs to quickly create models and generate predictions from them with high throughput without worrying about provisioning hardware, distributing and scaling the computational load, managing dependencies, or monitoring and troubleshooting the infrastructure. There is no setup cost, and developers pay as they go so they can start small, getting into a beta test with a low investment. Because high-quality data is critical to building accurate models, Amazon Machine Learning allows developers to visualize the statistical properties of the datasets that will be used to train the model to find patterns in the data. This saves time by allowing developers to understand data distributions and identify missing or invalid values prior to model training. Amazon Machine Learning then automatically transforms the training data and optimizes the machine learning algorithms so that developers don’t need a deep understanding of machine learning algorithms or tuning parameters to create the best possible model. In an example Amazon provided, a single Amazon developer using the Amazon Machine Learning technology was able in 20 minutes to solve a problem that had previously taken two developers 45 days to solve. (None of these developers had prior experience in machine learning.) Both models achieved the same accuracy of 92%. A customer, Space Ape Games, a mobile and tablet gaming startup, has used the service. Toby Moore, CTO and co-founder, said that the service has been used to predict the types of content, such as live events and tournaments, that customers enjoy the most and let the game adapt to their play styles. “We've been very impressed with Amazon Machine Learning so far, and plan to deploy Amazon Machine Learning across multiple departments in our organization to help us build and deploy predictive models for our current and future games,” he added. The move by Amazon follows IBM’s recent launch of hosted Watson Analytics and Microsoft’s hosted Azure Machine Learning. Google’s machine learning offering, Prediction API, was launched in 2012. Russian Internet search engine company Yandex offers Yandex Data Factory (YDF), based on the machine learning that it has developed internally for its own services, used for search, music recommendations, and speech and image recognition (SSN, January 2015, p. 10). NICE Systems announces analytics to improve the IVR experience Speech analytics applied to optional customer feedback after a call NICE Systems announced the launch of IVR NICE’s Customer Engagement Analytics Journey Analytics, a solution designed to reduce platform, which helps organizations sequence customer effort and improve their experience and visualize the customer journey to understand with automated Interactive Voice Response why customers are contacting them, to predict (IVR) systems. The cloud-based IVR Journey their next move, and to personalize the customer Analytics solution is the third addition to engagement. Speech Strategy News May 2015 8 According to NICE’s 2013 Global Customer Survey, 73% of consumers use IVR. But at least half the time, they do not succeed in resolving their issue—one-third of those callers simply hang up, and the other two-thirds bypass the system or try to contact a live agent. The system gathers insights from the customer journey prior to, during, and after the IVR interaction. It can determine certain patterns of behavior and then use this information to optimize the IVR experience by whittling down the menu options to provide only the options relevant to the particular journey. It also provides visual mapping so that any IVR service bottlenecks can be easily pinpointed and resolved. Organizations can also use NICE’s feedback solution for the IVR channel to solicit real-time customer feedback immediately following either an interaction that was contained in the IVR or an interaction that was handled by a contact center agent. Using speech analytics, they can better understand the customer experience and improve their systems to improve it. This could include customer service recovery, employee coaching, or process changes, depending on the customer feedback received. “A weak link anywhere in the customer journey can shatter the entire experience, and the IVR is typically the first step in a service call,” said Miki Migdal, President of the NICE Enterprise Product Group. Auraya Systems releases “universal speaker recognition” system Single license for all functionality Auraya Systems’ ArmorVox Speaker Identity detection and tracking solutions, and gender System is a voice biometrics authentication and detection all using a single software license.” verification software system, designed for use ArmorVox can be configured for cloud, by voice systems integrators and developers. customer premises equipment, or hosted The company released ArmorVox 2015, saying authentication services. ArmorVox is available it is the world’s first “Universal Speaker for Microsoft Windows and Linux operating Recognition” system. It supports both text- systems. dependent and text-independent speaker In February, cloud-based voice automation recognition, as well as gender detection. firm Inference Solutions (p. 35) said it had ArmorVox 2015 also has a modified front-end integrated a voice biometrics solution developed with better noise suppression and improved by Auraya Systems into its software. ArmorVox mobile phone and cross-channel performance. system has been integrated with the Inference Clive Summerfield, Auraya Founder and Studio platform, enabling the latter’s users to CEO, said, “By fusing technologies into a single confirm caller identity with voice biometrics software license, partners can implement active rather than through extensive security and passive voice biometric applications, fraud questioning. CallFinder speech analytics to be available to customers of EPIC Connections Targeted at small- and medium-sized businesses CallFinder, a provider of cloud-based call the unstructured data contained in voice recording and speech analytics solutions (SSN, conversations with their customers. They can April 2015, p. 10), announced a strategic automatically categorize and analyze the calls to alliance with EPIC Connections, a global identify business patterns and trends. provider of contact center consulting and outsourcing services. The company also EPIC Connections partnered with Aizan Technologies, a cloudEPIC clients can now use CallFinder to based voice solutions provider in Canada. extract business intelligence contained in phone Organizations using speech analytics can review conversations with customers to improve the Speech Strategy News May 2015 overall customer experience. “CallFinder’s solution is a potential fit for our clients in the SMB space, and for contact centers that operate under 100 seats,” says Jim Grace, Director of Corporate Development at EPIC Connections. speech analytics functionality alongside call routing, IVR, and recording capabilities. Aizan offers a full suite of CallMiner functionality, ranging from automatically created scorecards to highly customized professional services. “We are very excited to partner with Aizan Technologies,” said Terry Leahy, CEO at CallMiner. “They have the ability to deliver a wide range of services to fine-tune and manage analytics to deliver maximum return on investment for their customers. Their level of service and support is unparalleled in the carrier space.” Aizan Technologies Aizan Technologies now provides Canadian companies with access to CallMiner speech analytics as a hosted service, while retaining data within Canada and without having to purchase and manage premise-based equipment. With CallMiner analytics integrated into Aizan’s cloud platform, customers now have access to 9 Indian bank to use Nuance voice authentication Authentication doesn’t require specific passwords ICICI Bank in India is deploying voice- voice data when customers use the service to recognition technology for biometric identify if callers are agitated, in a hurry, or authentication, using speaker authentication irritated, which could be used to avoid upselling technology from Nuance Communications. or other activities on the call that might add to Customers will be able to call and transfer funds the irritation. to registered recipients or pay bills without According to the article, the system will not having to enter card numbers or keying in PIN require specific passwords for authentication. It codes, reports the Times of India. will, however, require at least 35 seconds of Customers enroll in “just 10 seconds” when voice before it can authenticate. they call the bank. There are also plans to use GTL addresses inmate identification on calls with biometric voiceprints Detects in realtime changes in speaker during a call Global Tel*Link (GTL), a provider of displayed with a time stamp in the monitoring correctional technology solutions, announced system indicating when any speaker change the release of Voice IQ, a new feature for its activity was detected. inmate telephone platforms that solves issues of GTL also released VisMobile Add-On, an inmate identification on calls. Voice IQ uses addition to its VisMobile video visitation voice biometrics to track and verify the identity application for Android smartphone and tablet of inmates and prevent fraud. Over 1.1 million users. This release, an addition to the VisMobile inmates utilize GTL’s inmate telephone services, app that allows users to register for and schedule and calls are controlled to avoid direction of visits, gives users whose loved ones are criminal activities from within prison. incarcerated in facilities that allow Internet Voice IQ builds a voiceprint profile for each video visitation the opportunity to conduct video inmate and enrolls that print in its repository for visits from their Android devices. Internet comparison in future calls. During a call, Voice visitation with smartphones and tablets is said to IQ continuously compares portions of the call to incorporate all of the same safety and security the recorded voiceprint to verify the inmate’s features as GTL’s other Internet and on-site identity during the call. A specific icon is video visitation technologies. Speech Strategy News May 2015 10 NewsHedge launches financial news service Runs as an app in a desktop browser using text-to-speech for audible alerts NewsHedge introduced an application that product’s reliability and speed. “We’ve runs in a desktop Web browser called combined AJAX with compressed page updates NewsHedge Squawk. It uses text-to-speech to across the entire experience to squeeze seconds alert users of financial news that might move a from every breaking announcement,” he said. particular stock or the market in general. “Our back-end is entirely Modern C++ through NewsHedge Squawk uses direct access the critical path, making everything blazing exchange feeds to process 8,000+ assets in real fast.” time, tick-by-tick. The company charges a fee of $49 per month for the service. NewsHedge features the audio alerts in Squawk, although there is also a visual representation (see image). Kevin Evenhouse, founder and CEO, said, “NewsHedge Squawk not only delivers market information that’s notable and relevant in real time, but it does so audibly—the method of receiving information that human beings react to fastest. We’re not giving traders one more thing to look at. We’re giving them something to listen to. We combine our proprietary smart-detection algorithms with text-to-speech technology to literally tell you what’s notable and market moving.” Drew Dormann, co-founder and CTO, sees the NewsHedge Squawk HTML5 front end as a critical part of the Translate Your World launches new version Translates voice or text into 34 languages in real time Translate Your World, Inc. (TYWI) The software makes use of capabilities outside launched its version 2.0 software of its its application. TYWI has built-in autotranslation software, with no less an objective translation and can connect to other translation than to “change the way the world software. One of the advantages of TYWI is communicates,” according to owner Sue Reager. that the TYWI software provides personal The software is said to be capable of translating, dictionaries so that the user has increased transcribing into text, and speaking the results in control of the results of automated translation. 34 languages in real time. Reager said, “You can Speech recognition is accomplished with the use it to have people who speak different capability built into most devices, including the languages all have a business meeting together. speech recognition built into Microsoft The barriers of language and education have Windows or use of Nuance’s Dragon been lifted.” Chester Anderson, the company’s NaturallySpeaking. TYWI software works vice president of business development, added, harmoniously with speech recognition by “I firmly believe that technology can make both translating the text results, keeping track of an our lives and collectively the world a better audience and their preferences. The service can place to live and work in.” The company said then deliver to each audience member’s device that the software already has preorders lined up what each wants to hear as translated voice or from large companies. read as subtitles. TYWI translated voices are Speech Strategy News May 2015 11 delivered by the online text-to-speech company ReadSpeaker. The software supports “parrots” for situations where several people are speaking, with translation required. A “parrot” is a person who listens and repeats the speaker’s words in the original language into a headset or microphone. Parrots speak clearly and have trained TYWI software for their voices. The audience does not hear the parrot, only the TYWI software “hears.” TYWI will translate what the parrot says. The parrot can be an individual on a company’s staff or a TYWI pro. TYWI enables simultaneous interpretation on the web. A company can use its choice of interpreter or a pro from TYWI. The interpreter can be located almost anywhere in the world. Your audience chooses their output: a) to listen to the interpreter's voice or b) to read subtitles of what the interpreter says, created by the interpreter’s voice speaking into the software. The interpreter’s voice can be automatically translated as subtitles in other languages. Fujitsu ties written material to a spoken presentation Applicable to teleconferences, education, and consultations with customers in stores In an Internet audio/video presentation or only a few spoken words. When tested in a meeting, the person speaking may be discussing prototype system designed to automatically material available to those participating outside highlight the correct place in presentation the presentation. Fujitsu Laboratories materials, the technology was found to detect the announced that it has developed technology that, correct section with 97% accuracy. using speech recognition technology on the Fujitsu shared some of the technical speaker’s voice, detects in real time the challenges in developing the technology. They applicable area in presentation or remote- noted that a challenge in speech recognition is conference materials. The company expects the that many short words have similar technology to be used where information is pronunciation, which increases the likelihood of explained, such as teleconferences, electronic errors in recognition. Fujitsu addressed this education, and consultations with customers in problem by combining these short words with stores. The technology supports business the words located in their immediate proximity communications that are often based on and storing them in a speech-recognition supporting materials, such as pamphlets used for dictionary as single words. This reduced product explanations, meetings that follow an recognition errors by roughly 60% compared to agenda, or talks that use slides that are shared previous technologies, according to the with participants. company. Displaying a section of meeting materials, By statistically calculating the relationship product pamphlets, and other presentation between the sequence of a spoken presentation materials while that section is being discussed and the materials’ structural information, by the presenter is effective in promoting including layout, paragraphing, and location of understanding, Fujitsu notes. To be effective, it explanations, it became clear that when the is necessary to identify at a glance the place content being discussed exceeds a certain being explained within the materials. Fujitsu has “distance” from a point in the materials, the developed technology that compares spoken frequency that the spoken presentation words against the content of the presentation transitions to that place drops precipitously. materials. The technology uses characteristics of Using this sequential characteristic and the the presentation’s sequence based on statistical frequency of words contained in a given part of calculations to filter candidate sections of the the spoken presentation, this technology is able presentation materials, in order to accurately to filter the candidate supporting material for the identify the correct section in real time, based on next part of the presentation, and can accurately Speech Strategy News May 2015 12 infer a correspondence with the spoken presentation, even with only a few spoken words being recognized. Fujitsu aims to have a practical implementation of this technology in a remote communications-support system within 2015. In addition, when combined with other company technology, this technology has a broad range of potential applications to help businesses run more efficiently, such as giving support to operators in call centers by providing information related to frequently asked questions or providing information-desk support or educational support. x.ai creates a virtual assistant for scheduling meetings Assistant analyzes email communications using NLP x.ai is creating a Personal Assistant that can for everyone and confirms the information into schedule meetings. In the U.S. alone, the all parties’ calendars by sending out an invite. company said there are 87 million knowledge x.ai indicated that the software passes each workers who spend nearly five hours per week email through natural language processing and scheduling meetings. supervised learning engines that understand the x.ai launched two personal assistants, twins context of the information before it is enriched “Amy” and “Andrew.” Designed to interact and and stored in a database. Based on these perceive needs like a personal assistant, these inputs—plus relevant context such as user virtual assistants eliminate the tedious email scheduling preferences—the assistant ping-pong that accompanies arranging a determines the appropriate course of action and meeting. Users simply copy their virtual crafts a response using a set of dynamic email assistant on email with up to four individuals responses to set up a meeting based on a they wish to schedule a meeting, then the mutually agreeable time and place. assistant takes over and coordinates the MongoDB offers a database of the same schedules using natural language processing. name. The company announced that x.ai uses The assistant identifies the best time and place the MongoDB database in its virtual personal assistant. Ericsson launches closed captioning service in US Live subtitling for broadcasters and operators using speech recognition Ericsson’s closed captioning business is the the caption data after it has been broadcast. For largest in Europe. The company announced in example, the caption data can be used in content April the United States launch of a closed discovery or archive search. The platform is captioning service that displays text on a currently being used to deliver both live and television, video screen, or other visual display offline captioning services for major broadcast to indicate what is being spoken. The company clients, including the BBC and Sky. Ericsson has established a broadcast and media services also plans to roll out video description services hub based in Atlanta, Georgia, to provide closed for vision-impaired audiences in the U.S. over captioning and video description services to both the coming months. domestic and international clients. In August 2014, the Federal Ericsson’s closed captioning services will be Communications Commission (FCC) outlined delivered using the company’s enterprise-level new regulations intended to increase the quality software platform, developed in-house using of captioning services, and to provide smoother speech-recognition technology from an and more accurate closed-captioned undisclosed source. The service allows multiple communications across a wider-reach of captioners to prepare and deliver real-time programming. Key milestones were set by the services for clients while maximizing reuse of Speech Strategy News May 2015 FCC relating to the broadcast of English and Spanish-language programs in the United States: § As of January 15, 2015, video distributors and broadcasters are obliged to meet specific guidelines relating to accuracy, synchronicity, completeness and placement of captioning for online video content; § § 13 By January 1, 2016, clips lifted straight from a program and posted onto the Internet, known as straight-lift clips, must be captioned; and By January 1 2017, montages of video clips must be captioned. YouMail provides an answer for spam calls on mobile phones App identifies spam calls and gives them a “number disconnected” message It’s no surprise to readers that spam calls on voicemail. With Smart Blocking, users can just mobile phones are a problem, but YouMail, ignore calls when they’re not certain who is which provides cloud-based telecom services for calling, and YouMail takes care of the rest. consumers and small businesses (SSN, April Smart Blocking leverages the app’s huge 2015, p. 42), has quantified the problem. The volume of incoming calls each day to rapidly company announced the results of a public determine that an incoming number is a survey showing that despite the Do Not Call spammer placing unwanted calls. Registry, 74% of U.S. mobile phone users YouMail’s technology identifies spam callers wrestle with spam calls each month. (The by dynamically analyzing traffic patterns of calls government admits that they can pursue only a made to YouMail users, as well as feedback small fraction of the complaints they receive of from YouMail users. In this way, almost companies ignoring do-not-call instructions.) immediately after a new spam number is used to Data from the nearly 5 billion calls that YouMail make a call, YouMail users will appear to have has answered for its users shows that 10-15% of disconnected numbers to those callers. In a all missed telephone calls are considered spam sense, the app is using crowd-sourcing to make by their recipients and that the average person it unnecessary for each user to identify a call as receives an average of 25-30 spam calls/month. spam. Alex Quilici, CEO of YouMail, noted More data from the survey is available at that, since many spammers maintain shared lists http://blog.youmail.com/post/116391615477/the of disconnected numbers, this can rapidly and -youmail-spam-calling-survey. significantly reduce the volume of spam calls to But YouMail has gone beyond identifying the any YouMail user. size of the problem. They have what they think Users can also actively block any individual is a cure. The company has released Smart number, allowing full control over who can Blocking, a feature that automatically identifies leave a message and who will reach what spammers and fools them into thinking they’ve appears to be a disconnected number. In reached a disconnected number. addition, YouMail provides a variety of other To use Smart Blocking, YouMail users features including smart greetings, automatic download the YouMail app for iPhone or replies, and access to messages across different Android to replace their standard wireless carrier devices. Amazon introduces shopping app for Apple watch Voice search and 1-click purchasing Amazon has introduced a shopping app that features including 1-Click purchase and saving will be available on the Apple Watch in Canada, to a Wish List. The Amazon shopping app for China, France, Germany, Japan, U.S., and U.K. Apple Watch is a companion to the Amazon The shopping app addresses the small form mobile shopping app for iPhone. factor through voice search and quick tap Speech Strategy News May 2015 Paul Cousineau, Director of Mobile Shopping, explained, “There are times when it might not be convenient to get your phone out of your pocket. So we worked to distill the best parts of the Amazon shopping experience into fast and simple access points from your wrist.” The Amazon app for Apple Watch includes the following features: § Search the Amazon Catalog: The Amazon shopping app allows customers with an Apple Watch to search the Amazon catalog and find “glanceable” product information such as product name, price, shipping information, product images, and star ratings. § § § § 14 1-Click Purchase: With the 1-Click purchase feature on millions of eligible items, customers can conveniently go from search to purchase in seconds, making it even easier to order familiar items. Add to Wish List: Customers can quickly and easily add any item to their Wish List. Save a Shopping Idea: Make a note by simply saying it and save it for later review. Get More Information from iPhone: If Amazon customers want additional search results or more product information while shopping, they can simply use a “Handoff” feature and open the search or product detail page in the Amazon shopping app on their iPhone. M*Modal launches Clinical Documentation Improvement platform Physician-friendly CDI software and services deliver improvements in medical documentation M*Modal, a provider of clinical back-end CDI specialist (CDIS) processes to documentation and “Speech Understanding” improve the quality of the clinical note using solutions (SSN, April 2014, p. 31), announced a existing systems and workflows. This platform full suite of Clinical Documentation integrates documentation improvement directly Improvement (CDI) software. M*Modal offers a with the report creation process, identifying general-purpose data aggregation, analytics, and deficiencies, gaps, and improvement physician engagement platform configured for opportunities at the time of documentation. CDI, which they say offers a cost-effective way M*Modal’s solutions enable central for hospitals to enhance clinical report quality, management of reporting requirements for productivity, and patient outcomes. higher case coverage and better efficiency. The company also announced enhancements M*Modal helps healthcare organizations solve to its Fluency Direct front-end speech software, their CDI challenges in three ways: where physicians immediately see the results of § Assess: Using natural language the speech recognition. Improvements in system understanding technology to convert customization, platform integration, speech transcription data into information that is understanding, and software management are shareable, sortable and searchable, hospitals intended to help physicians and healthcare gain access to large volumes of patient data professionals more quickly and easily create trapped in the narrative. This allows them to high-quality clinical documentation in identify documentation improvement Electronic Health Record (EHR) systems. opportunities and target actions related to In addition, the company announced it is physician education, certain conditions, etc. utilizing its real-time Computer-Assisted § Engage: M*Modal solutions embed CDI Physician Documentation (CAPD) capability to into the document creation workflow with deliver educational content from Precyse automated, real-time feedback delivered to University. clinicians as they dictate or type into the EMR. The context-dependent feedback is Clinical Documentation Improvement less disruptive to physicians. M*Modal’s solutions also educate physicians to M*Modal’s cloud-based CDI solutions are document for better clinical care, compliance built into transcription, front-end speech and Speech Strategy News May 2015 and coding, including the upcoming ICD-10 standard. § Collaborate: M*Modal offers a back-end CDI system workflow management and clinical intelligence solution to boost efficiency, automation and data centralization. M*Modal’s CDI solutions are available now, with flexible deployment models to run in different architectures, including Citrix-based desktop and application virtualization environments. M*Modal and Precyse Front-end speech recognition Fluency Direct is cloud-based software that allows healthcare providers to verbally create and edit patient narratives directly in EHR templates. The solution leverages M*Modal's speech understanding and natural language understanding technologies to combine accurate speech recognition with embedded clinical documentation improvement (CDI) capabilities. M*Modal’s Fluency Direct software is interoperable with over 80 leading EHRs. Fluency Direct’s Computer Assisted Physician Documentation (CAPD) is now integrated with M*Modal's CDI platform, providing realtime messaging and alerting to improve patient record accuracy. The amount of time necessary to personalize and train the system has been greatly reduced. There is also further enhanced deployment flexibility and scalability to allow expanded support for application streaming, virtual desktops, and thin clients. 15 M*Modal is delivering educational content from Precyse University using its ComputerAssisted Physician Documentation (CAPD). This method engages and educates physicians as they document patient care in any electronic health record (EHR) system when using M*Modal Fluency Direct. Leveraging M*Modal Natural Language Understanding (NLU) technology, the CAPD capability automatically identifies common documentation deficiencies and delivers in-line feedback to physicians as they dictate or type the note, asking for required clarifications when appropriate to support documentation best practices. Integration with Precyse University adds context-sensitive, clinically-relevant and physician-specific educational content on conditions to ensure adequate and compliant documentation. Specific deployment announced M*Modal also announced a specific deployment. It is delivering transcription services, front-end speech recognition, and integrated Clinical Documentation Improvement (CDI) workflow management to Kindred Healthcare’s Hospital Division. Kindred Healthcare, headquartered in Louisville, Kentucky, is the largest diversified provider of post-acute care services in the United States. Its Hospital Division serves patients at 97 transitional care hospitals. Winscribe releases Quick Speech Recognition for healthcare professionals Immediate speech recognition shows results to person dictating Winscribe released Winscribe Quick Speech transcriptionist for review and editing. Recognition (QSR), a “front-end” speech Immediate review while the case is fresh should recognition solution as part of its medical lead to more accurate reporting. Doctors who documentation management software solutions simply dictate a text report sometimes prefer the for healthcare professionals. A front-end back-end solution as saving them time, but a solution presents the results of the speech-to-text growing requirement for the report to be entered transcription immediately to the doctor dictating into a structured Electronic Medical Record a report, making it immediately available for (EMR) makes a free-form report less viable. review and correction. This is in contrast to Winscribe QSR is claimed to make EMR data back-end solutions, where the result of the entry faster and easier. It is designed to enable speech recognition is presented to a physicians and other healthcare professionals to Speech Strategy News May 2015 16 quickly create documentation, craft emails, enter data into Health Information Systems (HIS), and communicate with co-workers and patients more efficiently. QSR joins Winscribe’s suite of speech productivity solutions, which include enterprise-level medical documentation management, digital dictation, speech recognition workflow management, transcription, and mobile speech technology software solutions. With the ever-increasing implementation of EMR systems into hospitals, Accountable Care Organizations (ACOs), clinics, private practices, and insurance providers, these systems continue to garner attention and speculation regarding their usability, the negative effects on physician productivity, and the loss of time available for patients. Speech recognition, on the other hand, has gained recent attention as a proven method for improving EMR usability, reducing documentation costs and boosting the productivity levels of physicians and other medical staff. Winscribe QSR offers real-time, front-end speech recognition technology that the company claims has a low edit rate, enabling clinicians to quickly perform data entry and generate other documentation with confidence. Winscribe said the product has an intuitive interface and is easyto-use, requiring only a few minutes of training. Physicians simply dictate, review, and insert the recognized text, and then they are ready to move to the next field or task. Winscribe QSR supports general and medicalspecific vocabularies, which further enhance the accuracy of recognized text. Winscribe QSR’s ‘snippet’ functionality also makes it simple to create macros and templates that providers can initiate with a unique voice command to insert commonly used phrases, such as discharge instructions, risk and benefit statements, and normal findings. Winscribe QSR’s has a centralized management console that can ‘learn’ and manage new words, phrases and user profiles, based on pre-existing group knowledge. Winscribe QSR basically serves as a keyboard replacement that is adaptable and works with existing applications and any information systems that allow typed text entry, including Microsoft Office applications, Web browsers, and Health Information Systems. In addition, Winscribe QSR works with virtually any EMR. Nuance has new clinical documentation tools for mobile devices and wearables Joins Samsung at HIMSS to preview new dictation capabilities on Samsung Gear S Watch Nuance Communications announced its workflows that allow physicians to record vital newest innovations for bringing clinical signs, interact with patient alerts, document documentation to smart devices, smart watches, telephone encounters, and place medication, lab, and the Internet of Things at the 2015 Health and radiology orders. Information Management Systems Society Jonathon Dreyer, director of cloud and mobile Annual Conference in April. solutions, Nuance Communications, said, “2015 Nuance announced PowerMic Mobile, has turned into the Year for the Internet of available in May 2015. The mobile app can turn Things and the phenomenon is becoming firmly any iOS or Android mobile phone into a secure entrenched in healthcare.” He noted that a new dictation device that allows physicians to class of documentation tools “bridge dictate, edit, and navigate within the Electronic conveniences found in clinicians’ personal lives Health Record simply by speaking. to the healthcare environment.” In addition, Nuance has teamed with Nuance also showed using its cloud-based Samsung to develop a use case for PowerMic medical speech recognition with Metrix Mobile that will allow physicians to dictate Health’s Glass wearable. The combination directly into an EHR using Dragon Medical 360 enables surgeons to document operative notes and the Samsung Gear S smart watch. Nuance during care, helping to communicate these demonstrated Florence, its intelligent virtual critically important notes immediately rather assistant, on a Samsung Gear S. Florence than later from memory. provides a series of voice-driven clinical Speech Strategy News May 2015 17 MedMaster Mobility interprets physician dictation into mobile devices Integrates Nuance healthcare speech recognition and NLP Master Mobile Products designs, develops, Recognition and Clinical Language and deploys mobile healthcare applications Understanding (CLU), based on technology optimized for the Apple iPad and iPad mini. The from Nuance Communications’ healthcare company announced MedMaster Mobility, solutions. MedMaster claims the “only which allows physicians to create structured data successful mobile implementation of Nuance’s from dictation for Electronic Medical Record CLU engine” to date. (EMR) systems using an iPad. It is said to be The company’s basic MedMaster application independent of the EMR system, automatically includes full read-write mobile access to patient generating, for example, standard diagnosis chart data, scheduling, creation of medical codes in ICD9, SNOWMED, and RXNORM issues, soap notes, medications, vitals, messages, formats from unstructured dictation. The etc. With MedMaster Mobility, physicians can solution eases physician adoption of EMR. practice medicine 24/7 from an exam room, Practitioners can create structured data using hospital room, or family room using multiple MedMaster’s fully integrated Medical Speech and diverse EMR providers. Roku streaming TV player adds voice search for content 250,000 movies and TV episodes available for streaming Roku provides a streaming platform for listed by price from top streaming channels. delivering TV entertainment. Roku streaming There are currently 17 channels, with 250,000 players and the Roku Streaming Stick are made movies and TV episodes available for streaming. by Roku and sold through retailers in the US, Roku Founder and CEO Anthony Wood, said, Canada, the UK, and Ireland. Roku also licenses “Now with a fast and fun way to search by a reference design and operating system to TV voice, we’ve made the Roku 3—the best manufacturers to create co-branded Roku TV streaming player on the market—even better.” models. “Roku Feed” is a new feature that allows In April, the company released new ways for consumers to follow entertainment and get consumers to find and discover streaming automatic updates on pricing and availability. entertainment. A new Roku 3 streaming player Roku is launching the feature with a focus on has voice search and a faster streaming player “Movies Coming Soon,” providing information than the previous release. Roku Search lets on when a box office hit is available for consumers search for movies, TV shows, actors, streaming, which services offer the movie, and and directors, and receive all available results how much it costs. NIST machine learning challenge for language recognition Based on the i-vector paradigm The National Institute of Standards and paradigm widely used by state-of-the-art speaker Technology (NIST) will coordinate a special “i- and language recognition systems (a tutorial on vector” challenge in 2015 based on data used in the subject is available from Howard Lei of the previous NIST Language Recognition International Computer Science Institute, Evaluations (LREs) and certain other sources. ICSI) and will largely follow the approach taken The challenge is intended to foster interest in in the recent NIST-coordinated Speaker this field from the broader machine learning Recognition i-Vector Challenge. By providing icommunity. It will be based on the i-vector vectors directly, and not utilizing audio data, the Speech Strategy News May 2015 evaluation is intended to be readily accessible to participants from outside the audio processing field. This challenge focuses on the development of new methods for using i-vectors for language identification in the context of conversational telephone or narrowband broadcast speech. It is designed to foster research progress, including goals of: § 18 Exploring new ideas in machine learning for use in language recognition, § Making the language recognition field accessible to more participants from the machine learning community, and § Improving the performance of language recognition technology. The evaluation plan is available at https://ivectorchallenge.nist.gov. Challenge data will be made available on May 15. Fujitsu introduces a communications tool for the hearing impaired Speech converted to text in real time in meetings or classrooms Fujitsu Limited and Fujitsu Social Science communication, with built-in functions for PCs Laboratory Limited announced that, starting in allowing text input and “stamp” tools (to insert mid-May, they will begin sales of LiveTalk, a emoticons and preregistered, frequently used, communications tool for people with hearing fixed phrases). disabilities, to companies and schools in Japan. Even if multiple people speak at once, the text LiveTalk is software designed for situations in conversion is processed in parallel and displayed which multiple people share information, such simultaneously, making it possible to accurately as meetings or classroom settings. It recognizes grasp the flow of a conversation. If there are any a speaker’s speech using handheld and headset mistakes in the conversion of speech into text, mics, immediately converts it into text, and they can be corrected on the PC. Text is displays it on multiple PC screens. The software transmitted in real time to all PCs connected to a uses AmiVoice SP2 speech-recognition software given wireless LAN router environment. The from Advanced Media, Inc. The software was software can also be used on tablet computers. developed with a 2013 grant from the Japanese Fujitsu said in a statement, that, by promoting Ministry of Internal Affairs and smooth communication by hearing-impaired Communications. with people who can hear well, this software can All participants, including people with hearing be expected to broaden employment and disabilities, to see the shared information in real educational opportunities. time. LiveTalk also enables two-way New NissanConnect Services program set to launch on 2016 Nissan Maxima 8.0-inch color display with multi-touch control and speech recognition The new 2016 Nissan Maxima features the The services are accessed through a new NissanConnect Services powered by redesigned NissanConnect system that includes SiriusXM. The connected services program updated graphics on an 8.0-inch color display features vehicle security, monitoring, and with multi-touch control and Nissan Voice remote services. Every 2016 Maxima also Recognition. Every Maxima also includes a 7.0includes standard NissanConnect infotainment inch Advanced Drive Assist Display, two front features, providing access to Online Search with USB connection ports for iPod and other Google, SiriusXM Traffic, and SiriusXM Travel compatible devices, streaming audio via Link (fuel prices, weather, movie listings, stock Bluetooth, and a hands-free text messaging info, sports). The new telematics program is assistant. Nissan’s first in-vehicle launch of SiriusXM’s Three levels of services will be available on connected vehicle services. the new Maxima Platinum model when it goes Speech Strategy News May 2015 19 on sale in the US in summer 2015. The base package includes emergency services and maintenance alerts. The mid-tier package adds remote control services including Remote Start and Remote Door Lock/Unlock, as well as monitoring alerts including Valet Alert and Curfew Alert. Alerts include vehicle speed, curfew (with available notification to the driver 20 minutes before the curfew alert), valet alerts (if the vehicle is more than two miles from dropoff), and geographical boundaries if set. The full package adds to the suite of concierge services, including Assisted Search, Connected Search, and Journey Planner. NissanConnect links users to Cloud Services three ways: (1) beamed in through radio and satellite, (2) brought in through smartphone, and (3) built-in with a cellular network embedded telematics control unit (TCU). Nissan notes that the multiple connectivity options help in the case of an emergency, including connecting to a live person for assistance. Remote Access provides access to the automobile through a compatible computer or smartphone. Services include remote door lock and unlock, remote engine start, and remote horn and flashing lights to help find the Maxima in a garage or parking lot. Brainasoft offers personal assistant for controlling a Windows PC Speak or type text to do tasks such as play music, open programs, or dictate text Brainasoft offers Braina (derived from “Brain programs like Microsoft Word using Artificial”), personal assistant software for Dictation mode. Windows PCs. Braina is designed to let you § Play Videos - For example, say “Play video control your PC using what the company calls Godfather.” “natural language” commands, although § Calculator - Do calculations by speaking, keywords are required for many commands. e.g., “45 plus 20 minus 10.” You can either type commands or speak to the § Dictionary and Thesaurus - E.g., “Define assistant. A Braina for Android app supplements encephalon,” or “What is intelligence?” the software on the PC by letting you interact § Open and Close any Programs - E.g., “Open with your computer over a WiFi network. notepad,” “Close notepad.” Recently, the company had only four employees, § Open and Search Files and Folders - E.g., so the software obviously makes extensive use “Open file studynotes.txt,” “Search folder of the functionality built into the Windows OS authentication.” and Android OS, as well as external services. § Control a Powerpoint Presentation - Say The company says that Braina allows you to “next slide” or “previous slide.” easily dictate (speech-to-text), update social § News and Weather Information - E.g., network status, play songs and videos, search “Weather in London,” “Show news about the web, open programs and websites, find Cortana.” information, and more. More specifically, § Search Information on the Internet - E.g., commands include the following functionality: “Find information on Thalassemia disease,” § Play Songs - For example, just say, “Play All “Search Dodgers score on Google,” “Search You Need is Love” or “Play Neil Diamond” for Albert Einstein on Wikipedia,” “Search and Braina will play it for you from images of cute puppies.” anywhere in your computer or even the web. § Set Alarms - E.g., “Set alarm at 7:30 am.” § Dictate to any Software or Website - Use a § Remotely Shutdown Computer. speech-to-text feature in third party § Notes - Braina can remember notes for you, e.g., “Note I have given 550 dollars to John.” Speech Strategy News May 2015 20 IBM tests Numenta’s “brain algorithms” Machine intelligence based on “principals of the neocortex” Numenta is a company founded by Jeff Center. The group is working on designs for Hawkins, founder of the company Palm, a computers that would implement Hawkins’s developer of “Personal Digital Assistants” ideas in hardware. The approach is to stack (PDAs, early versions of handheld computers), multiple silicon wafers on top of one another, which was purchased by HP in 2010. The with physical connections running between them company’s web site summarizes its goals as to mimic the networks described by Numenta’s follows: algorithms. The IBM group is reportedly also “Numenta has developed a cohesive theory, core working on using Numenta’s algorithms to software technology, and numerous applications analyze satellite imagery of crops and to spot all based on principles of the neocortex. This early warnings signs of mechanical failures in technology lays the groundwork for the new era data from pumps or other machinery. of machine intelligence. Our innovative work The IBM Research web site indicates that delivers breakthrough capabilities and Winfried Wilcke, Sr. Mgr, Nanoscale Science & demonstrates that a computing approach based on biological learning principles will make possible a Technology and Distinguished Research Staff, is new generation of capabilities not possible with leading the effort. At a conference in February, Wilcke claimed Numenta’s software was closer today’s programmed computers.” According to MIT Technology Review, IBM to biological reality than other machine learning has established a research group to work on software. He said Numenta had struck a balance Numenta’s learning algorithms at its Almaden between taking cues from biology and making research lab in San Jose, California, with a software that is practical. group of about 100 called the Cortical Learning VERBATIM-VR improves speech recognition by letting users report errors Tunes speech-to-text for individual companies VERBATIM-VR Ltd. has announced reduce. A request for the source of these error software supporting speech recognition dictation rates referenced a talk by a Microsoft researcher products that are said to allow easy tuning of in 2012. There is apparently more that will be company- or industry-specific vocabularies for announced later. all users within a company. The key idea is that the speech recognition software can be tuned to the company’s specific business and specific products by what might be considered crowdsourcing by employees to report speech recognition errors, with software that updates all employees’ speech recognition to reduce similar errors. (See image.) The company declined to indicate what speech recognition technology they support at this time. The company claims current speech recognition applied to verticals such as tax law, radiology, Verbatim-VR operation and banking/customer-service, has 7-10% error rates, which they state they can substantially Google (cont.) Continued from page 1 Google search has evolved into a personal assistant that goes beyond providing a list of web sites to trying to provide a direct answer, in Speech Strategy News May 2015 part with “Knowledge Graphs.” Vemuri explained, “We’ve built APIs that are easy to integrate with and that allow you to capitalize on all of Google’s work in natural language recognition, as well as our semantic understanding of people, places, and things— through the Knowledge Graph.” IBM’s Nahamoo (cont.) Nahamoo contrasted “analytic systems” such as web search versus the deeper analytics of cognitive systems. The first can be defeated by the “static” of big data, whereby cognitive systems such as IBM’s Watson have the capability of filtering the static. He indicated that the technology behind Watson is a “massively parallel probabilistic evidence-based architecture.” The software generates and scores many hypotheses using a combination of natural language processing, information retrieval, machine learning, and reasoning algorithms. These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find. Nahamoo also indicated, during a Q&A session, that the speech recognition now in the Watson cloud benefits from continual improvement over the five years that the company collaborated with Nuance (an agreement that has expired, allowing IBM to sell the technology directly). He said the current version in Watson is currently considered a beta version. The slides of Nahamoo’s presentation will be available at http://avios.org/?page_id=2386. Continued from page 1 next natural resource.” Nahamoo said “cognitive systems” learn and interact naturally with people to amplify what either humans or machines could do on their own. They help us solve problems by penetrating the complexity of Big Data. He claimed that this trend will change traditional Information Technology (IT), allowing it to deal with unstructured as well as structured data, with natural language supplementing machine language. He asked what if: § The time to discover new sources of energy went from 2 years to 3 months? § Every patient had a full-time, dedicated staff of medical specialists? § Every citizen in need had a full-time, dedicated, expert case worker? § Every soldier doubled their ability to sense, reason, plan and act? § Legal documents could be reviewed for consistency and accuracy with precedent? § Every US worker had an expert assistant dedicated to their success on the job? § Every student had a personal, full-time, world class tutor? SmartAction (cont.) Continued from page 1 SmartAction claims its Intelligent Voice Automation (IVA) accurately recognizes speech, understands callers’ meaning and intent, and remembers the evolving context of each conversation. IVA dynamically responds with personalized, context-relevant, accurate answers, making it more likely a customer will complete their transaction without requiring an agent. And when IVA can’t complete a transaction, it captures and provides all relevant call information to live agents, making the call flow more efficient and satisfying to customers. 21 SmartAction indicated that their artificial intelligence call automation automatically incorporates generalized improvements from other customers. This and general R&D upgrades are reflected in improved performance over time. The AI technology that powers IVA was developed by Adaptive A.I. Inc., SmartAction’s parent company. The company doesn’t indicate the specific source of its speech recognition, saying on its web site that the system “uses a speech recognition technology with the most advanced open, natural language, speech recognition system available.” The web site indicates that the system “evaluates multiple Speech Strategy News May 2015 hypotheses from the speech recognition engine and selects the most likely interpretation based on context.” 22 News briefs Speech recognition, image recognition, and machine learning top Google CEO’s list of more important projects At a conference in April, Google chairman Eric Schmidt said that there are three projects that rank above all others in importance: speech recognition, image recognition, and machine learning. Sensory CEO discusses how “deep learning” relates to privacy In a blog entry, Todd Mozer, CEO, Sensory, discussed how big data and privacy relate to “deep learning” (deep neural nets). Mozer notes that a lot of the Big Data is personal information used as the data source for Deep Learning. Basically, Deep Learning is neural nets learning from your personal data, stats, and usage information, he indicated. This is why when you sign a EULA (end user license agreement), you typically give up the rights to your data, whether its usage data, voice data, image data, personal demographic info, or other data supplied through the “free” software or service. One reason the data is collected is to improve the speech recognition; another is because the speech recognition is sufficiently complex to require cloud-based processing. (Mozer noted that “Sensory will change this second point with our upcoming TrulyNatural release!”) When data is retained to improve speech recognition or natural language understanding using Deep Learning, it runs the same risk as any data held by a company and supposedly protected, Mozer notes. And given that many large companies have been hacked, protection is difficult to guarantee. Mozer indicated that Sensory will also attack the first point by the company’s “embedded” approach to deep-neural-net-based speech recognition which the company will will soon be bringing to market. Sensory uses Deep Learning approaches to train their nets with data collected from EULA consenting and often paid subjects. The company then takes the recognizer built from that research and runs it on our OEM customers’ devices, and because of that, never has to collect personal data. Gates notes Microsoft’s 40th anniversary Microsoft was founded on April 4, 1975, 40 years ago. In a letter to Microsoft employees, Bill Gates, among other things, said he believed “computing will evolve faster in the next 10 years than it ever has before…We are nearing the point where computers and robots will be able to see, move, and interact naturally, unlocking many new applications and empowering people even more.” Gates said he was impressed by the vision and talent he sees in product reviews he participates in. He wrote, “The result is evident in products like Cortana, Skype Translator, and HoloLens—and those are just a few of the many innovations that are on the way.” Orion adds IVR capabilities to its public sector workforce management software Orion Communications, a provider of public sector workforce management software and services, announced the addition of IVR technology to its web-based AgencyWeb software. By combining the AgencyWeb workforce management solution with a robust IVR platform, public sector agencies are able to extend many of the system’s capabilities to field personnel or those with no computer access. Daily scheduling, court event management, disaster planning, and day-to-day workforce management tasks are automatically delivered over a high-performance IVR platform. AgencyWeb IVR is built using Voice over IP (VoIP) technologies and supports VoiceXML, CCXML, SIP, TTS, and speech recognition. It is available as either on-premise or cloud deployments. Speech Strategy News May 2015 23 Mphasis to use Artificial Solutions natural language technology in customer support Artificial Solutions, which provides natural language interaction solutions (SSN, September 2014, p. 11), announced a partnership with Mphasis, an IT services provider. Mphasis will integrate Artificial Solutions’ digital agents, natural language analytics, and other components from the company’s Teneo platform into its Customer Experience Management (CEM) solutions to enable its clients in the banking and insurance industries to deliver a better customer experience. “Our technology will support Mphasis with the natural language capabilities required for its Customer Experience Managementsolutions,” said Lawrence Flynn, CEO at Artificial Solutions. “The Teneo platform will transform how customers communicate with organizations in the banking and insurance sector.” Mphasis CEM solution offers an integrated enterprise-wide approach to manage customer experience by leveraging continuous analysis of customers’ behaviors and processes. It focuses on identifying, anticipating, and satisfying customers’ needs across all touch points. Convergys Analytics and Nexidia announce partnership Convergys announced a strategic partnership with Nexidia. The partnership will combine Convergys’ customer experience analytics expertise with Nexidia’s speech analytics technology. Convergys Analytics will use the technology to offer its clients the ability to gain access to the often-untapped unstructured customer feedback found in spoken conversations between contact center agents and customers. Convergys’ analysts will leverage the data to uncover, explore, and recommend corrective action to resolve clients’ underlying business issues impacting the customer experience. Cable operator selects Fonolo call-backs to improve the customer experience Fonolo, which replaces call center hold time with a callback, announced that Suddenlink, the seventh largest cable operator in the US, has selected its call-back solution. By adding Fonolo’s InCall Rescue solution to its call center, Suddenlink's customers can now choose to “press 1 to receive a call-back” instead of waiting on hold, without losing their place in line. When their turn arrives, the customer’s phone will ring and a live agent will be on the line. Nuance voice biometrics chosen by SK Telecom Nuance Communications announced that SK Telecom, a mobile service provider in South Korea with over 28 million subscribers, has deployed Nuance’s VocalPassword voice biometric solution to provide authentication for its customers. When a customer dials into the contact center, they are asked to speak the predefined phrase: “At SK Telecom, my voice is my password,” to be authenticated into their account and then speak with a representative about their inquiry or request. “When it comes to authentication, we’ve seen that PINS, passwords and security questions are leaving accounts vulnerable and can no longer be considered a safe and secure method,” said Robert Weideman, executive vice president and general manager, Enterprise Division, Nuance. I PRINT N MAIL analyzes responses from direct mail with speech-recognition tools I PRINT N MAIL, a direct mail marketing firm, introduced the NextGen Direct Mail program. The service includes lead tracking, recording, analytics, and speech recognition to help marketers assess the campaign’s performance and improve conversion rates based on calls generated by direct mail. Under NextGen, after the mailers are sent, each inbound call from prospects are tracked and recorded, and marketers can access the information in real-time. NextGen has speech recognition technology that can follow a phone conversation between a sales rep and the prospect. On a call-bycall basis, it then assigns a score on how well the rep does. The program also rates how ready the prospects are, based on their tone of voice and the questions they ask. Speech Strategy News May 2015 24 The company claimed that, with the tool, marketers can quickly discover which territories work best, what products have the biggest potential, what the most frequent questions are, and which promotions are most intriguing. Big score gaps between the sales rep and the prospect (for example, a “hot” prospect vs. an underwhelming sales rep) can be flagged so a supervisor can immediately attempt to “save” the lead and convert the prospect into customers. An irate customer call can also be flagged so a manager can join in and help defuse the situation. Nuance, MEDITECH, and IMO collaborate to automate patient problem lists and support regulatory reporting Nuance Communications announced a collaboration with Intelligent Medical Objects (IMO) and MEDITECH to take rich clinical narratives from physician notes, extract key facts as structured data, and automatically populate patient problem lists and allergy lists using physician-friendly terminologies to preserve the clinical intent of physician documentation in the Electronic Health Record (EHR). This approach enables healthcare providers to translate unstructured narrative text into clinically-accurate, discrete data in the EHR to support patient problem lists for “Meaningful Use” (the federal requirement for support of EHR technology) as well as requirements for ICD-10 compliance and quality reporting. This process overcomes common challenges physicians have creating problem lists in the EHR, providing efficiencies and relief through simply converting narrative created during physician documentation into real-time, actionable information for medical decision making: § Physicians can use Nuance Dragon Medical 360 real-time speech recognition or a mix of voice recognition and structured physician documentation sections to create a clinically-rich narrative at the point of care, which reflects the full patient's condition. § Nuance's Clinical Language Understanding (CLU) engine analyzes the physician free text narrative and extracts key patient information, such as patient problems and allergies in real time as discrete, structured data. • IMO’s Intelligent Connect platform automatically maps this to IMO’s terminology libraries, and to standardized code sets, thus preserving clinical intent while supporting clinical documentation standards for interoperability and continuity of care. • The full patient story is then available in MEDITECH’s EHR including structured data, unlocked from previously unstructured patient information, and this can now be analyzed and used for a variety of reporting and compliance measures. Lyft expands its offerings to include Nuance’s Dragon Medical Practice Edition 2 for otolaryngologists Lyft, a healthcare software firm (not to be confused with the taxi service), has announced the availability of Dragon Medical Practice Edition 2 for otolaryngologists (ear, nose, and throat specialists), bringing advanced speech-recognition technology to another important niche in the healthcare community. Otolaryngologists can now experience the full benefits of a speech-enabled practice backed by Lyft, which offers value-added support services that ensure successful integration with leading Electronic Health Record (EHR) software. Lyft offers otolaryngologists support and services for installation, configuration, and training to maximize the benefits of integrating Dragon Medical Practice Edition 2 into their practices. Acusis service enters patient encounter summaries into an Electronic Health Record Acusis provides human clinical documentation specialists to enter patient encounter summaries directly into a client’s preferred EHR system, working over the Internet. The physician records a summary of the patient encounter by using their preferred dictation method, including the AcuMobile Smart Phone application or any other handheld device. The captured audio is processed through the Acusis Speech Recognition and Formatting engines. Using the Acusis Workflow, Speech Strategy News May 2015 25 Acusis completes the standard dictation and editing process. The Acusis Clinical Documentation Specialists access the local EHR to identify or create the appropriate patient encounter. They then enter this information into the applicable fields. Accusonus speech enhancement available for Cadence DSPs Cadence Design Systems and Accusonus announced that the AccusonusFocus-MDR and Focus-DNR speech enhancement software has been ported to and optimized for the Cadence Tensilica HiFi Audio/Voice digital signal processors (DSPs). The Accusonus software provides an integrated solution for reverberation and noise suppression, addressing challenging indoor acoustic environments. The Focus-MDR and Focus-DNR speech enhancement products offer a single-or multi-microphone approach to suppression of reverberation, as well as direct and ambient noise. OKI microphone technology picks up sound in specific areas, using two microphone arrays OKI announced that it has developed a technology called Area Sound Enhancement System that makes it possible to pick up sounds in a target area by positioning several directional microphones. The technology lets users hear speakers’ voices in a specified area, even in acoustically cluttered environments in which several people are talking at the same time, like conference rooms or offices. In environments in which speakers’ voices overlap with other voices and ambient noise, difficulties in hearing the speakers’ voices may interfere with smooth communication. Using directional microphones such as shotgun microphones and microphone arrays can pick up sounds in specific directions, but these microphones pick up not only sounds in a target area but noises in the same direction as well because the directionality extends linearly. The Area Sound Enhancement System developed by OKI uses two microphone arrays, and crosses each directionality of arrays from different directions in a target area. A common component in all directionalities of arrays is estimated to be a target sound, and then other components are suppressed as noises. This makes it possible to pick up only sounds in the target area and ensures clear speakers' voices for video conferencing and other remote communication in noisy environments. The technology also allows the speakers to talk while shifting their positions and walking about, as long as they remain within the area covered by the microphone arrays. VXi Bluetooth headset includes noise-cancelling and voice prompts VXi Corporation introduced the user’s mouth, which VXi claims gives a 90% BlueParrott Reveal, their first extendable-boom reduction in ambient noise. Bluetooth headset designed for mobile professionals (see image). Despite its small form factor, the microphone includes VXi’s noisecanceling technology. Noise-canceling microphones work by proximity, and just don’t work as effectively when they’re far from the user’s mouth. The device’s extendable microphone boom allows moving the microphone closer to the user’s mouth. With the Reveal, when ambient noise is low to moderate, users can leave the boom in its retracted position, and still experience some noise canceling. When life gets loud, they simply slide Reveal’s boom out. This puts VXI noise-cancelling Bluetooth headset Reveal’s microphone about an inch closer to the Speech Strategy News May 2015 26 Intel shows prototype smartphone with reduced-size RealSense technology At the opening day of the Intel Developer Forum in China on April 8, the company showed a new prototype for a six-inch smartphone that integrates Intel RealSense technology, which includes speech recognition and 3D scanning. The RealSense camera and technology have been reduced to 50% of their former size. CPqD biometric authentication and Brazilian Portuguese speech recognition available on a new IBM chip The Smart Authentication solution from CPqD, a Brazilian research institute, uses face and voice recognition for user authentication in applications such as banking and e-commerce. The solution is running on the Power8 processor recently launched by IBM. It is available for testing in the IBM Client Center in Sao Paulo. The software can be used in multiple communication channels (Internet, phone, and mobile phone) and combines biometric technologies, text-to-speech, and speech recognition from CPqD for Portuguese spoken in Brazil. New Tensilica Fusion DSP from Cadence Design Systems features low energy use Cadence Design Systems announced the new Cadence Tensilica Fusion digital signal processor (DSP) based on the its Xtensa Customizable Processor. This scalable DSP is designed for applications requiring merged controller plus DSP computation, ultra-low energy and a small footprint. It can be designed into systems on chip (SoCs) for wearable activity monitoring, indoor navigation, context-aware sensor fusion, secure local wireless connectivity, face trigger, voice trigger, and speech recognition, the company indicated. The Tensilica Fusion DSP uses 25% less energy, based on running a power diagnostic derived from the Sensory Truly Handsfree always-on algorithm, when compared to the current low-power Cadence Tensilica HiFi Mini DSP. Bernard Brafman, vice president of business development at Sensory, said, “By taking full advantage of the architectural features in the new Fusion DSP, we were able to optimize our software to achieve further power reduction for functions such as trigger to search, user-defined triggers, and speaker verification and identification.” Microsoft’s browser update said to include Cortana Early views of Microsoft’s new Windows 10 build offer a glimpse of Microsoft’s “Project Spartan” browser. Project Spartan will supposedly replace its Internet Explorer in steps. It comes with the Cortana digital assistant. By putting Cortana in a browser, the speech-interactive assistant becomes available on any platform using the browser. Microsoft’s Skype Translator test preview adds new languages and other options Microsoft is adding new features to its Skype Translator real-time language translation service, as part of the second phase of its current test preview. The new service translates conversations both ways in near real-time. The service will display an on-screen transcript of the call, and also ultimately will translate instant-message chats in more than 45 languages. The service is currently available only on devices running Windows 8.1 or the Windows 10 Technical Preview. In the second test phase, initiated April 8, Microsoft added support for Chinese (Mandarin) and Italian, adding to English and Spanish, which were launched in December. According to a blog post announcing the second test phase of Skype Translator, Microsoft also is adding new features to the preview, including: § The ability to mute audio for customers who prefer to read translation vs. hearing it spoken; § The option for partial translations, which reduces delay time between when someone finishes speaking and when a translation starts; Speech Strategy News § § May 2015 27 The ability to add speech recognition warnings, so that customers are prompted when the translator is having a hard time understanding the speaker, in which case, suggested ways to resolve the issue will be offered; and The addition of a text-to-speech option, allowing users to switch between text-to-speech and speech-to-speech translation. Getty Images and Microsoft partner to add images to products like Bing and Cortana Getty Images and Microsoft announced a new partnership to develop image-rich, compelling products and services for Microsoft products like Bing and Cortana using Getty Images’ imagery. In the coming years, the two companies’ technology teams will partner to provide real-time access to Getty Images imagery and associated metadata to enhance the Microsoft user experience. Cortana to recommend movies Microsoft’s Cortana on devices running Windows Phone 8.1 will soon be able to recommend movies based on user interests. Windows Phone owners can activate the free film concierge service by turning the feature on in Cortana’s Notebook settings. According to Microsoft, after a few days Cortana will begin to make recommendations. If users are interested in a recommended movie, clicking on it will bring up details such as synopsis, reviews, cast list, and trailers, and even allow users to purchase tickets. Amazon Echo can now be used to control WeMo and Hue home devices Home automation devices, WeMo switches and Philips Hue connected LED light bulbs, now work with Amazon Echo. You can now use Echo to switch on the lamp before getting out of bed, turn on the fan or heater while reading in your favorite chair, or dim the lights from the couch to watch a movie, according to Amazon. One simply connects WeMo and Hue devices to a home Wi-Fi and names them in their respective app. Then the user says, “Alexa, discover my appliances.” After Echo's confirmation, you can control your devices by voice, e.g., § “Alexa, turn on the hallway light” § “Alexa, turn on the coffee maker” § “Alexa, dim the living room lights to 20%” § “Alexa, turn on the electric blanket” § “Alexa, turn on the outdoor decorations.” Amazon Echo adds podcasts Amazon’s Echo has now added support for accessing podcasts. A user can listen to new episodes of popular podcasts by saying “Alexa, play the podcast [name] on TuneIn.” Siri’s synthetic voice gets some improvements Apple’s new iOS 8.3 update includes some improvements in how Siri speaks. A brief YouTube video illustrates the difference. The new update also includes a speech recognition setting for New Zealand English as distinct from Australian English. Dictionary.com app supports Apple Watch with speech recognition to display definitions Dictionary.com, an IAC company, announced new functionality in the Dictionary.com app to provide customized support for the Apple Watch. Dictionary.com services include access to millions of English definitions, synonyms, pronunciations and example sentences, which can now be easily accessed directly from a user’s wrist. Michele Turner, CEO of Dictionary.com, said, “We’ve added functionality to our app to take advantage of the immediacy of the watch, letting people speak or tap to get a definition or synonym, or quickly glance to learn our Word of the Day.” Speech Strategy News May 2015 28 One touch navigates between definitions and synonyms. The user can also keep track of recent word searches and view favorites synced from an iPhone. App activity can be launched in the iPhone for more information on the word the user was last viewing from their watch. SoundHound + LiveLyrics offers new Apple Watch app SoundHound, which provides sound recognition and search technologies, announced its flagship music product, SoundHound + LiveLyrics, is one of the first third-party apps to deploy in the Apple Watch. The SoundHound app allows Apple Watch users to capture, collect and enjoy the music they hear. By tapping one’s wrist, users can bring lyrics to their fingertips. Users can view collected songs by scrolling on the song history list, synched with the user’s iPhone and iPad. Apple moves to Siri back-end built on open-source Apache Mesos platform Mesos is an open-source distributed systems kernel from the Apache Software Foundation. Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with APIs for resource management and scheduling across entire datacenter and cloud environments. Apple is now on its third-generation system for handling Siri queries, moving to the Mesos platform, according to the Mesosphere blog. Apple reportedly made the announcement at the Bay Area Mesos meeting in April. During a presentation, Apple engineers said that the switch to Mesos would reduce latency, assist scalability, and made it easier to deploy new services as Siri’s capabilities are expanded. Mesos is also used by other large technology companies, including Twitter and eBay. IBM teams with Apple and others on AI health program using Watson IBM announced alliances with Apple, Medtronic, and Johnson & Johnson to put artificial intelligence to work drawing potentially life-saving insights from the booming amount of health data generated on personal devices. IBM is collaborating with the companies to use its Watson artificial intelligence system to give users insights and advice from personal health information gathered from fitness trackers, smartphones, implants, or other devices. IBM wants to create a platform for sharing that information. “All this data can be overwhelming for providers and patients alike, but it also presents an unprecedented opportunity to transform the ways in which we manage our health,” IBM senior vice president John Kelly said in a news release. IBM also said it is acquiring a pair of healthcare technology companies and establishing an IBM health unit. Digital Alert Systems adds enhanced multilingual alerting for Emergency Alert Systems Digital Alert Systems, a division of Monroe Electronics, introduced its DASDEC OmniLingual Alert Module software, which gives the company’s DASDEC emergency messaging platform enhanced multilingual alerting capabilities for Emergency Alert Systems (EAS), as well as text-to-speech (TTS) in a wide variety of languages. The OmniLingual Alert Module provides television and radio broadcasters with the option to transmit EAS alerts in multiple languages, or to automatically add non-English alerts as post-alert audio to serve audiences with limited English proficiency. There is support for Spanish, Portuguese, French, German, Italian, Polish, and Lithuanian, among other languages. The module also provides a dedicated translation software package that incorporates both text and TTS translation. Digital Alert Systems indicated that there are more than 25 million people with limited English proficiency in the U.S., and almost 61 million Americans do not speak English at home. Persons with limited English ability account for 25% or more of the total population in seven U.S. cities. Speech Strategy News May 2015 29 Peterbilt introducing next generation SmartNav infotainment system Peterbilt Motors Company announced the next generation of its in-dash SmartNav infotainment system. The system features an expanded array of virtual gauges, auto-activated safety cameras, improved hands-free calling, and the capability to provide real-time traffic and fuel price information. Operators engage SmartNav through a touch-sensitive, full-color seven-inch display. The company indicated that the speech recognition capabilities for hands-free calling had been improved. The system includes Bluetooth connectivity and pairing with Bluetooth-enabled devices, as well as controls for using certain devices. “One of the key improvements to the new SmartNav system is its ability to be customized with approved applications developed by Peterbilt, PACCAR or third parties,” says Scott Newhouse, Peterbilt Chief Engineer. “For instance, future functionality could include integration with reefer trailers or truck bodies to provide operational data, like reefer temperature. SmartNav’s new flexible architecture allows it to be updated quickly with additional features and capabilities.” Infobip adds inbound and outbound voice communications to its mobile services cloud Infobip, a provider of mobile services to IT companies, start-ups, and app developers, announced the launch of Infobip Voice, an enterprise-grade suite of cloud-based voice applications. Infobip Voice lets enterprises and developers quickly and easily add voice capabilities to their existing software through an API-powered cloud platform. With support for all major mobile services, including SMS messaging, push notifications, and direct carrier billing, Infobip provides developers with a wide range of tools. Silvio Kutic, founder and CEO at Infobip, explained, “As telecoms services are increasingly becoming IP-based, introducing voice to our existing mobile services cloud was the next logical step.” Key features of Infobip Voice include support for both voice and SMS messaging over the same DID (Direct Inward Dialling) number, inbound and outbound calls, text-to-speech capabilities for voice messaging or voice-enabled two-factor authentication, and a web interface for service management and analysis. New Ford Galaxy includes SYNC 2 with voice control The new Ford Galaxy includes the Ford SYNC 2 with Voice Control, which enables drivers to operate phone, entertainment, climate and navigation systems using conversational language and also features Emergency Assistance. CogniToys toy dinosaur can answer questions and more Powered by IBM Watson’s cognitive technology and its own speech-recognition, Elemental Path’s CogniToys dinosaur can answer thousands of questions—and even tell jokes. A press of its belly gets the conversation started. The first dinos will be available through Kickstarter in November for $99.99 each and are expected to head to retail in 2016. The toy is designed to let parents monitor their children’s progress and moderate content. Elemental Path’s co-founder Donald Coolidge said that the company doesn’t see itself as a toy company, however. “We’re more a technology company,” he said, planning to improve its platform and work with companies to build educational products, like the dinos, which have learning modules that becoming increasingly challenging. ChatGrape launches search engine for specific apps and documents The ChatGrape service reduces what co-founder and COO Leo Fasbender calls the “look-up factor” when linking to and referencing external information, such as a document, calendar entry, or code repository update, from within the app’s chat box. The service uses natural language and close integration with the other applications whose information it accesses. It is designed to help teams communicate more efficiently, and in particular reduce what co-founder and COO Leo Fasbender Speech Strategy News May 2015 30 calls the “look-up factor” when linking to and referencing external information, such as a document, calendar entry, or code repository update, from within the app’s chat box. ChatGrape has built what is essentially a search engine that indexes various tools and services that you connect the app to. These include Box, DropBox, Google Drive, GitHub, BitBucket, Jira and others. Typing # directly into a chat triggers ChatGrape’s smart autocomplete — now called the ‘Grape Browser’ — enabling you to quickly look up and link to/reference files or data from any supported third-party service. The startup also offers an API, making it possible for companies to integrate their own internal data into the chat app. A5 Technologies uses speech recognition to teach English to Japanese speakers A5 Technologies’s flagship product is called A5 Pro, an English-language training application for Japanese business professionals in the 20-45 age bracket who need better English to advance their careers. It will be launched later this year as a consumer app and A5 is in discussion with strategic partners in Japan to promote and market the product there. A speech recognition algorithm allows learners to practice speaking in private, with instant feedback on their pronunciation. Geppetto Avatars developing AI-based platform with avatars Geppetto Avatars is an early stage company with an AI-based platform that provides “humanlike” interaction (avatars/agents) and mimics human sensors for the purpose of providing user education, support, assessment, and surveillance. The company indicated it is working on algorithms that will translate conversations, emotions, images, videos, audio, drawing, motion, location, and taps into meaningful and valuable insights. Tencent develops smartphone operating system Chinese Internet service portal Tencent Holdings released an operating system for smartphones and smartwatches as it tries to attract more of the 557 million Chinese accessing the Internet through mobile devices. The software, called TOS+, includes voice recognition and payment systems. The company will work with partners to integrate the software into devices including smart glasses. Tencent is leveraging its ownership of China’s two most-popular instant messaging applications— WeChat and QQ—to boost its efforts against Google’s Android. Asia’s largest Internet company, Alibaba, also developed its own operating system, called YunOS. Interactive Intelligence launches cloud services in Australia and New Zealand Interactive Intelligence Group has launched its PureCloud Collaborate and PureCloud Communicate cloud services for customers in Australia and New Zealand. First announced for U.S. customers in March on AWS US East Region, these collaboration and communications cloud services are now available from Amazon’s data center in the AWS Asia Pacific (Sydney) Region. The services are the first to be offered from the company’s new multitenant, enterprise-grade PureCloud platform, which is based on modern, distributed cloud architecture, a unified, singleplatform cloud solution running applications for multiple use cases: collaboration, communications, and next-up customer engagement. Functionality includes IP PBX capabilities, including autoattendant, call recording, speech recognition, and unified messaging. NSF grant supports Alelo research in teaching language and cross-cultural communication with avatars and robots With an award from the National Science Foundation (NSF), Alelo (a Hawaiian word that means “language” or “tongue”) developed a range of products to teach cross-cultural communication using virtual worlds. Alelo’s tools have already been applied in military training and English-as-asecond-language programs and are now being applied to a wide range of learning applications. The company said that virtual role-play—where learners engage in simulated encounters with artificially Speech Strategy News May 2015 31 intelligent agents that behave and respond in a culturally accurate manner—has been shown to be effective at teaching cross-cultural communication. “You learn by playing a role in a simulation of some real-life situation,” said W. Lewis Johnson, a former professor at the University of Southern California and co-founder of Alelo. “You practice communication with some artificial intelligent interactive characters that will listen and respond to you, depending on what you say and do. It helps develop fluency, but it also helps to develop confidence.” To develop a version to teach English as a second language (ESL) to those in the United States, the researchers interviewed immigrants to determine what cultural issues they found most problematic. The instruction they created—available in multiple languages—explains how to handle different situations that one might face as a new immigrant in the United States and provides tips on culturally appropriate behavior. Statistics and Surveys Mobile advertising revenue will top $60 billion globally in 2019 According to 451 Research, global mobile advertising revenue was $18.8 million in 2014 (using exchange rates at the time). It is forecast to rise to $28.7 million in 2015, reaching $61.4 million in 2019. Speech Analytics market reviewed Grand View Research issued a market report, Global Speech Analytics Market analysis Size and Segment Forecasts To 2020. Introduction of advanced technological tools enables organizations to take action on unstructured data acquired from customer interactions, thereby enhancing customer experiences, and gaining a competitive advantage. The report indicated that the technology works on three approaches: direct phase recognition, phonetic recognition, and Large Vocabulary Continuous Speech Recognition (LVCSR). If you want to know more, you’ll have to pay a stiff price. 44% of US adults live in mobile-phone-only households According to data released GfK MRI in April, 44% of US adults lived in households with a mobile phone but no landline phone in 2014, compared to 26% in 2013. The percentage in 2014 rose to 64% for Millennials, but 45% of Gen Xers and 32% of baby boomers reported mobile-only usage. Hispanics were also heavily mobile-phone-only, at 60%. Voice search use rising In publicity for their MindMeld API product, Expect Labs (SSN, January 2015, p. 24) cites the following statistics: § Major search engines are seeing as much as 10% of their traffic coming from voice. § Over the past year, Google voice search use more than doubled. § In a survey of U.S. smartphone users, 55% of teens and 41% of adults use voice search every day. § Search experts predict that, within five years, over half of all global search traffic will be driven by voice. The source of these figures was not provided. Google will take 55% of search ad dollars globally in 2015 Google will take 55% of search ad dollars globally in 2015 and Baidu, the next largest player, will take 8.8%, according to eMarketer. Digital ad spending worldwide will reach $170.85 billion in 2015, according to new estimates from eMarketer. This year, search ads will account for $81.59 Speech Strategy News May 2015 32 billion worldwide, an increase of 16.2% over 2014. By 2019, search ad spending will reach $130.58 billion globally, still growing at nearly 10% year over year. Mobile ad spend to top $100 billion worldwide in 2016, 51% of digital ad market The global mobile advertising market will surpass $100 billion in spending and account for more than 50% of all digital ad expenditure for the first time in 2016, according to eMarketer. The US and China will drive growth in the short term, accounting for nearly 62% of mobile ad spending worldwide next year. The mobile ad market will grow to $196 Billion in 2019, 70.1% of digital ad expenditure. Facebook accounts for three-quarters of global social network ad spend The global social network market continued to show strong growth in 2014, according to Strategy Analytics’ Global Social Network Forecast. According to the report, globally, social networks surpassed 2 billion users for the first time in 2014, of which Facebook accounted for 68%. North America had the highest ratio of social network users to its population (64%) in 2014, followed by Western Europe at 55%, but China accounts for almost 25% of global social network users with 495 million users in 2014. Ad spend on social networks grew 41% globally in 2014 totaling over $15.3 billion, accounting for 11% of global digital ad spend. Facebook accounted for three-quarters of global social network ad spend in 2014, while Twitter accounted for 8%. In 2015, ad spend on social networks is expected to grow by 29%, totaling $24.2 billion. Robotics sales flourish In 2014, global robotics sales exceeded 200,000 units, surpassing the previous year’s 180,000 units, according to the International Federation of Robotics. The trend is expected to continue: Global robotics spending should grow from US$15 billion in 2010 to US$67 billion in 2025, the Boston Consulting Group reports. US ad spending in 2015 According to eMarketer, ad spending in 2015 in the US will break down as follows: (Billions) Search ad spending Display ad spending Mobile 12.85 14.67 Desktop 12.82 12.38 Total 25.66 27.05 “Augmented reality” predicted to be four times bigger than “virtual reality” by 2020 Digi-Capital has released a report on augmented reality (AR), where glasses can add a virtual overlay of the real world (e.g., Google Glasses), and virtual reality (VR), where the wearable immerses you in the virtual world. VR and AR headsets both provide stereo 3D high definition video and audio. Digi-Capital admits the difficulty of forecasting a market that is in such early stages, but nevertheless forecast that AR/VR could hit $150B revenue by 2020, with AR taking the lion’s share around $120 billion and VR at $30 billion. Web self-service surpasses phone in customer service channel preference According to a new report from Forrester Research, consumers now rely on self-service more than they rely on phone calls, with Web self-service use rising from 67% in 2012 to 76% in 2014. The phone, on the other hand, has remained stagnant at 73% during the same span of time. Presumably, these numbers reflect the fact that a user can use both channels at different times. Contact centers are facing challenges as they transition into a new, self-service-driven business model. When it comes to delivering support via chat and social media, for example, companies aren’t deploying a proportional amount of resources to meet demand, Kate Leggett, Forrester analyst Speech Strategy News May 2015 33 and report coauthor, says. Two-thirds of contact centers offer chat support, and more than half provide service through social media, but those percentages should be higher, she explains. When these services are available, customers respond favorably—only 10% of chat users and 25% of Twitter users are dissatisfied with the support functionality of these channels. Microphone market to reach $1.81 million by 2020 According to new market research by MarketsandMarkets, the total microphone market will be worth $1.81 Billion by 2020 at an estimated CAGR of 7.5%. The major drivers for the microphone market, according to the research report, are the increasing demand of consumer electronics devices, low-cost and compact size of MEMS microphones, and increased number of microphones per device, among others. There are restraints in the market such as difficult packaging and integration. The increasing demand from emerging economies is one of the key opportunities for the microphone market. 60% of consumers self-install smart home devices, but majority would prefer professional assistance Newly released Parks Associates research reports 84% of U.S. broadband households set up their entertainment and computing devices on their own, while 60% of U.S. broadband households set up their smart home devices on their own. The report finds that despite this independent onboarding process, consumers’ need for tech support persists and will increase as they bring more connected devices into the home. “Consumers’ home networks are rapidly expanding through the adoption of complex connected devices,” said Patrice Samuels, Research Analyst, Parks Associates. “For example, 27% of U.S. broadband households owned a connected health device by the end of 2014. As consumers embrace new categories of devices, support needs will increase dramatically. Support providers must invest in new tools and solutions that minimize the burden on support resources.” Tech support is a key factor in tying together the Internet of Things, according to Parks Associates. Approximately 60% of U.S. broadband households have concerns over the device security and data security when using connected devices. Artificial Intelligence for enterprise applications to reach $11.1 billion in market value by 2024 According to a new report from Tractica, the market for enterprise AI systems will increase from $202.5 million in 2015 to $11.1 billion by 2024. The market intelligence firm forecasts that enterprise AI deployments will also drive significant investments in professional services such as installation, training, customization, integration, and maintenance, along with additional spending on IT hardware and services including computing power, graphics processor units (GPUs), networking products, storage, and cloud computing. The technologies by the new report include cognitive computing, deep learning, machine learning, predictive APIs, natural language processing, image recognition, and speech recognition. “While artificial intelligence has been just beyond the horizon for decades, a new era is dawning,” says principal analyst Bruce Daley. “Systems modeled on the human brain such as deep learning are being applied to tasks as varied as medical diagnostic systems, credit scoring, program trading, fraud detection, product recommendations, image classification, speech recognition, language translation, and self-driving vehicles. The results are starting to speak for themselves.” “Cognitive computing” market projected to grow at 38% CAGR to 2019 MarketsandMarkets issued a market study on “cognitive computing,” described as systems that “work on the principle of the neocortex, a part of human brain that helps humans in effective decision making on the basis of contextual and behavioral analysis.” More specifically, technologies include natural language processing (NLP), machine learning, and, automated reasoning; and the Speech Strategy News May 2015 34 deployment model includes on-premises and cloud. In 2014, natural language processing (NLP) accounted for the largest market share followed by machine learning, the company said. The cognitive computing market is expected to grow from $2.5 billion in 2014 to $12.6 billion by 2019, representing a Compound Annual Growth Rate of 38.0% from 2014 to 2019. Financial Notes Blinkx acquires All Media Network blinkx pioneered Internet Video Search using its COncept Recognition Engine (CORE), which leverages speech recognition and text and image analysis to understand the meaning and context of video content to generate improved search relevancy for consumers and a brand-safe environment for advertisers. On April 16, blinkx announced the completion of its acquisition of All Media Network. Through All Media, blinkx gains access to a number of premium consumer properties, including Sidereel.com, Allmusic.com, Allmovie.com and Celebified.com. blinkx expects All Media, an all cash transaction that was funded in late March 2015, to be earnings accretive within the first full year post acquisition. All Media represents a step forward in the company's strategy to build a complete digital advertising platform for brand-safe, cross-screen advertising at scale. Adacel acquires CSC’s NexSim ATC simulator business On April 15, Adacel and Computer Sciences Corporation (CSC) announced they have signed and closed a definitive agreement for the sale of the CSC NexSim ATC simulator line and associated support services. Terms of the agreement were not released, but effective immediately Adacel has acquired full rights to the NexSim product line and source code from CSC. Adacel is a developer of advanced simulation and training solutions, speech recognition applications, and operational air traffic management systems. PeerTV acquires an interest in Speech Modules that gives it some exclusive rights outside Israel PeerTV on April 16 revealed a deal to cooperate with Speech Modules Holdings, which is listed on the Tel Aviv Stock Exchange, in a deal that will see the two companies issue each other new shares valued at GBP200,000. PeerTV will have exclusive rights in all countries outside of Israel to market software and hardware products and services based on Speech Modules’ technology. PeerTV is a vendor of end-to-end technologies and solutions for OTT (Over-The-Top) TV market. The company combines standard cable and satellite TV features, such as live TV and Videoon-Demand with new features, such as Internet browsing, social networking, gaming, and voice communication. The company uses Speech Modules speech recognition technology in a remote control (SSN, February 2015, p. 14). PeerTV said that development of the advanced remote control units to be used in home entertainment systems, including TV, audio, and games is proceeding as planned, and PeerTV expects to start presenting that new product to customers later in Q2 2015. PeerTV will issue Speech Modules shares equivalent to a 15.4% stake in the company, while Speech Modules will issue Peer with 7.3 million shares at NIS0.16 per share. PeerTV will therefore have a 15.66% stake in Speech Modules and be able to appoint a director to Speech Modules’ board. Speech Modules can sell the shares issued to it by PeerTV, although there will be restrictions in place to prevent more than a quarter of the shares being sold in the space of one month. Speech Strategy News May 2015 35 SensorSuite raises capital for its wireless monitoring and energy saving solutions for large buildings SensorSuite Inc. provides wireless monitoring and energy saving solutions for multi-residential, commercial, and industrial buildings. The company announced the successful closing of funding from Extreme Venture Partners, BDC Capital, and private angels. SensorSuite’s system overcomes the traditional installation barriers to retrofitting existing buildings with energy saving control systems – the rewiring. When compared with wired solutions, SensorSuite delivers significantly shorter payback periods, especially in retrofits, and in some cases is the only viable solution for electrically heated buildings. “Commercial and industrial buildings are emerging as one of the largest opportunities for real-time analytics and intelligent control applications,” said Robert Platek, Founder and CEO of SensorSuite. “SensorSuite’s innovations enable property managers to save time while conserving energy.” People Interactions adds to management team, including former AT&T research personnel Interactions announced the addition of five new members to its executive team, following the 2014 acquisition of the AT&T Watson speech recognition and natural language interpretation technology and research program. Best known for its industry-leading customer care virtual assistant solutions, Interactions is expanding its core solutions by developing technology to enable the “Interface of Things,” a new generation of speech, touch, and text-enabled interfaces. As a result of this recent growth, the following individuals have joined Interactions’ executive team: § Jay Wilpon will serve as Interactions’ Senior Vice President of Natural Language Research. With more the 150 published papers and patents in speech and natural language research to his name, Wilpon is one of the world’s pioneers and a chief evangelist for speech and natural language technologies and services. § David Thomson joins Interactions as the Vice President of Speech Research, where he manages Interactions’ research and development teams. § Ben Stern is joining Interactions as Vice President of Software Systems R&D. With more than 25 years of experience advancing speech recognition and language understanding technologies, including the development and delivery of the AT&T Watson engine, Stern is tasked with the continued development of the Watson technology and professional services, including packaging, performance tuning, features and enhancements. § Mahesh Nair has been promoted to Vice President of Engineering, bringing nearly 20 years of experience designing and developing massively scalable systems in the Internet of Things/Machine to Machine, Interactive Voice Response, Business Process Management, Business-to-Business and financial domains. § Jane Price has been elevated to the position of Vice President of Marketing. As Vice President of Marketing, Price oversees product and go-to-market strategy, and Interactions’ corporate brand and communications. David Stone joins Inference as VP Sales, APAC David Stone has joined Inference as VP Sales, APAC. Stone’s most recent role was with Telstra as Group Manager for Contact Centre Solutions Sales. He will be based in Sydney, but will be available throughout the Asia Pacific region. Speech Strategy News May 2015 36 Fonolo adds John Gengarella to its Advisory Board Fonolo (p. 23) announced the addition of John Gengarella to its advisory board. Gengarella was formerly CEO of Voxify, an enterprise SaaS solution for speech self-service. Following Voxify’s merger with 24/7, he led the acquisition of Tellme from Microsoft and served as the Chief Revenue Officer of 24/7. Gengarella previously held senior positions at Siebel Systems and Oracle. Attensity appoints Cary Fulbright as Chief Strategy Officer Attensity has appointed Cary Fulbright as Chief Strategy Officer. Attensity uses semantic technologies and context-based discovery to analyze more than 150 million data sources. Fulbright joins from Salesforce, where he served as Chief Strategy Officer. In his new role, Fulbright will oversee a number of initiatives, including the formalizing of the company’s planning processes. The firm has also promoted James Purchase to the role of Vice President of Business Development from Vice President of Product Management; and Nick Arnett has been promoted to fill the position as Director of Product Management. For Further Information on Companies Mentioned in this Issue Company 24/7 Inc. ([24]7) A5 Technologies Accusonus Acusis Adacel Inc. Adaptive A.I. Inc Advanced Media Inc. Aizan Technologies Alelo Inc. Alibaba All Media Network Amazon Amazon Web Services Location Business Campbell, CA Customer service solutions Dublin, Language learning Ireland software Kastritsi, Speech and music Greece processing tools Pittsburgh, Medical transcription PA services Orlando, FL Defense and air traffic control solutions El Segundo, AI technologies used CA in customer service and other applications Tokyo, Japan Asian speech technology Richmond Hosted voice solutions Hill, ON, Canada Los Angeles, Language training CA courses China E-Commerce San Digital content Francisco, CA Seattle, WA Product sales on the Web (and more) Seattle, WA Technology infrastructure platform in the cloud Contact info www.247-inc.com www.a5pro.com www.accusonus.com (412)209-1300; www.acusis.com (407)581 1560; www.adacel.com www.adaptiveai.com +81 3 59581031; www.advancedmedia.co.jp (905)882-5563; www.aizan.com (310)945-5985; www.alelo.com www.alibaba.com http://allmedianetwork.com www.amazon.com http://aws.amazon.com Speech Strategy News May 2015 37 Companies mentioned in this issue Company Apache Mesos Apache Software Foundation Apple Applied Voice Input Output Society (AVIOS) Artificial Solutions AT&T AT&T Labs Attensity Auraya Systems Baidu, Inc. BDC Capital Blinkx Boston Consulting Group Brainasoft Cadence Design Systems, Inc. CallFinder Cepstral, LLC Location Business Distributed Systems Kernel — Open-source software, including NLP Cupertino, Personal computers, CA music players, wireless phones San Jose, CA Non-profit organization supporting quality speech application development Stockholm, Virtual assistant Sweden development tools San Antonio, Telecommunications TX services Florham Speech research Park, NJ Saarbrücken, Semantic annotation Germany Canberra, Speaker authentication Australia — Beijing, China Web search in Chinese Wakefield, Financing MA San Audio-Video Search Francisco, CA, and London, UK New York, Business consulting NY Personal assistant software for Windows PC Berkshire, UK Chip design Burlington, VT Pittsburgh, PA ChatGrape CogniToys (Elemental -Path) Computer Sciences El Segundo, Corporation (CSC) California Call Recording and Speech Analytics TTS engine Communications application Interactive toys Healthcare information systems Contact info http://mesos.apache.org www.apache.org; http://opennlp.apache.org www.apple.com (408)323-1783; www.avios.com +46 8 663 54 50; www.artificialsolutions.com www.att.com; www.wireless.att.com (973)360-8127; www.research.att.com +49 681 857 670; www.attensity.com +61 2 6201 5253; www.auraya.net; www.armorvox.com http://ir.baidu.com (781)928-1100; www.bdcnewengland.com (415)655-1450; +44 20 8906 6857; www.blinkx.com; www.blinkx.tv www.bcg.com www.brainasoft.com +44.1344.360333; www.cadence.com (800)639-1700; www.mycallfinder.com (412)432-0400; www.cepstral.com https://chatgrape.com www.elementalpath.com (310)615-0311; www.csc.com Speech Strategy News May 2015 38 Companies mentioned in this issue Company Convergys Corporation Location Cincinnati, OH CPqD Digi-Capital Brazil San Francisco, CA Lyndonville, NY Digital Alert Systems (division of Monroe Electronics) eBay eMarketer EPIC Connections Ericsson Expect Labs Extreme Venture Partners Facebook Federal Communications Commission (FCC) Fonolo Forrester Research Fujitsu Geppetto Avatars Getty Images GfK MRI Business Customer care and employee-benefit solutions Research institute Consulting and market analysis Contact info (513)723-7153; www.convergys.com Emergency Alert Systems (585)765-1155; www.digitalalertsystems.com www.cpqd.com.br/en_us http://www.digi-capital.com/ San Jose, CA Internet auctions New York, Market research NY Omaha, NE Contact center consulting and outsourcing services Stockholm, Telecommunications Sweden products San Natural language Francisco, interpretation and CA MindMeld API Toronto, ON, Venture capital firm Canada Palo Alto, CA Social web service Washington, US government DC agency www.ebay.com (212)763-6010; www.emarketer.com (402)884-4700; http://epicconnections.com Toronto, ON, IVR navigation service Canada Cambridge, Market research MA Tokyo, Japan Information and communication technology (ICT)based business solutions Berkeley, CA AI-based platform or Milwaukee, providing human-like WI avatars Chicago, IL Image licensing (416)366-2500; www.fonolo.com New York, NY Global Tel*Link (GTL) Reston, VA Market research Correctional technology www.ericsson.com www.expectlabs.com www.extremevp.com www.facebook.com www.fcc.gov (617)613-6000; www.forrester.com +81-3-6252-2220; www.fujitsu.com www.geppettoavatars.com (312)344 4500; www.gettyimages.com (212)884-9200; www.gfkmri.com www.gtl.net Speech Strategy News May 2015 39 Companies mentioned in this issue Company Google Location Mountain View, CA, and Cambridge, MA Grand View Research San Francisco, CA HP (Hewlett-Packard) Palo Alto, CA I PRINT N MAIL IBM ICICI Bank Inference Communications San Francisco, CA New York, NY Somers, NY India Victoria, Australia Infobip London, UK Intel Corporation Santa Clara, CA Northbrook, IL Franklin, MA IAC Intelligent Medical Objects (IMO) Interactions Corporation Interactive Intelligence Group Inc. International Computer Science Institute (ICSI) International Federation of Robotics Johnson & Johnson Lyft Indianapolis, IN Berkeley, CA Frankfurt, Germany Business Voice and directory search Contact info (650)253-0000; www.google.com; www.google.com/mobile; www.grandcentral.com Market research www.grandviewresearch.com Computer and software products and consultiong Direct mail marketing www.hp.com Web sites such as Dictionary.com Information systems Bank Speech recognition and voice automation applications Mobile messaging and payments services Semiconductors http://iac.com Clinical interface terminology databases Virtual agent services for call centers and speech recognition technology Unified Communications and IVR Research institute (847) 272-1242; www.imoonline.com (317)810-2800; www.interactions.net Industry organization www.ifr.org New Healthcare products Brunswick, NJ Spokane, WA IT support http://nextgen.iprintnmail.com (877)426-3774; www.ibm.com www.icicibank.com +61 1300 191 431; www.inferencecommunications.co m www.infobip.com www.intel.com (317)872-3000; www.ININ.com www.icsi.berkeley.edu (732)524-0400; www.jnj.com (509)789-5750; www.lyftsolutions.com Speech Strategy News May 2015 40 Companies mentioned in this issue Company M*Modal Location Franklin, TN MarketsandMarkets Dallas, TX Master Mobile Products Meditech (Medical Information Technology, Inc.) Apple Valey, CA Westwood, MA Medtronic Dublin, Ireland Redmond, WA Japan Microsoft Ministry of Internal Affairs and Communications MongoDB Mphasis (an HP Company) National Institute of Standards and Technology (NIST) National Science Foundation (NSF) NewsHedge New York, NY India (781)821-3000; www.meditech.com Various applications, products, and services Government organization (206)454-2030; www.microsoft.com www.soumu.go.jp/english/index Database www.mongodb.org IT services www.mphasis.com www.medtronic.com Research organization www.nsf.gov Financial news alert service Audio content search (312)532-9833; www.newshedge.com (404)495-7220; www.nexidia.com; www.nexidiatv.com +972 9 775-3777; www.nice.com Ra'anana, Israel Franklin, TN Burlington, MA Oracle Corp. Integrated software solutions for healthcare organizations Medical technology Washington, D.C. Chicago, IL NICE Systems OKI Electric Industry Co. Openstream, Inc. (888)600-6441; www.marketsandmarkets.com www.medmastermobile.com www.nist.gov Atlanta, GA Numenta Medical dictation Contact info (800)233-3030; www.mmodal.com Gaithersburg, Research awards and MD testing Nexidia Nissan USA Nuance Communications Business Speech recognition technology for healthcare transcription Market research Multimedia analytics Vehicle manufacturer Speech technology, applications, and services Machine intelligence www.nissanusa.com (617)428-4444; www.nuance.com Redwood City, CA Tokyo, Japan Text-to-speech (650)369-8282; http://numenta.com +81 3 3580 8950; www.oki.com Edison, NJ Mobile Internet infrastructure platform and applications Business software and hardware systems (732)507-7030; www.openstream.com Redwood Shores, CA (650)506-7000; www.oracle.com Speech Strategy News May 2015 41 Companies mentioned in this issue Company Orion Communications Parks Associates PeerTV Peterbilt Motors Company Philips Hue Precyse Precyse University ReadSpeaker Roku Inc. Salesforce Samsung Electronics SensorSuite Sensory, Inc. Siebel Systems SiriusXM SK Telecom SmartAction SoundHound Space Ape Games Speech Modules Strategy Analytics Location Dallas, TX Dallas, TX Business Workforce management software Market research Contact info www.orioncom.com (972)490-1113; www.parksassociates.com Petach-Tikva, TV solutions +972 9 740 7315; Israel www.peertv.com Denton, TX Infotainment system (940)591-4000; www.peterbilt.com -LED light bulbs www2.meethue.com/en-XX Wayne, PA Medical transcription (610)688-2464; solutions using speech www.precyse.com recognition Wayne, PA Professional education www.precyse.com/precyseunivers arm of Precyse ity Uppsala, Voice-on-the-web +46 18 60 44 94; Sweden services www.readspeaker.com Saratoga, CA Streaming Media www.roku.com Player San CRM and sales (415)901-7000; Francisco, support software www.salesforce.com CA Seoul, South Wireless telephones www.samsung.com Korea and TVs Mississauga, Wireless monitoring www.sensorsuite.com ON, Canada and energy saving solutions Santa Clara, Embedded speech (408)625-3300; CA recognition and www.sensory.com speaker ID San Mateo, Customer relationship (650)295-5000; www.siebel.com CA management software New York, Satellite radio www.siriusxm.com NY Seoul, South Mobile service provider www.sktelecom.com Korea El Segundo, Customer service (310)776-9200; CA automation www.smartaction.com San Jose, CA Music identification (408)441-3200; and search www.soundhound.com London, UK Mobile and tablet www.spaceapegames.com games Nes Ziona, Speech recognition +972 73 222 5555; Israel technology www.speechmodules.com Newton, MA Market reports 617 614-0700; www.strategyanalytics.net Speech Strategy News May 2015 42 Companies mentioned in this issue Company Suddenlink TMA Associates -Tarzana, CA Tractica Boulder, CO Translate Your World Twitter VERBATIM-VR Atlanta, GA -Hadera, Israel Dover, NH VXI Corporation Location Business Cable operator Consulting, market studies, newsletters, and conferences in business implications of speech and telephone technology Market intelligence focused on human interaction with technology Translation services Social text service Speech-to-text vocabulary software Microphones and accessories Home automation WeMo (Belkin) Playa Vista, CA WinScribe Chicago, IL Dictation solutions x.ai New York, NY Moscow, Russia Aliso Viejo, CA AI-based virtual assistant Search and cloud speech recognition Voicemail-to-text service Yandex YouMail, Inc. Contact info www.suddenlink.com (818)708-0962; www.tmaa.com (303)248-3000; www.tractica.com www.translateyourworld.com www,twitter.com www.VERBATIM-VR.com (603)742-2888; www.vxicorp.com www.belkin.com/us/Products/hom e-automation/c/wemo-homeautomation (866)494-6727; www.winscribe.com https://x.ai www.yandex.com (800)374-0013; www.youmail.com Speech Strategy News May 2015 43 Blog (with a chance to comment!) The Software Society (www.thesoftwaresociety.com) THE HUMAN-COMPUTER CONNECTION The role of the Top 1% in reducing income inequality The US stock market: Good as gold? Amazon Echo: Why you'll want one Is the US stock market the new gold standard? Apple’s next big thing isn’t a thing Can Artificial Intelligence create a new non-technical job category? Will mobile apps redefine the Web? Artificial Intelligence: Hype or the next big thing? Expanding brainpower—The next phase of economic growth? The Productivity Paradox: Efficiency without Jobs? Don’t underestimate Microsoft Is speech recognition on mobile phones a big deal, or just a gimmick? Chain reactions in technology I wish to subscribe to Speech Strategy News for one year (12 issues), payable in US$ on US bank— Individual* Corporate* Individual* Corporate* PDF PDF PDF PDF 6 monthly issues 6 monthly issues 12 monthly issues 12 monthly issues $215 $750* $425 $1,495* * Corporate subscriptions: Unlimited users within a corporation for PDF version with Web access through corporate password. Individual subscriptions cannot be shared (neither passwords nor electronic copies). Please invoice me. Or go to www.tmaa.com/subscribetossn Please send information on your consulting. Name: Company: Address: Check enclosed, payable to TMA Associates (in U.S. $ on a U.S. bank). Invoice me. Charge my— Visa MasterCard American Express City, State ZIP/Postal code Card # Country Expiration date: Email (required for email alerts or a Web subscription): Signature: _______________________________________________ Phone: Copyright TMA Associates 2014; All rights reserved. TMA Associates, P.O. Box 570308, Tarzana, CA 91357- 0308 USA. Tel: (818) 708-0962. 263 Speech Strategy News is published twelve times per year by TMA Associates, Editor: William S. Meisel. Trademarks mentioned in this publication are the property of the companies mentioned; they are used editorially. The material herein is based on data from sources believed to be reliable, but is not guaranteed as to accuracy and does not purport to be complete. From time to time, the author or TMA Associates may have consulting assignments, advisory positions, own stock, or have other business relations with organizations in speech recognition and associated areas, including companies discussed in this newsletter. Speech Strategy News is a trademark of TMA Associates.