Making voice assistants better

Voice assistants are growing in popularity and are even being adopted in industry. To work reliably, they need to »hear« voice commands clearly and must be trained to understand what is being said. A platform »made in Germany« is needed in order to retain ultimate control over these training models and the data the assistant collects. The Fraunhofer SPEAKER project is pursuing precisely that objective.

 

The use of virtual voice assistants such as Alexa, Siri, and others is becoming increasingly popular, with one in six Germans using »smart speakers« in 2018. The number of users of these speakers equipped with digital voice assistants has therefore tripled compared to the previous year.1 However, despite the technology’s growing popularity, surveys reveal that many consumers have concerns regarding data protection.2 In addition, users of the various assistants feel that their virtual helpers often fail to understand them correctly.3 At Fraunhofer IIS, we are addressing both of these problems with our natural language user interface (NLUI) projects. These interfaces allow humans and machines to interact using natural language – and we take two key steps to ensure that the machine understands and processes voice commands correctly.

Step one: The voice assistant needs to have good hearing

For voice commands to be processed correctly, they must reach the »artificial ear« loud and clear. This isn’t always as trivial as it sounds – indeed, many virtual assistants are housed inside a smart speaker that is also used to listen to music, for example. Likewise, the room in which voice assistants are used may produce echoes, the environment may be too loud or simply too large, or the user may be standing too far away. It was to solve these problems and others that we developed the Fraunhofer upHear Voice Quality Enhancement (VQE) technology, which ensures optimum processing of the voice signal for the smart speaker – for example, by suppressing acoustic echoes in the microphone signal. The device can therefore be voice-operated while simultaneously playing back music or announcements. Background noise is removed to allow operation even while the user is far away. This approach ensures that the »keyword spotter« receives a clear voice signal, significantly improving recognition performance.

 

Step two: The voice assistant must be »trained«

Voice assistants »exist« on human–machine interaction, allowing people to communicate with devices via voice commands and therefore to access products and services in natural language.

In order for the system to understand human beings, it is first necessary to train reliable models so that the machine can learn what voice commands mean. Until now, these technologies have lacked solutions that meet European data protection standards, because the market in voice assistance solutions has so far been dominated by companies from the USA and Asia. There is, however, huge demand from German business and industry for solutions of this kind. Particularly in relation to data sovereignty, there is a need for improved protection and secure exchange of personal data. This is possible with a voice assistant solution made in Germany,as it will implement European data security standards. At the same time, a new level of quality is emerging in human-machine communication that far exceeds the semantic capabilities of current systems and is therefore much more user-friendly.

Fraunhofer SPEAKER project

In the Fraunhofer SPEAKER project, we have joined forces with the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) to bring together experts in the areas of natural language understanding, artificial intelligence, and software engineering in an interinstitutional collaboration. As part of this major research and development project, which is supported by the Federal Ministry for Economic Affairs and Energy (BMWi), there are plans to develop a voice assistant that is »made in Germany«. The aim of the SPEAKER platform is to provide open, transparent, and secure voice assistant applications. With this in mind, it is important to facilitate the simple and straightforward use of leading technologies in the areas of audio preprocessing, speech recognition, natural language understanding, question answering, dialog management, and speech synthesis with the help of artificial intelligence and machine learning. These key modules are used to develop industrial voice assistance applications that can in turn be made available as finished applications to other market participants via the platform. The SPEAKER project received an award as part of the BMWi »artificial intelligence as a driver for economically relevant ecosystems« innovation competition and was therefore one of 16 outstanding concepts that prevailed in a field of more than 130 submissions.

April 1, 2020, marks the official start of the implementation phase of the SPEAKER project. Once the platform’s development is complete, it will be transferred to an operating company and offered at a similar cost to established platforms.


1 Donath, T. (2019): »Smart Speaker & Voice Control.« In: Trendmonitor Deutschland. Available at: https:// trendmonitor-deutschland.de/smart-speaker-voice-control/.

2 Bodenhöfer, X. (2018): »Digitale Sprachassistenten als intelligente Helfer im Alltag« [Digital voice assistants as smart helpers in everyday life]. In: Research articles by eresult GmbH.

3 Adobe 2019 Voice Report.