Fraunhofer upHear® Voice Quality Enhancement

Enhanced speech recognition in smart assistant devices

Fraunhofer upHear Voice Quality Enhancement is a smart-assistant-ecosystem agnostic microphone processing technology. The software is designed to facilitate voice-controlled human-machine interactions using microphones built into mobile phones and smart assistant devices such as smart speakers or smart soundbars. It allows the smart assistant to understand far-field voice commands and enables barge-in by removing interfering sounds captured by the device’s microphones, extracting the user’s voice and cancelling out acoustical echoes that would otherwise make it impossible for the HMI to understand the user’s request.

 

Challenge

With the rapid advancements in machine learning over the last few years, voice-controlled Human Machine Interfaces (HMI) are becoming more widespread with applications in several areas, including mobile phones, smart assistant devices and cars. Voice-controlled HMI systems typically consist of the following processing units:

  • a keyword spotter to wake up the system
  • an Automatic Speech Recognizer (ASR) module to convert speech into text
  • a Natural Language Understanding Interface (NLUI) to enable natural conversations with the machine
  • a Natural Language Generation (NLG) module to generate meaningful feedback commands to the user
  • a Text-To-Speech (TTS) module to create synthesized speech from text

The input of any voice-controlled HMI is the audio stream captured by the microphones built into the device. In acoustic environments in which voice-controlled HMI systems are typically used, the quality of the captured voice may be insufficient to guarantee adequate performance of the keyword spotter and ASR processing units.

 

Our solution

Fraunhofer upHear Voice Quality Enhancement is a fully integrated and flexible solution for a wide range of mobile and smart assistant devices, as well as conferencing solutions. The technology combines advanced source localization and beamforming techniques with echo and noise reduction algorithms, thus providing outstanding voice quality even under unfavorable acoustic conditions. Advanced multichannel acoustic echo cancellation allows for barge-in functionality in an always-listening operation of the voice-controlled HMI.

Even though the technology supports single-microphone use cases, we recommend the use of microphone arrays to further improve the user experience in challenging conditions, especially for far-field applications.

Contact us for information on device-specific tuning by our sound engineers and consultancy regarding microphone placements.

Product features

Fraunhofer upHear Voice Quality Enhancement improves voice quality by an optimized integration of the following functionalities:

  • Multichannel Acoustic Echo Cancellation (MC-AEC) attenuates echoes originating from the devices’ loudspeakers.
  • Direction of Arrival (DOA) estimates the direction of the active talker.
  • Beamforming exploits the spatial diversity offered by an array of microphones to achieve improved directional sound acquisition and extracts the user’s voice even in far-field conditions.
  • Noise Reduction (NR), Dereverberation and Automatic Gain Control (AGC) further enhances the quality of the captured voice.

Product requirements

Fraunhofer upHear Voice Quality Enhancement can be adapted to the unique housing of the device. It offers flexibility in both the microphone and loudspeaker configuration of the device, ensuring optimal performance, regardless of whether mono, stereo, surround or immersive sound is being played back. This enables flexibility in the product design, and ensures optimal performance. Commonly used array geometries such as linear or circular microphone placements are natively supported.

The number of microphones and their arrangement needed for multichannel speech enhancement depend on the application scenario and the product design. Typically, it ranges from 2, 4 or even up to 8 for highest quality operation. Configurations shown in the following graphic are only examples.

Availability

Fraunhofer upHear Voice Quality Enhancement (VQE) is available for licensing. The software library can be provided for:

  • Desktop platforms (Windows, Mac, Linux)
  • Mobile Apps (iOS, Android)
  • Embedded Systems (e.g., ARM Cortex)

Fraunhofer IIS provides extensive technical support to licensees of the upHear VQE software.

 

If you are interested in licensing software from us please fill out the request form below.

More information

Request licensing information: upHear Voice Quality Enhancement

To request a price quote or an evaluation license, please fill in and submit the form.

* Required

Software platform:
Hardware platform
Title
Name