Enhanced speech recognition in smart assistant devices
Fraunhofer upHear Voice Quality Enhancement is a smart-assistant-ecosystem agnostic microphone processing technology. The software is designed to facilitate voice-controlled human-machine interactions using microphones built into mobile phones and smart assistant devices such as smart speakers or smart soundbars. It allows the smart assistant to understand far-field voice commands and enables barge-in by removing interfering sounds captured by the device’s microphones, extracting the user’s voice and cancelling out acoustical echoes that would otherwise make it impossible for the HMI to understand the user’s request.
Challenge
With the rapid advancements in machine learning over the last few years, voice-controlled Human Machine Interfaces (HMI) are becoming more widespread with applications in several areas, including mobile phones, smart assistant devices and cars. Voice-controlled HMI systems typically consist of the following processing units:
- a keyword spotter to wake up the system
- an Automatic Speech Recognizer (ASR) module to convert speech into text
- a Natural Language Understanding Interface (NLUI) to enable natural conversations with the machine
- a Natural Language Generation (NLG) module to generate meaningful feedback commands to the user
- a Text-To-Speech (TTS) module to create synthesized speech from text
The input of any voice-controlled HMI is the audio stream captured by the microphones built into the device. In acoustic environments in which voice-controlled HMI systems are typically used, the quality of the captured voice may be insufficient to guarantee adequate performance of the keyword spotter and ASR processing units.
Our solution
Fraunhofer upHear Voice Quality Enhancement is a fully integrated and flexible solution for a wide range of mobile and smart assistant devices, as well as conferencing solutions. The technology combines advanced source localization and beamforming techniques with echo and noise reduction algorithms, thus providing outstanding voice quality even under unfavorable acoustic conditions. Advanced multichannel acoustic echo cancellation allows for barge-in functionality in an always-listening operation of the voice-controlled HMI.
Even though the technology supports single-microphone use cases, we recommend the use of microphone arrays to further improve the user experience in challenging conditions, especially for far-field applications.
Contact us for information on device-specific tuning by our sound engineers and consultancy regarding microphone placements.