Fraunhofer upHear® Voice Quality Enhancement is designed to facilitate voice-controlled human-machine interactions using microphones built into mobile phones and smart home devices such as smart speakers.
With the rapid advancements in machine learning over the last few years, voice-controlled Human Machine Interfaces (HMI) are becoming more widespread with applications in several areas, including mobile phones, smart home devices and cars. Voice-controlled HMI systems typically consist of the following processing units:
- a keyword spotter to wake up the system
- an Automatic Speech Recognizer (ASR) module to convert speech into text
- a Natural Language Understanding Interface (NLUI) to enable natural conversations with the machine
- a Natural Language Generation (NLG) module to generate meaningful feedback commands to the user
- a Text-To-Speech (TTS) module to create synthesized speech from text
The input of any voice-controlled HMI is the audio stream captured by the microphones built into the device. In particular, the keyword spotter and the ASR performance are directly impacted by the quality of the captured voice.
Fraunhofer upHear Voice Quality Enhancement removes interfering sounds captured by the device’s microphones, extracts the user’s voice and cancels out acoustical echoes that would otherwise make it impossible for the HMI to understand the user’s request.
Fraunhofer upHear Voice Quality Enhancement is a fully integrated and flexible solution combining advanced multichannel source localization and beamforming techniques with echo and noise reduction algorithms. It provides outstanding audio quality even under unfavorable acoustic conditions. Advanced acoustic echo cancellation allows for barge-in functionality in an always-listening operation of the voice-controlled HMI.
Even though the technology supports single-microphone use cases, we recommend the use of microphone arrays to further improve the user experience in challenging conditions, especially for far-field applications.