A revolution in tv sound


For the first time in decades, a revolution in TV sound is about to enter our homes. With the exception of the introduction of 5.1 surround sound, TV has previously offered little in the way of audio innovations. Now, sound on TV and the Internet is about to become individual and capable of filling whole rooms. With the new audio codec MPEG-H Audio, TV and Internet audiences will be able to adjust their own soundmix and bring 3D cinema sound to their own homes.


We’ve all experienced audio problems on the TV. You might be watching a whodunit, and as the detective starts questioning the suspect, suspense-filled music kicks in and muffles the dialogue. Or maybe it’s a sports show and the commentator’s voice drowns out the stadium atmosphere that we’d sometimes rather hear. Broadcasting companies around the world regularly have to deal with complaints about these and similar issues. Numerous specialists at the TV channels try to give audiences the most balanced sound mix possible. But audiences are so heterogeneous that a single mix cannot make everyone happy. What has been lacking up to now is a way of individually adjusting the sound.

Also lacking is the ability to bring the room-filling 3D sound of movie theaters into our homes. These days, it is very rare for a blockbuster movie to appear on the big screen without natural sound coming from every direction. To play back the immersive surround sound, movie theaters install speakers not only around the audience, but also in the ceiling so that the sound literally fills the room and emanates from all sides as it would in a natural setting. Acoustically, this puts audiences at the heart of the action and makes the movie experience more immediate. However, this all stops when the same movies are shown on TV or the Internet, as there is still no way of efficiently bringing 3D sound to screens at home. Few of us would entertain the idea of installing a 3D speaker setup in the living room. You would need at least seven speakers for that – which puts a major obstacle in the way of mass adoption.

It is also hard to imagine the latest virtual reality worlds and devices without 3D sound. If 360° videos create the perfect alternative reality, the sound can’t simply (as is currently the case) be delivered as stereo sound on the headphones. Users need 3D sound here, perfectly aligned with the wraparound video so that it makes the virtual reality seem natural and realistic. Similar to TV and streaming, one of the challenges here is to find an efficient way of delivering the immersive sound.


MPEG-H Audio allows users to adjust the audio mix themselves.
© Fraunhofer IIS / viaframe
MPEG-H Audio allows users to adjust the audio mix themselves.

MPEG-H Audio: the next generation of audio coding

Audio and media technology experts have spent years developing technical solutions for these challenges. Our scientists played a major role in developing and standardizing the ISO/MPEG audio codec MPEG-H Audio. This descendant of the mp3 format was specifically developed to respond to the needs of broadcasters and streaming providers that want to offer their audience and customers a solution for the problems discussed above.

Customizable audio mix

In future, MPEG-H Audio will enable viewers to adjust the audio mix themselves. The degree to which this is possible will be decided by the TV station or the streaming provider. The possibilities are almost endless. For instance, the technology can let you turn up the dialogue so you can hear it better over the background noises and music. It can also let you choose between different commentators during sporting events, fill your living room with sound of soccer fans singing in the stadium as your favorite team plays, or listen in on the pit radio in your favorite driver’s car during a race. And you will be able to do all this via your remote control.



MPEG-H Audio can also deliver 3D sound efficiently. It supports channels, objects, and ambisonics audio. Channels are the conventional method for delivering sound: two channels for stereo, six channels for surround sound, and ten channels for 3D sound. MPEG-H Audio can also transmit audio objects. Examples of objects are interactive elements or specific 3D sound components (such as a helicopter flying over the audience). Objects have advantages over channel delivery in that they can be manipulated individually and adapted to the specific playback situation. Before playing back the objects, the decoder and renderer recalculate the sound so that it fits the available speaker setup every time. This achieves a better 3D effect than if the sound was delivered via channels alone. Lastly, MPEG-H Audio also supports ambisonics audio. Rather than delivering sound via channels or objects, this technique uses a mathematical description of the sound field recorded using a special microphone setup. Ambisonic recordings are popular among producers of virtual reality content because compact miking is enough to produce acceptable quality, and the sound can be easily played back through headphones.

3D sound at home

MPEG-H Audio thus makes it possible to deliver 3D sound flexibly and extremely efficiently, and at the kind of data rates that are largely standard for surround sound today. This means that 3D soundscapes are no longer confined to movie theaters, but can also unfold in your own living room.

Our audio team has presented a reference design for a 3D soundbar that, once placed below the television, allows users to play back room-filling sound. It removes the need to buy numerous speakers and install complex cabling – one of these soundbars is all you need to bring immersive sound into your own home.

MPEG-H Audio is not limited to use with TVs. The codec was designed so that the playback can be dynamically adjusted to the individual end device. You can play content on a smartphone, a tablet, a TV with built-in speakers, a soundbar, or a full-scale home cinema system, and MPEG-H Audio will optimize the sound quality for each playback situation.

MPEG-H Audio therefore does everything a modern audio codec should be capable of. It can be used immediately within closed systems provided by streaming services. Before it can be used with TV, however, the codec needs to be integrated into application standards such as ATSC or DVB. Our audio team was also active in this area and has ensured that MPEG-H Audio was integrated into the ATSC 3.0 standard and into the DVB standard. Now, whenever a country introduces new TV systems (e.g., for playing ultra-high-definition (UHD) 4K video), MPEG-H Audio will be available for delivering the audio.

UHDTV with MPEG-H Audio in South Korea

A current example of this is the new UHDTV system in South Korea that will use MPEG-H Audio for audio delivery. The system, which is based on the ATSC 3.0 standard, will initially be launched in Seoul and the surrounding region. The plan is to have expanded it to the sports venues in time for the Winter Olympics in 2018, and to have introduced it across South Korea by 2021. MPEG-H Audio is the first new-generation audio codec that will be used in a terrestrial 4K system. As is so often the case, South Korea is at the forefront of this technological development.

South Korean companies are also among the first providers to have started developing and selling transmitters and receivers that support MPEG-H Audio. For instance, Kai Media, DS Broadcast, and Pixtree announced and launched the first TV encoders. Broadcasters need the encoders so that they can encode their programs prior to transmission and thus prepare them for transmission. A German company has also begun offering professional equipment: the MPEG-H Audio Monitoring and Authoring Unit by Jünger Audio from Berlin makes it possible to mix immersive, interactive sound (even during live events) and prepare it for broadcast. Finally, to coincide with the introduction of MPEG-H Audio, the South Korean market will see the launch of plug-ins and software tools that will enable tonmeisters and sound designers to mix MPEG-H Audio sound in their preferred working environment. On the receiver side, leading manufacturers of consumer electronics are launching TVs equipped with MPEG-H Audio. Everything is therefore in place for the introduction of interactive 3D sound in South Korea: the codec will provide immersive sound at low data rates, TV broadcasters have access to the equipment they need, and consumers can buy TVs that can play the new programming.

Initial tests in Germany

German TV broadcasters are also interested in the new possibilities presented by interactivity, 3D sound, and MPEG-H Audio. Our tonmeisters helped public-service broadcaster ZDF to record and mix 3D sound for the Wolfskinder (Wolf Children) episode of the documentary show Terra X. This unusual production tested out new technologies for recording and broadcasting sound and images. The show was shot in 4K resolution, and 360° videos were produced to run online in parallel to the broadcast. 3D sound is particularly important for the 360° videos.

It will doubtless take a few more years before a terrestrial 4K TV system is introduced in Germany. When it does arrive, MPEG-H Audio will be ready, because it is included in the DVB specification and can thus be used in all DVB-based systems. Of course, this does not only apply to Germany: countries all over the world will introduce ultra-high-definition TV in the future, and MPEG-H Audio will always be an option for achieving the perfect sound.

Please note: Starting the video transfers usage data to youtube.