MPEG‑I Immersive Audio

Enabling Convincing XR Sound

Extended Reality (XR) – Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality – has made huge visual strides. High‑resolution displays, precise tracking, and real‑time rendering create convincing (virtual) worlds. But immersion breaks instantly when audio doesn’t behave as expected: a choir that doesn’t bloom in a cathedral, a drum you can’t hear around a corner, or your own voice that feels disconnected from the room.

 

Standardized Rendering for Natural Immersive Audio

Convincing immersion requires sound that reacts naturally to the listener’s position, orientation, and movement. It must also reflect the environment’s geometry, materials, and acoustic properties.

MPEG‑I Immersive Audio is a standardized ISO/MPEG renderer designed for high‑quality VR and AR audio. It delivers audio experiences that feel natural and convincing, complementing the visuals with realistic spatial sound. Whether you’re walking through a virtual concert hall or following a live sports event in VR, MPEG‑I ensures acoustics that remain coherent, immersive, and true to the environment.

 

From 3DoF to Full Spatial Freedom

MPEG-I VR Scene - Six Degrees of Freedom
© Fraunhofer IIS

Previous solutions were limited to an acoustic experience from just one observation point, only allowing for head movements in three degrees of freedom (3DoF). MPEG‑I changes this by supporting full six degrees of freedom (6DoF). VR users experience acoustics that react seamlessly to every change in position: moving through rooms, changing distance to sound sources, or walking behind objects all produce realistic acoustic shifts. In AR, virtual sound sources blend naturally into real spaces – for example, placing virtual musicians around your living room and enjoy a customized private concert.

Creating highly plausible 6DoF audio means simulating real acoustic physics:

  • Sound bending around obstacles
  • Propagation through open and reverberant spaces
  • Doppler effects from moving sources
  • Realistic radiation patterns and spatial extent

From ocean waves stretching across a beach to traffic noise weaving through a dense cityscape: With MPEG-I immersive audio, these complex acoustic behaviors are supported within a standardized, real-time capable rendering framework, ensuring they can be reproduced efficiently and consistently across different end user devices.

Virtual basketball hall
© Fraunhofer IIS

The demonstration below is a simple example of MPEG-I Immersive Audio technology. Please use headphones for the best experience and enjoy, for example, the enveloping splashing sound of the fountain's spatial extent.

From Metadata to Sound

With the MPEG-I Immersive Audio standard, truly immersive VR and AR sound can be created, delivered, and experienced consistently across devices. This makes it possible to distribute immersive VR and AR content as a next generation service over existing delivery channels, bringing high-quality spatial audio experiences to a broad audience.

MPEG-I structures immersive audio creation into three components:

  • Authoring & Encoding, where content creators define sound sources, environments, geometry, and materials.
  • Transport & Storage, using the efficient MHAS (MPEG-H Audio Streaming) format for streaming, broadcast, or file-based delivery.
  • Decoding & Rendering, where listener tracking and detailed acoustic metadata are combined to recreate convincing, physics-based sound scenes in real-time.

Extended Reality Communications: MPEG-I and IVAS

IVAS has been standardized in 3GPP as the codec for immersive spatial calling, including real-time communication with support for spatial metadata such as audio object positions and orientations and listener head-tracking. MPEG-I Immersive Audio complements this by providing the XR-side framework for sound that responds naturally to both listener and audio object movement and factors-in the effect of distance and environmental acoustic cues. This makes it possible to create convincing 6DoF audio experiences in walkable virtual environments. Together, the two technologies link immersive voice communication and immersive media rendering, using complementary standards to support both live calls and interactive XR experiences. This combination helps establish a consistent immersive-audio ecosystem for future communication services, entertainment and XR applications.

MPEG-I and IVAS VR Scene
© Fraunhofer IIS

Unified Immersive Audio with MPEG-I and MPEG-H

MPEG-I and MPEG-H VR Scene
© Fraunhofer IIS

By standardizing the 6DoF metadata bitstream format, transport, decoding and rendering, MPEG-I ensures that immersive content remains interoperable, future‑proof, and easy to consume across platforms. MPEG-I Immersive Audio can be seamlessly combined with MPEG-H Audio for efficient content compression supporting the rendering of channels, objects and Higher Order Ambisonics (HOA). Moreover, local audio, for example the user’s own voice, can be input to the renderer to support low-latency real-time conversations in virtual or augmented environments. This allows providers to offer users truly exciting and engaging experiences, whether in entertainment, documentaries, education, or sports.

More Information

Summary of MPEG-I Immersive Audio Verification Test Report

MPEG-I Immersive Audio - The Technology of the New Standard for Virtual / Augmented Reality Audio

MPEG-I Immersive Audio - The Upcoming New Audio Standard for Virtual / Augmented Reality

Quality Testing for AR and VR in MPEG-I Immersive Audio

MPEG-I Immersive Audio — Reference Model for the Virtual/Augmented Reality Audio Standard