Three-dimensional audio standards are bountiful these days, with Dolby Atmos and DTS:X already carving up the market between them. MPEG-H is the third standard vying for attention, offering up its own unique features to boot. It’s not a new player in the market, MPEG-H has been in development since 2014 and has appeared in products as far back as 2017. However, MPEG-H is arguably the smallest player out of these three standards, but broader adoption could soon be upon us.
If you’re looking for supported products, Sennheiser’s AMBEO soundbar (pictured below) is one of the first generation products sporting MPEG-H capabilities. In the future, more soundbars, automotive speaker systems, and gaming headsets could all support the technology. Today though, we’re here to take a closer look at the MPEG-H audio standard, what it does, how it works, and what it means for the future of 3D audio.
MPEG-H in a nutshell
MPEG-H is an audio coding standard developed by the ISO/IEC Moving Picture Experts Group (MPEG) and Fraunhofer IIS. The standard was published in 2015 and was adopted in South Korean television’s 4K broadcast encoder in 2016, as part of the ATSC 3.0 standard. In 2017, Fraunhofer IIS unveiled a program to identify compatible consumer products and MPEG-H became the 3GPP’s standard audio codec for video streaming over mobile networks in 2018. In early 2019, Sony announced its 360 Reality Audio music service based on MPEG-H.
The standard supports a huge number of audio channels which can be mixed together to simulate a vast 3D space.
At its core, the standard supports a huge number of audio channels which can be mixed together to simulate a vast 3D space. There’s support from eight up to 64 loudspeaker channels and 128 codec core channels. More than you’d ever need in a home speaker setup. These channels can be traditional audio channels, audio objects complete with 3D location metadata, or an ambisonic “full sphere” surround sound format. While the primary target is surround sound systems, the standard also supports binaural audio rendering for headphones.
MPEG-H is designed with flexibility in mind. It supports broadcast and commercial applications, through to consumer level audio streaming, surround sound products, headsets, and even virtual reality.
How MPEG-H handles 3D audio
The key to next-generation 3D audio is the use of audio objects. Audio files aren’t stored in a specific speaker channel, instead, they are stored individually with metadata that includes a X, Y, and Z location in 3D space and gain, among other things. The mixing program and hardware adjusts the gain and position of an object during rendering, including a height component, producing a highly realistic 3D representation that’s essentially speaker agnostic.
In other words, channel encoding doesn’t pre-mix audio for a specific setup like 5.1. Instead, object-based audio is mixed together and rendered right before the sound comes out of your speakers. Dolby Atmos and DTS:X take the same object approach to audio for 3D sound, and this works across devices too. You may start listening on your phone, which requires a stereo mix, before returning home to finish listening on a surround sound setup.
Object-based audio is mixed together and rendered right before the sound comes out of your speakers... The mixing program and hardware adjusts the gain and position of an object during rendering, including a height component, producing a highly realistic 3D representation that's essentially speaker agnostic.
MPEG-H is also flexible enough to cater for end-user sound customization. For TV, film, and music, the user probably won’t adjust the location of audio objects, but the situation might be different for programs like live sports, listening to accessibility channels, and making an adjustment for hard of hearing. For example, a user can adjust the volume of the crowd during a football game, or move the sound of the commentators off over to the left or right. This is all based on the same 3D mixing system but can take user input to tweak experience just so.
In addition to audio objects, MPEG-H supports ambisonic audio. Ambisonics is particularly popular in the augmented and virtual reality markets. Here, audio isn’t treated as objects or channels but as a full 360-degree sphere of sound targeted at a central focal point. Because objects don’t move in space freely, ambisonics is considered “pre-mixed” similar to the old 5.1 and 7.1 formats. As a result, formats like Ambisonics B use as little as four channels to products a full 360 audio effect. However, unlike old surround sound formats, ambisonics is speaker agnostic and can be decoded to work on any speaker configuration.
MPEG-H supports all of the key methods for producing immersive surround sound audio. It does this while supporting a wider range of speaker configurations than ever before, while also working with existing surround sound setups and even headphones.
How will MPEG-H change the game?
As we’ve just explored, this 3D audio standard builds on the traditional 5.1 and 7.1 surround sound formula with not only sound in the X and Y axis but also with height in the Z axis too. This isn’t brand new given the competing Atmos and DTS:X standards but still offers tangible improvements to immersion over old-school surround sound setups. Think of helicopters and planes flying realistically overhead or the sound of a tower crumbling all around you.
Not only that, but the flexible and adjustable nature of audio objects supports a wider range of speaker configurations than ever before. That’s also a boon for those looking for more flexible and upgradable routes into a pricy home cinema setup. It also allows for a customized audio experience in your own home.
Furthermore, these same effects are supported in the headphone space thanks to binaural audio support. This is clearly a big deal in the gaming market, where object-based sound and ambisonics are very popular rendering techniques and immersion is a top priority. This builds directly into the virtual reality market, where content that envelops you in realistic sounding environments is just as key as the visual component. This is a particularly powerful tool when combined with head-tracking capabilities.
MPEG-H is certainly a powerful standard for 3D audio. We’ll hopefully see a growing portfolio of products across a variety of price points appear in the coming months and years.