What Does “3D Audio” Mean

True stereophonic sound is like nothing you have heard before. Music in three dimensions with a real-life sense of depth and direction totally unlike ordinary recorded music.

Advert for RCA Victor Stereo Sound on Records

It seems silly, now, to call stereo — especially stereo on a record — “music in three dimensions” or to claim it has “real-life” anything when we compare it to what we are capable of today. Ambisonics, Ambiophonics, Atmos, Auro 3D: they all make stereo seem rather flat by comparison. We can actually do “immersive” and “3D” audio now, or so modern adverts claim…

However, how do we reconcile the difference between the “3D” offered by VR experiences with the “3D” of theatrical experiences?

We Don’t Hear in 3D… Except We Do

The irony is that one could make a very pedantic argument that even mono is “3D,” and be technically correct. Conversely one could make the argument from linear independence that a 5.1 system actually exists in 4D (and this isn’t that crazy of an argument from a DSP engineer’s point of view). So what even is “3D”?

Usually, when we talk about “three dimensions” we think of the usual three spatial dimensions: length, width, height. However, when we hear a sound, we don’t hear it as 6′ ahead, 3′ to the left, and 1.5′ overhead. When we close our eyes, we quickly realize that we don’t hear distance at all; we interpret it from certain environmental cues, but we can’t actually tell exact distances like we can at sight.

Instead, we are able to point at (or wave in the general direction of) a sound. We can also hear the loudness of a sound, which is a perceptual representation of its amplitude. So we don’t hear in the traditional “3D,” we this directional thing going on.

Space, the Final Frontier

In geometry we learn about Cartesian coordinates. This coordinate system is often how we tend to think about space: measuring it out along three axis, perhaps using ourselves as the center or else using some landmark depending on what the context it. For visual representations of space it works well, because it connects with how we tend to think.

Cartesian plot example.

Less emphasized outside of more rigorous studies into math, an alternative to describing spatial relationships is with the polar coordinate system. In the polar coordinate system describes locations in terms of azimuth (horizontal rotation), elevation (vertical rotation), and distance. As you may recall, you can freely convert between Cartesian and polar coordinates.

Polar plot example

Converting between the two, gives us ways to describe the same space in different ways. However it is important to note that the conversion isn’t just a purely mathematical/theoretical one, rather it is a perspective shift, we change our fundamental associations between points in space and the language we use to describe them.

Hearing in Circles

This idea of being able to point in the direction of a sound maps very well to the polar coordinate concepts of azimuth and elevation. However we still have the issue of distance: we don’t hear distance. Well, we can borrow a concept from mathematics: a vector. Vectors are typically described as having a “magnitude,” which you may remember as a concept that is related to the amplitude of a sound, if you’ve taken any acoustics classes.

So, if we think of sound as a vector, which has an azimuth, elevation, and amplitude, then we can easily represent how we hear sound from some direction. Using the azimuth and elevation we can quantify that sense of pointing (or arm waving) in the direction of a sound, and the length of our vector can represent the magnitude of the sound — which describes its amplitude.

So, we do hear in “3D,” just not the three dimensions that we typically think of. The three dimensions of hearing are the two directions of angular rotation, and the loudness (derived from the magnitude/amplitude). Now, you might be wondering if something like the loudness or magnitude counts as a “dimension.” From a mathematical perspective, the answer is that it is not a proper “spatial dimension;” however, we can still consider it a dimension (just not a “spatial” one) as it does contribute to our perception of spatial localization.

But What is 3D Sound?

The short and cynical answer is “marketing hype.” If you doubt this, scroll back up and read the RCA quote again. This type of language has been around for quite some time.

More generally, we’ve seen a divergence in approaches for systems beyond stereo. “3D audio” and “immersive audio” have become catchall terms for nearly anything that includes height or uses binaural encoding (such as a horizontal-only ambisonic soundfield). This is, in some ways, unfortunate for reasons mentioned above: it is unhelpful and imprecise to group something like ambisonics or encoded object-based audio in a 6DoF experience into the same category as a theatrical format like Atmos. Eventually (hopefully), the industry will develop more precise terminology along the theatrical/XR divide for immersive sound.

Perhaps, though, this imprecision of terminology can also serve as a reminder that understandings of “space” are fluid and contextual. The same idea can be described in a number of different ways, and we are free to use the system that makes the most sense at the time for the task.