Paper accepted to the 2020 International Computer Music Conference.

Outlines an algorithm for representing complex sound sources in 6DoF audio experiences, including the representation of statistical directivity information. Describes two recordings made as proofs-of-concept.

Paper accepted to the International Computer Music Conference 2020 (postponed to 2021).

Read the full paper here. The following is a “paraphrased” version of the paper, in a less rigorous presentation that focuses more on the process and my personal thoughts rather than the math and research behind the project.


Introduction

For computational efficiency sound sources in interactive experiences are frequently considered as idealized point sources that emit their sound spherically across the entire frequency spectrum. However, many sound sources exhibit frequency-dependent directional characteristics, and larger sound sources diverge from being point-sources as the listening position becomes closer. For many use-cases this added detail is not worth the added computational complexity to solve; however, for audio-focused immersive experiences the added detail can be welcome or even necessary to maintaining the sense of immersion.

Recording Complex Sources

To use this method of encoding complex sources, certain considerations need to be made during recording. The microphones need to be calibrated so that their input through the analog signal chain and into the converters is consistent for any input audio level across all the microphones. Microphone placement needs to adequately cover the sound source so that there are no perceptual gaps in the instrument and so that all the areas of interest are captured.

For the piano recording, 13 microphones were used with one microphone discarded during post-production. The microphones were place a consistent distance from the soundboard of the piano, and calibrated using a pink-noise generator.

All of the microphones used were high-quality, omnidirectional, condenser microphones. The lid of the piano was removed to prevent reflections from causing interference with the recording, and the microphones were placed high enough off the soundboard to prevent interference from reflections off the insides of the body of the instrument.

For the harp, the mic array used consisted of five microphones. During post-production, it was realized that a higher density would have been preferable. The two rings of microphones pictured were to attempt an experimental technique for collecting statistical directivity data, however the small size of the room made the recordings unusable for that purpose.

As with the piano recordings, the harp recordings made use of high-quality omnidirectional microphones that were calibrated using a pink-noise generator.

Post-Production

The audio recordings were edited in a standard digital-audio workstation. Care was taken not to interrupt the effect of the calibration on the microphones, so no dynamics range processing or equalization was used. The audio recordings were hosted in audio middle-ware integrated into the Unity game development engine. Virtual models were created of the harp and piano.

The reproduction algorithm uses a modification of the DBAP algorithm. The individual sampled points can be thought of as additional emitters that form a gestalt whole. The listener’s position relative to the emitters determines the impression of the instrument.

For the piano, the placement of the emitters was adjusted slightly away from the recorded positions. These adjustments allowed for a better representation of the instrument’s detail.

The 13th microphone had been placed on the underside of the piano. However during implementation and testing, it was decided that the additional information from that part of the instrument was not needed.

For the harp, the virtual emitter positions are relatively close to where the real microphones were placed.

While a third microphone along the arch would have been preferable for completeness, the larger issue was coverage towards the center of the instrument. This area was conservatively left clear to accommodate the player’s needed range of motion, however a more comprehensive approach would have been beneficial for the spatial image at closer listening positions.

VR Experience

Three virtual reality experiences were created using the recorded materials. Two of the recordings were centered around typical representations of the instruments in VR. The third experience was an experimental perspective using a modification of the algorithm.

These experiences were tested on an HTC Vive, however the harp experience was rebuilt for Oculus, and the required processing was lightweight enough to run on the stand-alone Oculus Quest (the piano experiences were not tested on the Quest).

Piano in Room

The first experience made follows a traditional piano representation. The listener is in a room with the playing piano, and is able to move around within the virtual space. As the listener moves close to the piano the perceptual “size” of the piano increases appropriately, and if the listener puts their head close to the soundboard the effect closely simulates the sense of having your head inside of the piano.

Harp in Room

The harp experience places the listener in a virtual room with the harp. As with the piano, the harp retains its perceptual size as the listener changes their distance from the instrument. At close distances, resonances from the column, soundboard, and strings are discernible. As the harpist plays, the location of individual notes is generally discernible; however, (as previously noted) the localization and detail could be better with increased coverage of the strings.

Piano Soundboard

Finally, an experimental application of the algorithm was implemented based on an audio recording that I made a few years ago. In the audio recording I made during my masters, I constructed a soundfield that gave the impression of the listener sitting in the middle of the piano’s soundboard.

Following this, I recreated this recording idea with six instead of three degrees of freedom. In the VR experience the listener is “shrunk down” so that they can walk around on the piano’s soundboard. The algorithm dynamically constructs a soundfield around the listener that approximates the sounds and resonances that they would be hearing if they were to stand at that specific point on the piano’s soundboard. However, the image breaks down as the user moves towards the boundaries of the emitter “mesh.”

Final Thoughts

I believe that VR technologies represent an interesting opportunity to create new kinda of musical experiences for audiences. Not only can we represent immersive experiences that simulate reality with varying degrees of immersiveness, we can also create new spaces and sound objects that are artistically convincing and intentional yet also unrealizable in the physical world.


VR Experiences made using: