360 Audio in Practice

To date, much of the discussion of 360 content has focused on the visual side of things and the entirely new hardware and software required for this new medium of 360 video. In contrast surround sound has been in use for many decades in cinema, live performances and recordings of experimental and popular music, theatergaming, and art installations. So rather than requiring the invention of entirely new technologies, we can instead adapt existing techniques for the specific demands of Virtual Reality (VR), namely;

  1. that sounds should appear from around, above and below the listener
  2. that (headtracked) headphones must be used instead of loudspeakers
  3. and that the system is portable, practical and reasonably simple to use

While a great many surround sound recording techniques have been developed over the years, for practicalities sake these have often tended to disregard the vertical position of sounds (installing loudspeakers in the ceiling and floor is often challenging!). So we end up with surround systems such as the long defunct Quadraphonics (four loudspeakers arranged in a square), 5.1, 7.1, etc.

5.1 Surround Sound

The motivation for these developments, particularly in cinema surround sound, was often practical requirements such as clearer dialogue, a bigger dynamic range of loud explosions and quiet whispers, without hiss or distortion, and increased fidelity. These developments are nicely summarized in the following documentary by Filmmaker IQ on the History of Sound at the Movies.

Of course as John Hess points out, sound is more than simply a technical solution; it is “half the picture”, and the ability to position sounds all around the listener is important from an artistic standpoint too. For this reason cinema surround sound continues to develop with the biggest recent change being the incorporation of overhead loudspeakers in popular new systems such as Auro 3D and Dolby Atmos. Alfonso Cuarón’s Gravity from 2013 is an excellent example of a film which really took advantage of the capabilities of these new systems in many different ways, as discussed in this great interview with the films sound designer Glenn Freemantle and re-recording mixer Skip Lievsay.

This ability to put the listener inside the sound scene is important for cinema, but it is an absolute necessity for VR. However, the types of microphone techniques that work for cinema may not be the most appropriate for VR, particularly if we want to change from a loudspeaker system to headphones. So called dummy-head microphones have existed for many years, and when listened to on headphones these can do a pretty good job of simulating normal hearing in a way that is very different from normal stereo sound. When we use headphones the sounds we hear usually seem to be positioned inside our heads, which when you think about is very unnatural. In normal hearing, sounds are externalized and this is captured to a certain extent when we use these binaural microphones containing two capsules positioned on either side of a dummy-head. As we can see from the picture below, these microphones also try to replicate the folds and shape of the ear pinnae, however this is no easy task as, like fingerprints, the particular shape of our ears is unique to us. As a consequence, reproducing sounds directly in front or behind is particularly challenging with binaural techniques, as this type of perception depends largely on the specific way sounds are filtered by our own unique set of ears (for more information on how spatial hearing and binaural sound works, see this page on my website).


Here are some examples of binaural sound recorded using a dummy-head microphone (these should be listened to on headphones).

First off, lets say hello.

Now lets move around the microphone.

and here’s some examples of the many binaural recordings available online, namely a thunderstorm (source: Freesound.org), and a market (souce: Wikicommons.org).

So although not perfect, binaural will definitely be involved in the reproduction of 360 audio, however, this type of binaural microphone is perhaps not ideal for these types of recordings. You may have noticed in the previous examples that when you moved your head, all the sounds move too, which again is highly unnatural. In real life a static sound stays in the same point in space as we rotate or move our head, and this needs to happen in virtual reality too. Actually tracking the position and rotation of the head is pretty simple, but manipulating the audio so that sounds stay in the same place as our head and attached headphones move is more challenging. Early 360 presentations (such as Beck’s 360 recording of David Bowie’s Sound & Vision for example) attempted to solve this problem using dummy head microphones with multiple sets of ears, with somewhat monstrous looking results!


In effect, these microphones capture multiple, concurrent binaural recordings from different perspectives. On playback, the head-tracking system cross-fades between these different recordings as the listener’s head moves, so that sounds hold their position rather than rotating with the listener. While this can work, this solution is not without its problems. Firstly, these microphones are very idiosyncratic, non-standardized and quite bulky. More importantly, sounds which are located at positions in between the different angles optimally captured by the microphone may not be reproduced correctly, and smoothly rotating the sound to compensate for head movement is also difficult to achieve.

For these reasons, these charmingly freaky looking microphones are increasingly being replaced with a different approach based on the audio format known as Ambisonics. First developed in the 1970s by Michael Gerzon, Peter Fellgett, among others, Ambisonics has been used by experimental composers and sound designers for many years, but without much in the way of widespread, commercial use. However this is now changing as the microphone technique associated with this approach is very well suited to VR. The so-called Soundfield microphone can capture a three-dimensional soundfield using one compact arrangement of four microphone capsules in a standardized arrangement, which can later be decoded to different arrangements of loudspeakers, or indeed to binaural. In addition, the entire recorded soundfield can be smoothly rotated prior to this decoding, which makes head-tracking much easier to achieve. Finally, as this technique has been around for over four decades it is well understood and lots of existing ambisonic hardware and software is available, often for free.

So how does it actually work? Well to put it simply in Ambisonics a 3D soundfield is described using a four channels of audio labelled W, X, Y & Z, collectively referred to as a B-format signal. These four channels of audio correspond to the overall non-directional sound pressure level [W], and the front-to-back [X], side-to-side [Y], and up-to-down [Z] directional information.


These four signals can be captured directly using an omni-directional microphone, and three bi-directional, or figure-of-8 microphones. However, it is not really possible to mount four microphones in the same point in space, so instead the microphone actually contains four cardioid or sub-cardioid  capsules mounted on the surface of a tetrahedron (soundfield mics are sometimes referred to as tetrahedral mics for this reason). The raw microphone recording (known as A-format) is then converted in hardware or more usually these days in software into B-format, before further processing. Existing mono or stereo recordings can also be positioned in space and encoded into B-format using a hardware or software panner.

It is important to note that these signals do not feed the loudspeakers directly but instead function as a description of a soundfield at a particular point in space. This means that a B-format ambisonics signal can be smoothly rotated around any axis (for head-tracking) and then decoded for different configurations of loudspeakers as needed (although in practice it works best with regular and symmetrical loudspeaker arrays). For VR, the individual loudspeaker signals are instead encoded in real-time into binaural signals, which are then mixed together to produce the final headphone mix. This virtual loudspeaker approach has been the focus of considerable research in recent years and is a very efficient and effective way of implemented head-tracked 360 audio.

Describing a soundfield in this way using a bare minimum of four channels is certainly efficient, but there’s plenty of research to shows that this efficiency comes at a cost. Errors or blurriness in the position of sounds can occur, and this is particularly true when the limitations of binaural are also taken into account. However, this is perhaps much less of an issue for VR, as in this type of presentation we can potentially also see the source of a particular sound as well as hear it.

All of this means that soundfield mics and “virtual reality microphones” are set to become synonymous, although the rate of development remains slow compared to 360 video, and indeed much of the 360 content currently available does not actually contain matching 360 audio. A number of microphones based around the traditional design are currently available however, such as the original Soundfield line now produced by TSL Products, Core Sound’s Tetramic, or the new Ambeo microphone from Sennheiser. Portable recorders containing multiple microphone capsules such as the Zoom H2n can also be modified or processed to produce a B-format signal (although horizontal only) and there are also a few more elaborate systems such as MH Acoustics’ Eigenmike. We have been investigating the precise capabilities of these different microphones as part of the Trinity 360 project and the Spatial Audio over Virtual and Irregular Arrays research group lead by Prof. Francis Boland. Over the summer we will be publishing the results of a series of experiments (shown below) which assessed a number of these microphones in terms of their directional accuracy and overall tone quality and fidelity. We are also planning on recording the concert performance on April 8th using a number of these microphones, so we can see how well they function in an actual location recording, and when combined with matching 360 video. We’ll publish the footage on this blog once we have it, and then you can judge the results yourself!



Spectral Music & the Commencements Bell

With thanks to Michael Murray, Noel McCann and Tony Dalton.

Spectral music is based around the idea that “music is ultimately sound evolving in time”. Often this involves the computer aided analysis of a particular sound, a Bb on a clarinet for example, followed by the metaphorical re-synthesis of this sound using an orchestra. In the picture shown above, the image on the left is a spectrogram of a single strike of the commencements bell in the Campanile in Trinity College. One the right, is an early draft of an orchestral chord (just synthesized using samples and MIDI for now), which attempts to instrumentally recreate this unique timbre. In an earlier post I mentioned the call-and-response as the most fundamental forms of spatial music. So, as long as the temperamental gods of Irish weather are on our side, a call-and-response between this bell and the orchestra will open the 3rd and final movement of this new piece, entitled From Within, From Without.

While the bell’s spectrum informed the harmonic language of all three movements of this work, the rhythmic structure of the orchestral movement was inspired by the synthesis technique of granulation, and some other techniques developed by composers such as Earle Brown [1926-2002]and Henry Brant [1913-2008]. Brant was a prolific composer of spatial music, writing 76 spatial works (and 57 non-spatial works) over the course of his long career. For Brant, the only way to really exploit space was to ensure that each musician, at each distinct location in space, performed material that was as differentiated as possible from every other part. So, although the entrance of each musician or group of musicians might be cued, they would then proceed independently at their own speed, rhythm, and in their own key. This is very similar in lots of ways to the concept of blocks of music developed by Earle Brown, and later Henry Vega, in which the start and end point of each block are tightly synchronized, but the individual musical lines inside each block are left entirely unsynchronized. This technique results in an interestingly complex texture and neatly avoids the issue of maintaining synchronization between spatially distributed musicians. The opening of Brant’s 1954 composition Millenium II illustrates this approach as ten trombones and ten trumpets, positioned along the side walls of the hall, enter one-by-one, each playing different melodies, in different keys.


Fig. 1 Stage Layout for Henry Brant’s Millenium II (1954) [Harley, 1997]

While Brant’s approach is very effective at highlighting the spatial distribution of the instruments, it does inevitably result in very dissonant harmonies and textures. With this new work I wanted to explore this type of approach, but with melodic lines that are rhythmically independent, but much closer in terms of the harmonic language. As such, the independent lines overlap in ways that are sometimes consonant, and sometimes dissonant. In some respects this results in a texture that is reminiscent of granulation; the electronic processing technique in which many fragments (or grains) of the original sound are layered over each other, and particularly when long grain durations are used. This is particularly prominent in the middle section of From Without, From Within as five trumpets and two trombones, all of which are distributed around the audience, play very similar melodies that are deliberately desynchronized, resulting in a complex texture that shifts between consonance and dissonance and results in some interesting spatial effects.

If you’d like to know more about the specific details (“let’s talk quartertones!”) of this movement, and indeed the other two movements of this work, I’ll be giving a  free, public talk on the project as part of the Music at Trinity Series in the Long Room Hub, March 21st, at 6.15 pm.

Also, if you’d like to come to the performance on April 8th, tickets are now available from this link (this is a free event, but tickets are required to reserve a seat).

[Harley, 1997] Harley, M. A., “An American in Space: Henry Brant’s “Spatial
Music”, American Music, Vol. 15(1), pp. 70-92, 1997.