To date, much of the discussion of 360 content has focused on the visual side of things and the entirely new hardware and software required for this new medium of 360 video. In contrast surround sound has been in use for many decades in cinema, live performances and recordings of experimental and popular music, theater, gaming, and art installations. So rather than requiring the invention of entirely new technologies, we can instead adapt existing techniques for the specific demands of Virtual Reality (VR), namely;
- that sounds should appear from around, above and below the listener
- that (headtracked) headphones must be used instead of loudspeakers
- and that the system is portable, practical and reasonably simple to use
While a great many surround sound recording techniques have been developed over the years, for practicalities sake these have often tended to disregard the vertical position of sounds (installing loudspeakers in the ceiling and floor is often challenging!). So we end up with surround systems such as the long defunct Quadraphonics (four loudspeakers arranged in a square), 5.1, 7.1, etc.
The motivation for these developments, particularly in cinema surround sound, was often practical requirements such as clearer dialogue, a bigger dynamic range of loud explosions and quiet whispers, without hiss or distortion, and increased fidelity. These developments are nicely summarized in the following documentary by Filmmaker IQ on the History of Sound at the Movies.
Of course as John Hess points out, sound is more than simply a technical solution; it is “half the picture”, and the ability to position sounds all around the listener is important from an artistic standpoint too. For this reason cinema surround sound continues to develop with the biggest recent change being the incorporation of overhead loudspeakers in popular new systems such as Auro 3D and Dolby Atmos. Alfonso Cuarón’s Gravity from 2013 is an excellent example of a film which really took advantage of the capabilities of these new systems in many different ways, as discussed in this great interview with the films sound designer Glenn Freemantle and re-recording mixer Skip Lievsay.
This ability to put the listener inside the sound scene is important for cinema, but it is an absolute necessity for VR. However, the types of microphone techniques that work for cinema may not be the most appropriate for VR, particularly if we want to change from a loudspeaker system to headphones. So called dummy-head microphones have existed for many years, and when listened to on headphones these can do a pretty good job of simulating normal hearing in a way that is very different from normal stereo sound. When we use headphones the sounds we hear usually seem to be positioned inside our heads, which when you think about is very unnatural. In normal hearing, sounds are externalized and this is captured to a certain extent when we use these binaural microphones containing two capsules positioned on either side of a dummy-head. As we can see from the picture below, these microphones also try to replicate the folds and shape of the ear pinnae, however this is no easy task as, like fingerprints, the particular shape of our ears is unique to us. As a consequence, reproducing sounds directly in front or behind is particularly challenging with binaural techniques, as this type of perception depends largely on the specific way sounds are filtered by our own unique set of ears (for more information on how spatial hearing and binaural sound works, see this page on my website).
Here are some examples of binaural sound recorded using a dummy-head microphone (these should be listened to on headphones).
First off, lets say hello.
Now lets move around the microphone.
So although not perfect, binaural will definitely be involved in the reproduction of 360 audio, however, this type of binaural microphone is perhaps not ideal for these types of recordings. You may have noticed in the previous examples that when you moved your head, all the sounds move too, which again is highly unnatural. In real life a static sound stays in the same point in space as we rotate or move our head, and this needs to happen in virtual reality too. Actually tracking the position and rotation of the head is pretty simple, but manipulating the audio so that sounds stay in the same place as our head and attached headphones move is more challenging. Early 360 presentations (such as Beck’s 360 recording of David Bowie’s Sound & Vision for example) attempted to solve this problem using dummy head microphones with multiple sets of ears, with somewhat monstrous looking results!
In effect, these microphones capture multiple, concurrent binaural recordings from different perspectives. On playback, the head-tracking system cross-fades between these different recordings as the listener’s head moves, so that sounds hold their position rather than rotating with the listener. While this can work, this solution is not without its problems. Firstly, these microphones are very idiosyncratic, non-standardized and quite bulky. More importantly, sounds which are located at positions in between the different angles optimally captured by the microphone may not be reproduced correctly, and smoothly rotating the sound to compensate for head movement is also difficult to achieve.
For these reasons, these charmingly freaky looking microphones are increasingly being replaced with a different approach based on the audio format known as Ambisonics. First developed in the 1970s by Michael Gerzon, Peter Fellgett, among others, Ambisonics has been used by experimental composers and sound designers for many years, but without much in the way of widespread, commercial use. However this is now changing as the microphone technique associated with this approach is very well suited to VR. The so-called Soundfield microphone can capture a three-dimensional soundfield using one compact arrangement of four microphone capsules in a standardized arrangement, which can later be decoded to different arrangements of loudspeakers, or indeed to binaural. In addition, the entire recorded soundfield can be smoothly rotated prior to this decoding, which makes head-tracking much easier to achieve. Finally, as this technique has been around for over four decades it is well understood and lots of existing ambisonic hardware and software is available, often for free.
So how does it actually work? Well to put it simply in Ambisonics a 3D soundfield is described using a four channels of audio labelled W, X, Y & Z, collectively referred to as a B-format signal. These four channels of audio correspond to the overall non-directional sound pressure level [W], and the front-to-back [X], side-to-side [Y], and up-to-down [Z] directional information.
These four signals can be captured directly using an omni-directional microphone, and three bi-directional, or figure-of-8 microphones. However, it is not really possible to mount four microphones in the same point in space, so instead the microphone actually contains four cardioid or sub-cardioid capsules mounted on the surface of a tetrahedron (soundfield mics are sometimes referred to as tetrahedral mics for this reason). The raw microphone recording (known as A-format) is then converted in hardware or more usually these days in software into B-format, before further processing. Existing mono or stereo recordings can also be positioned in space and encoded into B-format using a hardware or software panner.
It is important to note that these signals do not feed the loudspeakers directly but instead function as a description of a soundfield at a particular point in space. This means that a B-format ambisonics signal can be smoothly rotated around any axis (for head-tracking) and then decoded for different configurations of loudspeakers as needed (although in practice it works best with regular and symmetrical loudspeaker arrays). For VR, the individual loudspeaker signals are instead encoded in real-time into binaural signals, which are then mixed together to produce the final headphone mix. This virtual loudspeaker approach has been the focus of considerable research in recent years and is a very efficient and effective way of implemented head-tracked 360 audio.
Describing a soundfield in this way using a bare minimum of four channels is certainly efficient, but there’s plenty of research to shows that this efficiency comes at a cost. Errors or blurriness in the position of sounds can occur, and this is particularly true when the limitations of binaural are also taken into account. However, this is perhaps much less of an issue for VR, as in this type of presentation we can potentially also see the source of a particular sound as well as hear it.
All of this means that soundfield mics and “virtual reality microphones” are set to become synonymous, although the rate of development remains slow compared to 360 video, and indeed much of the 360 content currently available does not actually contain matching 360 audio. A number of microphones based around the traditional design are currently available however, such as the original Soundfield line now produced by TSL Products, Core Sound’s Tetramic, or the new Ambeo microphone from Sennheiser. Portable recorders containing multiple microphone capsules such as the Zoom H2n can also be modified or processed to produce a B-format signal (although horizontal only) and there are also a few more elaborate systems such as MH Acoustics’ Eigenmike. We have been investigating the precise capabilities of these different microphones as part of the Trinity 360 project and the Spatial Audio over Virtual and Irregular Arrays research group lead by Prof. Francis Boland. Over the summer we will be publishing the results of a series of experiments (shown below) which assessed a number of these microphones in terms of their directional accuracy and overall tone quality and fidelity. We are also planning on recording the concert performance on April 8th using a number of these microphones, so we can see how well they function in an actual location recording, and when combined with matching 360 video. We’ll publish the footage on this blog once we have it, and then you can judge the results yourself!