Comparing Ambisonic Microphones

In previous posts on this blog I informally compared a number of different Ambisonic microphones which can be used to capture spatial audio for 360 video and VR. However, myself and my colleagues (Seán Dooney, Marcin Gorzel, Hugh O’Dwyer, Luke Ferguson, and Francis M. Boland from the Spatial Audio Research Group, Trinity College, and Google, Dublin) have just recently completed a more formal study comparing a number of these microphones using both listening tests and an objective analysis. The research was published in two papers presented at the Audio Engineering Society (AES) Conference on Sound Field Control, Guildford UK  in July 2016, and the 142nd AES Convention in Berlin in May 2017, and are now available from the AES digital library (Part 1 here, & Part2 here). While the papers themselves contain detailed technical information on both parts of this study, in this blog post I’ll present the major findings and share the original recordings so you can draw your own conclusions.

The original Soundfield microphone was first developed in the 1970s and since then a number of variations of this design have been released by different manufacturers. More recently, as Ambisonics has become the de facto standard format for spatial audio for VR and 360 video, many more Ambisonic microphones have emerged, and so a formal comparison of these different microphones is quite timely.

A number of previous studies on spatial audio have found a significant correlation between a listener’s overall preference and two specific parameters, namely localization accuracy (i.e. how directionally accurate is the spatial recording), and the overall sound quality. Our study therefore employed subjective listening tests to evaluate and compare the overall quality and timbre of these microphones, alongside an objective analysis of the localization accuracy.

In the first paper we examined the following microphones;

  1. DPA 4006 omni-directional condenser microphone used a monophonic reference
  2. Soundfield MKV system with rack-mounted control unit with four sub-cardioid capsules
  3. Core Sound TetraMic with four cardioid electret-condenser capsules
  4. Zoom H2n containing two stereo microphone pairs which can be converted into a horizontal only Ambisonic signal
  5. MH Acoustics Eigenmike spherical microphone array with thirty-two omni-directional electret capsules

TestMikes

For this initial listening test, samples of speech and music were played back over a single loudspeaker and recorded in turn by each of the five microphones. For this comparison, we wanted to examine the fundamental audio quality of these microphones, so only the monophonic, omni-directional W channel of these spatial recordings was used. The listening test participants were asked to compare each of the mic recordings to a reference captured using a high quality DPA 4006 microphone with an extremely flat frequency response. This approach is known as a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) test (described in ITU–R BS.1534-1) and is typically used to evaluate intermediate levels of audio quality in multiple stimuli both through a comparison to a reference recording, and between each stimulus. Participants were asked to compare each recording to the reference in terms of Audio Quality (for both music and speech), Low, Mid and High Frequency Timbre (music), and Artifacts/Distortion (speech) using the continuous quality scale shown below.

User-GUI

30 participants (26 male & 4 female) performed this test which lasted 17.5 minutes on average. The MUSHRA test procedure typically excludes subjects who give the hidden reference a score of less than 90 for more than 15% of the test items, however, as the differences in quality were often quite subtle in this case, this specification was relaxed slightly with 11 participants excluded from the final results. Our analysis found statistically significant results for all 6 questions which strongly suggest differences in the subjective performance of the different microphones.

A more detailed discussion of these results can be found in the original paper however two findings were particularly notable. Firstly, the results indicate that despite this being an older model from the Soundfield range, the Soundfield MKV performed the best overall in terms of audio quality.

plot1-2

The Eigenmike and TetraMic produced the worst results in terms of artifacts and noise, and this could well have contributed to the lower scores for these two microphones in terms of the overall Audio Quality, particularly for speech. The high scores for the TetraMic for Audio Quality and High, Mid, and Low Frequency Timbre for music examples (in which the presence of noise would be much less noticeable) support this finding. The relatively poor performance of the TetraMic in this category is perhaps explained by the low level unbalanced, microphone-level signal output which required significantly more preamplifier gain when compared to the other microphones. The low score for the Eigenmike for this question can perhaps be explained by the more elaborate design of this microphone and the much larger number of individual mic capsules (32 instead of just 4).

Plot3


In the second paper, we included an additional, newly released microphone, namely the Sennheiser Ambeo. For this listening test we wanted to replicate a more typical usage of these microphones, so the spatial recordings were presented using a head-tracked binaural rendering over headphones, which is standard for 360 videos presented using a Head Mounted Display (HMD).

Test-Mikes-test2

The listening test stimuli were taken from a live performance by an acoustic quartet (flute, guitar, saxophone and cello) of an original spatial music composition entitled A Round, Around. This piece consists of tonal, melodic canons, rotations and other spatial effects created by passing musical material between consecutive instruments, as can be seen in the score excerpt below. Many, many thanks to Kate EllisNick Roth, and Lina Andonovska for their fine performance.

GMBscore-extractThe piece was recorded using multiple monophonic spot (AKG 414s) and room microphones (AKG C314s, and Sennheiser TLMs) arranged in a circle of 1m radius and at 90 degree intervals, as shown below. In order to ensure consistent reproduction and microphone position, four excerpts from the piece were then reproduced using a number of loudspeakers and Ambisonic recordings made with each of the microphones in turn. A MOTU 8m audio interface was used for the TetraMic and Ambeo, while the other microphones used their own, proprietary interfaces. All of these recordings can be downloaded from the link at the bottom of this post, along with some additional recordings we couldn’t include in the test itself.

Array-MicsTo investigate the subjective timbral quality of each microphone, a modified MUSHRA test was again implemented. However, as no reference recording is available for this particular experiment, all test stimuli are presented to the listener at the same time without a reference.

Due to the relative difficulty of the task, and the lack of a suitable reference, all participants undertook training prior to taking the test. The benefits of using trained, experienced listeners over naive or untrained listeners has been well documented with trained listeners tending to give more discriminating and consistent preference ratings. Training for this experiment was conducted using Harman’s “How to Listen” listener training software and each subject was required to reach a skill level of 7 or higher in two specific programs focusing on the high frequency timbre (Brightness/Dullness), and low frequency timbre (Fullness/Thinness) of various stereo music samples.

The four tests each used a different excerpt from the piece, which was recorded using each of the five Ambisonic microphones. Subjects are asked to rate the five recordings in terms of high frequency timbre (Bright/Dull) in tests 1 & 2, and low frequency timbre (Full/Thin) in tests 3 & 4. The same continuous quality scale as in the training program is used, namely from -5 (Dull), to 5 (Bright) in tests 1 & 2, and from -5 (Thin), to 5 (Full) in tests 3 & 4, with 0 (Ideal) as the initial, default value. The test system and user interface was implemented in Max MSP and presented using a laptop, RME Babyface audio interface, Sennheiser HD650 open back headphones, and an Oculus Rift DK2 for head-tracking. The binaural decoding was based around a cube of virtual loudspeakers and mode matched decoder, implemented using the ambiX decoder plugin and the KEMAR HRTF set from the SADIE database.

Test-interface-MaxMSP

21 participants (20 male & 1 female) performed the test, with ages ranging from 21 to 50 years old, and an average age of 33. The participants included experienced listeners (12 subjects) comprising of professional audio engineers and academics with prior experience of similar tests, and semi-experienced listeners (9 subjects) comprising of post-graduate students in the Music & Media Technology programme in Trinity College Dublin. The average time taken for both the training and the test was 53 minutes, and a short break was always taken following the initial training and before the test.

TimbrePlots

For Question 1: high frequency timbre, no statistically significant difference was found between the Ambeo and H2n, and a small, but still statistically significant difference was found between the TetraMic and Soundfield MKV. Significant differences were found between all other pairs of microphones, with the Soundfield MKV rated as closest to ideal in terms of high frequency timbre. The Eigenmike was rated as dull sounding compared to the other microphones, while the Ambeo and H2n were both rated as brighter than ideal. This result for the Ambeo can perhaps be explained by the presence of the “Ambisonics Correction Filter” incorporated by default within v1.03 of Sennheiser’s A-to-B-format conversion plugin, which applied a significant boost above 10 kHz to all four channels. It should be noted that Sennheiser have since released an updated version of this plugin (v1.1.2) which significantly reduces this high frequency boost. While this new version arrived too late to be included in our formal test, the download link below includes both the original recording used in test, and a new version processed with the latest version of the plugin.

For Question 2, low frequency timbre, no statistically significant difference was found between the Soundfield MKV and TetraMic. A small, but still statistically significant difference was found between the Eigenmike and TetraMic, with significant differences found between all other pairs of microphones. The Soundfield MKV was again rated as being closest to ideal, but could not be statistically distinguished from the TetraMic. The Eigenmike was rated as full compared to the other microphones, while in contrast the Zoom H2n was rated as thinner sounding than other mics.


In addition to our listening tests, we also performed an objective analysis of localization accuracy for each of the five microphones. For these experiments, the five microphones were placed in turn at the center of a spherical array of sixteen Equator D5 coaxial loudspeakers, which were mounted on a dampened metal frame and the floor, in four vertically separated rings. Two laser pointers were mounted on the frame to mark the exact center point of the array and ensure the same position and orientation for each microphone.

Untitled-1

To generate the test material, pink noise bursts were played consecutively from each loudspeaker and recordings made with each microphone. In addition, a monophonic recording made with the DPA 4006 was synthetically encoded into Ambisonics using the ambiX encoder and the azimuth and elevation angles of the loudspeakers in the array, and this served as a reference for the directional analysis.

As well as the five microphones mentioned earlier, this second experiment also assessed the Ambeo both with, and without the Ambisonics Correction Filter mentioned earlier. In addition, the Zoom H2n was assessed using both the native Ambisonics recording mode (horizontal only), released by Zoom as a firmware update in 2016, and also the conversion of the standard 4-channel recording mode to B-format using a freeware conversion plugin developed by my colleague Brian Fallon at Trinity College Dublin (available as a free download, here)

The published paper contains precise details on our directional analysis method but fundamentally our approach is based around an intensity vector analysis of different frequency bands. For each of these frequency bands (where are derived from the Bark scale), the mode of angle estimates was taken as the estimated angle to the source. This can be seen in the histogram of azimuth angle data for a loudspeaker at 45 degrees shown below where the angle estimates are spread across the range -180 to 180 degrees but the primary mode correlates strongly with the true source angle.

SoundField Mic Stand - Speaker 2 V2 (Cropped)

Using this method, an estimate for the azimuth and elevation angles of each of the 16 loudspeakers was obtained from the different microphone recordings. All of these estimates were then compared to the actual loudspeaker locations and an absolute offset error determined. These results are summarized below, although it should be noted that as the Zoom H2n does not include a vertical component, no elevation results are included for this microphone.

The minimal errors for the synthetically encoded reference signal clearly demonstrate the effectiveness of this analysis method for this type of analysis. A paired t-test revealed statistically significant differences between microphones, and the results indicate that the Eigenmike was the most accurate mic in terms of both azimuth and elevation, on par with the Ambeo (both versions) and better than all other microphones.

Both versions of the Ambeo were on par with the Eigenmike and Soundfield MKV (for azimuth), better than the Soundfield MKV (for elevation), and better than the TetraMic and H2n (both versions). Interestingly, our study revealed no significant difference in directional accuracy for the Ambeo with and without the optional Ambisonics Correction Filter.

The results for the Soundfield MKV and TetraMic were largely comparable, and the Zoom H2n performed worse overall, although this is not unexpected as this mic does not include a vertical component and all loudspeakers were offset from horizontal to some degree. Finally, no statistically significant differences were found between the H2n native Ambisonic recording mode, and our 4-channel conversion plugin.

directional_error_no_callaghan-final


The results of these experiments largely match the findings of Part 1 of this study, with the Soundfield MKV once again producing the best results in terms of overall timbral quality, and with comparable results to the other microphones in terms of directionality with the exception of the Eigenmike and Ambeo.

The Ambeo performed very well in terms of directionality, on par with the Eigenmike and Soundfield (azimuth), and better than all other mics. In terms of timbral quality the Ambeo was rated as less ideal and brighter than other mics, however, as noted earlier this result was strongly influenced by the now updated Correction Filter applied during the A-to-B-format conversion process. Notably the latest version of this conversion plugin released by Sennheiser has drastically reduced the extent of this high frequency boost, and this new version is also included in the download link below.

The TetraMic was slightly less accurate than the Eigenmike, Ambeo and Soundfield MKV in terms of directional accuracy, which may be explained by the slighly greater inter-capsule spacing of this microphone. While the use of speech signals and studio recordings in Part 1 of this study revealed some issues with noise with the TetraMic due to relatively low level signal output compared to other microphones such as the Ambeo, this was much less apparent with this music recording. This suggests that given appropriate pre-amplifiers with sufficient gain, good results can be achieved with the TetraMic in terms of timbral quality.

The more elaborate design of the Eigenmike produced the best results overall in terms of localization accuracy, but with decreased performance in terms of timbral quality.  As with our earlier experiments, this suggests a certain trade-off between timbral quality and directional accuracy, particularly when the number of individual mic capsules is significantly increased.

The Zoom H2n performed worse overall compared to all other microphones, however, its performance is still very reasonable given its extremely low cost , and that it was not originally designed to produce B-format recordings.


All of the recordings used in the second listening test can be downloaded from the link below. The ZIP file contains the 4 excerpts from A Round, Around recorded with the Soundfield MKV, Ambeo, TetraMic, H2n  and Eigenmike. In addition, an alternative version of the AMBEO (converted using the newly updated filter), and Eigenmike (3rd order Ambisonics) are also included in the additional recordings folder. All the recordings are 48kHz/24bit, and the Furse-Malham weighting (with the W channel attenuated by 3dB) and channel order (WXYZ) was used throughout.

ComparingAmbiMics-Samples.zip

The Zip file also contains a sample Reaper session which can be used to audition the files, using the ambiX decoder plugin and the Kemar HRTF set from the Sadie database.

A New Camera Rig, Distance Processing Experiments, and “Vortex Cannons”

The Music & Media Technologies programme in Trinity College Dublin recently celebrated its 20th birthday with a concert in the Samuel Beckett Theatre. So, we decided to use this opportunity to try out our new camera system and microphone, namely a GoPro Omni, and a Sennheiser Ambeo.

index_postera3-fancy2-webgraphic                       img_0286_1-small

The concert featured numerous performances by the Crash Ensemble and composers such as Miriam Ingram, Enda Bates, Natasa Paulberg, Neil O’Connor, Maura McDonnell, and Conor Walsh/Mark Hennessy. However, a couple of performances in particular seemed very suitable for a 360 video, particularly The Sense Ensemble / Study #2 by George Higgs for string quartet, vortex cannons, silent signing singer, and percussion.

George is currently pursuing a Ph.D. at Trinity College Dublin entitled ‘An Approach to Music Composition for the Deaf’ and here’s his programme note for the piece;

Music involves much more than hearing. All of our senses – arguably nine in number – are in fact collaborating in our musical experience as a kind of ‘sense ensemble’. This composition is the second in an experimental research series exploring approaches to music composition for deaf audiences; or more generally music that appeals to the multiple senses responsible for our appreciation of music. The performance features smoke ring cannons, two signing percussionists and string quartet. Many thanks to Neimhin Robinson (smoking signer), Dr Dermot Furlong (nonsmoking supervisor), Jessica Kennedy(choreographic consultant), and the Irish Research Council.

While the content of this performance was very well suited to the medium (those smoke cannons in particular), the conditions were highly challenging in terms of the video shoot and so this was a good test of the limitations of the GoPro Omni system.

The most notable feature of this rig is undoubtedly the synchronisation of the six GoPro’s, which so far has been very stable and trouble free. Once the firmware is updated, then all 6 cameras can be controlled from one master camera, which can also be matched to the standard GoPro remote control. If you purchase the rig-only Omni, then it should be noted that the power pack and stitching software needs to be purchased separately, however, we were just about able to snake our 6 USB power cables up along the tripod and into the cameras without too much difficulty.

 

img_0286_5

To achieve this synchronisation, the cameras attach to the central brain of the rig, however, the extra space needed for this does mean the cameras are not positioned as close together as physically possible. As a consequence, the rig and stitching software does struggle when moving objects or people get too close to the camera, as can be seen in the above video when George and Neimhin approach the smoke machines to fill up the vortex cannons.

Stitching is implemented using a specific GoPro Omni Importer App and for simpler shots in which nothing is moving too close to the rig, this does a pretty good job. However, in general at least some touching up of the stitch is required using a combination of Autopano Video Pro, and Autopano Giga. Visible stitch lines on static objects or people are relatively easy to correct and simply require some adjustments with the masking tool in Autopano Giga. This tool allows Markers to be added to the reference panorama image so that stitch lines avoid certain areas and noticeable artefacts are removed (or at least reduced).

For static objects this can usually be achieved relatively easily, however, it is definitely worth considering how you orientate the camera rig with that in mind. By default the omni is mounted on one of the corners of the cube, however, it could be worth adding an attachment to the tripod so the rig is mounted flat, depending on the particular setup for the shoot (we may have had better results here using that orientation).

The particularly challenging aspect of the stitch of George’s piece was the movement of the two performers, and this required further processing using the timeline in Autopano Video. This is a similar process as again we use the masking tool in Autopano Giga to selectively maintain specific areas in the reference panorama. Now however we’re using the timeline in Autopano Video to move between different sets of markers as the person or object moves across a stitch line. This can be pretty effective once the objects are not too close, as the following tutorial from CV North America demonstrates. However, if the action is happening within a few metres of the rig, then stitching artefacts may be unavoidable or at least extremely time consuming to eliminate entirely (as can clearly be seen at times in the video of George’s piece).

The particular lighting needed for this piece also presented some challenges in the stitch. In order to light the smoke rings, two fairly powerful spot lights were directed over the audience and directly down onto the stage (and therefore also the camera rig), which resulted in exaggerated brightness and shine in the performers faces (and those white coats too!).

In contrast to the above, the stitching for the second video from this concert was much more straightforward. For this piece by Miriam Ingram, the musicians were all at a safe distance from the camera so only a few small touch ups were required, again just using the masking tool in Autopano Giga to select specific areas within the reference panorama.

 

The audio for both of these videos was recorded using a Sennheiser Ambeo microphone and Zoom H6 recorder, mounted just in front and below the camera rig. We will be publishing some specific analysis of this microphone in the second part of our Comparing Ambisonic Microphones Study early in 2017, however, more informally I’ve been very impressed by this microphone. The build quality feels very good, it outputs a high signal level that performs well with average quality mic preamps such as in the Zoom recorders, and the accompanying conversion plugin is very straightforward.

Although marketed as a “VR mic” this is actually an almost identical design to the original Soundfield microphone which has been use for many decades. The photo below on the left shows the capsule arrangement within the Ambeo which follows the same tetrahedral layout of the four microphones as in the original Soundfield microphone (which is shown on the right).

sennheiser20ambeo20vr20microphone2mg07soundfieldcapsule-509x1024

As is often the case in 360 audio mixes, the question of distance is a an important factor to consider. For acoustic performances, recordings such as this can often sound very distant, generally due to the placement of the microphone alongside the camera rig at a distance beyond what would be typical for an audio only recording. As a consequence, this can result in a lack of directionality in the recording and require the addition of additional spot microphones. However, for this shoot we had the opposite problem due to the close miking and amplification of the musicians through the venue PA. As the mic and camera rig were positioned in the front row of seating, the PA loudspeakers were positioned very wide relative to the mic, resulting in a overly close sound compared to the visual distance. This was particularly noticeable for the string quartet in George’s piece which sounded much too wide and close initially.

To correct this issue, some simple yet surprisingly effective distance processing was applied to the Ambeo Mic recording, namely the addition of some early reflections to the W channel. This was very much a quick experiment using just the early reflections component of a commercial reverb plugin, however, as the results were pretty good it made it into the final mix. As a demonstration, the audio samples below contain static (at 30 deg and -90 deg respectively) binaural mixes of an excerpt of the piece. Each sample begins with two bars of unprocessed material, then two bars with the additional reflections, and so on for another 4 bars.

This type of distance processing of both ambisonic recordings, and encoded mono sources such as spot microphones will be the focus of my research in the coming year, as there are still many unknowns in this whole area. For example, just how important is it that the pattern and timing of these early reflections match the physical dimensions and acoustic of the space? Alternatively, can better results be achieved using some general method, perhaps such as the distance panpot suggested back in 1992 by Peter Craven and Michael Gerzon [1]?

There is also some evidence that the optimal number and distribution of early reflections for distance processing without excessive timbral coloration is dependent on the nature of the source signal, which suggests that a one size fits all solution may not be the best approach. Lets just say this is definitely a topic we’ll be returning too in 2017, and now that we have our hardware and workflow all sorted, expect a lot more 360 videos over the coming year.

 
[1] M. A. Gerzon: The Design of Distance Panpots, Proc. 92nd AES Convention, Preprint No. 3308, Vienna, 1992.

 

 

 

Zoom H2n Conversion Plugin

My colleague Brian Fallon has recently created a First-Order Ambisonic Encoder Reaper plugin for the Zoom H2n portable recorder which you can download from the link below;

H2n-FOA-Encoder-Package.zip

h2n-plugin
While the H2n is far from the best Ambisonic microphone available, it is certainly one of the most affordable and produces surprisingly usable results given its cost (although due to the geometry of the H2n’s microphone capsules, it is horizontal only). Zoom released a firmware update for the H2n earlier this year which allows for horizontal only Ambi-X audio to be recorded directly onto the recorder. However, it can sometimes to be useful to record in the original 4-channel mode (so you have access to the original stereo tracks) and convert to Ambisonics later. In addition, if you made 4-channel recordings with the H2n prior to the release of this Firmware update, then this plugin can also be used to convert these into Ambisonics.

Brian’s plugin is for the DAW Reaper and can be used to convert these H2n 4-channel recordings into horizontal B-format Ambisonics and also allows you to choose various output channel orders and normalization schemes (Furse-Malham, Ambi-X, etc.). The package includes a sample Reaper project and a manual with details on the recording and plugin setup.

Note, that if you own the older H2 recorder which has a slightly different microphone arrangement, then Daniel Courville’s VST and AU plugins can be used for conversion to B-format in a similar fashion.

 

Soundscapes, VR, & 360 Video

 

Over the past few months we’ve been busy presenting our work at the AES and ISSTA conferences, through masterclasses,  workshops, and public demonstrations such as Dublin Science Gallery’s event, ‘Probe: Research Uncovered‘ at Trinity College Dublin last month. In our research, we are continuing to evaluate different recording techniques and microphones and we have also recently acquired a new 360 camera system (a GoPro Omni to be exact) with a much simpler and faster capture and video stitching process compared to our experimental rig (although monoscopic only of course). We’ll have more information on that camera system in the coming weeks.

As a composer, one of the things I find most fascinating about VR and 360 video is its relationship to soundscape composition. Composers have been making music from field recordings for many decades from the electroacoustic nature photographs of Luc Ferrari, to the acoustic ecology of The World Soundscape Project (WSP) and composers such as Murray Schafer, Barry Truax, and Hildegard Westerkamp, and the music and documentary work of Chris Watson, to give just a few examples (the Ableton Blog has a nice article on the Art of Field Recording here).

Of course, in the world of VR and 360 video the soundscape serves an important functional role as a means to increase the sense of immersion and presence in the reproduced environment. In addition, the location of sounds can be used to direct visual attention towards notable elements in the scene. It has been said of cinema that “sound is half the picture” but this is perhaps even more true in VR!

The combination of these two areas is therefore deeply interesting to me, both in terms of how we might create music soundtracks for 360 videos and VR games that are created from the natural recorded soundscape and sound design, but also in terms of how we might use 360 video for the presentation of soundscape compositions.

Although it may seem somewhat counter-intuitive, this ability to control and perhaps also remove the visual component can be used to focus the attention on the audible soundscape in a potentially interesting way. While loudspeakers or headphones can provide an effective sense of envelopment within an audio scene, there is inevitably a conflict between the visual perception of the loudspeakers and/or reproduction environment, compared to the recorded soundscape. In the context of 360 video, the composer has in contrast complete control over both the visual and audible scene which opens up some interesting creative possibilities.

This type of environmental composition, which makes use of both 360 video and spatial soundscapes is the next focus of this project and we should have new work online in the coming months. However, in the meantime I’d like to recommend an award winning VR experience which has inspired my work in this area. Notes on Blindness is a documentary film based on the audio diaries of John Hull and his emotive descriptions of the sensory and psychological experience of losing sight and blindness. The accompanying VR presentation utilizes spatial audio and sparse, dimly lit 3D animations to represent this experience of blindness in a highly evocative manner. Released for the Samsung platform earlier this year, the VR experience is now available as a free app for iOS or Android and is highly recommended.

 

Spatial Audio & 360 Video

The first 360 video from the concert is now online, and in a nice piece of timing, Youtube have just released a new Android app that can playback 360 videos with spatial audio. So, if you happen to own a high spec Android smartphone like a Samsung Galaxy or Nexus (with Android v4.2 or higher), you can watch this video using a VR headset like Cardboard with matching spatial audio on headphones. Desktop browsers like Chrome, Firefox, Opera (but not Safari), and the YouTube iOS app will only playback a fixed stereo soundtrack for now, but this feature will presumably be added to these platforms in the near future.

This recording is of the first movement of From Within, From Without, as performed by Pedro López LópezTrinity Orchestra, and Cue Saxophone Quartet in the Exam Hall, Trinity College Dublin, April 8th, 2016. You can read more about the composition of this piece in an earlier post on this blog, and much gratitude to François Pitié for all his work on the video edit.

Apart from YouTube’s Android app, 360 video players that currently support matching 360 audio are thin on the ground, at least for now. Samsung’s Gear VR platform supports spatial audio in a similar manner to YouTube, although only if you have a Samsung smartphone and VR headset. Facebook’s 360 video platform does not support 360 audio right now, however, the recent release of a free Spatial Audio Workstation for Facebook 360 suggests that this wont be the case for long. The Workstation was developed by Two Big Ears and includes audio plugins for various DAWs, a 360 video player which can be synchronised to a DAW for audio playback, and various other authoring tools for 360 audio and video (although only for OSX at the moment).

The mono video stitch was created using VideoStitch Studio 2 which worked ok, but struggled a little with the non-standard camera configuration. My colleague François Pitié is currently investigating alternative stitching techniques which may produce better results.

The spatial audio mix is a combination of a main ambisonic microphone and additional spot microphones, mixed into a four channel ambisonic audio file (B-format/ 1st order, ACN channel ordering, SN3D normalization), as per YouTube’s 360 audio specifications. As you can see in the video, we had three different ambisonic microphones to choose from, a MH Acoustics Eigenmike, a Core Sound TetraMic, and a Zoom H2n. We used the TetraMic in the end as this produced the best tonal quality with reasonably good spatial accuracy.

As might be expected given the distance of the microphones from the instruments, and the highly reverberant acoustic, all of the microphones produced quite spatially diffuse results, and the spot microphones were most certainly needed to really pull instruments into position. The Eigenmike was seriously considered as this microphone did produce the best results in terms of directionality (which is unsurprising given it’s more complicated design). However the tonal quality of the Eigenmike was noticeably inferior to the TetraMic, and as the spot mics could be used to add back some of this missing directionality, this proved to be the deciding factor in the end.

The Zoom H2n was in certain respects a back up to the other two microphones, as this inexpensive portable recorder can not really compete with dedicated microphones such as the TetraMic. However, despite its low cost it does work surprisingly well as a horizontal only ambisonic mic and was in fact used to capture the ambient sound in the opening part of the above video (our TetraMic picked up too much wind noise on that occasion so the right type of wind shield for this mic is strongly recommended for exterior recordings). While we used our own software to convert the raw four channel recording from the Zoom into a B-format, 1st order ambisonic audio file (this will be released as a plugin in the coming months), there is now a firmware update for the recorder that allows this format to be recorded directly from the device. This means you can record audio with the H2n, add it to your 360 video and upload it to YouTube without any further processing beyond adding some meta data (more on this below). So, although far from perfect (e.g. no vertical component in the recording), this is definitely the cheapest and easiest way to record spatial audio for 360 video.

The raw four channel TetraMic recording had to first be calibrated and then converted into a B-format ambisonic signal using the provided VVMic for TetraMic VST plugin. However, it should be noted that this B-format signal, like most ambisonic microphones (apart from the Zoom H2n), uses the traditional Furse-Malham channel order and normalization. So, it must be converted into the ACN, SN3D using another plugin such as the  AmbiX converter, or Bruce Wiggins‘ plugin, as shown in the screen shot below.

Screen Shot 2016-05-25 at 18.42.10

In addition to these ambisonic microphones positioned at the camera, we all also used a number of spot mics (AKG 414s), in particular for the brass and percussion on the sides and back of the hall. These were mixed into the main mic recording using an Ambi-X plugin to encode this mono recording into a four channel ambisonic audio signal and position it spatially, as shown below.

Screen Shot 2016-05-25 at 19.13.03

As these spot microphones were positioned closer to the instruments than the main microphone, and were directly mixed into this recording in this way, they provide much of the perceived directionality in the final mix, with the main microphone providing a more diffuse, room impression. This traditional close mics + room mic approach was needed in this particular case as it was a live performance with an audience and musicians distributed around the hall. However, this is quite different from how spatial audio recordings (such as for a 5.1 recording) are usually created. These types of recordings tend to emphasize the main microphone recording with minimal or even no use of spot microphones. However, to do this we have be able to place our main microphone arrangement quite close to the musicians (above the conductors head for example) so that we can capture a good balance of the direct signal (which gives us a sense of direction), and the diffuse room reverberation (which gives us a sense of spaciousness and envelopment). Often this is achieved by splitting the main microphone configuration into two arrangements, one positioned quite close to the musicians, and another further away. However, this is much harder to do using a single microphone such as the TetraMic, particularly when an audience is present and the musicians are distributed all around the room. This is one of the things we will be exploring in our next recording, which will be of a quartet (including such fine musicians as Kate Ellis, Nick Roth, and Lina Andonovska), and without an audience. This will allow us to position the musicians much closer to the central recording position and so capture a better sense of directionality using the main microphone, with less use of spot microphones.

Google have released a template project for the DAW Reaper which demonstrates how to do all of the above, and also includes a binaural decoder and rotation plugin that simulates the decoding performed by the Android app. For installation instructions and download links for the decoder presets and Reaper Template, see this link. This can also be implemented using the same plugins in Adobe Premiere, as shown here. Bruce Wiggins has a nice analysis of the Head Related Transfer Functions used to convert the ambisonic mix to a binaural mix for headphones on his blog which you can read here.

Finally, once you’ve created a four channel ambisonic audio file for your 360 video, you then need to add some metadata so YouTube knows that this is a spatial audio signal, as shown here. My colleague François Pitié describes how to do this on his blog, as well using the command line converter FFmpeg to combine the final 360 video and audio files. That article also demonstrates how to use FFmpeg to prepare a 360 video file for direct playback from an Android phone using Google’s Jump Inspector app.

 

After the Concert

So, we have all the footage transferred from the (many) SD cards and backed up; we have about 660 Gb of audio and video material in total, so plenty to work with!

Here is a little timelapse video of François Pitié and Sean Dooney putting the final touches to our 360 camera rig before the concert.

We’ve just started working on stitching together and synchronizing our video footage from both the 360 camera rig, and the three 3D pairs (16 GoPros in total). Although others have reported occasional problems with the GoPro wifi remote and the odd corrupted file, we didn’t encounter any problems in that regard. We did find that the cameras would sometimes go immediately out of sync when started with the remote, however, this was always immediately noticeable (and very obvious) and could be easily fixed by stopping and starting record again. Of course, the GoPro wifi remote only ensures synchronization within a few frames and much more accurate time alignment is required to create the full 360 stitch. This synchronization is often done using the audio track captured by each individual GoPro, and while we also have this option, we did try a different method that looks promising. As well as the standard clapper board, we also used a camera flash to create a visual reference for the alignment measurements, and while much work needs to be done, this definitely looks like a viable approach which may potentially allow for more accurate alignment than audio alone.

We’ll be working on stitching the 360 video over the coming months, so we’ll have more technical details on that in the near future. In terms of the audio, we recorded the concert using three different main microphones, namely a MH Acoustics Eigenmike, a Core Sound TetraMic, and a Zoom H2n, as well as a number of mono spot microphones. One of the aspects of this recording I’ll be looking at in the coming months is the production process for combining these B-format recordings and spot microphones, however, we’ll also be comparing the relative performance of these three microphones, particularly when combined with the matching 360 video.

In the meantime, here are some photos of the concert (with thanks to François Pitié).

DSC_4250

DSC_4258

DSC_4324

DSC_4247

DSC_4331

DSC_4333

DSC_4340

DSC_4353

You also can hear the third movement of From Within, From Without at the Spatial Music Collective‘s programme at the Ideopreneurial Entrephonics II festival, this coming Saturday (April 23rd) at the Freemasons’ Hall, in Dublin. Also featured on the programme will be works by Jonathan Nangle, Brian Bridges, Massimo Davi, and others, and there are lots of other interesting performances and events happening over the course of the festival too.

You can hear a short excerpt of this soundscape composition entitled Of Town and Gown below, although of course this has been reduced to stereo from the original mix for eight loudspeakers.

Here’s the programme note;

Upon entering the campus of Trinity College Dublin it is striking how much the character of the ambient sound changes, as the din of traffic and pedestrians which dominates outside the college walls recedes into the background as you emerge from the narrow passageway of the front gate. This sense of liminality and of passing through a threshold into an entirely different space is the focus of this tape piece entitled Of Town and Gown. This soundscape composition is constructed from field recordings made around the outskirts of the campus, which are then manipulated, processed and combined with a sound from the heart of the university, namely the commons bell located in the campanile. In this way the piece explores the relationship between the university and the rest of the city (between town and gown) through the blending and relating of these different sounds from both inside and outside the college walls.

Finally, I would like to say thanks to a great many people who helped make the concert happen, in particular the wonderful performers of Trinity Orchestra, Cue Saxophone Quartet, Pedro López López, and Miriam Ingram, and also;

The Provost Patrick Prendergast, The ADAPT Centre, Science Foundation Ireland, Christina Reynolds, Brian Cass, Valerie Francis, Francis Boland, Naomi Harte, Dermot Furlong, Jenny Kirkwood and all the staff and students of the Music & Media Technologies programme, François Pitié, John Squires, Conor Nolan, Sean O’Callaghan, Hugh O’Dwyer, Luke Ferguson, Sean Dooney, Stephen Roddy, Aifric Dennison, Bill Coleman, Albert Baker, Jonathan Nangle, Stephen O’Brien, the Spatial Music Collective, Richard Duckworth and all the staff of the Music Department, Sarah Dunne, John Balfe, Sara Doherty, Tom Merriman, Michael Murray, Noel McCann, Tony Dalton, Paul Bolger, Liam Reid, & Ciaran O’Rourke.

 

From Within, From Without

So due to the vagaries of Irish weather, tomorrows performance of From Within, From Without will now take place in the wonderful location of the Exam Hall, in Front Square. This means that there are now some additional tickets available (which can be booked here), and a small number of tickets should also be available on the door.

exam-hall-interior-tcd

The concert kicks off at 7pm sharp, beginning with the musicians of Trinity Orchestra, Cue Saxophone Quartet, and Pedro López López.

Our 360 camera rig is ready to go, and we’re looking forward to seeing you there!

———————

From Within, From Without

Enda Bates

I. From Without, From Within

Trinity Orchestra / Cue Saxophone Quartet / Pedro López López

II. The Silent Sister

Miriam Ingram / Eight channel electronics

III. Of Town and Gown

Eight channel electronics

7pm, Friday, April 8th, 2016. The Exam Hall, Front Square, Trinity College Dublin.