Enda Bates

Sea Swell at the Zêzere Arts Festival

In 2019 I had the great fortune to attend the Zêzere Arts Festival in Portugal, where the festival choir performed my composition Sea Swell twice, in two quite incredible venues. After many Covid related delays, I’m happy to say that the 360 video of the performance is now finally finished and can be seen below on YouTube.

Sea Swell was my first choral composition, which I wrote many years ago in 2007 while working on my PhD in Trinity College Dublin. The piece was commissioned by the Spatial Music Collective and New Dublin Voices choir, and was first performed by New Dublin Voices in Trinity Chapel, in January 2008.

The initial inspiration for this work came from some experiments with granulated white noise. When a very large grain duration was employed the granulation algorithm produced a rolling noise texture which rose and fell in a somewhat irregular fashion and was highly reminiscent of the sound of breaking waves. This noise texture was originally intended to accompany the choir, however in practice this proved to be unnecessary as the choir could easily produce hissing, noise textures which achieved the same effect.

The choir is divided into four SATB groups positioned symmetrically around the audience and facing a central conductor. The piece utilizes many of Henry Brant’s ideas, albeit with a much less atonal harmonic language. In particular, spatial separation is used to clarify and distinguish individual lines within the extremely dense, sixteen part harmony. If this chord was produced from a single spatial location, it would collapse into a dense and largely static timbre. However, this problem is significantly reduced when the individual voices are spatially distributed. Certain individual voices are further highlighted through offset rhythmic pulses and sustained notes.

As exact rhythmic coordination is difficult to achieve when the musicians are spatially separated and hence a slow, regular tempo is used throughout. The beat is further delineated through the use of alliteration in the text as the unavoidable sibilant sounds mark the rhythmic pulse. These sibilant sounds are also used in isolation at the beginning of the piece, to create a wash of noise-like timbres which mimic the sounds of crashing waves suggested by the title. In the opening section, each singer produces a continuous unpitched “sssssssssss” sound which follows a given dynamic pattern. The angled lines in the score indicate the dynamics which are divided into three levels, below the bar line indicating fading to or from silence, above the bar line indicating maximum loudness and on the bar line indicating a medium level. The note durations are provided to indicate durations for the dynamic changes and do not indicate pitch. Instead a continuous “ssssss” sound is produced for each entire phrase.

Brant used the term spill to describe the effect of overlapping material produced by spatially distributed groups of musicians. However, Brant was also aware of the perceptual limitations of this effect and he noted that precise spatial trajectories became harder to determine as the complexity and density of the spatial scene increases. The opening of this piece will contain a significant degree of spill due to the common tonality of each part and will thus be perceived as a very large and diffuse texture. Large numbers of these sibilant phrases are overlapped in an irregular fashion to deliberately disguise the individual voices and create a very large, spatially complex timbre of rolling noise-like waves. The score of this opening section can be seen below.

The lyrics consist of some fragments of poetry by Pablo Neruda as well as some original lines of my own.

The piece was performed twice by the Zêzere Arts Festival choir, conducted by the wonderful Aoife Hiney, in two quite spectacular venues, namely the Convento de Cristo, Tomar (pictured above), and the Batalha Monastery. Both locations were very well suited to spatial performances, and so the concerts also featured some other well known works of spatial music, most notably Tallis’ Spem in Alium (which was the subject of some previous posts). For the final 360 video I therefore decided to combine both performances in a single video, with a transition from the Convento de Cristo to the Batalha Monastery occurring approximately halfway through the piece.

The 360 video was filmed using the same Go Pro Omni camera used in previous videos in this series (and discussed in more detail in previous posts). For the audio, a Sennheiser Ambeo 1st order Ambisonics microphone was used to record both concerts and positioned just below the 360 camera. Unfortunately the recording from the first concert proved unusable due to excessive wind noise (note to self, don’t forget to bring your wind shield to the actual concert!!). Fortunately however, Gerard Lardner, a singer in the choir and a fellow spatial audio enthusiast had a Zoom H3-VR microphone and recorder (with windshield) and so this recording was used instead for the first portion of the video.

The opening section of largely high frequency sibilant sounds proved quite revealing in terms of some of the limitations of Ambisonics, and particularly standard 1st Order Ambisonic (FOA) microphones. Although FOA mics are a very efficient and compact approach to spatial audio recording, there are some trade-offs, most significantly in this case in terms of the directional accuracy of high frequencies. In the venues itself, the direction of the sibilance produced by each choir was quite clear, and in the opening section this produced a clear sense of a wash of sound spreading around the space. In the FOA recording however (with either microphone) the directional accuracy was far less pronounced in this section, although later sections with more traditional vocal sounds was far better.

Although it was not practically possible at the time, this issue could have been addressed to some extent using alternative microphone arrangements. A number of Higher Order Ambisonic (HOA) microphones have become available in recent years, such as the Coresound Octomic (2nd order), Zylia ZM-1 (3rd order), or the MH Acoustics Eigenmike (4th order). Although these require more channels of audio to be recorded, and HOA in general is not supported by all 360 video platforms, increasing the order has been shown to produce noticeable improvements in directional accuracy. Alternatively, entirely different, non-Ambisonic microphone arrays can also be used, and then later encoded into Ambisonics before being attached to the 360 video. An example of this approach using 8 cardioid microphones in an octagonal arrangement known as the Equal Segment Microphone Array (ESMA), was discussed in a previous post. Although this approach does offer some advantages and is preferred by some producers, it is not always practical particularly when the space available to setup the microphones is limited.

For this particular recording therefore, FOA microphone recordings were required and provided reasonable directionality for the majority of the piece, apart from this opening section. However, some small improvements were possible using some post-production techniques and converting the FOA recordings to 3rd order in a couple of different ways. This was achieved using the wonderful Harpex plugin which uses some sophisticated signal processing to process spatial audio signals of different types. Here, Harpex was first used to upmix the entire FOA mix to 3rd Order Ambisonics (TOA) which resulted in a small but noticeable improvement in directional accuracy throughout the recording. In addition for the opening section discussed above, Harpex was also used to synthesize four highly directional shotgun microphones pointed in the direction of the four choirs. These signals were then re-encoded into TOA and blended back into the overall mix, in an attempt to improve the directional accuracy of the sibilant material in this opening section. Overall this resulted in small improvements overall, however, as YouTube still only supports FOA, this new version with the TOA audio mix can only be played locally using VLC player or the excellent VIVE cinema app. In theory Facebook also supports 360 video with TOA audio, however, in practice this is not always so simple. For example it seems as if 360 videos are supported on personal Facebook accounts, but not always on pages. However, if recent reports about Google’s efforts to develop an open, royalty-free media format are true, then perhaps support for HOA may finally become available on YouTube.

I would like to express my heartfelt thanks to all the singers of the Zêzere Arts Festival choir, and in particular to Aoife Hiney, Brian MacKay for the invitation to the festival. If you are singer or instrumentalist, whether young or old, amateur or professional, then I simply cannot recommend the festival enough. Its combination of masterclasses, courses, workshops, collaborations with composers and artists, and of course performances is quite unique and offers an incredible career development experience in a truly wonderful location. More information on next years festival and how to apply can be found on their website, here.

A New Immersive Audio Loudspeaker System

The M.Phil. in Music & Media Technologies (MMT) programme in Trinity College Dublin has recently constructed a new immersive audio loudspeaker system. The 7.1.4 array is based on 11 Equator D5 loudspeakers and is compatible with formats such as DTS-X, Dolby Atmos, and Higher Order Ambisonics.

Here’s a 360 photo of the system, which was constructed by X3 Acoustics.

To date, the system has been used for a number of research projects including a study on Ambisonic Decoder Listening Test Methodologies which I presented at the 152nd Audio Engineering Convention earlier this year.

The array is also used by students taking my Spatial Audio module in MMT, which you can read more about here;

This Spatial Audio module is now also offered as a stand-alone micro-credential, which are short, accredited learning experiences that facilitate flexible and innovative professional development and lifelong learning. More details on that can be found here

Spem in Alium in 360 Video

The most recent 360 video in the Trinity360 series has now been released and can be seen below. This is the first of two performances of spatial choral music recorded as part of the Trinity360 series, and features a performance by the award winning choir New Dublin Voices of Thomas Tallis’ landmark composition, Spem in Alium, recorded in the Beckett Theatre, Trinity College Dublin on May 19th, 2019. The performance is also the second in a series of videos presented in partnership with SoundField^TM Microphones whom I would like to thank for their sponsorship of the production.

There is also a short behind-the-scenes video about the performance, produced by Mark Linnane, which can be seen on the Soundfield website, here.

A previous post on this blog discussed the historical background to this fantastic example of 16th century spatial music, so this article will concentrate on the technical production of the 360 audio and video, and also a brief discussion of an alternative spatial layout of the eight choirs, compared to the traditional sequential layout presented above.

Venues such as churches and cathedrals which are typically used for performances of spatial choral music present a number of technical and logistical challenges for 360 video and audio. Due to the large size of such venues, it can be difficult to position all of the singers close enough to the camera and microphones, and in addition, they do not typically allow for extensive control over lighting. The highly reverberant nature of such venues also presents some challenges, particularly when singers are widely distributed around the space. For these reasons, it was decided to instead record the performance in the more controlled environment of the Beckett Theatre in Trinity College Dublin, which allowed for the relatively tight spacing of the singers around the camera, and spot lighting of each choir using the in-house rig.

The heavy drapes hung in the theatre all around the choir significantly deadened the acoustic of the space, which helped ensure that the maximum directionality was captured in the audio recording. While this worked very well in that regard, it did result in a dryer recording that would be expected for a choral piece, and so additional reverberation was therefore added in post to create a more natural sounding acoustic environment. This was implemented using the excellent MS5 3D surround reverb plugin developed by Blue Ripple Sound and released as part of their O3A Reverb library. Although this plugin can be used to generate highly detailed early reflections, as well as late arriving, diffuse reverberation, in this instance the addition of early reflections was not particularly beneficial, and so only diffuse reverberation was added to the overall mix.

Spem- Mic Setup-fix

For the audio recording, a Soundfield ST450 MkII Ambisonic microphone was positioned centrally and just below the GoPro Omni 360 camera. In addition, eight Rode NT5 cardioid microphones were also used, and arranged in an octagon with a 45º subtended angle for each microphone pair. This microphone arrangement is known as the Equal Segment Microphone Array (ESMA), and was developed by Hyunkook Lee from previous work by Michael Williams. There are a number of different arrangements of the ESMA, however, in this instance, and in order to maximize directional accuracy in the horizontal plane, the ESMA configuration containing 8 cardioid microphones in a regular octagon was used. Two different microphone spacings have been proposed for this particular octagonal arrangement, 82cm according to Williams, and 55cm according to Lee. In this instance, the wider spacing of 82cm was used in order to maximize the difference between this microphone arrangement, and the more traditional 1st order Ambisonic (FOA) microphone, particularly in terms of the degree of spaciousness captured by near-coincident arrays such as ESMA when compared to entirely coincident FOA microphones. While still less commonly used for 360 recordings when compared to FOA (mostly due to the larger physical footprint and greater numbers of mic stands and cabling required), ESMA and other near-coincident or spaced microphone techniques such as ORTF-3D are certainly a viable alternative to FOA.

It is worth remembering that although Ambisonics remains the primary format for the final delivery and playback of spatial audio for 360 video (due to the ease with which this format can be rotated for head-tracking), this does not necessarily imply or require the use of Ambisonic microphones. Other arrangements can also be used by encoding each microphone signal into Ambisonics at the appropriate azimuth and elevation angle, and combining these to generate a FOA or TOA mix to attach to the 360 video.

In general, a traditional FOA microphone is often practically advantageous in certain contexts, due to their relative simplicity, small physical footprint, and the small number of audio channels involved. However, these advantages must be balanced with some potential drawbacks such as a lack of spaciousness and envelopment (which is typical of all coincident microphone techniques).

In contrast, near-coincident or spaced microphone arrays typically capture a more spacious and enveloping soundfield, due to the capture of timing differences between the individual microphones. While this is generally highly beneficial when capturing ambiences, and stereo versions of such techniques are often preferred for classical music recordings, the large footprint of such arrays can be highly challenging to setup when on location, particularly for field recordings. Also, for 360 video this can result in a larger number of microphones (and stands and cabling) being visible in the footage, requiring additional work in post-production to remove them from the image. For this particular production, the theatre location and the high contrast spot lighting meant that neither of these issues was particularly problematic, and so we were able to record the choir using both types of techniques.

For this production, it was originally intended to solely use the additional ESMA recording for some experiments with a 6 Degrees of Freedom (DoF) version of this piece. The greater spacing of the microphones in this array is potentially useful in this regard as each microphone pair could potentially be used to provide a closer perspective on the specific choir (8 in total) positioned directly in front of that segment of the array (stay tuned for more on these 6 DoF experiments in a later post). However, when creating the audio mix for the video, it was found that a roughly equal combination of both the FOA and ESMA recordings produced the best result in terms of tonal balance, directional accuracy, and spaciousness. Which just goes to show that as always with audio, the “best” approach is always context dependent.

To create the final FOA mix for YouTube, each of the individual cardioid mic recordings from the ESMA was encoded into Ambisonics at the appropriate angles using the 3rd order ambiX encoder plugin developed by Matthias Kronlachner (the first four channels of which were used for the FOA mix). This was then combined with the FOA signal captured by the Soundfield ST450, and in addition, some corrective EQ was applied to both tracks using the freeware multiEQ plugin from the IEM Plugin suite. Finally, the entire FOA mix was routed through the Blue Ripple Sound MS5 surround reverb plugin, along with just a hint of compression using the Omnicompressor plugin, again from the IEM suite.

Although many 360 video platforms (including YouTube) and 360 video players currently only support FOA, a TOA mix of the piece was also created. For the ESMA signals, the TOA signals were already available from the ambiX encoders, and just required the routing of 16 audio channels (rather than 4) to the main mix output. In addition, the Harpex plugin was used to upmix the original FOA ST450 recording to TOA. While the effectiveness of Harpex for this type of upmixing is highly dependent on the precise nature of the audio content, this produced excellent results here. While online 360 video platforms that support TOA are still fairly limited (currently just Facebook), some 360 video players (such as the ViveCinema app) do support local playback of 360 video files with TOA audio. As has been reported before in many previous studies, there was a noticeable improvement in directional accuracy and spatial clarity of the different singers when moving from FOA to TOA.

360 video was once again captured using the GoPro Omni camera system and stitched using the Autopano Video and Autopano Giga applications. As has been discussed previously on this blog, action cameras such as GoPro’s can struggle in low light conditions, or when the scene contains highly contrasting dark and light areas (something which was particularly evident here). Reducing the ISO limit and exposure compensation settings in the Omni rig certainly helped to limit the amount of image noise and graininess in the footage. However, further noise reduction was needed to reduce noise in certain parts of the image. While Adobe Premiere does include some noise reduction plugins, their performance leaves a lot to be desired and so for this production, we decided to purchase a different Noise Reduction plugin from Neat Video (as recommended by the members of the GoPro Omni Facebook Group). Although somewhat complex to setup, the Neat Video plugin produced significantly better results than the native plugins in Premiere, and is highly recommended, particularly given its relative affordability. Apart from some noise reduction, other standard processing such as sharpening, colour correction, and titles were all applied using the standard tools in the latest version of Premiere, which now contains extensive support for 360 video and Ambisonics (although FOA only).

In the video presented above, the eight choirs are arranged in a traditional, sequential layout which means that the consecutive entry of each choir at the beginning proceeds in a clockwise direction around the room. As discussed in a previous post, there is some evidence to suggest that the first performance of the piece occurred in Nonsuch Palace, which contained an octagonal banqueting hall with four first-floor balconies. However, this particular layout, which was replicated here in the Beckett Theatre using four elevated risers, suggests some other possible arrangements of the eight choirs. So, as well as the traditional sequential layout ,we also recorded the piece using an arrangement with choirs 1-4 at floor level at North, East, South and West directions, and choirs 5-8 elevated and at NorthWest, NorthEast, SouthEast, and SouthWest. In particular, we wanted to see if this could support the creation of a sort of spiral effect, as the consecutive entry of the choirs complete a full circle at floor level, before then completing a circle at the elevated position. This alternative layout of the choirs can be seen below, and you compare the two different configurations for yourself.

Stay tuned for more updates on this piece and some experiments with 6 DoF.

Many thanks to New Dublin Voices and their director Bernie Sherlock, Michael Canney and all of the crew at the Beckett Theatre, and Sebastian Csadi for their assistance during the shoot.

I would like to acknowledge the support of SoundField^TM Microphones in providing equipment for this project.

Spatial Choral Music and 360 Video

IMG_20190519_173032

Over the coming months, I’ll be working on a number of 360 videos of performances of spatial choral music. The first video was recorded last Sunday, May 19th in the Beckett Theatre, Trinity College Dublin and is the second in a series of videos presented in partnership with Rode microphones who provided a Soundfield ST450 MkII Ambisonic microphone to record the piece.

For this video, I’m delighted to be working once again with the fantastic, award winning choir New Dublin Voices, who have performed a number of my own choral compositions in the past.

Spatial music is often closely associated with the development of electroacoustic music in the twentieth century, yet the use of space as a musical parameter is in fact much older. The call and response pattern could perhaps be considered as one of the most fundamental forms of spatial music and has been found in many different cultures and musical traditions throughout history. In 16th century Europe, this antiphonal style was developed further by composers of the Venetian school like Adrian Willaert and Andrea Gabrieli, before spreading across Europe leading to works such as Orazio Benevoli’s Festal Mass for 53 individual parts, and Thomas Tallis’ Spem in Alium.

The latter work in particular is still regularly performed and I’ve wanted to produce a 360 video of this piece for quite some time. So I was delighted to finally have the opportunity to do so last weekend with New Dublin Voices.

Spem in Alium (which translates from Latin as “Hope in any other”) was believed to have been composed by Tallis in c. 1570, however, the exact origins of the piece are shrouded in mystery. The piece contains 40 individual parts, arranged in eight 5-voice choirs (soprano, alto, tenor, baritone and bass), and the number 40 is symbolically important, with the first, combined entry of all 40 voices occurring at bar 40. One theory behind Tallis’ motivation for composing the piece is that it was intended to celebrate the 40th birthday of Queen Elisabeth I. However, given that Tallis was Catholic, it has also been suggested that the piece was intended as a protest for the forty generations of English Catholics slandered by the Protestant Reformation [1]. Another explanation suggested by Thomas Legge in his preface to the score is that the composition was inspired by a performance of Alessandro Striggio’s mass for 40 voices in London in 1557 [2]. There is some evidence that following this performance, Tallis was encouraged to compose a piece for similar forces, possibly by Henry FitzAllen, the 19th Earl of Arundel, and his music-loving son-in-law Thomas Allen, 4th Duke of Norfolk. Interestingly, a copy of the score was held in the music library of FitzAllen’s country residence, Nonsuch Palace, which also contained an octagonal banqueting hall with four first-floor balconies [2]. If this was the site of the first performance of the piece, it is possible that the work was intended to be not only sung in the round, but perhaps also with four of the eight choirs in elevated positions on the surrounding balconies.

Spem in Alium is particularly impressive in its sophisticated use of spatial effects, including an opening rotation of material around consecutive choirs, antiphonal exchanges between various choir groupings, and the dramatic collective entry of all eight choirs at specific moments (such as at bar 40 for example). This can be seen in the large-scale form of the piece, which was graphically depicted by W. G. Whittaker in an essay on Tallis and is shown below [3].

Screenshot 2019-01-16 at 13.10.32 This work has received renewed attention in recent years through Janet Cardiff’s 40 Part Motet installation, in which recordings of the individual voices of Tallis’ composition are reproduced over 40 loudspeakers.

jc-low

For our 360 recording, we decided to explore two different spatial layouts using elevated risers for four of the eight choirs. The first, more traditional layout places the choirs sequentially in a circle, while the second uses a spiral layout, with choirs 1-4 at North, East, South and West, and choirs 5-8 at NorthWest, NorthEast, SouthEast, and SouthWest.

We explored a number of possible venues for the recording including various choirs and cathedrals with surrounding balconies. However, these locations did not allow for much control over the acoustics and lighting, and in particular would have resulted in the positioning of the singers quite far away from the 360 camera and microphones. So instead we decided to do the recording in the Beckett Theatre in Trinity College Dublin, effectively creating a blackbox space with spot lighting of each of the choirs. The drapes surrounding the choir also significantly deadened the acoustic of the space, which helped ensure that the maximum directionality was captured in the audio recording (although we are planning on adding reverberation in post-production to create a more natural sounding acoustic environment for the final video).

Finally, we are also planning on conducting some experiments with six Degrees of Freedom (DoF) VR. 360 video typically only allows the viewer to rotate their head (3 DoF) where as true VR also allows the viewer to move around the room (6 DoF).

6dof-vs-3dof

Recently there have been some interesting developments in processing conventional 360 video to allow for some degree of listener movement, not just head rotations.

However, the precise way in which we record matching spatial audio to support 6 DoF is also an interesting research question. While I will be conducting a number of specific experiments to explore different microphone techniques for this purpose in the coming months, for this shoot I decided to supplement the main Soundfield Microphone with an octagonal arrangement of 8 cardioid microphones. Over the coming months I’ll write more on whether this arrangement can be utilized in a 6 DoF VR application to create a sense of audibly moving closer to individual choirs. The first priority however is the main 360 video, so stay tuned for further updates! IMG_20190518_135035

Many thanks to New Dublin Voices and their director Bernie Sherlock, Michael Canney and all of the crew at the Beckett Theatre, and Monica Ryan and Sebastian Csadi for their assistance over the weekend.

I would like to acknowledge the support of SoundField^TM Microphones in providing equipment for this project.

[1] K. Davis, Motive and Spatialization in Thomas Tallis’ Spem in Alium, online http://kevindavismusic.com/wp-content/uploads/2014/11/Motive-and-Spatialization-in-Thomas-Tallis-Spem-in-Alium.pdf

[2] T. Tallis, Spem in Alium Nunquam Habui: a Motet for 40 Voices, edited by Philip Legge, Choral Public Domain Library. http://www.cpdl.org (2004)

[3] W. G. Whittaker, An Adventure, In Collected Essays, 86–89. Freeport, N.Y .: Books for Libraries Press (1970).

Broken, Unbroken

I’m delighted to say that the most recent 360 video in the Trinity360 series has now been released and can be seen below. The piece, titled Broken, Unbroken, was performed by Nick Roth, and the Cue Sax Quartet on Tuesday June 26th 2018 in various locations around Trinity College Dublin. Thanks again to SoundField^TM Microphones for their sponsorship of the production.

In addition, here’s a short behind-the-scenes video about the production, which can be seen on the Soundfield website, here.

The logistics of acoustic spatial music performances are often quite challenging to deal with. While specific architectural features such as balconies can be an interesting creative opportunity, this site-specificity can also be problematic for repeat performances in other venues which may not have those specific features. More generally, devising a suitable layout for the spatially distributed performers and an audience, many of whom may be seated in a far from ideal position, is often far from straight forward. 360 video and matching spatial audio represents a significant opportunity in this regard, as in many ways it is the ideal medium to capture such performances. Eliminating a live audience allows for a far greater range of potential venues, and as a result, the musicians can be placed as close to the camera as desired. In some respects, a 360 video recording of a work of spatial music can be considered as a performance for an audience of one, thereby potentially overcoming many of the logistical challenges associated with this type of music.

360 video therefore allows for much more freedom in terms of the types of venues you can use, and also the way in which you position the musicians. As always however, we must carefully consider the relationship between the acoustic environment, and the types of spatial effects attempted in the composition, and investigating this issue was one of the principle goals of Broken, Unbroken. It contains four sections, each of which explore different spatial effects, and the piece was performed in its entirety in four, quite different locations (which were discussed in more detail in a previous post). This allowed for the comparison of the resulting recordings, and the choice of a location for each section that best suited the particular spatial effects implemented in each section.

The recording was captured using a single Soundfield ST450 MkII Ambisonic microphone, mounted just below the GoPro Omni 360 camera rig, as shown below. For the two exterior locations, a Rycote Windjammer and windshield was also used although for once, the Irish weather was nice and sunny with little to no wind.

SaxQuintet-MicSetup

For the opening section of Broken, Unbroken I wanted to experiment with large distances and mobile performers, which was really only possible in the wide expanse of Front Square. The main idea here was for the music to gradually emerge from the background ambiance of the location, as the performers move through the square and approach the camera. The music consisted of a short phrase which the musicians could improvise around and play from memory. Much like in the middle section of From Without, From Within the musicians are not synchronized in any way, until they finally all converge around the recording position. One of the surprising aspects of this section was how well the direction of each musician was maintained in the first order Ambisonic (FOA) recording, even at large distances (c. 120m). In contrast, FOA recordings made in indoor locations with significant reverberation can suffer from a distinct lack of directionality when the instruments are placed far away from the microphone.

The title of the piece, Broken, Unbroken, is a reference to the cori spezzati (meaning “split” or “broken” choirs) technique used by composers such as Adrian Willaert and Andrea Gabrieli in 16^th century Venice. Here, the title refers to both this breaking of the individual musicians into distinct locations, but also the deliberate bringing together (or unbreaking) of the quintet in the second and third sections of the piece. Henry Brant used the term ‘spill’ to describe this effect of spatially separated instruments joining together to create an immersive, unified whole. The easiest way to achieve this is by simply having the instruments play in exact unison, or at least closely related harmonic intervals. One of the things I wanted to investigate in this piece was the extent to which this effect is influenced by the nature of the acoustic environment, and in particular the amount of reverberation. Overall, it appears that a strong sense of spill can be achieved in any location, and with any distribution of instruments, once they are all playing in strict unison. However, this effect is much less pronounced in exterior locations once they deviate from unison to any degree. However in more reverberant locations, such as the Anatomy Theatre in section 3, the effect still holds to an extent once the individual lines retain a close harmonic relationship.

A number of composers have attempted to create spatial trajectories by passing material between instruments, perhaps most famously the sustained chord passed between the brass instruments of three spatially distributed orchestras in Karlheinz Stockhausen’s Gruppen fur Drei Orchester (1955-57). This technique, which is in many ways an acoustic emulation of amplitude panning, is rather fragile however, as it relies on very precisely matched pitch and timbre, and a high degree of synchronization between the overlapping crescendo and decrescendo dynamic envelopes in each instrumental group. A slightly different implementation of this technique was therefore implemented in the fourth and final section of Broken, Unbroken using overlapping staccato note sequences which are embedded in a melodic canonic structure. In this way, circular trajectories are created through both the passing of these staccato sequences between consecutive instruments, and also the accenting of the specific pitches in the melodic canon. SCoreExtract-SecD-highlight

This was arguably the most effective section of the piece overall, particularly in the relatively dry acoustic of the Freeman library in which the musicians had clear sight-lines between each other and could maintain synchronization relatively easily. Once again however, in more reverberant locations such as the Anatomy Theatre, the overall reduction in directionality captured in the recording negatively impacted on the clarity of the spatial trajectories to a significant degree. This effect also worked quite well in the vertical arrangement of musicians in the Beckett Theatre balconies, however maintaining synchronization was naturally more difficult, even with a conductor.

Overall, the comparison of the recordings from each location indicates that the effectiveness of these different spatial effects is strongly influenced by the amount of reverb, and the distance of the performers from the microphone. While there has always been a relationship between the musical content, and the acoustic environment used for a performance, this is a particularly important consideration for this new medium of 360 video. Composers and sound engineers should therefore consider their choice of venue carefully when filming spatial music performances, and ensure that this choice supports the spatial effects implemented in the music.

More details on this piece will be presented in a paper entitled ‘Recording & Composing Site-Specific Spatial Music for 360 Video’, which will be presented at the 146th Convention of the Audio Engineering Society, in Dublin, in March 2019.

Nick Roth is a prominent contemporary musician in Ireland and co-founder of the Diatribe record label. Upcoming projects include the Space I Installation for the European Space Agency (ESA), and numerous performances as detailed here.

Cue Saxophone Quartet is a talented and ambitious chamber group, composing, arranging, and performing music in contemporary, classical, jazz, and popular genres. More information about upcoming performances can be found here.

I would like to acknowledge the support of the Trinity College Visual and Performing Arts Fund for this performance, and also the support of SoundField^TM Microphones in providing equipment for this project.

In addition I would like to thank Gillian Marron and Padraig Carmody from the Dept. of Geography, Siobhan Ward and the Steering Group for Old Anatomy from the School of Medicine, Michael Canney from the School of Creative Arts, and Austin Sheedy and Rua Barron from DU Players, for facilitating access to these different venues around the university.

Wilde Flock

“We are all in the gutter, but some of us are looking at the stars”.

The great Irish playwright, poet and writer Oscar Wilde passed away 118 years ago today, so to mark the anniversary here’s a little 360 video called Wilde Flock.

The footage was captured earlier this year at the Oscar Wilde Memorial Sculpture , created by the sculptor Danny Osborne, and erected in Merrion Square, Dublin in 1997. Wilde was actually born nearby in No. 1 Merrion Square, which can be seen in the video on the opposite side of the road from the statue. During my time as a post-graduate student in Trinity College, I walked home by this statue every day, and it is probably one of my favourite monuments in the city. I always thought the pose of the statue captured Wilde’s wit and flamboyance very well, especially when combined with the typically witty quotations displayed on the two pillars at either side.

For the piece I wanted to use these quotations, but rather than a simple voice-over narration I decided to process the recordings into a flock of sound that flies towards and eventually surrounds the listener. This was created using granulation, a technique that has been used by many electroacoustic composers (Curtis Roads and Barry Truax for example) and something I have used extensively in my own music. Fundamentally, granulation involves the division of an audio file into many short segments or grains, which can then recombined in different ways. The technique can also be used to time-stretch or pitch-shift an audio file, and a similar technique (using the PaulStretch application) was used to create those recordings of pop songs slowed down a 1000 times you can find on YouTube (I’m not much of a fan of Justin Bieber, but when you slow it down by 800%,,,, that I like!).

Granulation is an extremely useful technique for spatial music composition, as the individual grains of sound can be collapsed into a single point, or alternatively spread out in space in different ways. One technique I’ve used in the past is to combine granulation with a flocking algorithm, which replicates the complex emergent behavior found in nature using a collection of individual agents following a simple set of rules, such as;

steer away from nearby flockmates
steer toward the average heading of the flock
steer toward the center of the flock

It’s remarkable how effectively computer simulations of this algorithm resemble the complexity, and the beauty of nature, such as the murmuration of starlings shown in the clip below.

Here’s an example of the Boids flocking algorithm, developed by Janne Karhu.

For this piece, the granulated audio was spatialized using a slightly different spatialization algorithm, implemented using the wonderful Sound Particles application, developed by Nuno Fonseca.

The original audio consisted of myself reading various quotations of Wilde, along with some bird sounds, and a synthesized combination of the two. For the spatialization in Sound Particles, the flock starts off in the distance and gradually approaches the listener before transitioning to a non-spatialized, unprocessed monophonic recording of one or two quotes. This was implemented using a new feature on YouTube, which now supports both a spatial First Order Ambisonics track, and also a separate, standard stereo track which is headlocked, meaning it is not spatialized and doesn’t respond to head rotations.

While I’ve alway been somewhat resistant to the use of headlocked audio in this medium, it is undoubtedly useful at times. In particular, the ability to create a mono audio track, which on headphones will be heard by the listener inside their head, can make for a nice contract with the externalized, spatial audio we typically use for 360 videos and VR.

Technically, the process of adding this audio to a 360 video is fairly straightforward and requires a six channel audio file with the Ambisonics audio in the first four channels as before, and the headlocked stereo track in channels five and six (more detailed instructions can be found here). Just remember that you need to add the meta data tags using the latest version of the Google Injector tool, which can be found here.

Update

I’ve had a few requests for the specific ffmpeg commands used to add the spatial audio and head-locked stereo audio to the video file for uploading to Youtube, so here it is.

I used Adobe Premiere to edit the video file, the four-channel Ambisonics audio, and the stereo, headlocked audio. Then I exported the video by itself, without audio, before then exporting the cut four-channel Ambisonics Audio by itself, and then finally the stereo, head-locked audio by itself (both as full resolution .wav audio).

I then rendered the two audio files together into a single six-channel audio file using Reaper. The channel order should be FOA-AmbiX in channels 1-4, and the stereo, headlocked audio in channels 5-6.

Then I used ffmpeg to convert my exported mp4 video file to a MOV so we can attach the full quality .wav audio files, as follows;

ffmpeg -i my360video-noaudio.mp4 -vcodec copy -f mov my360video-noaudio.mov

Then I attached the six-channel audio file to the .MOV video, as follows;

ffmpeg -i my360video-noaudio.mov -i my360audio-6ch.wav -channel_layout 6.0 -c:v copy -c:a copy my360video.mov

Alternatively (with thanks to Angelo Farina for pointing this out), you can implement both steps using a single command, as follows;

ffmpeg -i my360video-noaudio.mp4 -i my360audio-6ch.wav -channel_layout 6.0 -c:v copy -c:a copy my360video.mov

Finally, making sure you have the latest version of Google’s spatial metadata injector, select your video file, tick the boxes and click Inject Metadata. As you can see in the screenshot below, the latest version of the tool will recognise that your file contains both spatial audio and head-locked stereo in 6 channels.

Screen Shot 2018-12-12 at 14.40.13

Once that’s done, the tool will create a new version of the video file with the metadata added which you can then upload to YouTube. Remember to wait a little while for the spatial audio to be processed as this may take a number of hours to fully complete.

Spatial Saxophone Quintet

Freeman9

The next piece to be composed for the Trinity360 project will be a spatial saxophone quintet, performed by esteemed saxophonist and composer Nick Roth, and the Cue Sax Quartet on Tuesday June 26th (or June 29th in case of bad weather). Nick Roth is a prominent contemporary musician in Ireland and co-founder of the Diatribe record label, and was part of the quartet that performed A Round, Around. The Cue Quartet also performed From Without, From Within along side Trinity Orchestra, which was the first 360 video presented as part of this project.

I’m also very happy to announce that this will be the first in a series of videos presented in partnership with Rode microphones who are providing a Soundfield ST450 MkII Ambisonic microphone to record the piece.

The quintet will be performing the piece in four different locations around Trinity College, each of which support a slightly different spatial arrangement of the five musicians. The first location, pictured above, is the Freeman Library in the Museum building, which has been the library of the Geography department in the university since the 1930s. This beautiful room contains a balcony which extends across three sides, and so is ideal for a spatial layout with musicians at different elevations.

The other indoor location that will be used is the old anatomy theatre (pictured above) in the School of Medicine. The very steep tiered seating in this lovely room will support a semi-circular arrangement of the quintet at different elevations from the recording position at the lecture podium.

beckettbalconies9

In addition to these two indoor locations, the piece will hopefully also be performed in two outdoor locations, namely Front Square (which can be seen in the opening shot of From Without, From Within) and the exterior balconies of the Samuel Beckett Theatre (pictured above). These two outdoor performances will be open to public and will take place on Tuesday June 26th at the Beckett Theatre at 12.20pm, and in Front Square at 2.20pm. Currently the long range weather forecast is pretty good for that date, however, if the Irish weather acts up we will postpone until the 29th (stay tuned to this blog for confirmation of that closer to the date).

The exterior balconies of the Beckett Theater are particularly interesting as they will allow for multiple different elevations and a vertical distribution of the material. For the Front Square performance, the musicians will be simply placed around in the camera, apart from the opening section which will explore greater distances in more detail. Specifically, the musicians will gradually approach the camera from each end of the campus while continuously playing, forming a spatial call-and-response that initially blends in the background ambiance of the city before gradually becoming more prominent as the players approach the camera.

While the entire work will be performed in each location (albeit with different spatial arrangements of players), the accompanying 360 video will most likely be an edit of these performances, with different sections taken from the specific location that best suits the spatial distribution of material. In this way, the video documentation of the performances will not consist of a continuous performance from one location, but will instead feature multiple locations and spatial distributions while still involving a single piece of music.

The precise way in which these multiple locations and performances are edited into one continuous 360 video will be interesting to explore, so stay tuned for more updates on that process over the coming month.

A Round, Around, & 360 Video Titles in Adobe Premiere

The second work composed specifically for the Trinity 360 project has now been released on YouTube and spatial audio should now be supported on most browsers and mobile devices (including iOS). The piece consists of an acoustic quartet (guitar, cello, flute and saxophone) arranged symmetrically around the central recording position in the debating chamber of Trinity College Dublin. Many, many thanks to Kate Ellis, Nick Roth, and Lina Andonovska for their fine performances.

GMB

This performance actually took place in June of last year but as the recordings were used as part of our Ambisonic Microphone study (discussed in the last blog post here), we decided to complete that research before releasing the video of the performance.

The compositional aesthetic here follows a more traditional contrapuntal approach in the form of a modified round, a form of strict canon in which each part performs the same melody but starting at different times. The title of A Round, Around reflects this approach and the spatial arrangement of players, and the use of rotations and other spatial effects created by passing musical material between consecutive instruments, as can be seen in the score excerpt below.

score-extract

The recording presented here is based around a CoreSound TetraMic, and multiple monophonic spot (AKG 414s) and room microphones (AKG C314s, and Sennheiser TLMs) arranged in a circle of 1m radius and at 90 degree intervals, as shown below.

Array-Mics

As always with this type of production, it is critically important that additional processing is applied to the instrument spot mics so that the perceived distance of the audio broadly matches the visible distance of the performers in the video. In general, this has been one of the biggest differences I’ve encountered when preparing a mix for a 360 video when compared to a traditional audio-only mix. This is particularly important if you are not mixing directly to picture or solely viewing the video on a desktop display, as in those scenarios, it is all too easy to underestimate the auditory distance required when viewing the content on a VR headset.

The precise way in which we synthetically alter the perceived distance of these close, spot-mic recordings remains an under-explored topic, particularly in the context of 360 video and VR, and this is an area which we are currently investigating. For this video, the instrument spot mic recordings were processed using a method [1] suggested by Peter Craven, and one of the primary inventors of Ambisonics, Michael Gerzon. This method uses a fixed pattern of early reflections as the primary means of altering the perceived distance, which makes it highly efficient and also allows for a certain amount of customization of the distance processing for different types of sounds. Here, the spot mics were processed using an implementation of Gerzon’s Distance Panpot created by one of our students in the MMT programme in Trinity College, Eoghan Tyrrel, and you can read more about his work here. While the results are pretty good for certain instruments (the guitar for example), other instruments are still perhaps perceived as sounding a little too close. This suggests that some form of source-specific distance processing might be worth investigating, particularly in terms of customizing the degree of amplitude attenuation and the number and pattern of early reflections for different types of instruments and sounds. This is an area of research we will be looking at in some detail over the coming years, particularly in the context of Augmented Reality (AR) applications.

The video was captured with our original experimental camera system based around 12 GoPros, and stitched using the Auto Pano Video application which is also used with the GoPro Omni camera system. As is pretty clear from the video, the GoPro’s performed reasonably well for exterior shots, but far less so inside the venue and the overall picture quality for the performance itself is relatively poor. This is perhaps to be expected for an action camera with low dynamic range like the GoPro but once again it really emphasizes the importance of lighting for 360 video shoots with these types of camera rigs. The production company Visualise have recently written an excellent blog post on different lighting strategies for 360 video shoots which is well worth a read.

Adobe Premiere Pro CC was used for the video edit itself and Premiere now includes lots of support for 360 video monitoring, editing, and processing. We can now preview both Ambisonics audio and 360 video from within Premiere, and on export we can also add the metadata tags needed to tell YouTube that this is a 360 video with spatial audio (for more information on 360 video support in Premiere, see this article).

In addition, Adobe have recently acquired the Mettle SkyBox 360/VR tools which add a number of very useful features to Premiere when working with 360 video. While these plugins will eventually be incorporated directly into Premiere, for now anyone with a subscription to Adobe Creative Cloud can get the plugins for free by emailing Adobe (full details here).

The SkyBox suite includes a number of plugins specifically designed to work with 360 video, whether monoscopic or stereoscopic. This includes some practical post effects such as Blur, Denoise, Sharpen, and Glow, as well as more creative effects such as Colour Gradients and Fractal Noise which can be applied directly to 360 footage without any distortion along the seams. There are also plugins to enable the rotation of the 360 footage, and a number of transitions designed specifically for 360 content.

However, on a practical level, perhaps one of the most useful plugins in this suite is the SkyBox plugin which can be used to warp standard text titles created in Premiere so that they appear correctly when played back in a 360 video platform such as YouTube. Mettle have a number of tutorial videos which go through all of these different plugins, however, here’s one which outlines the relatively simple workflow for creating 360 video text titles in Premiere.

So, if you are working with 360 video in Adobe Premiere, I strongly recommend you pick up a copy of these plugins.

[1] M. A. Gerzon: The Design of Distance Panpots, Proc. 92nd AES Convention, Preprint No. 3308, Vienna, 1992.

Comparing Ambisonic Microphones

In previous posts on this blog I informally compared a number of different Ambisonic microphones which can be used to capture spatial audio for 360 video and VR. However, myself and my colleagues (Seán Dooney, Marcin Gorzel, Hugh O’Dwyer, Luke Ferguson, and Francis M. Boland from the Spatial Audio Research Group, Trinity College, and Google, Dublin) have just recently completed a more formal study comparing a number of these microphones using both listening tests and an objective analysis. The research was published in two papers presented at the Audio Engineering Society (AES) Conference on Sound Field Control, Guildford UK in July 2016, and the 142^nd AES Convention in Berlin in May 2017, and are now available from the AES digital library (Part 1 here, & Part2 here). While the papers themselves contain detailed technical information on both parts of this study, in this blog post I’ll present the major findings and share the original recordings so you can draw your own conclusions.

The original Soundfield microphone was first developed in the 1970s and since then a number of variations of this design have been released by different manufacturers. More recently, as Ambisonics has become the de facto standard format for spatial audio for VR and 360 video, many more Ambisonic microphones have emerged, and so a formal comparison of these different microphones is quite timely.

A number of previous studies on spatial audio have found a significant correlation between a listener’s overall preference and two specific parameters, namely localization accuracy (i.e. how directionally accurate is the spatial recording), and the overall sound quality. Our study therefore employed subjective listening tests to evaluate and compare the overall quality and timbre of these microphones, alongside an objective analysis of the localization accuracy.

In the first paper we examined the following microphones;

DPA 4006 omni-directional condenser microphone used a monophonic reference
Soundfield MKV system with rack-mounted control unit with four sub-cardioid capsules
Core Sound TetraMic with four cardioid electret-condenser capsules
Zoom H2n containing two stereo microphone pairs which can be converted into a horizontal only Ambisonic signal
MH Acoustics Eigenmike spherical microphone array with thirty-two omni-directional electret capsules

TestMikes

For this initial listening test, samples of speech and music were played back over a single loudspeaker and recorded in turn by each of the five microphones. For this comparison, we wanted to examine the fundamental audio quality of these microphones, so only the monophonic, omni-directional W channel of these spatial recordings was used. The listening test participants were asked to compare each of the mic recordings to a reference captured using a high quality DPA 4006 microphone with an extremely flat frequency response. This approach is known as a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) test (described in ITU–R BS.1534-1) and is typically used to evaluate intermediate levels of audio quality in multiple stimuli both through a comparison to a reference recording, and between each stimulus. Participants were asked to compare each recording to the reference in terms of Audio Quality (for both music and speech), Low, Mid and High Frequency Timbre (music), and Artifacts/Distortion (speech) using the continuous quality scale shown below.

User-GUI

30 participants (26 male & 4 female) performed this test which lasted 17.5 minutes on average. The MUSHRA test procedure typically excludes subjects who give the hidden reference a score of less than 90 for more than 15% of the test items, however, as the differences in quality were often quite subtle in this case, this specification was relaxed slightly with 11 participants excluded from the final results. Our analysis found statistically significant results for all 6 questions which strongly suggest differences in the subjective performance of the different microphones.

A more detailed discussion of these results can be found in the original paper however two findings were particularly notable. Firstly, the results indicate that despite this being an older model from the Soundfield range, the Soundfield MKV performed the best overall in terms of audio quality.

plot1-2

The Eigenmike and TetraMic produced the worst results in terms of artifacts and noise, and this could well have contributed to the lower scores for these two microphones in terms of the overall Audio Quality, particularly for speech. The high scores for the TetraMic for Audio Quality and High, Mid, and Low Frequency Timbre for music examples (in which the presence of noise would be much less noticeable) support this finding. The relatively poor performance of the TetraMic in this category is perhaps explained by the low level unbalanced, microphone-level signal output which required significantly more preamplifier gain when compared to the other microphones. The low score for the Eigenmike for this question can perhaps be explained by the more elaborate design of this microphone and the much larger number of individual mic capsules (32 instead of just 4).

Plot3

In the second paper, we included an additional, newly released microphone, namely the Sennheiser Ambeo. For this listening test we wanted to replicate a more typical usage of these microphones, so the spatial recordings were presented using a head-tracked binaural rendering over headphones, which is standard for 360 videos presented using a Head Mounted Display (HMD).

Test-Mikes-test2

The listening test stimuli were taken from a live performance by an acoustic quartet (flute, guitar, saxophone and cello) of an original spatial music composition entitled A Round, Around. This piece consists of tonal, melodic canons, rotations and other spatial effects created by passing musical material between consecutive instruments, as can be seen in the score excerpt below. Many thanks to funding and supporting agencies Science Foundation Ireland, Sennheiser, and Google, and the musicians Kate Ellis, Nick Roth, and Lina Andonovska for their fine performance.
GMB

score-extract The piece was recorded using multiple monophonic spot (AKG 414s) and room microphones (AKG C314s, and Sennheiser TLMs) arranged in a circle of 1m radius and at 90 degree intervals, as shown below. In order to ensure consistent reproduction and microphone position, four excerpts from the piece were then reproduced using a number of loudspeakers and Ambisonic recordings made with each of the microphones in turn. A MOTU 8m audio interface was used for the TetraMic and Ambeo, while the other microphones used their own, proprietary interfaces. All of these recordings can be downloaded from the link at the bottom of this post, along with some additional recordings we couldn’t include in the test itself.

Array-Mics To investigate the subjective timbral quality of each microphone, a modified MUSHRA test was again implemented. However, as no reference recording is available for this particular experiment, all test stimuli are presented to the listener at the same time without a reference.

Due to the relative difficulty of the task, and the lack of a suitable reference, all participants undertook training prior to taking the test. The benefits of using trained, experienced listeners over naive or untrained listeners has been well documented with trained listeners tending to give more discriminating and consistent preference ratings. Training for this experiment was conducted using Harman’s “How to Listen” listener training software and each subject was required to reach a skill level of 7 or higher in two specific programs focusing on the high frequency timbre (Brightness/Dullness), and low frequency timbre (Fullness/Thinness) of various stereo music samples.

The four tests each used a different excerpt from the piece, which was recorded using each of the five Ambisonic microphones. Subjects are asked to rate the five recordings in terms of high frequency timbre (Bright/Dull) in tests 1 & 2, and low frequency timbre (Full/Thin) in tests 3 & 4. The same continuous quality scale as in the training program is used, namely from -5 (Dull), to 5 (Bright) in tests 1 & 2, and from -5 (Thin), to 5 (Full) in tests 3 & 4, with 0 (Ideal) as the initial, default value. The test system and user interface was implemented in Max MSP and presented using a laptop, RME Babyface audio interface, Sennheiser HD650 open back headphones, and an Oculus Rift DK2 for head-tracking. The binaural decoding was based around a cube of virtual loudspeakers and mode matched decoder, implemented using the ambiX decoder plugin and the KEMAR HRTF set from the SADIE database.

Test-interface-MaxMSP

21 participants (20 male & 1 female) performed the test, with ages ranging from 21 to 50 years old, and an average age of 33. The participants included experienced listeners (12 subjects) comprising of professional audio engineers and academics with prior experience of similar tests, and semi-experienced listeners (9 subjects) comprising of post-graduate students in the Music & Media Technology programme in Trinity College Dublin. The average time taken for both the training and the test was 53 minutes, and a short break was always taken following the initial training and before the test.

TimbrePlots

For Question 1: high frequency timbre, no statistically significant difference was found between the Ambeo and H2n, and a small, but still statistically significant difference was found between the TetraMic and Soundfield MKV. Significant differences were found between all other pairs of microphones, with the Soundfield MKV rated as closest to ideal in terms of high frequency timbre. The Eigenmike was rated as dull sounding compared to the other microphones, while the Ambeo and H2n were both rated as brighter than ideal. This result for the Ambeo can perhaps be explained by the presence of the “Ambisonics Correction Filter” incorporated by default within v1.03 of Sennheiser’s A-to-B-format conversion plugin, which applied a significant boost above 10 kHz to all four channels. It should be noted that Sennheiser have since released an updated version of this plugin (v1.1.2) which significantly reduces this high frequency boost. While this new version arrived too late to be included in our formal test, the download link below includes both the original recording used in test, and a new version processed with the latest version of the plugin.

For Question 2, low frequency timbre, no statistically significant difference was found between the Soundfield MKV and TetraMic. A small, but still statistically significant difference was found between the Eigenmike and TetraMic, with significant differences found between all other pairs of microphones. The Soundfield MKV was again rated as being closest to ideal, but could not be statistically distinguished from the TetraMic. The Eigenmike was rated as full compared to the other microphones, while in contrast the Zoom H2n was rated as thinner sounding than other mics.

In addition to our listening tests, we also performed an objective analysis of localization accuracy for each of the five microphones. For these experiments, the five microphones were placed in turn at the center of a spherical array of sixteen Equator D5 coaxial loudspeakers, which were mounted on a dampened metal frame and the floor, in four vertically separated rings. Two laser pointers were mounted on the frame to mark the exact center point of the array and ensure the same position and orientation for each microphone.

Untitled-1

To generate the test material, pink noise bursts were played consecutively from each loudspeaker and recordings made with each microphone. In addition, a monophonic recording made with the DPA 4006 was synthetically encoded into Ambisonics using the ambiX encoder and the azimuth and elevation angles of the loudspeakers in the array, and this served as a reference for the directional analysis.

As well as the five microphones mentioned earlier, this second experiment also assessed the Ambeo both with, and without the Ambisonics Correction Filter mentioned earlier. In addition, the Zoom H2n was assessed using both the native Ambisonics recording mode (horizontal only), released by Zoom as a firmware update in 2016, and also the conversion of the standard 4-channel recording mode to B-format using a freeware conversion plugin developed by my colleague Brian Fallon at Trinity College Dublin (available as a free download, here)

The published paper contains precise details on our directional analysis method but fundamentally our approach is based around an intensity vector analysis of different frequency bands. For each of these frequency bands (where are derived from the Bark scale), the mode of angle estimates was taken as the estimated angle to the source. This can be seen in the histogram of azimuth angle data for a loudspeaker at 45 degrees shown below where the angle estimates are spread across the range -180 to 180 degrees but the primary mode correlates strongly with the true source angle.

SoundField Mic Stand - Speaker 2 V2 (Cropped)

Using this method, an estimate for the azimuth and elevation angles of each of the 16 loudspeakers was obtained from the different microphone recordings. All of these estimates were then compared to the actual loudspeaker locations and an absolute offset error determined. These results are summarized below, although it should be noted that as the Zoom H2n does not include a vertical component, no elevation results are included for this microphone.

The minimal errors for the synthetically encoded reference signal clearly demonstrate the effectiveness of this analysis method for this type of analysis. A paired t-test revealed statistically significant differences between microphones, and the results indicate that the Eigenmike was the most accurate mic in terms of both azimuth and elevation, on par with the Ambeo (both versions) and better than all other microphones.

Both versions of the Ambeo were on par with the Eigenmike and Soundfield MKV (for azimuth), better than the Soundfield MKV (for elevation), and better than the TetraMic and H2n (both versions). Interestingly, our study revealed no significant difference in directional accuracy for the Ambeo with and without the optional Ambisonics Correction Filter.

The results for the Soundfield MKV and TetraMic were largely comparable, and the Zoom H2n performed worse overall, although this is not unexpected as this mic does not include a vertical component and all loudspeakers were offset from horizontal to some degree. Finally, no statistically significant differences were found between the H2n native Ambisonic recording mode, and our 4-channel conversion plugin.

directional_error_no_callaghan-final

The results of these experiments largely match the findings of Part 1 of this study, with the Soundfield MKV once again producing the best results in terms of overall timbral quality, and with comparable results to the other microphones in terms of directionality with the exception of the Eigenmike and Ambeo.

The Ambeo performed very well in terms of directionality, on par with the Eigenmike and Soundfield (azimuth), and better than all other mics. In terms of timbral quality the Ambeo was rated as less ideal and brighter than other mics, however, as noted earlier this result was strongly influenced by the now updated Correction Filter applied during the A-to-B-format conversion process. Notably the latest version of this conversion plugin released by Sennheiser has drastically reduced the extent of this high frequency boost, and this new version is also included in the download link below.

The TetraMic was slightly less accurate than the Eigenmike, Ambeo and Soundfield MKV in terms of directional accuracy, which may be explained by the slighly greater inter-capsule spacing of this microphone. While the use of speech signals and studio recordings in Part 1 of this study revealed some issues with noise with the TetraMic due to relatively low level signal output compared to other microphones such as the Ambeo, this was much less apparent with this music recording. This suggests that given appropriate pre-amplifiers with sufficient gain, good results can be achieved with the TetraMic in terms of timbral quality.

The more elaborate design of the Eigenmike produced the best results overall in terms of localization accuracy, but with decreased performance in terms of timbral quality. As with our earlier experiments, this suggests a certain trade-off between timbral quality and directional accuracy, particularly when the number of individual mic capsules is significantly increased.

The Zoom H2n performed worse overall compared to all other microphones, however, its performance is still very reasonable given its extremely low cost , and that it was not originally designed to produce B-format recordings.

All of the recordings used in the second listening test can be downloaded from the link below. The ZIP file contains the 4 excerpts from A Round, Around recorded with the Soundfield MKV, Ambeo, TetraMic, H2n and Eigenmike. In addition, an alternative version of the AMBEO (converted using the newly updated filter), and Eigenmike (3rd order Ambisonics) are also included in the additional recordings folder. All the recordings are 48kHz/24bit, and the Furse-Malham weighting (with the W channel attenuated by 3dB) and channel order (WXYZ) was used throughout.

ComparingAmbiMics-Samples.zip

The Zip file also contains a sample Reaper session which can be used to audition the files, using the ambiX decoder plugin and the Kemar HRTF set from the Sadie database.

This research was supported by Science Foundation Ireland.

A New Camera Rig, Distance Processing Experiments, and “Vortex Cannons”

The Music & Media Technologies programme in Trinity College Dublin recently celebrated its 20th birthday with a concert in the Samuel Beckett Theatre. So, we decided to use this opportunity to try out our new camera system and microphone, namely a GoPro Omni, and a Sennheiser Ambeo.

index_postera3-fancy2-webgraphic img_0286_1-small

The concert featured numerous performances by the Crash Ensemble and composers such as Miriam Ingram, Enda Bates, Natasa Paulberg, Neil O’Connor, Maura McDonnell, and Conor Walsh/Mark Hennessy. However, a couple of performances in particular seemed very suitable for a 360 video, particularly The Sense Ensemble / Study #2 by George Higgs for string quartet, vortex cannons, silent signing singer, and percussion.

George is currently pursuing a Ph.D. at Trinity College Dublin entitled ‘An Approach to Music Composition for the Deaf’ and here’s his programme note for the piece;

Music involves much more than hearing. All of our senses – arguably nine in number – are in fact collaborating in our musical experience as a kind of ‘sense ensemble’. This composition is the second in an experimental research series exploring approaches to music composition for deaf audiences; or more generally music that appeals to the multiple senses responsible for our appreciation of music. The performance features smoke ring cannons, two signing percussionists and string quartet. Many thanks to Neimhin Robinson (smoking signer), Dr Dermot Furlong (nonsmoking supervisor), Jessica Kennedy(choreographic consultant), and the Irish Research Council.

While the content of this performance was very well suited to the medium (those smoke cannons in particular), the conditions were highly challenging in terms of the video shoot and so this was a good test of the limitations of the GoPro Omni system.

The most notable feature of this rig is undoubtedly the synchronisation of the six GoPro’s, which so far has been very stable and trouble free. Once the firmware is updated, then all 6 cameras can be controlled from one master camera, which can also be matched to the standard GoPro remote control. If you purchase the rig-only Omni, then it should be noted that the power pack and stitching software needs to be purchased separately, however, we were just about able to snake our 6 USB power cables up along the tripod and into the cameras without too much difficulty.

img_0286_5

To achieve this synchronisation, the cameras attach to the central brain of the rig, however, the extra space needed for this does mean the cameras are not positioned as close together as physically possible. As a consequence, the rig and stitching software does struggle when moving objects or people get too close to the camera, as can be seen in the above video when George and Neimhin approach the smoke machines to fill up the vortex cannons.

Stitching is implemented using a specific GoPro Omni Importer App and for simpler shots in which nothing is moving too close to the rig, this does a pretty good job. However, in general at least some touching up of the stitch is required using a combination of Autopano Video Pro, and Autopano Giga. Visible stitch lines on static objects or people are relatively easy to correct and simply require some adjustments with the masking tool in Autopano Giga. This tool allows Markers to be added to the reference panorama image so that stitch lines avoid certain areas and noticeable artefacts are removed (or at least reduced).

For static objects this can usually be achieved relatively easily, however, it is definitely worth considering how you orientate the camera rig with that in mind. By default the omni is mounted on one of the corners of the cube, however, it could be worth adding an attachment to the tripod so the rig is mounted flat, depending on the particular setup for the shoot (we may have had better results here using that orientation).

The particularly challenging aspect of the stitch of George’s piece was the movement of the two performers, and this required further processing using the timeline in Autopano Video. This is a similar process as again we use the masking tool in Autopano Giga to selectively maintain specific areas in the reference panorama. Now however we’re using the timeline in Autopano Video to move between different sets of markers as the person or object moves across a stitch line. This can be pretty effective once the objects are not too close, as the following tutorial from CV North America demonstrates. However, if the action is happening within a few metres of the rig, then stitching artefacts may be unavoidable or at least extremely time consuming to eliminate entirely (as can clearly be seen at times in the video of George’s piece).

The particular lighting needed for this piece also presented some challenges in the stitch. In order to light the smoke rings, two fairly powerful spot lights were directed over the audience and directly down onto the stage (and therefore also the camera rig), which resulted in exaggerated brightness and shine in the performers faces (and those white coats too!).

In contrast to the above, the stitching for the second video from this concert was much more straightforward. For this piece by Miriam Ingram, the musicians were all at a safe distance from the camera so only a few small touch ups were required, again just using the masking tool in Autopano Giga to select specific areas within the reference panorama.

The audio for both of these videos was recorded using a Sennheiser Ambeo microphone and Zoom H6 recorder, mounted just in front and below the camera rig. We will be publishing some specific analysis of this microphone in the second part of our Comparing Ambisonic Microphones Study early in 2017, however, more informally I’ve been very impressed by this microphone. The build quality feels very good, it outputs a high signal level that performs well with average quality mic preamps such as in the Zoom recorders, and the accompanying conversion plugin is very straightforward.

Although marketed as a “VR mic” this is actually an almost identical design to the original Soundfield microphone which has been use for many decades. The photo below on the left shows the capsule arrangement within the Ambeo which follows the same tetrahedral layout of the four microphones as in the original Soundfield microphone (which is shown on the right).

sennheiser20ambeo20vr20microphone2 mg07soundfieldcapsule-509x1024

As is often the case in 360 audio mixes, the question of distance is a an important factor to consider. For acoustic performances, recordings such as this can often sound very distant, generally due to the placement of the microphone alongside the camera rig at a distance beyond what would be typical for an audio only recording. As a consequence, this can result in a lack of directionality in the recording and require the addition of additional spot microphones. However, for this shoot we had the opposite problem due to the close miking and amplification of the musicians through the venue PA. As the mic and camera rig were positioned in the front row of seating, the PA loudspeakers were positioned very wide relative to the mic, resulting in a overly close sound compared to the visual distance. This was particularly noticeable for the string quartet in George’s piece which sounded much too wide and close initially.

To correct this issue, some simple yet surprisingly effective distance processing was applied to the Ambeo Mic recording, namely the addition of some early reflections to the W channel. This was very much a quick experiment using just the early reflections component of a commercial reverb plugin, however, as the results were pretty good it made it into the final mix. As a demonstration, the audio samples below contain static (at 30 deg and -90 deg respectively) binaural mixes of an excerpt of the piece. Each sample begins with two bars of unprocessed material, then two bars with the additional reflections, and so on for another 4 bars.

This type of distance processing of both ambisonic recordings, and encoded mono sources such as spot microphones will be the focus of my research in the coming year, as there are still many unknowns in this whole area. For example, just how important is it that the pattern and timing of these early reflections match the physical dimensions and acoustic of the space? Alternatively, can better results be achieved using some general method, perhaps such as the distance panpot suggested back in 1992 by Peter Craven and Michael Gerzon [1]?

There is also some evidence that the optimal number and distribution of early reflections for distance processing without excessive timbral coloration is dependent on the nature of the source signal, which suggests that a one size fits all solution may not be the best approach. Lets just say this is definitely a topic we’ll be returning too in 2017, and now that we have our hardware and workflow all sorted, expect a lot more 360 videos over the coming year.

[1] M. A. Gerzon: The Design of Distance Panpots, Proc. 92nd AES Convention, Preprint No. 3308, Vienna, 1992.

	Sea Swell at the Zêz… on A New Camera Rig, Distance Pro…
	Sea Swell at the Zêz… on Spatial Choral Music and 360…
	Sea Swell at the Zêz… on Spem in Alium in 360 Vide…
	Spem in Alium in 360… on A Round, Around, & 360 Vid…
	Spem in Alium in 360… on 360 Audio in Practice