In previous posts on this blog I informally compared a number of different Ambisonic microphones which can be used to capture spatial audio for 360 video and VR. However, myself and my colleagues (Seán Dooney, Marcin Gorzel, Hugh O’Dwyer, Luke Ferguson, and Francis M. Boland from the Spatial Audio Research Group, Trinity College, and Google, Dublin) have just recently completed a more formal study comparing a number of these microphones using both listening tests and an objective analysis. The research was published in two papers presented at the Audio Engineering Society (AES) Conference on Sound Field Control, Guildford UK in July 2016, and the 142nd AES Convention in Berlin in May 2017, and are now available from the AES digital library (Part 1 here, & Part2 here). While the papers themselves contain detailed technical information on both parts of this study, in this blog post I’ll present the major findings and share the original recordings so you can draw your own conclusions.
The original Soundfield microphone was first developed in the 1970s and since then a number of variations of this design have been released by different manufacturers. More recently, as Ambisonics has become the de facto standard format for spatial audio for VR and 360 video, many more Ambisonic microphones have emerged, and so a formal comparison of these different microphones is quite timely.
A number of previous studies on spatial audio have found a significant correlation between a listener’s overall preference and two specific parameters, namely localization accuracy (i.e. how directionally accurate is the spatial recording), and the overall sound quality. Our study therefore employed subjective listening tests to evaluate and compare the overall quality and timbre of these microphones, alongside an objective analysis of the localization accuracy.
In the first paper we examined the following microphones;
- DPA 4006 omni-directional condenser microphone used a monophonic reference
- Soundfield MKV system with rack-mounted control unit with four sub-cardioid capsules
- Core Sound TetraMic with four cardioid electret-condenser capsules
- Zoom H2n containing two stereo microphone pairs which can be converted into a horizontal only Ambisonic signal
- MH Acoustics Eigenmike spherical microphone array with thirty-two omni-directional electret capsules
For this initial listening test, samples of speech and music were played back over a single loudspeaker and recorded in turn by each of the five microphones. For this comparison, we wanted to examine the fundamental audio quality of these microphones, so only the monophonic, omni-directional W channel of these spatial recordings was used. The listening test participants were asked to compare each of the mic recordings to a reference captured using a high quality DPA 4006 microphone with an extremely flat frequency response. This approach is known as a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) test (described in ITU–R BS.1534-1) and is typically used to evaluate intermediate levels of audio quality in multiple stimuli both through a comparison to a reference recording, and between each stimulus. Participants were asked to compare each recording to the reference in terms of Audio Quality (for both music and speech), Low, Mid and High Frequency Timbre (music), and Artifacts/Distortion (speech) using the continuous quality scale shown below.
30 participants (26 male & 4 female) performed this test which lasted 17.5 minutes on average. The MUSHRA test procedure typically excludes subjects who give the hidden reference a score of less than 90 for more than 15% of the test items, however, as the differences in quality were often quite subtle in this case, this specification was relaxed slightly with 11 participants excluded from the final results. Our analysis found statistically significant results for all 6 questions which strongly suggest differences in the subjective performance of the different microphones.
A more detailed discussion of these results can be found in the original paper however two findings were particularly notable. Firstly, the results indicate that despite this being an older model from the Soundfield range, the Soundfield MKV performed the best overall in terms of audio quality.
The Eigenmike and TetraMic produced the worst results in terms of artifacts and noise, and this could well have contributed to the lower scores for these two microphones in terms of the overall Audio Quality, particularly for speech. The high scores for the TetraMic for Audio Quality and High, Mid, and Low Frequency Timbre for music examples (in which the presence of noise would be much less noticeable) support this finding. The relatively poor performance of the TetraMic in this category is perhaps explained by the low level unbalanced, microphone-level signal output which required significantly more preamplifier gain when compared to the other microphones. The low score for the Eigenmike for this question can perhaps be explained by the more elaborate design of this microphone and the much larger number of individual mic capsules (32 instead of just 4).
In the second paper, we included an additional, newly released microphone, namely the Sennheiser Ambeo. For this listening test we wanted to replicate a more typical usage of these microphones, so the spatial recordings were presented using a head-tracked binaural rendering over headphones, which is standard for 360 videos presented using a Head Mounted Display (HMD).
The listening test stimuli were taken from a live performance by an acoustic quartet (flute, guitar, saxophone and cello) of an original spatial music composition entitled A Round, Around. This piece consists of tonal, melodic canons, rotations and other spatial effects created by passing musical material between consecutive instruments, as can be seen in the score excerpt below. Many thanks to funding and supporting agencies Science Foundation Ireland, Sennheiser, and Google, and the musicians Kate Ellis, Nick Roth, and Lina Andonovska for their fine performance.
The piece was recorded using multiple monophonic spot (AKG 414s) and room microphones (AKG C314s, and Sennheiser TLMs) arranged in a circle of 1m radius and at 90 degree intervals, as shown below. In order to ensure consistent reproduction and microphone position, four excerpts from the piece were then reproduced using a number of loudspeakers and Ambisonic recordings made with each of the microphones in turn. A MOTU 8m audio interface was used for the TetraMic and Ambeo, while the other microphones used their own, proprietary interfaces. All of these recordings can be downloaded from the link at the bottom of this post, along with some additional recordings we couldn’t include in the test itself.
To investigate the subjective timbral quality of each microphone, a modified MUSHRA test was again implemented. However, as no reference recording is available for this particular experiment, all test stimuli are presented to the listener at the same time without a reference.
Due to the relative difficulty of the task, and the lack of a suitable reference, all participants undertook training prior to taking the test. The benefits of using trained, experienced listeners over naive or untrained listeners has been well documented with trained listeners tending to give more discriminating and consistent preference ratings. Training for this experiment was conducted using Harman’s “How to Listen” listener training software and each subject was required to reach a skill level of 7 or higher in two specific programs focusing on the high frequency timbre (Brightness/Dullness), and low frequency timbre (Fullness/Thinness) of various stereo music samples.
The four tests each used a different excerpt from the piece, which was recorded using each of the five Ambisonic microphones. Subjects are asked to rate the five recordings in terms of high frequency timbre (Bright/Dull) in tests 1 & 2, and low frequency timbre (Full/Thin) in tests 3 & 4. The same continuous quality scale as in the training program is used, namely from -5 (Dull), to 5 (Bright) in tests 1 & 2, and from -5 (Thin), to 5 (Full) in tests 3 & 4, with 0 (Ideal) as the initial, default value. The test system and user interface was implemented in Max MSP and presented using a laptop, RME Babyface audio interface, Sennheiser HD650 open back headphones, and an Oculus Rift DK2 for head-tracking. The binaural decoding was based around a cube of virtual loudspeakers and mode matched decoder, implemented using the ambiX decoder plugin and the KEMAR HRTF set from the SADIE database.
21 participants (20 male & 1 female) performed the test, with ages ranging from 21 to 50 years old, and an average age of 33. The participants included experienced listeners (12 subjects) comprising of professional audio engineers and academics with prior experience of similar tests, and semi-experienced listeners (9 subjects) comprising of post-graduate students in the Music & Media Technology programme in Trinity College Dublin. The average time taken for both the training and the test was 53 minutes, and a short break was always taken following the initial training and before the test.
For Question 1: high frequency timbre, no statistically significant difference was found between the Ambeo and H2n, and a small, but still statistically significant difference was found between the TetraMic and Soundfield MKV. Significant differences were found between all other pairs of microphones, with the Soundfield MKV rated as closest to ideal in terms of high frequency timbre. The Eigenmike was rated as dull sounding compared to the other microphones, while the Ambeo and H2n were both rated as brighter than ideal. This result for the Ambeo can perhaps be explained by the presence of the “Ambisonics Correction Filter” incorporated by default within v1.03 of Sennheiser’s A-to-B-format conversion plugin, which applied a significant boost above 10 kHz to all four channels. It should be noted that Sennheiser have since released an updated version of this plugin (v1.1.2) which significantly reduces this high frequency boost. While this new version arrived too late to be included in our formal test, the download link below includes both the original recording used in test, and a new version processed with the latest version of the plugin.
For Question 2, low frequency timbre, no statistically significant difference was found between the Soundfield MKV and TetraMic. A small, but still statistically significant difference was found between the Eigenmike and TetraMic, with significant differences found between all other pairs of microphones. The Soundfield MKV was again rated as being closest to ideal, but could not be statistically distinguished from the TetraMic. The Eigenmike was rated as full compared to the other microphones, while in contrast the Zoom H2n was rated as thinner sounding than other mics.
In addition to our listening tests, we also performed an objective analysis of localization accuracy for each of the five microphones. For these experiments, the five microphones were placed in turn at the center of a spherical array of sixteen Equator D5 coaxial loudspeakers, which were mounted on a dampened metal frame and the floor, in four vertically separated rings. Two laser pointers were mounted on the frame to mark the exact center point of the array and ensure the same position and orientation for each microphone.
To generate the test material, pink noise bursts were played consecutively from each loudspeaker and recordings made with each microphone. In addition, a monophonic recording made with the DPA 4006 was synthetically encoded into Ambisonics using the ambiX encoder and the azimuth and elevation angles of the loudspeakers in the array, and this served as a reference for the directional analysis.
As well as the five microphones mentioned earlier, this second experiment also assessed the Ambeo both with, and without the Ambisonics Correction Filter mentioned earlier. In addition, the Zoom H2n was assessed using both the native Ambisonics recording mode (horizontal only), released by Zoom as a firmware update in 2016, and also the conversion of the standard 4-channel recording mode to B-format using a freeware conversion plugin developed by my colleague Brian Fallon at Trinity College Dublin (available as a free download, here)
The published paper contains precise details on our directional analysis method but fundamentally our approach is based around an intensity vector analysis of different frequency bands. For each of these frequency bands (where are derived from the Bark scale), the mode of angle estimates was taken as the estimated angle to the source. This can be seen in the histogram of azimuth angle data for a loudspeaker at 45 degrees shown below where the angle estimates are spread across the range -180 to 180 degrees but the primary mode correlates strongly with the true source angle.
Using this method, an estimate for the azimuth and elevation angles of each of the 16 loudspeakers was obtained from the different microphone recordings. All of these estimates were then compared to the actual loudspeaker locations and an absolute offset error determined. These results are summarized below, although it should be noted that as the Zoom H2n does not include a vertical component, no elevation results are included for this microphone.
The minimal errors for the synthetically encoded reference signal clearly demonstrate the effectiveness of this analysis method for this type of analysis. A paired t-test revealed statistically significant differences between microphones, and the results indicate that the Eigenmike was the most accurate mic in terms of both azimuth and elevation, on par with the Ambeo (both versions) and better than all other microphones.
Both versions of the Ambeo were on par with the Eigenmike and Soundfield MKV (for azimuth), better than the Soundfield MKV (for elevation), and better than the TetraMic and H2n (both versions). Interestingly, our study revealed no significant difference in directional accuracy for the Ambeo with and without the optional Ambisonics Correction Filter.
The results for the Soundfield MKV and TetraMic were largely comparable, and the Zoom H2n performed worse overall, although this is not unexpected as this mic does not include a vertical component and all loudspeakers were offset from horizontal to some degree. Finally, no statistically significant differences were found between the H2n native Ambisonic recording mode, and our 4-channel conversion plugin.
The results of these experiments largely match the findings of Part 1 of this study, with the Soundfield MKV once again producing the best results in terms of overall timbral quality, and with comparable results to the other microphones in terms of directionality with the exception of the Eigenmike and Ambeo.
The Ambeo performed very well in terms of directionality, on par with the Eigenmike and Soundfield (azimuth), and better than all other mics. In terms of timbral quality the Ambeo was rated as less ideal and brighter than other mics, however, as noted earlier this result was strongly influenced by the now updated Correction Filter applied during the A-to-B-format conversion process. Notably the latest version of this conversion plugin released by Sennheiser has drastically reduced the extent of this high frequency boost, and this new version is also included in the download link below.
The TetraMic was slightly less accurate than the Eigenmike, Ambeo and Soundfield MKV in terms of directional accuracy, which may be explained by the slighly greater inter-capsule spacing of this microphone. While the use of speech signals and studio recordings in Part 1 of this study revealed some issues with noise with the TetraMic due to relatively low level signal output compared to other microphones such as the Ambeo, this was much less apparent with this music recording. This suggests that given appropriate pre-amplifiers with sufficient gain, good results can be achieved with the TetraMic in terms of timbral quality.
The more elaborate design of the Eigenmike produced the best results overall in terms of localization accuracy, but with decreased performance in terms of timbral quality. As with our earlier experiments, this suggests a certain trade-off between timbral quality and directional accuracy, particularly when the number of individual mic capsules is significantly increased.
The Zoom H2n performed worse overall compared to all other microphones, however, its performance is still very reasonable given its extremely low cost , and that it was not originally designed to produce B-format recordings.
All of the recordings used in the second listening test can be downloaded from the link below. The ZIP file contains the 4 excerpts from A Round, Around recorded with the Soundfield MKV, Ambeo, TetraMic, H2n and Eigenmike. In addition, an alternative version of the AMBEO (converted using the newly updated filter), and Eigenmike (3rd order Ambisonics) are also included in the additional recordings folder. All the recordings are 48kHz/24bit, and the Furse-Malham weighting (with the W channel attenuated by 3dB) and channel order (WXYZ) was used throughout.
The Zip file also contains a sample Reaper session which can be used to audition the files, using the ambiX decoder plugin and the Kemar HRTF set from the Sadie database.
This research was supported by Science Foundation Ireland.