The L3DAS22 Challenge aims at encouraging and fostering research on machine learning for 3D audio signal processing.
3D audio is gaining increasing interest in the machine learning community in recent years. The range of applications is incredibly wide, extending from virtual and real conferencing to autonomous driving, surveillance and many more. In these contexts, a fundamental procedure is to properly identify the nature of events present in a soundscape, their spatial position and eventually remove unwanted noises that can interfere with the useful signal. To this end, L3DAS22 Challenge presents two tasks: 3D Speech Enhancement and 3D Sound Event Localization and Detection, both relying on first-order Ambisonics recordings in reverberant office environments.
Each task involves 2 separate tracks: 1-mic and 2-mic recordings, respectively containing sounds acquired by one 1st order Ambisonics microphone and by an array of two ones. The use of two Ambisonics microphones represents one of the main novelties of the L3DAS22 Challenge. We expect higher accuracy/reconstruction quality when taking advantage of the dual spatial perspective of the two microphones. Moreover, we are very interested in identifying other possible advantages of this configuration over standard Ambisonics formats.
The tasks we propose are:
The L3DAS22 dataset contains multiple-source and multiple-perspective B-format Ambisonics audio recordings. We sampled the acoustic field of a large office room, placing two first-order Ambisonics microphones in the center of the room and moving a speaker reproducing the analytic signal in 252 fixed spatial positions.
We aimed at creating plausible and variegate 3D scenarios to reflect possible real-life situations in which sound and disparate types of background noises coexist in the same 3D reverberant environment.