2023 IEEE ICASSP Grand Challenge 

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Signal Processing Grand Challenge at IEEE ICASSP 2023

Scope of the Challenge

The L3DAS23 Challenge aims at encouraging and fostering research on machine learning for 3D audio signal processing.

3D audio applications in virtual environments are gaining increasing interest in the machine learning community in recent years. This field of applications is incredibly wide and ranges from virtual and real conferencing to game development, music production, augmented reality and immersive technologies.

This challenge, which extends the two tasks of the L3DAS22 Grand Challenge presented at ICASSP 2022, relies on first-order Ambisonics recordings in reverberant simulated environments while paying a special attention on possible augmented reality applications. To this end, L3DAS23 presents two tasks: 3D Speech Enhancement and 3D Sound Event Localization and Detection.

Each task is accompanied by a dataset containing recordings and pictures showing the frontal view from the microphone, which may be used to extract visual cues that can enhance the models performance. Therefore, each task involves 2 separate tracks: audio-only and audio-visual track, where each provides two subtracks, i.e. 1-mic and 2-mic recordings. 
We expect higher accuracy/reconstruction quality in the case of the audio-visual track, especially when taking advantage of the dual spatial perspective of the two microphones.

Schedule

▪ Nov 28, 2022 - Registration Opening
▪ Dec 15, 2022 - Release of the Training and Development Sets and Documentation
▪ Jan 15, 2023 - Release of the Support Code
▪ Jan 15, 2023 - Release of the Baseline Models
Feb 05, 2023 – Release of the Evaluation Test Set
▪ Feb 10, 2023 – Registration Closing
Feb 15, 2023 Feb 19, 2023 – Deadline for Submitting Results
Feb 20, 2023  3:00 a.m. (AoE)– Notification of Top Ranked Teams
▪ Feb 20, 2023 – Deadline for 2-page Paper Submission (Top Ranked 5 Only)
Mar 7, 2023 – Grand Challenge Paper Acceptance Notification
Mar 14, 2023 – Camera-Ready Grand Challenge Papers Deadline

Tasks

The tasks we propose are:

 3D Speech Enhancement in Simulated Reverberant Environments

The objective of this task is the enhancement of speech signals immersed in the spatial sound field of a reverberant simulated environments. Here the models are expected to extract the monophonic voice signal from the 3D mixture containing various background noises. The evaluation metric for this task is a combination of short-time objective intelligibility (STOI) and word error rate (WER).

 More details

  3D Sound Event Localization and Detection in Simulated Reverberant Environments

The aim of this task is to detect the temporal activities of a known set of sound event classes and, in particular, to further locate them in the space. Here the models must predict a list of the active sound events and their respective location at regular intervals of 100 milliseconds. Performance on this task is evaluated according to the location-sensitive detection error, which joins the localization and detection error metrics.

 More details

Dataset

Each of the above two tasks is supported by an appropriate dataset. The L3DAS23 datasets contains multiple-source and multiple-perspective B-format Ambisonics audio recordings. We sampled the acoustic field of multiple simulated environments, placing two first-order Ambisonics microphones in random points of the rooms and capturing up to 737 room impulse responses in each one. The datasets also contain multiple RGB pictures showing the frontal view from the main microphone.
We aimed at creating plausible and variegate 3D scenarios to reflect possible real-life situations in which sound and disparate types of background noises coexist in the same 3D reverberant environment.

 More details on the dataset

Baselines

As baseline methods we propose similar architectures to those used as baseline for L3DAS22, specifically adapted for each track. For both tasks, we used the only signals coming from one Ambisonics microphone (mic A), leaving room for experimentation with the dual-mic configuration.

 More details on the baselines

Benefits for Challenge Winners

  • Top 5 ranked teams can submit a 2-page paper according to the ICASSP guidelines.

Additional Info

  • Registration
    Participants are required to register for the challenge by compiling this form (registrations closed!).
  • Previous Challenges: L3DAS22 and L3DAS21
    This challenge extends the two tasks of the L3DAS22 Grand Challenge presented at ICASSP 2022. While in L3DAS22 the environment used was only one, the new version includes a multitude of realistic simulated environments. We also adapted the code to make it possible to run experiments also in the multi-modal scenario more fluently. For additional info you can also refer to the L3DAS22 challenge website, the official GitHub repository and the official ICASSP 2022 paper describing the L3DAS22 dataset.
    An additional earlier edition, called L3DAS21, is described in detail on the respective page of the website.

Organizers

Christian Marinoni, Sapienza University of Rome, Italy
Riccardo Fosco Gramaccioni, Sapienza University of Rome, Italy
Changan Chen, UT Austin, TX, USA
Danilo Comminiello, Sapienza University of Rome, Italy

Challenge Partners

Optimize your work to save as much energy as possible.