L3DAS22 - Submission

Submission platform

Results and papers can be submitted via the submission site (Microsoft CMT) , where the following steps must be made:

  1. Login with your CMT account. If you are new to the CMT platform you need to register with your email.
  2. In the Author Console, click on "Create New Submission".
  3. Insert the TITLE including your team name and the task, e.g., TeamName_Task1.
  4. Add the email address of your team members.
  5. In the first stage, it is sufficient to upload a pdf file containing a short description of the proposed method and any additional information regarding the submission. The full paper can be uploaded after the notification of the challenge results.
  6. After creating the submission, return back to Author Console to submit your enhanced evaluation test set.
  7. Click on Upload Supplementary Material and upload the zip file with the chosen title (e.g., TeamName_Task1.zip) containing the results. Save to complete your submission.
  8. You can edit the submission before the challenge deadline (Jan 10, 11:59 pm AoE Time).

Optionally, you can inform us at l3das@uniroma1.it about your submission.

AoE time now:

How to format the results for the submission

All participants must submit the only results obtained for the blind test, which will be released on Jan 5, 2022. The submission must be a zip archive (up to 5 files, max 700MB each) enclosing two separate folders for the challenge tasks, named task1 and task2 (note: if your team competes in only one of the two tasks, your zip archive needs just to contain either the task1 or task2 folder). Each task folder should contain the results obtained for each data-point as individual files that must have the same name of the predictors files of the blind test set. Besides the naming, the format and the content differ for the two tasks.

Task 1

For this task the models must predict mono-aural sound waveforms, containing the enhanced speech signals extracted from the multichannel noisy mixtures. Each submitted file for this task should be a numpy mono-dimensional array (.npy file) enclosing the floating-point samples of the predicted speech waveform, with a sampling rate of 16kHz.

Task 2

For this task the models are expected to predict the spatial coordinates and class of the sound events active in a multichannel audio mixture. Such information must be generated for each frame in a discrete temporal grid with 100-milliseconds non overlapping frames.
Each submitted file for this task should be a csv table listing, for every time frame, the class and spatial coordinates of each predicted sound event. Only time frames with active sounds should be included (do not include time frames with no sounds predicted). Please use the following format for each row of the table:

[time frame number (starting from 0)] [class] [x][y][z]

where class should be a string containing the sound class name (with the same naming of the original dataset) and x, y and z should be floats describing the corresponding spatial coordinates.

We provide functions for testing the validity of the submission files as part of our API, as detailed in our github repository. Moreover, below you find an example of submission folder.

Example submission folder [ZIP]