Results and papers can be submitted via the submission site (Microsoft CMT) , where the following steps must be made:
Besides submitting papers related to L3DAS22 Challenge, authors are encouraged to submit to this special session also papers related to the topic of machine learning for 3D audio signal processing. In such case, there is no need to upload any supplementary material.
Optionally, you can inform us at firstname.lastname@example.org about your submission.
All participants must submit the only results obtained for the blind test, which will be released on Dec 15, 2021. The submission must be a zip archive (max size 350 MB) enclosing two separate folders for the challenge tasks, named task1 and task2 (note: if your team competes in only one of the two tasks, your zip archive needs just to contain either the task1 or task2 folder). Each task folder should contain the results obtained for each data-point as individual files that must have the same name of the predictors files of the blind test set. Besides the naming, the format and the content differ for the two tasks.
For this task the models must predict mono-aural sound waveforms, containing the enhanced speech signals extracted from the multichannel noisy mixtures. Each submitted file for this task should be a numpy mono-dimensional array (.npy file) enclosing the floating-point samples of the predicted speech waveform, with a sampling rate of 16kHz.
For this task the models are expected to predict the spatial coordinates and class of the sound events active in a multichannel audio mixture. Such information must be generated for each frame in a discrete temporal grid with 100-milliseconds non overlapping frames.
Each submitted file for this task should be a csv table listing, for every time frame, the class and spatial coordinates of each predicted sound event. Only time frames with active sounds should be included (do not include time frames with no sounds predicted). Please use the following format for each row of the table:
[time frame number (starting from 0)] [class] [x][y][z]
where class should be a string containing the sound class name (with the same naming of the original dataset) and x, y and z should be floats describing the corresponding spatial coordinates.
We provide functions for testing the validity of the submission files as part of our API, as detailed in our github repository [link not available yet]. Moreover, below you find an example of submission folder.
Example submission folder [ZIP]