Task 1 - IEEE MLSP Data Challenge 2021

Sound event classes

To generate the spatial sound scenes the measured room IRs are convolved with clean sound samples belonging to distinct sound classes. The noise sound event database we used for task 1 is the well-known FSD50K dataset. In particular, we have selected 12 transient classes, representative of the noise sounds that can be heard in an office: computer keyboard, drawer open/close, cupboard open/close, finger snapping, keys jangling, knock, laughter, scissors, telephone, writing, chink and clink, printer, and 4 cointinous noise classes: alarm, crackle, mechanical fan and microwave oven.
Furthermore, we extracted clean speech signals (without background noise) from Librispeech, taking only sound files up to 10 seconds.

Dataset specs

The main characteristics of the L3DAS21 Task1 section are:

more than 30000 virtual 3D audio environments with a duration up to 10 seconds
16kHz 16 bit AmbiX wav files
clean voice sounds from Librispeech
up to 3 non-speech overlapping background noises
252 RIRs positions collected in an office-like environment

The predictors data of this section are released as 8-channels 16kHz 16 bit wav files, consisting of 2 sets of first-order Ambisonics recordings (4 channels each). The channels order is [WA,YA,ZA,XA,WB,YB,ZB,XB], where A/B refers to the used microphone and WYZX are the b-format ambisonics channels.

L3DAS21 - Task 1

Introduction

Sound event classes

Dataset specs