This is an official page of "BAF: An Audio Fingerprinting Dataset For Broadcast Monitoring" published in ISMIR 2022.
Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound's private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.
It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.
The dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis. It is available upon request on Zenodo alongside an extended description of the dataset contents, motivation, license, ownership of the data, and the dataset datasheet.
Configuration files are located at baf-dataset/configs.
-
Audfprint code repository: https://github.com/dpwe/audfprint
-
Panako / Olaf code repository: https://github.com/JorenSix/panako (at its 2.1 version release)
-
NeuralFP code repository: https://github.com/guillemcortes/neural-audio-fp (forked from https://github.com/mimbres/neural-audio-fp)
-
PeakFP code can be found in this repository at baf-dataset/peakfp directory
baf-dataset/
├── compute_statistics.py --> Script to generate metrics
├── configs --> Parameter configurations used
│ ├── audfprint.cfg
│ ├── …
│ └── panako.cfg
└── peakfp --> Fingerprinting baseline
├── README.md
├── constants.py
├── …
└── utils.py
The authors recommend the use of virtual environments.
Requirements:
- Python 3.6+
- Create virtual environment and install requirements
git clone https://github.com/guillemcortes/baf-dataset.git
cd baf-dataset
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
BAF has a dedicated dataloader in mirdata that can help working with tha dataset. Check here the documentation.
- The code in this repository is licensed under Apache 2.0
- Dataset license is detailed in Zenodo
Please cite the following publication when using the dataset:
Guillem Cortès, Alex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six, & Xavier Serra. (2022). BAF: An audio fingerprinting dataset for broadcast monitoring. Proceedings of the 23rd International Society for Music Information Retrieval Conference, pp. 908–916. 4-8 December 2022, Bengaluru, India.
Bibtex version:
@inproceedings{cortes2022BAF,
author = {Guillem Cortès and
Alex Ciurana and
Emilio Molina and
Marius Miron and
Owen Meyers and
Joren Six and
Xavier Serra},
title = {{BAF: An audio fingerprinting dataset for broadcast monitoring}},
booktitle = {{Proceedings of the 23rd International Society for Music Information Retrieval Conference}},
year = 2022,
pages = {908-916},
publisher = {ISMIR},
address = {Bengaluru, India},
month = dec,
venue = {Bengaluru, India},
doi = {10.5281/zenodo.7316812},
url = {https://doi.org/10.5281/zenodo.7372162}
}
This research is part of NextCore – New generation of music monitoring technology (RTC2019-007248-7), funded by the Spanish Ministerio de Ciencia e Innovación and the Agencia Estatal de Investigación. Also, has received support from Industrial Doctorates plan of the Secretaria d’universitats i Recerca, Departament d’Empresa i Coneixement de la Generalitat de Catalunya, grant agreement No. DI46-2020.
Attribution
Document icon created by iconmas - Flaticon