This is an official page of "BAF: An Audio Fingerprinting Dataset For Broadcast Monitoring" published in ISMIR 2022.

4min video presentation:

Dataset

Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound's private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.

It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.

Downloading the data

The dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis. It is available upon request on Zenodo alongside an extended description of the dataset contents, motivation, license, ownership of the data, and the dataset datasheet.

Algorithms

Configuration files are located at baf-dataset/configs.

Audfprint code repository: https://github.com/dpwe/audfprint
Panako / Olaf code repository: https://github.com/JorenSix/panako (at its 2.1 version release)
NeuralFP code repository: https://github.com/guillemcortes/neural-audio-fp (forked from https://github.com/mimbres/neural-audio-fp)
PeakFP code can be found in this repository at baf-dataset/peakfp directory

Code

baf-dataset/
├── compute_statistics.py --> Script to generate metrics
├── configs --> Parameter configurations used
│   ├── audfprint.cfg
│   ├── …
│   └── panako.cfg
└── peakfp --> Fingerprinting baseline
    ├── README.md
    ├── constants.py
    ├── …
    └── utils.py

Installation

The authors recommend the use of virtual environments.

Requirements:

Python 3.6+
Create virtual environment and install requirements

git clone https://github.com/guillemcortes/baf-dataset.git
cd baf-dataset
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

BAF has a dedicated dataloader in mirdata that can help working with tha dataset. Check here the documentation.

License

The code in this repository is licensed under Apache 2.0
Dataset license is detailed in Zenodo

Citation

Please cite the following publication when using the dataset:

Guillem Cortès, Alex Ciurana, Emilio Molina, Marius Miron, Owen Meyers, Joren Six, & Xavier Serra. (2022). BAF: An audio fingerprinting dataset for broadcast monitoring. Proceedings of the 23rd International Society for Music Information Retrieval Conference, pp. 908–916. 4-8 December 2022, Bengaluru, India.

Bibtex version:

@inproceedings{cortes2022BAF,
  author       = {Guillem Cortès and
                  Alex Ciurana and
                  Emilio Molina and
                  Marius Miron and
                  Owen Meyers and
                  Joren Six and
                  Xavier Serra},
  title        = {{BAF: An audio fingerprinting dataset for broadcast monitoring}},
  booktitle    = {{Proceedings of the 23rd International Society for Music Information Retrieval Conference}},
  year         = 2022,
  pages        = {908-916},
  publisher    = {ISMIR},
  address      = {Bengaluru, India},
  month        = dec,
  venue        = {Bengaluru, India},
  doi          = {10.5281/zenodo.7316812},
  url          = {https://doi.org/10.5281/zenodo.7372162}
}

Acknowledgements

This research is part of NextCore – New generation of music monitoring technology (RTC2019-007248-7), funded by the Spanish Ministerio de Ciencia e Innovación and the Agencia Estatal de Investigación. Also, has received support from Industrial Doctorates plan of the Secretaria d’universitats i Recerca, Departament d’Empresa i Coneixement de la Generalitat de Catalunya, grant agreement No. DI46-2020.

Attribution

Document icon created by iconmas - Flaticon

Database icon created by Bharat Icons - Flaticon

Youtube Logo from freepnglogos.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dataset

Downloading the data

Algorithms

Code

Installation

Usage

License

Citation

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dataset

Downloading the data

Algorithms

Code

Installation

Usage

License

Citation

Acknowledgements