-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding loader for IDMT-SMT-AUDIO-EFFECTS (#595)
* idmt_smt_audio_effects dataset script and index * idmt_smt_audio_effects loader * idmt_smt_audio_effects tests and resources * idmt_smt_audio_effects dataset added to docs * black formatter * fixing type error in mypy test * loader docstring * pytest fix: folder delete in dataloader * remove download func, adding unpacking dirs * fixed resources path * formatting * formatting * added tests * added tests * fixing test * fix dependencies in setup.py * modified dataset tests and added custom track to test_loaders.py * changed docstrings and exception handling * docstrings * GitHub Actions migration (#596) * ADD formatting workflow * FIX variables in formatting workflow * ADD python linting workflow and environment * ADD CI workflow and environment * Remove CircleCI * UPDATE new_loader.md PR template * ADD readthedocs * UPDATE readme badges * FIX numpy asarray bug * CHANGE arg name due to librosa update * REMOVE tox.ini * UPDATE dependencies for test * ADD dependencies * dependencies.. * dependencies.. * dependencies.. * dependencies.. * fix dependencies versions * ADD all smart_open protocols install for CI * MOVE smart_open[all] to pip install * INSTALL types to pass mypy * intall types-pyaml for python linting test * Change to work with music21 v9.* * TEST Environment CI with no reestrictions. python3.10 test passing in local * FIX ikala test to pass linux tests * FIX normpath for windows tests * BLACK * Change assert tolerance for floats in test_ikala The motivation of this change is that linux and macos return different floats. MacOS returns 260.946404518887 while Linux 260.94640451888694. So we adjust the tolerance of the test * Remove windows CI test * Assert modification forgot in the last commit * Specifying packages versinos on environment yml * Fix h5py version for python3.7 * CI test dependencies fixed at the versions of last passing test * CI test dependencies without lowerbound * CI test dependencies that should work * sort dependencies by alphabetical order * Update setup dependencies * Update test-lint dependencies * Update contributing docs to match new testing pipeline * Remove comment from test_ikala This comment was showing the assertion test done before the PR#596 * Set dependencies packages versions for docs * Remove comment * jams get_duration handling * Trigger tests after CircleCI removing --------- Co-authored-by: Magdalena Fuentes <[email protected]> * Update badges url (#598) * Update badges url * Trigger doc build again * metadata exception, whitespaces in table.rst * fixing table.rst * fixing mirdata.rst and adding references to quick_reference.rst * increasing test coverage * adding corrupted xml file for testing * modified metadata logic for xml files * removed general exception * removing FileNotFoundError, changing dirs for _ and moving Cached Properties to Attributes * revert to FileNotFoundError and test --------- Co-authored-by: Magdalena Fuentes <[email protected]> Co-authored-by: Genís Plaja-Roglans <[email protected]> Co-authored-by: Guillem Cortès <[email protected]>
- Loading branch information
1 parent
ef72b2c
commit afbc0c3
Showing
11 changed files
with
334,968 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,285 @@ | ||
"""IDMT-SMT-Audio-Effects Dataset Loader | ||
.. admonition:: Dataset Info | ||
:class: dropdown | ||
IDMT-SMT-Audio-Effects is a large database for automatic detection of audio effects in recordings of electric guitar and bass and | ||
related signal processing. The overall duration of the audio material is approx. 30 hours. | ||
The dataset consists of 55044 WAV files (44.1 kHz, 16bit, mono) with single recorded notes: | ||
20592 monophonic bass notes | ||
20592 monophonic guitar notes | ||
13860 polyphonic guitar sounds | ||
Overall, 11 different audio effects are incorporated: | ||
feedback delay, slapback delay, reverb, chorus, flanger, phaser, tremolo, vibrato, | ||
distortion, overdrive, no effect (unprocessed notes/sounds) | ||
2 different electric guitars and 2 different electric bass guitars, each with two different pick-up settings and | ||
up to three different plucking styles (finger plucked - hard, finger plucked - soft, picked) were used for recording. | ||
The notes cover the common pitch range of a 4-string bass guitar from E1 (41.2 Hz) to G3 (196.0 Hz) or the common | ||
pitch range of a 6-string electric guitar from E2 (82.4 Hz) to E5 (659.3 Hz). | ||
Effects processing was performed using a digital audio workstation and a variety of mostly freely available effect | ||
plugins. | ||
To organize the database, lists in XML format are used, which record all relevant information and are provided with | ||
the database as well as a summary of the used effect plugins and parameter settings. | ||
In addition, most of this information is also encoded in the first part of the file name of the audio files using | ||
a simple alpha-numeric encoding scheme. The second part of the file name contains unique identification numbers. | ||
This provides an option for fast and flexible structuring of the data for various purposes. | ||
DOI | ||
10.5281/zenodo.7544032 | ||
""" | ||
import os | ||
import librosa | ||
import numpy as np | ||
import xml.etree.ElementTree as ET | ||
|
||
from deprecated.sphinx import deprecated | ||
from typing import BinaryIO, Tuple, Optional | ||
from mirdata import download_utils, jams_utils, core, io | ||
from smart_open import open | ||
|
||
BIBTEX = """ | ||
@dataset{stein_michael_2023_7544032, | ||
author = {Stein, Michael}, | ||
title = {IDMT-SMT-Audio-Effects Dataset}, | ||
month = jan, | ||
year = 2023, | ||
publisher = {Zenodo}, | ||
version = {1.0.0}, | ||
doi = {10.5281/zenodo.7544032}, | ||
url = {https://doi.org/10.5281/zenodo.7544032} | ||
} | ||
""" | ||
|
||
INDEXES = { | ||
"default": "1.0", | ||
"test": "1.0", | ||
"1.0": core.Index(filename="idmt_smt_audio_effects_index.json"), | ||
} | ||
|
||
REMOTES = { | ||
"full_dataset": download_utils.RemoteFileMetadata( | ||
filename="IDMT-SMT-AUDIO-EFFECTS.zip", | ||
url="https://zenodo.org/record/7544032/files/IDMT-SMT-AUDIO-EFFECTS.zip?download=1", | ||
checksum="91e845a1b347352993ebd5ba948d5a7c", # the md5 checksum | ||
destination_dir=".", # relative path for where to unzip the data, or None | ||
unpack_directories=[""], | ||
), | ||
} | ||
|
||
DOWNLOAD_INFO = """ | ||
This loader will create the following folders in the dataset data_home path: | ||
> idmt_smt_audio_effects/ | ||
> Bass monophon/ | ||
> Bass monophon2/ | ||
> Gitarre monophon/ | ||
> Gitarre monophon2/ | ||
> Gitarre polyphon/ | ||
> Gitarre polyphon2/ | ||
""" | ||
|
||
LICENSE_INFO = """ | ||
Creative Commons BY-NC-ND 4.0. | ||
https://creativecommons.org/licenses/by-nc-nd/4.0/ | ||
""" | ||
|
||
|
||
class Track(core.Track): | ||
"""IDMT-SMT-Audio-Effects track class. | ||
Args: | ||
track_id (str): track id of the track. | ||
data_home (str): Local path where the dataset is stored. | ||
dataset_name (str): Name of the dataset. | ||
index (Dict): Index dictionary. | ||
metadata (Dict): Metadata dictionary. | ||
Attributes: | ||
audio_path (str): path to audio file. | ||
instrument (str): instrument used to record the track. | ||
midi_nr (int): midi number of the note. | ||
fx_group (int): effect group number. | ||
fx_type (int): effect type number. | ||
fx_setting (int): effect setting number. | ||
""" | ||
|
||
def __init__(self, track_id, data_home, dataset_name, index, metadata): | ||
super().__init__( | ||
track_id, | ||
data_home, | ||
dataset_name, | ||
index, | ||
metadata, | ||
) | ||
""" | ||
Args: | ||
track_id (str): track id of the track | ||
data_home (str): Local path where the dataset is stored. If `None`, looks for the data in the default directory, `~/mir_datasets/idmt_smt_audio_effects` | ||
dataset_name (str): Name of the dataset. | ||
index (Dict): Index dictionary. | ||
metadata (Dict): Metadata dictionary. | ||
""" | ||
self.audio_path = self.get_path("audio") | ||
|
||
@property | ||
def instrument(self): | ||
return self._track_metadata["instrument"] | ||
|
||
@property | ||
def midi_nr(self): | ||
return self._track_metadata["midi_nr"] | ||
|
||
@property | ||
def fx_group(self): | ||
return self._track_metadata["fx_group"] | ||
|
||
@property | ||
def fx_type(self): | ||
return self._track_metadata["fx_type"] | ||
|
||
@property | ||
def fx_setting(self): | ||
return self._track_metadata["fx_setting"] | ||
|
||
@property | ||
def audio(self) -> Optional[Tuple[np.ndarray, float]]: | ||
"""The track's audio | ||
Returns: | ||
* np.ndarray - audio signal | ||
* float - sample rate | ||
""" | ||
try: | ||
return load_audio(self.audio_path) | ||
except FileNotFoundError: | ||
raise FileNotFoundError( | ||
f"Audio file {self.audio_path} not found. Did you run .download?" | ||
) | ||
|
||
def to_jams(self): | ||
"""Get the track's data in jams format | ||
Returns: | ||
jams.JAMS: the track's data in jams format | ||
""" | ||
return jams_utils.jams_converter( | ||
audio_path=self.audio_path, | ||
metadata=self._track_metadata, | ||
) | ||
|
||
|
||
# no decorator here because of https://github.com/librosa/librosa/issues/1267 | ||
def load_audio(fhandle: BinaryIO) -> Tuple[np.ndarray, float]: | ||
"""Load a IDMT-SMT-Audio Effect track. | ||
Args: | ||
fhandle (Union[str, BinaryIO]): Path to audio file or file-like object. | ||
Returns: | ||
* np.ndarray - the mono audio signal | ||
* float - The sample rate of the audio file | ||
""" | ||
return librosa.load(fhandle, sr=44100, mono=True) | ||
|
||
|
||
@core.docstring_inherit(core.Dataset) | ||
class Dataset(core.Dataset): | ||
"""The IDMT-SMT-Audio Effect dataset. | ||
Args: | ||
data_home (str): Directory where the dataset is located or will be downloaded. | ||
version (str): Dataset version. Default is "default". | ||
Attributes: | ||
name (str): Name of the dataset. | ||
track_class (Type[core.Track]): Track type. | ||
bibtex (str): BibTeX citation. | ||
indexes (Dict[str, core.Index]): Available versions. | ||
remotes (Dict[str, download_utils.RemoteFileMetadata]): Data to be downloaded. | ||
download_info (str): Instructions for downloading the dataset. | ||
license_info (str): Dataset license. | ||
""" | ||
|
||
def __init__(self, data_home=None, version="default"): | ||
super().__init__( | ||
data_home, | ||
version, | ||
name="idmt_smt_audio_effects", | ||
track_class=Track, | ||
bibtex=BIBTEX, | ||
indexes=INDEXES, | ||
remotes=REMOTES, | ||
download_info=DOWNLOAD_INFO, | ||
license_info=LICENSE_INFO, | ||
) | ||
|
||
@core.cached_property | ||
def _metadata(self): | ||
"""Return a dictionary containing metadata information parsed from XML files. | ||
Returns: | ||
dict: A dictionary containing metadata information parsed from XML files. | ||
Raises: | ||
FileNotFoundError: If metadata file not found. | ||
ValueError: If there's an error parsing the XML file. | ||
Exception: For unexpected errors during processing. | ||
""" | ||
metadata = dict() | ||
metadata = { | ||
"fileID": { | ||
"list_id": str, | ||
"instrument": str, | ||
"midi_nr": str, | ||
"fx_group": int, | ||
"fx_type": int, | ||
"fx_setting": int, | ||
} | ||
} | ||
|
||
xml_files_count = 0 | ||
|
||
for root, _, files in os.walk(self.data_home): | ||
for file in files: | ||
if file.endswith(".xml"): | ||
xml_files_count += 1 | ||
xml_path = os.path.join(root, file) | ||
try: | ||
with open(xml_path, "r") as fhandle: | ||
tree = ET.parse(fhandle) | ||
|
||
except ET.ParseError: | ||
raise ValueError( | ||
f"Error parsing XML file {xml_path}. The file may be corrupted or not abailable, make sure you have all files." | ||
) | ||
|
||
root_xml = tree.getroot() | ||
listID = root_xml.find("listinformation/listID").text | ||
|
||
for audiofile in root_xml.findall("audiofile"): | ||
name = audiofile.find("fileID").text | ||
instrument = audiofile.find("instrument").text | ||
midinr = audiofile.find("midinr").text | ||
fxgroup = audiofile.find("fxgroup").text | ||
fxtype = audiofile.find("fxtype").text | ||
fxsetting = audiofile.find("fxsetting").text | ||
|
||
metadata[name] = { | ||
"list_id": listID, | ||
"instrument": instrument, | ||
"midi_nr": int(midinr), | ||
"fx_group": int(fxgroup), | ||
"fx_type": int(fxtype), | ||
"fx_setting": int(fxsetting), | ||
} | ||
|
||
if xml_files_count == 0: | ||
raise FileNotFoundError( | ||
f"No XML files found in {self.data_home}. Did you run .download?" | ||
) | ||
return metadata |
Oops, something went wrong.