Adding loader for IDMT-SMT-AUDIO-EFFECTS (#595)

* idmt_smt_audio_effects dataset script and index * idmt_smt_audio_effects loader * idmt_smt_audio_effects tests and resources * idmt_smt_audio_effects dataset added to docs * black formatter * fixing type error in mypy test * loader docstring * pytest fix: folder delete in dataloader * remove download func, adding unpacking dirs * fixed resources path * formatting * formatting * added tests * added tests * fixing test * fix dependencies in setup.py * modified dataset tests and added custom track to test_loaders.py * changed docstrings and exception handling * docstrings * GitHub Actions migration (#596) * ADD formatting workflow * FIX variables in formatting workflow * ADD python linting workflow and environment * ADD CI workflow and environment * Remove CircleCI * UPDATE new_loader.md PR template * ADD readthedocs * UPDATE readme badges * FIX numpy asarray bug * CHANGE arg name due to librosa update * REMOVE tox.ini * UPDATE dependencies for test * ADD dependencies * dependencies.. * dependencies.. * dependencies.. * dependencies.. * fix dependencies versions * ADD all smart_open protocols install for CI * MOVE smart_open[all] to pip install * INSTALL types to pass mypy * intall types-pyaml for python linting test * Change to work with music21 v9.* * TEST Environment CI with no reestrictions. python3.10 test passing in local * FIX ikala test to pass linux tests * FIX normpath for windows tests * BLACK * Change assert tolerance for floats in test_ikala The motivation of this change is that linux and macos return different floats. MacOS returns 260.946404518887 while Linux 260.94640451888694. So we adjust the tolerance of the test * Remove windows CI test * Assert modification forgot in the last commit * Specifying packages versinos on environment yml * Fix h5py version for python3.7 * CI test dependencies fixed at the versions of last passing test * CI test dependencies without lowerbound * CI test dependencies that should work * sort dependencies by alphabetical order * Update setup dependencies * Update test-lint dependencies * Update contributing docs to match new testing pipeline * Remove comment from test_ikala This comment was showing the assertion test done before the PR#596 * Set dependencies packages versions for docs * Remove comment * jams get_duration handling * Trigger tests after CircleCI removing --------- Co-authored-by: Magdalena Fuentes <[email protected]> * Update badges url (#598) * Update badges url * Trigger doc build again * metadata exception, whitespaces in table.rst * fixing table.rst * fixing mirdata.rst and adding references to quick_reference.rst * increasing test coverage * adding corrupted xml file for testing * modified metadata logic for xml files * removed general exception * removing FileNotFoundError, changing dirs for _ and moving Cached Properties to Attributes * revert to FileNotFoundError and test --------- Co-authored-by: Magdalena Fuentes <[email protected]> Co-authored-by: Genís Plaja-Roglans <[email protected]> Co-authored-by: Guillem Cortès <[email protected]>
mir-dataset-loaders · Nov 2, 2023 · afbc0c3 · afbc0c3
1 parent ef72b2c
commit afbc0c3
Show file tree

Hide file tree

Showing 11 changed files with 334,968 additions and 1 deletion.
diff --git a/docs/source/mirdata.rst b/docs/source/mirdata.rst
@@ -219,6 +219,14 @@ haydn_op20
    :inherited-members:
 
 
+idmt_smt_audio_effects
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. automodule:: mirdata.datasets.idmt_smt_audio_effects
+   :members:
+   :inherited-members:
+
+
 ikala
 ^^^^^
 

diff --git a/docs/source/quick_reference.rst b/docs/source/quick_reference.rst
@@ -79,6 +79,13 @@ Musical pitch contours, typically encoded as time series indicating the musical
 The time series typically have evenly spaced timestamps, each with a corresponding pitch value
 which may be encoded in a number of formats/granularities, including midi note numbers and Hertz.
 
+.. _fx:
+
+Effect
+^^^^^^
+Effect applied to a track. It may refer to the effect applied to a single stroke or an entire track. 
+It can include the effect name, the effect type, the effect parameters, and the effect settings.
+
 .. _genre:
 
 Genre
@@ -219,7 +226,7 @@ strings, sometimes with associated weights/confidences.
 .. _tonic:
 
 Tonic
-^^^^^^^^^^^
+^^^^^
 The absolute tonic of a track. It may refer to the tonic a single stroke, or the tonal center of
 a track.
 

diff --git a/docs/source/table.rst b/docs/source/table.rst
@@ -281,6 +281,16 @@
      - .. image:: https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png
           :target: https://creativecommons.org/licenses/by-nc-sa/4.0
 
+   * - IDMT-SMT-Audio Effects
+     - - audio: ✅
+       - annotations: ✅
+     - - instrument :ref:`instruments`
+       - midi nr :ref:`notes`
+       - metadata :ref:`fx`
+     - 55044
+     - .. image:: https://licensebuttons.net/l/by-nc-nd/4.0/80x15.png
+          :target: https://creativecommons.org/licenses/by-nc-nd/4.0/
+
    * - IRMAS
      - - audio: ✅
        - annotations: ✅

diff --git a/mirdata/datasets/idmt_smt_audio_effects.py b/mirdata/datasets/idmt_smt_audio_effects.py
@@ -0,0 +1,285 @@
+"""IDMT-SMT-Audio-Effects Dataset Loader
+
+.. admonition:: Dataset Info
+    :class: dropdown
+
+    IDMT-SMT-Audio-Effects is a large database for automatic detection of audio effects in recordings of electric guitar and bass and
+    related signal processing. The overall duration of the audio material is approx. 30 hours.
+
+    The dataset consists of 55044 WAV files (44.1 kHz, 16bit, mono) with single recorded notes:
+
+    20592 monophonic bass notes
+    20592 monophonic guitar notes
+    13860 polyphonic guitar sounds
+    Overall, 11 different audio effects are incorporated:
+    feedback delay, slapback delay, reverb, chorus, flanger, phaser, tremolo, vibrato, 
+    distortion, overdrive, no effect (unprocessed notes/sounds)
+
+    2 different electric guitars and 2 different electric bass guitars, each with two different pick-up settings and
+    up to three different plucking styles (finger plucked - hard, finger plucked - soft, picked) were used for recording.
+    The notes cover the common pitch range of a 4-string bass guitar from E1 (41.2 Hz) to G3 (196.0 Hz) or the common
+    pitch range of a 6-string electric guitar from E2 (82.4 Hz) to E5 (659.3 Hz).
+    Effects processing was performed using a digital audio workstation and a variety of mostly freely available effect
+    plugins.
+
+    To organize the database, lists in XML format are used, which record all relevant information and are provided with
+    the database as well as a summary of the used effect plugins and parameter settings.
+
+    In addition, most of this information is also encoded in the first part of the file name of the audio files using 
+    a simple alpha-numeric encoding scheme. The second part of the file name contains unique identification numbers. 
+    This provides an option for fast and flexible structuring of the data for various purposes.
+
+    DOI
+    10.5281/zenodo.7544032
+"""
+import os
+import librosa
+import numpy as np
+import xml.etree.ElementTree as ET
+
+from deprecated.sphinx import deprecated
+from typing import BinaryIO, Tuple, Optional
+from mirdata import download_utils, jams_utils, core, io
+from smart_open import open
+
+BIBTEX = """
+@dataset{stein_michael_2023_7544032,
+  author       = {Stein, Michael},
+  title        = {IDMT-SMT-Audio-Effects Dataset},
+  month        = jan,
+  year         = 2023,
+  publisher    = {Zenodo},
+  version      = {1.0.0},
+  doi          = {10.5281/zenodo.7544032},
+  url          = {https://doi.org/10.5281/zenodo.7544032}
+}
+"""
+
+INDEXES = {
+    "default": "1.0",
+    "test": "1.0",
+    "1.0": core.Index(filename="idmt_smt_audio_effects_index.json"),
+}
+
+REMOTES = {
+    "full_dataset": download_utils.RemoteFileMetadata(
+        filename="IDMT-SMT-AUDIO-EFFECTS.zip",
+        url="https://zenodo.org/record/7544032/files/IDMT-SMT-AUDIO-EFFECTS.zip?download=1",
+        checksum="91e845a1b347352993ebd5ba948d5a7c",  # the md5 checksum
+        destination_dir=".",  # relative path for where to unzip the data, or None
+        unpack_directories=[""],
+    ),
+}
+
+DOWNLOAD_INFO = """
+        This loader will create the following folders in the dataset data_home path:
+            > idmt_smt_audio_effects/
+                > Bass monophon/
+                > Bass monophon2/
+                > Gitarre monophon/
+                > Gitarre monophon2/
+                > Gitarre polyphon/
+                > Gitarre polyphon2/
+"""
+
+LICENSE_INFO = """
+Creative Commons BY-NC-ND 4.0.
+https://creativecommons.org/licenses/by-nc-nd/4.0/
+"""
+
+
+class Track(core.Track):
+    """IDMT-SMT-Audio-Effects track class.
+
+    Args:
+        track_id (str): track id of the track.
+        data_home (str): Local path where the dataset is stored.
+        dataset_name (str): Name of the dataset.
+        index (Dict): Index dictionary.
+        metadata (Dict): Metadata dictionary.
+
+    Attributes:
+        audio_path (str): path to audio file.
+        instrument (str): instrument used to record the track.
+        midi_nr (int): midi number of the note.
+        fx_group (int): effect group number.
+        fx_type (int): effect type number.
+        fx_setting (int): effect setting number.
+    """
+
+    def __init__(self, track_id, data_home, dataset_name, index, metadata):
+        super().__init__(
+            track_id,
+            data_home,
+            dataset_name,
+            index,
+            metadata,
+        )
+        """
+        Args:
+            track_id (str): track id of the track
+            data_home (str): Local path where the dataset is stored. If `None`, looks for the data in the default directory, `~/mir_datasets/idmt_smt_audio_effects`
+            dataset_name (str): Name of the dataset.
+            index (Dict): Index dictionary.
+            metadata (Dict): Metadata dictionary.
+        """
+        self.audio_path = self.get_path("audio")
+
+    @property
+    def instrument(self):
+        return self._track_metadata["instrument"]
+
+    @property
+    def midi_nr(self):
+        return self._track_metadata["midi_nr"]
+
+    @property
+    def fx_group(self):
+        return self._track_metadata["fx_group"]
+
+    @property
+    def fx_type(self):
+        return self._track_metadata["fx_type"]
+
+    @property
+    def fx_setting(self):
+        return self._track_metadata["fx_setting"]
+
+    @property
+    def audio(self) -> Optional[Tuple[np.ndarray, float]]:
+        """The track's audio
+
+        Returns:
+            * np.ndarray - audio signal
+            * float - sample rate
+        """
+        try:
+            return load_audio(self.audio_path)
+        except FileNotFoundError:
+            raise FileNotFoundError(
+                f"Audio file {self.audio_path} not found. Did you run .download?"
+            )
+
+    def to_jams(self):
+        """Get the track's data in jams format
+
+        Returns:
+            jams.JAMS: the track's data in jams format
+
+        """
+        return jams_utils.jams_converter(
+            audio_path=self.audio_path,
+            metadata=self._track_metadata,
+        )
+
+
+# no decorator here because of https://github.com/librosa/librosa/issues/1267
+def load_audio(fhandle: BinaryIO) -> Tuple[np.ndarray, float]:
+    """Load a IDMT-SMT-Audio Effect track.
+
+    Args:
+        fhandle (Union[str, BinaryIO]): Path to audio file or file-like object.
+
+    Returns:
+        * np.ndarray - the mono audio signal
+        * float - The sample rate of the audio file
+    """
+    return librosa.load(fhandle, sr=44100, mono=True)
+
+
+@core.docstring_inherit(core.Dataset)
+class Dataset(core.Dataset):
+    """The IDMT-SMT-Audio Effect dataset.
+
+    Args:
+        data_home (str): Directory where the dataset is located or will be downloaded.
+        version (str): Dataset version. Default is "default".
+
+    Attributes:
+        name (str): Name of the dataset.
+        track_class (Type[core.Track]): Track type.
+        bibtex (str): BibTeX citation.
+        indexes (Dict[str, core.Index]): Available versions.
+        remotes (Dict[str, download_utils.RemoteFileMetadata]): Data to be downloaded.
+        download_info (str): Instructions for downloading the dataset.
+        license_info (str): Dataset license.
+    """
+
+    def __init__(self, data_home=None, version="default"):
+        super().__init__(
+            data_home,
+            version,
+            name="idmt_smt_audio_effects",
+            track_class=Track,
+            bibtex=BIBTEX,
+            indexes=INDEXES,
+            remotes=REMOTES,
+            download_info=DOWNLOAD_INFO,
+            license_info=LICENSE_INFO,
+        )
+
+    @core.cached_property
+    def _metadata(self):
+        """Return a dictionary containing metadata information parsed from XML files.
+
+        Returns:
+            dict: A dictionary containing metadata information parsed from XML files.
+
+        Raises:
+            FileNotFoundError: If metadata file not found.
+            ValueError: If there's an error parsing the XML file.
+            Exception: For unexpected errors during processing.
+        """
+        metadata = dict()
+        metadata = {
+            "fileID": {
+                "list_id": str,
+                "instrument": str,
+                "midi_nr": str,
+                "fx_group": int,
+                "fx_type": int,
+                "fx_setting": int,
+            }
+        }
+
+        xml_files_count = 0
+
+        for root, _, files in os.walk(self.data_home):
+            for file in files:
+                if file.endswith(".xml"):
+                    xml_files_count += 1
+                    xml_path = os.path.join(root, file)
+                    try:
+                        with open(xml_path, "r") as fhandle:
+                            tree = ET.parse(fhandle)
+
+                    except ET.ParseError:
+                        raise ValueError(
+                            f"Error parsing XML file {xml_path}. The file may be corrupted or not abailable, make sure you have all files."
+                        )
+
+                    root_xml = tree.getroot()
+                    listID = root_xml.find("listinformation/listID").text
+
+                    for audiofile in root_xml.findall("audiofile"):
+                        name = audiofile.find("fileID").text
+                        instrument = audiofile.find("instrument").text
+                        midinr = audiofile.find("midinr").text
+                        fxgroup = audiofile.find("fxgroup").text
+                        fxtype = audiofile.find("fxtype").text
+                        fxsetting = audiofile.find("fxsetting").text
+
+                        metadata[name] = {
+                            "list_id": listID,
+                            "instrument": instrument,
+                            "midi_nr": int(midinr),
+                            "fx_group": int(fxgroup),
+                            "fx_type": int(fxtype),
+                            "fx_setting": int(fxsetting),
+                        }
+
+        if xml_files_count == 0:
+            raise FileNotFoundError(
+                f"No XML files found in {self.data_home}. Did you run .download?"
+            )
+        return metadata