Skip to content

Urban sounds classification with Covnolutional Neural Networks

License

Notifications You must be signed in to change notification settings

LambdaColdStorage/urban-audio-classifier

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Urban sounds classification with Convolutional Neural Networks

The objective of this project is to implement CNN models to recognize sound events from the UrbanSound9K dataset. The work has been divided into the following notebooks:

  1. Data analysis (and related papers brief)
  2. Pre-processing and feature evaluation
  3. CNN model with MFCC
  4. CNN model with Log-MEL Spectrograms
  5. Data augmentation
  6. Data augmentation pre-processing
  7. CNN model with augmented data (Log-MEL Spectrograms)

Notebooks

  1. Data analysis: a brief about previous works with the URbanSound8K dataset (scientific papers), dataset exploration, distribution analysis, listening.

  2. Pre-processing: an introduction to different audible features we can use to work with digital audio, the pre-processing pipeline, STFT, MFCC and Log-MEL Spectrograms, feature extraction and data normalization.

  3. CNN model with MFCC features: data preparation, CNN model definition (with detailed explanation) using Keras and TensorFlow back-end. Solution of a multi-class classification problem, model evaluation and testing, Recall, Precision and F1 analysis.

  4. CNN Model with Log-MEL Spectrograms: a performance comparison using the same CNN model architecture with MEL spectrograms. Same training and evaluation than notebook #3.

  5. Data augmentation: creation of augmented data from UrbanSound8K original sounds, using common audio effects like pitch shifting, time stretching, adding noise, with LibROSA.

  6. Augmented pre-processing: audible features extraction from the new generated data.

  7. CNN model with augmented data: using the same CNN architecture and almost identical training procedures with the generated data. Model evaluation and test to compare with previous achievements.

Getting the dataset

Download a copy of the UrbanSounds8K dataset from the UrbanSound8K home page.

Make sure to uncompress the dataset root directory into the project root, you should end up with a directory like "UrbanSounds8K" (or a symbolic link to it) in the project root.

Install required libraries

Make sure that Tensorflow, Keras, LibROSA, IPython, NumPy, Pandas, Matplotlib and SciKit Learn are already installed in your environment.

Note that we are using Tensorflow as Keras back-end, you must set this in your ~/.keras/keras.json file, this is an example:

{
    "image_dim_ordering": "tf",
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

The UrbanSound8K dataset

The UrbanSound8K dataset is a compilation of urban sound recordings, classified in 10 categories according to the paper "A Dataset and Taxonomy for Urban Sound Research", which proposes a taxonomical categorization to describe different environmental sound types.

The UrbanSound8K dataset contains 8732 labeled sound slices of varying duration up to 4 seconds. The categorization labels being:

  1. Air Conditioner
  2. Car Horn
  3. Children Playing
  4. Dog bark
  5. Drilling
  6. Engine Idling
  7. Gun Shot
  8. Jackhammer
  9. Siren
  10. Street Music

Note that the dataset comes already organized in 10 validation folds. In the case we want to compare our results with other we should stick with this schema.

Dataset metadata

The included metadata file ("UrbanSound8K/metadata/metadata.csv") provides all the required information about each audio file:

  • slice_file_name: The name of the audio file.
  • fsID: The Freesound ID of the recording from which this excerpt (slice) is taken
  • start: The start time of the slice in the original Freesound recording
  • end: The end time of slice in the original Freesound recording
  • salience: A (subjective) salience rating of the sound. 1 = foreground, 2 = background.
  • fold: The fold number (1-10) to which this file has been allocated.
  • classID: A numeric identifier of the sound class.
  • class: The class label name.

References

1- Data analysis

2- Data pre-processing

4- Model optimization

5- Related papers

  • Environmental sound classification with convolutional neural networks, Karol J. Piczak

  • Dilated convolution neural network with LeakyReLU for environmental sound classification, Xiaohu Zhang ; Yuexian Zou ; Wei Shi.

  • Deep Convolutional Neural Network with Mixup for Environmental Sound Classification, Zhichao Zhang, Shugong Xu, Shan Cao, Shunqing Zhang

  • End-to-End Environmental Sound Classification using a 1DConvolutional Neural NetworkSajjad Abdoli, Patrick Cardinal, Alessandro Lameiras Koerich

  • An Ensemble Stacked Convolutional Neural Network Model for Environmental Event Sound Recognition, Shaobo Li, Yong Yao, Jie Hu, Guokai Liu, Xuemei Yao 3, Jianjun Hu

  • Classifying environmental sounds using image recognition networks, Venkatesh Boddapati, Andrej Petef, Jim Rasmusson, Lars Lundberg

  • Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion, Yu Su, Ke Zhang, Jingyu Wang, Kurosh Madani

Comments, suggestions and corrections always welcome

About

Urban sounds classification with Covnolutional Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%