Skip to content

V. Smart Noise Detector: Software

Anni K. edited this page Aug 27, 2024 · 8 revisions

This page provides information about our smart noise detector software.

Contents

Tools & Libraries

Beyond the basic core functionality, the software relies on several additional libraries to provide key features:

WiFi (ESP32) and HTTPClient The libraries are used for network communication to transmit the noise level and predictions over Wifi. The WiFi library offers functionality to handle the connection to a wireless network, while HTTPClient manages the HTTP requests sent to the TinyAIoT backend.

ESP-IDF I2S Driver Library The ESP-IDF I2S driver library for audio capture. It configures the I2S peripheral, handles DMA operations, and manages the low-level details of audio streaming. It offers more configurations than the Arduino-ESP32 I2S library and allows connecting the MEMS microphone using the GPIO pins.

Edge Impulse Inference SDK The project's machine learning part uses a custom Edge Impulse SDK library, the DuoNoise_inferencing library, which facilitates real-time machine learning inference on microcontrollers. The library was created on the Edge Impulse platform and integrates our trained noise recognition model. It also has methods for feature extraction and model prediction.

File Structure

The software's source code is split into multiple files based on their purpose.

Files Description
config.h The file where the network and other settings should be made.
debug.h Contains a macro that conditionally includes or excludes debug code, depending on whether debug mode is enabled or disabled.
General_EI_dB_Wlan.ino The main Arduino sketch file that coordinates the different components of the program.
microphone.h microphone.cpp Files for managing microphone-related functions, such as processing audio data, making predictions, and handling I2S communication.
wifi_manager.h wifi_manager.cpp Files for managing Wi-Fi-related operations, like connecting to a network and sending data to a server over Wi-Fi.

Configurations

The device-specific configurations must be made in the file config.h.

Name Description
CONFIG_DEBUG_MODE Integer value that controls debug output. When set to 0, no debug messages are printed. Any non-zero value enables debug messages to be displayed in the serial monitor.
WIFI_SSID The name (SSID) of the Wi-Fi network as a Sring that the device should connect to. This should be the exact name as it appears when searching for available networks.
WIFI_PASSWORD The password for the specified Wi-Fi network as a Sring.
API_URL A String containing the URL of the backend server where data will be sent via an HTTP POST request.
AUTH_BEARER A String containing the secure API token used to authenticate the device when sending data to the backend.
PROJECT_ID A unique identifier as a String representing the project with which this device is associated. It helps categorize and manage data within the backend platform.
SENSOR_ID A unique identifier as a String assigned to the specific device, allowing the backend to distinguish between data from multiple devices.

Uploading the Code

This chapter explains how to compile and upload the onto the XIAO ESP32S3 microcontroller. First, you need to have Arduino IDE, and the ESP32 Board Manager set in the Preferences. In the "Board Manager" tab, search for "esp32" by Espressif Systems and change the installed version to 2.0.17. This needs to be done because the newer versions don't have the <driver/i2s.h> necessary for the external I2S microphone.

Download the folder General_EI_dB_Wlan and the ei-duonoise-arduino-1.0.2.zip. Place the General_EI_dB_Wlan, where you keep your other Arduino sketches. To import the custom Machine Learning Library, go to "Sketch", then under "Include Library" select "Add .ZIP Library" and choose the downloaded ei-duonoise-arduino-1.0.2.zip.

After you connect the microcontroller to the computer, make sure to enable PSRAM in the "Tools." Then, you can upload the code to the microcontroller via the right-pointing arrow at the top. If you run into a "Ping timeout" message that crashes the IDE, you can try switching to the older IDE version, which seems to work around the problem for some.

Inference Cycle

The software continuously classifies ambient noise using a pre-trained machine learning model and sends the results to a backend server over Wi-Fi. The system operates in a loop that handles data acquisition, inference, and network communication.

The process is divided into phases, which are executed sequentially. The main phases in the code are initialization, following with continuous data collection, inference, and result transmission. The process repeats endlessly, and no power-saving modes or deep sleep are implemented.

flowchart LR
start([Start]) --> connect_wifi[Connect to Wi-Fi]
connect_wifi --> check_wifi{Wi-Fi Connected?}
check_wifi -- "No" --> retry_wifi[Retry Wi-Fi Connection] --> check_wifi
check_wifi -- "Yes" --> init_microphone[Initialize Microphone]
init_microphone --> check_microphone{Microphone Initialized?}
check_microphone -- "No" --> error_init[Debug: Error, Microphone Initialization Failed]
check_microphone -- "Yes" --> debug_print[Debug: Print Inference Settings]
debug_print --> run_inference[Run Inference]
run_inference --> check_inference{Inference Successful?}
check_inference -- "No" --> error_inference[Debug: Error, Inference Failed] --> run_inference
check_inference -- "Yes" --> send_results[Send dBFS and Prediction over Wi-Fi] --> run_inference
Loading

Tiny Machine Learning

The model definition, training process as well as the used data can be reviewed in the Edge Impulse Project.

Edge Impulse is a platform which allows the creation of small machine learning models. By uploading the data and specifying the desired layers, the model can easily be trained and converted to a compressed version to be executed on the microcontroller which meets our requirements for the project.

With our time constraints it was unfeasible to create our own dataset, so we used one publicly available on Kaggle. We first started with a dataset containing the two classes screaming and not screaming. However, the latter contains samples of loud music or people talking or clapping, so it did not match our requirements. For the actual training, we used the ESC-50 dataset, which contains 40 labeled recordings of five seconds for each of the 50 different classes. The classes can be grouped into five categories: animals, water sounds, human sounds, interior sounds, and urban sounds. With a small model, a distinction of that many classes is not manageable, and since we want to obtain the probability of noise to be present, we grouped all classes into either resembling noise or no noise. Animal sounds like chirping birds or barking dogs, as well as weather sounds like rain and thunder, were classified as no noise, while human-made sounds like screaming or clapping resemble noise. Although sounds like door knocking or a toilet flushing will probably never be recorded by our sensor, we kept all instances and classified them as well. With this reclassification, we labeled 24 classes as noise and 26 classes as no noise. On this data, we trained the first versions of the model, but during testing, we noticed a poor performance as the model constantly predicted noise. This phenomenon likely occurred due to a quiet background noise picked up by the microphone while the training data contained passages with absolutely no amplitude at all. To reduce this effect, we introduced new samples recorded with our own microphone. To obtain 26 minutes (320 samples) resembling no noise with our hardware, we split our recording into multiple overlapping five-second intervals, which were added to our dataset. In total, the data consists of 3 hours and 12 minutes of audio recordings.

Edge Impule follows a two-step approach of working with the samples. First, features from the recording are extracted which are later used in the neural network. We decided to represent the recordings by 1152 features obtained by using 32 filters on 0.5 second frames with a 0.125 second stride.

Due to the limited resources available, we stick to a very small neural network described in the following and depicted below. The building blocks are a reshape layer to obtain 32 columns, which represent the filters used for feature extraction. The next layer is a 1d convolution with 36 feature maps a kernel size of 3 and two output layers followed by a replica with 72 feature maps and finally a third convolution with 36 feature maps, a 1x1 kernel and 3 output layers which has a dropout rate of 25% during training. The feature maps are flattened and fed into two dense layers with 32 and 8 neurons, respectively. We tried different structures and this one achieved the highest accuracy on the validation set of 78%. It is far from perfect but an improvement to randomly guessing which would obtain an accuracy of 50%. For running inference on a five second recording, this architecture (running on the ESP32S3) needs 7 milliseconds.

grafik

There are different ways to improve the noise detection model. First, with a dedicated dataset which contains necessary and expected samples recorded by the microphone that will be used, the incoming data is probably very similar to the training data, which improves the performance of the model. Generally, larger models are better as it can be seen in state-of-the-art classification problems. We did not aim to use as much resources as possible, so slightly larger models can lead to further improvements when detecting noise. Our model uses a fixed record length of five seconds to run inference on. With a custom dataset, the frame length can be adjusted to include larger recordings resulting in more input features for the model.

Clone this wiki locally