Skip to content

Commit

Permalink
Merge pull request #35 from chaosparrot/recording-improvements
Browse files Browse the repository at this point in the history
Recording improvements
  • Loading branch information
chaosparrot authored Mar 14, 2023
2 parents ae010fd + 03f85d6 commit 51152fb
Show file tree
Hide file tree
Showing 19 changed files with 1,727 additions and 260 deletions.
25 changes: 16 additions & 9 deletions docs/RECORDING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,33 @@ In order to train a model, you need to record sounds first. You can do this by r

![Installing packages](media/settings-record.png)

This script will record sounds in seperate files of 30 milliseconds each and save them in your recordings folder ( data/recordings is the default place, which can be changed in the data/code/config.py file using the examples in lib/default_conifg.py ).
This script will record your microphone and save the detected areas inside of an SRT file. It will record in overlapping segments of 30 milliseconds.
You have to be sure to record as little noise as possible. For example, if you are recording a bell sound, it is imperative that you only record that sound.
If you accidentally recorded a different sound, you can always delete the specific file from the recordings directory.


![Installing packages](media/settings-record-progress.png)

In order to make sure you only record the sound you want to record, you can alter the power setting at the start. I usually choose a value between 1000 and 2000.
You can also trim out stuff below a specific frequency value. Neither the intensity, power or the frequency values I am using isn't actually an SI unit like dB or Hz, just some rough calculations which will go up when the loudness or frequency goes up.
During the recording, you can also pause the recording using SPACE or quit it using ESC.
If you feel a sneeze coming up, or a car passes by, you can press these keys to make sure you don't have to remove data.
If you accidentally did record a different sound, you can always press BACKSPACE or - to remove some data from the recording.

During the recording, you can also pause the recording using SPACE or quit it using ESC. If you feel a sneeze coming up, or a car passes by, you can press these keys to make sure you don't have to prune away a lot of files.
You can look at the 'Recorded' part during the recording session to see how much of your sound has been detected.

### Amount of data needed

I found that you need around 30 seconds of recorded sound, roughly 1000 samples, to get a working recognition of a specific sound. Depending on the noise it would take between a minute and two minutes to record the sounds ( there are less samples to pick from with short sounds like clicks, whereas longer sounds like vowels give more samples ).
You will start getting diminishing returns past two and a half minutes of recorded sound ( 5000 samples ), but the returns are still there. As of the moment of this writing, I used 15000 samples for the Hollow Knight demo.
The Data quantity part of the recording shows you whether we think you have enough data for a model.
The minimum required is about 16 seconds, 41 seconds is a good amount, and anything above 1 minute 22 seconds is considered excellent.
You will start getting diminishing returns after that, but the returns are still there. I used about 4 minutes per sound for the Hollow Knight demo.
You can try any amount and see if they recognize well.

From this version onward, there will also be full recordings of the recording session saved in the source directory inside of the sound you are recording. This might come in handy when we start adding more sophisticated models in the future.
If you want the model to do well, you should aim to have about the same amount of recordings for every sound you record.

### Checking the quality of the detection

If you want to see if the detection was alright, you can either open up the SRT file inside the segments folder of your recorded sound and compare it to the source file, or use the comparison.wav file inside of the segments folder.
If you place both the source file and the comparison.wav file inside a program like Audacity, you can see the spots where it detected a sound.

You can use these source files to resegment the recordings you have made as well, by using the [V] menu at the start and then navigating to [S]. This will reuse the source files available to read out the wav data and persist them inside the data/output folder.
![Audacity comparing detection](media/settings-compare-detection.png)

### Background noise

Expand Down
Binary file added docs/media/settings-compare-detection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/media/settings-record-progress.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/media/settings-record.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 39 additions & 39 deletions lib/audio_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import numpy as np
import random
import math
from lib.wav import load_wav_data_from_srt

class AudioDataset(Dataset):

Expand All @@ -15,61 +16,60 @@ def __init__(self, grouped_data_directories, settings):
self.augmented_samples = []
self.length = 0
self.training = False
rebuild_cache = False

for index, label in enumerate( grouped_data_directories ):
directories = grouped_data_directories[ label ]

listed_files = []
listed_files = {}
for directory in directories:
for file in os.listdir( directory ):
if( file.endswith(".wav") ):
listed_files.append( os.path.join(directory, file) )
segments_directory = os.path.join(directory, "segments")
source_directory = os.path.join(directory, "source")
if not (os.path.exists(segments_directory) and os.path.exists(source_directory)):
continue

source_files = os.listdir(source_directory)
srt_files = [x for x in os.listdir(segments_directory) if x.endswith(".srt")]
for source_file in source_files:
shared_key = source_file.replace(".wav", "")

possible_srt_files = [x for x in srt_files if x.startswith(shared_key)]
if len(possible_srt_files) == 0:
continue

# Find the highest version of the segmentation for this source file
srt_file = possible_srt_files[0]
for possible_srt_file in possible_srt_files:
current_version = int( srt_file.replace(".srt", "").replace(shared_key + ".v", "") )
version = int( possible_srt_file.replace(".srt", "").replace(shared_key + ".v", "") )
if version > current_version:
srt_file = possible_srt_file

listed_files[os.path.join(source_directory, source_file)] = os.path.join(segments_directory, srt_file)
listed_files_size = len( listed_files )

print( f"Loading in {label}: {listed_files_size} files" )

for file_index, full_filename in enumerate( listed_files ):
print( str( math.floor(((file_index + 1 ) / listed_files_size ) * 100)) + "%", end="\r" )

# When the input length changes due to a different input type being used, we need to rebuild the cache from scratch
if (index == 0 and file_index == 0):
rebuild_cache = len(self.feature_engineering_cached(full_filename, False)) != len(self.feature_engineering_augmented(full_filename))

self.samples.append([full_filename, index, torch.tensor(self.feature_engineering_cached(full_filename, rebuild_cache)).float()])
self.augmented_samples.append(None)
print( f"Loading in {label}" )
listed_source_files = listed_files.keys()
for file_index, full_filename in enumerate( listed_source_files ):
all_samples = load_wav_data_from_srt(listed_files[full_filename], full_filename, self.settings['FEATURE_ENGINEERING_TYPE'], False)
augmented_samples = load_wav_data_from_srt(listed_files[full_filename], full_filename, self.settings['FEATURE_ENGINEERING_TYPE'], False, True)

for sample in all_samples:
self.samples.append([full_filename, index, torch.tensor(sample).float()])
for augmented_sample in augmented_samples:
self.augmented_samples.append([full_filename, index, torch.tensor(augmented_sample).float()])

def set_training(self, training):
self.training = training

def feature_engineering_cached(self, filename, rebuild_cache=False):
# Only build a filesystem cache of feature engineering results if we are dealing with non-raw wave form
if (self.settings['FEATURE_ENGINEERING_TYPE'] != 1):
cache_dir = os.path.join(os.path.dirname(filename), "cache")
os.makedirs(cache_dir, exist_ok=True)
cached_filename = os.path.join(cache_dir, os.path.basename(filename) + "_fe")
if (os.path.isfile(cached_filename) == False or rebuild_cache == True):
data_row = training_feature_engineering(filename, self.settings)
np.savetxt( cached_filename, data_row )
else:
cached_filename = filename

return np.loadtxt( cached_filename, dtype='float' )

def feature_engineering_augmented(self, filename):
return augmented_feature_engineering(filename, self.settings)

def __len__(self):
return len( self.samples )

def __getitem__(self, idx):
# During training, get a 10% probability that you get an augmented sample
if (self.training and random.uniform(0, 1) >= 0.9 ):
if (self.augmented_samples[idx] is None):
self.augmented_samples[idx] = [self.samples[idx][0], self.samples[idx][1], torch.tensor(self.feature_engineering_augmented(self.samples[idx][0])).float()]
return self.augmented_samples[idx][2], self.augmented_samples[idx][1]
else:
return self.samples[idx][2], self.samples[idx][1]

if (idx in self.augmented_samples):
return self.augmented_samples[idx][2], self.augmented_samples[idx][1]
return self.samples[idx][2], self.samples[idx][1]

def get_labels(self):
return self.paths
6 changes: 5 additions & 1 deletion lib/default_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,8 @@
if( SPEECHREC_ENABLED == True ):
SPEECHREC_ENABLED = dragonfly_spec is not None


BACKGROUND_LABEL = "silence"

# Detection strategies
CURRENT_VERSION = 1
CURRENT_DETECTION_STRATEGY = "auto_dBFS_mend_dBFS_30ms_secondary_dBFS_reject_cont_45ms_repair"
5 changes: 4 additions & 1 deletion lib/key_poller.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ def __exit__(self, type, value, traceback):
def poll(self):
if( IS_WINDOWS == True ):
if( msvcrt.kbhit() ):
return msvcrt.getch().decode()
ch = msvcrt.getch()
if ch == b'\xe0' or ch == b'\000':
ch = msvcrt.getch()
return ch.decode()
else:
dr,dw,de = select.select([sys.stdin], [], [], 0)
if not dr == []:
Expand Down
3 changes: 2 additions & 1 deletion lib/learn_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from sklearn.neural_network import *
from lib.combine_models import define_settings, get_current_default_settings
from lib.audio_model import AudioModel
from lib.wav import load_wav_files_with_srts

def learn_data():
dir_path = os.path.join( os.path.dirname( os.path.dirname( os.path.realpath(__file__)) ), DATASET_FOLDER)
Expand Down Expand Up @@ -205,7 +206,7 @@ def load_data( dir_path, max_files, input_type ):
for str_label, directories in grouped_data_directories.items():
# Add a label used for classifying the sounds
id_label = get_label_for_directory( "".join( directories ) )
cat_dataset_x, cat_dataset_labels, featureEngineeringTime = load_wav_files( directories, str_label, id_label, 0, max_files, input_type )
cat_dataset_x, cat_dataset_labels, featureEngineeringTime = load_wav_files_with_srts( directories, str_label, id_label, 0, max_files, input_type )
totalFeatureEngineeringTime += featureEngineeringTime
dataset_x.extend( cat_dataset_x )
dataset_labels.extend( cat_dataset_labels )
Expand Down
1 change: 0 additions & 1 deletion lib/machinelearning.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ def augmented_feature_engineering( wavFile, settings ):
print( "OLD MFCC TYPE IS NOT SUPPORTED FOR TRAINING PYTORCH" )
return data_row


def get_label_for_directory( setdir ):
return float( int(hashlib.sha256( setdir.encode('utf-8')).hexdigest(), 16) % 10**8 )

Expand Down
78 changes: 78 additions & 0 deletions lib/migrate_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from config.config import *
import os
from lib.stream_processing import process_wav_file
from lib.print_status import create_progress_bar, clear_previous_lines, get_current_status, reset_previous_lines
from .typing import DetectionState
import time

def check_migration():
version_detected = CURRENT_VERSION
recording_dirs = os.listdir(RECORDINGS_FOLDER)
for file in recording_dirs:
if os.path.isdir(os.path.join(RECORDINGS_FOLDER, file)):
segments_folder = os.path.join(RECORDINGS_FOLDER, file, "segments")
if not os.path.exists(segments_folder):
version_detected = 0
break
else:
source_files = os.listdir(os.path.join(RECORDINGS_FOLDER, file, "source"))
for source_file in source_files:
srt_file = source_file.replace(".wav", ".v" + str(CURRENT_VERSION) + ".srt")
if not os.path.exists(os.path.join(segments_folder, srt_file)):
version_detected = 0
break

if version_detected < CURRENT_VERSION:
print("----------------------------")
print("!! Improvement to segmentation found !!")
print("This can help improve the data gathering from your recordings which make newer models better")
print("Resegmenting your data may take a while")
migrate_data()

def migrate_data():
print("----------------------------")
recording_dirs = os.listdir(RECORDINGS_FOLDER)
for label in recording_dirs:
source_dir = os.path.join(RECORDINGS_FOLDER, label, "source")
if os.path.isdir(source_dir):
segments_dir = os.path.join(RECORDINGS_FOLDER, label, "segments")
if not os.path.exists(segments_dir):
os.makedirs(segments_dir)
wav_files = [x for x in os.listdir(source_dir) if os.path.isfile(os.path.join(source_dir, x)) and x.endswith(".wav")]
if len(wav_files) == 0:
continue
print( "Resegmenting " + label + "..." )
progress = 0
progress_chunk = 1 / len( wav_files )
skipped_amount = 0
for index, wav_file in enumerate(wav_files):
wav_file_location = os.path.join(source_dir, wav_file)
srt_file_location = os.path.join(segments_dir, wav_file.replace(".wav", ".v" + str(CURRENT_VERSION) + ".srt"))
output_file_location = os.path.join(segments_dir, wav_file.replace(".wav", "_detection.wav"))

# Only resegment if the new version does not exist already
if not os.path.exists(srt_file_location):
process_wav_file(wav_file_location, srt_file_location, output_file_location, [label], \
lambda internal_progress, state: print_migration_progress(progress + (internal_progress * progress_chunk), state) )
else:
skipped_amount += 1
progress = index / len( wav_files ) + progress_chunk

if progress == 1 and skipped_amount < len(wav_files):
clear_previous_lines(1)

clear_previous_lines(1)
print( label + " resegmented!" if skipped_amount < len(wav_files) else label + " already properly segmented!" )

time.sleep(1)
print("Finished migrating data!")
print("----------------------------")

def print_migration_progress(progress, state: DetectionState):
status_lines = get_current_status(state)
line_count = 1 + len(status_lines) if progress > 0 or state.state == "processing" else 0
reset_previous_lines(line_count) if progress < 1 else clear_previous_lines(line_count)
print( create_progress_bar(progress) )
if progress != 1:
for line in status_lines:
print( line )
Loading

0 comments on commit 51152fb

Please sign in to comment.