Merge pull request #35 from chaosparrot/recording-improvements

Recording improvements
chaosparrot · Mar 14, 2023 · 51152fb · 51152fb
2 parents ae010fd + 03f85d6
commit 51152fb
Show file tree

Hide file tree

Showing 19 changed files with 1,727 additions and 260 deletions.
diff --git a/docs/RECORDING.md b/docs/RECORDING.md
@@ -5,26 +5,33 @@ In order to train a model, you need to record sounds first. You can do this by r
 
 ![Installing packages](media/settings-record.png)
 
-This script will record sounds in seperate files of 30 milliseconds each and save them in your recordings folder ( data/recordings is the default place, which can be changed in the data/code/config.py file using the examples in lib/default_conifg.py ). 
+This script will record your microphone and save the detected areas inside of an SRT file. It will record in overlapping segments of 30 milliseconds.
 You have to be sure to record as little noise as possible. For example, if you are recording a bell sound, it is imperative that you only record that sound.
-If you accidentally recorded a different sound, you can always delete the specific file from the recordings directory.
+
 
 ![Installing packages](media/settings-record-progress.png)
 
-In order to make sure you only record the sound you want to record, you can alter the power setting at the start. I usually choose a value between 1000 and 2000. 
-You can also trim out stuff below a specific frequency value. Neither the intensity, power or the frequency values I am using isn't actually an SI unit like dB or Hz, just some rough calculations which will go up when the loudness or frequency goes up.
+During the recording, you can also pause the recording using SPACE or quit it using ESC.
+If you feel a sneeze coming up, or a car passes by, you can press these keys to make sure you don't have to remove data.
+If you accidentally did record a different sound, you can always press BACKSPACE or - to remove some data from the recording.
 
-During the recording, you can also pause the recording using SPACE or quit it using ESC. If you feel a sneeze coming up, or a car passes by, you can press these keys to make sure you don't have to prune away a lot of files.
+You can look at the 'Recorded' part during the recording session to see how much of your sound has been detected.
 
 ### Amount of data needed
 
-I found that you need around 30 seconds of recorded sound, roughly 1000 samples, to get a working recognition of a specific sound. Depending on the noise it would take between a minute and two minutes to record the sounds ( there are less samples to pick from with short sounds like clicks, whereas longer sounds like vowels give more samples ).
-You will start getting diminishing returns past two and a half minutes of recorded sound ( 5000 samples ), but the returns are still there. As of the moment of this writing, I used 15000 samples for the Hollow Knight demo.
+The Data quantity part of the recording shows you whether we think you have enough data for a model.
+The minimum required is about 16 seconds, 41 seconds is a good amount, and anything above 1 minute 22 seconds is considered excellent.
+You will start getting diminishing returns after that, but the returns are still there. I used about 4 minutes per sound for the Hollow Knight demo.
 You can try any amount and see if they recognize well.
 
-From this version onward, there will also be full recordings of the recording session saved in the source directory inside of the sound you are recording. This might come in handy when we start adding more sophisticated models in the future.
+If you want the model to do well, you should aim to have about the same amount of recordings for every sound you record.
+
+### Checking the quality of the detection
+
+If you want to see if the detection was alright, you can either open up the SRT file inside the segments folder of your recorded sound and compare it to the source file, or use the comparison.wav file inside of the segments folder.
+If you place both the source file and the comparison.wav file inside a program like Audacity, you can see the spots where it detected a sound.
 
-You can use these source files to resegment the recordings you have made as well, by using the [V] menu at the start and then navigating to [S]. This will reuse the source files available to read out the wav data and persist them inside the data/output folder.
+![Audacity comparing detection](media/settings-compare-detection.png)
 
 ### Background noise
 

diff --git a/docs/media/settings-compare-detection.png b/docs/media/settings-compare-detection.png
diff --git a/docs/media/settings-record-progress.png b/docs/media/settings-record-progress.png
diff --git a/docs/media/settings-record.png b/docs/media/settings-record.png
diff --git a/lib/audio_dataset.py b/lib/audio_dataset.py
@@ -5,6 +5,7 @@
 import numpy as np
 import random
 import math
+from lib.wav import load_wav_data_from_srt
 
 class AudioDataset(Dataset):
 
@@ -15,61 +16,60 @@ def __init__(self, grouped_data_directories, settings):
         self.augmented_samples = []
         self.length = 0
         self.training = False
-        rebuild_cache = False
 
         for index, label in enumerate( grouped_data_directories ):
             directories = grouped_data_directories[ label ]
 
-            listed_files = []
+            listed_files = {}
             for directory in directories:
-                for file in os.listdir( directory ):
-                    if( file.endswith(".wav") ):
-                        listed_files.append( os.path.join(directory, file) )
+                segments_directory = os.path.join(directory, "segments")
+                source_directory = os.path.join(directory, "source")                
+                if not (os.path.exists(segments_directory) and os.path.exists(source_directory)):
+                    continue
+
+                source_files = os.listdir(source_directory)
+                srt_files = [x for x in os.listdir(segments_directory) if x.endswith(".srt")]
+                for source_file in source_files:
+                    shared_key = source_file.replace(".wav", "")
+
+                    possible_srt_files = [x for x in srt_files if x.startswith(shared_key)]
+                    if len(possible_srt_files) == 0:
+                        continue
+
+                    # Find the highest version of the segmentation for this source file
+                    srt_file = possible_srt_files[0]
+                    for possible_srt_file in possible_srt_files:
+                        current_version = int( srt_file.replace(".srt", "").replace(shared_key + ".v", "") )
+                        version = int( possible_srt_file.replace(".srt", "").replace(shared_key + ".v", "") )
+                        if version > current_version:
+                            srt_file = possible_srt_file
+
+                    listed_files[os.path.join(source_directory, source_file)] = os.path.join(segments_directory, srt_file)
             listed_files_size = len( listed_files )
 
-            print( f"Loading in {label}: {listed_files_size} files" )
-
-            for file_index, full_filename in enumerate( listed_files ):            
-                print( str( math.floor(((file_index + 1 ) / listed_files_size ) * 100)) + "%", end="\r" )
-
-                # When the input length changes due to a different input type being used, we need to rebuild the cache from scratch
-                if (index == 0 and file_index == 0):
-                    rebuild_cache = len(self.feature_engineering_cached(full_filename, False)) != len(self.feature_engineering_augmented(full_filename))
-
-                self.samples.append([full_filename, index, torch.tensor(self.feature_engineering_cached(full_filename, rebuild_cache)).float()])
-                self.augmented_samples.append(None)
+            print( f"Loading in {label}" )
+            listed_source_files = listed_files.keys()
+            for file_index, full_filename in enumerate( listed_source_files ):
+                all_samples = load_wav_data_from_srt(listed_files[full_filename], full_filename, self.settings['FEATURE_ENGINEERING_TYPE'], False)
+                augmented_samples = load_wav_data_from_srt(listed_files[full_filename], full_filename, self.settings['FEATURE_ENGINEERING_TYPE'], False, True)
+
+                for sample in all_samples:
+                    self.samples.append([full_filename, index, torch.tensor(sample).float()])
+                for augmented_sample in augmented_samples:
+                    self.augmented_samples.append([full_filename, index, torch.tensor(augmented_sample).float()])
 
     def set_training(self, training):
         self.training = training
 
-    def feature_engineering_cached(self, filename, rebuild_cache=False):
-        # Only build a filesystem cache of feature engineering results if we are dealing with non-raw wave form
-        if (self.settings['FEATURE_ENGINEERING_TYPE'] != 1):
-            cache_dir = os.path.join(os.path.dirname(filename), "cache")
-            os.makedirs(cache_dir, exist_ok=True)
-            cached_filename = os.path.join(cache_dir, os.path.basename(filename) + "_fe")
-            if (os.path.isfile(cached_filename) == False or rebuild_cache == True):
-                data_row = training_feature_engineering(filename, self.settings)
-                np.savetxt( cached_filename, data_row )
-        else:
-            cached_filename = filename
-
-        return np.loadtxt( cached_filename, dtype='float' )
-
-    def feature_engineering_augmented(self, filename):
-        return augmented_feature_engineering(filename, self.settings)
-
     def __len__(self):
         return len( self.samples )
 
     def __getitem__(self, idx):
         # During training, get a 10% probability that you get an augmented sample
         if (self.training and random.uniform(0, 1) >= 0.9 ):
-            if (self.augmented_samples[idx] is None):
-                self.augmented_samples[idx] = [self.samples[idx][0], self.samples[idx][1], torch.tensor(self.feature_engineering_augmented(self.samples[idx][0])).float()]
-            return self.augmented_samples[idx][2], self.augmented_samples[idx][1]
-        else:
-            return self.samples[idx][2], self.samples[idx][1]
-
+            if (idx in self.augmented_samples):
+                return self.augmented_samples[idx][2], self.augmented_samples[idx][1]
+        return self.samples[idx][2], self.samples[idx][1]
+
     def get_labels(self):
         return self.paths
diff --git a/lib/default_config.py b/lib/default_config.py
@@ -67,4 +67,8 @@
 if( SPEECHREC_ENABLED == True ):
     SPEECHREC_ENABLED = dragonfly_spec is not None
 
-
+BACKGROUND_LABEL = "silence"
+
+# Detection strategies
+CURRENT_VERSION = 1
+CURRENT_DETECTION_STRATEGY = "auto_dBFS_mend_dBFS_30ms_secondary_dBFS_reject_cont_45ms_repair"
diff --git a/lib/key_poller.py b/lib/key_poller.py
@@ -30,7 +30,10 @@ def __exit__(self, type, value, traceback):
     def poll(self):
         if( IS_WINDOWS == True ):
             if( msvcrt.kbhit() ):
-                return msvcrt.getch().decode()
+                ch = msvcrt.getch()
+                if ch == b'\xe0' or ch == b'\000':
+                    ch = msvcrt.getch()
+                return ch.decode()
         else:
             dr,dw,de = select.select([sys.stdin], [], [], 0)
             if not dr == []:

diff --git a/lib/learn_data.py b/lib/learn_data.py
@@ -22,6 +22,7 @@
 from sklearn.neural_network import *
 from lib.combine_models import define_settings, get_current_default_settings
 from lib.audio_model import AudioModel
+from lib.wav import load_wav_files_with_srts
 
 def learn_data():
     dir_path = os.path.join( os.path.dirname( os.path.dirname( os.path.realpath(__file__)) ), DATASET_FOLDER)    
@@ -205,7 +206,7 @@ def load_data( dir_path, max_files, input_type ):
     for str_label, directories in grouped_data_directories.items():
         # Add a label used for classifying the sounds
         id_label = get_label_for_directory( "".join( directories ) )
-        cat_dataset_x, cat_dataset_labels, featureEngineeringTime = load_wav_files( directories, str_label, id_label, 0, max_files, input_type )
+        cat_dataset_x, cat_dataset_labels, featureEngineeringTime = load_wav_files_with_srts( directories, str_label, id_label, 0, max_files, input_type )
         totalFeatureEngineeringTime += featureEngineeringTime
         dataset_x.extend( cat_dataset_x )
         dataset_labels.extend( cat_dataset_labels )

diff --git a/lib/machinelearning.py b/lib/machinelearning.py
@@ -114,7 +114,6 @@ def augmented_feature_engineering( wavFile, settings ):
         print( "OLD MFCC TYPE IS NOT SUPPORTED FOR TRAINING PYTORCH" )    
     return data_row
 
-
 def get_label_for_directory( setdir ):
     return float( int(hashlib.sha256( setdir.encode('utf-8')).hexdigest(), 16) % 10**8 )
 

diff --git a/lib/migrate_data.py b/lib/migrate_data.py
@@ -0,0 +1,78 @@
+from config.config import *
+import os
+from lib.stream_processing import process_wav_file
+from lib.print_status import create_progress_bar, clear_previous_lines, get_current_status, reset_previous_lines
+from .typing import DetectionState
+import time
+
+def check_migration():
+    version_detected = CURRENT_VERSION
+    recording_dirs = os.listdir(RECORDINGS_FOLDER)
+    for file in recording_dirs:
+        if os.path.isdir(os.path.join(RECORDINGS_FOLDER, file)):
+            segments_folder = os.path.join(RECORDINGS_FOLDER, file, "segments")
+            if not os.path.exists(segments_folder):
+                version_detected = 0
+                break
+            else:
+                source_files = os.listdir(os.path.join(RECORDINGS_FOLDER, file, "source"))
+                for source_file in source_files:
+                    srt_file = source_file.replace(".wav", ".v" + str(CURRENT_VERSION) + ".srt")
+                    if not os.path.exists(os.path.join(segments_folder, srt_file)):
+                        version_detected = 0
+                        break
+
+    if version_detected < CURRENT_VERSION:
+        print("----------------------------")
+        print("!! Improvement to segmentation found !!")
+        print("This can help improve the data gathering from your recordings which make newer models better")
+        print("Resegmenting your data may take a while")
+        migrate_data()
+
+def migrate_data():
+    print("----------------------------")
+    recording_dirs = os.listdir(RECORDINGS_FOLDER)
+    for label in recording_dirs:
+        source_dir = os.path.join(RECORDINGS_FOLDER, label, "source")
+        if os.path.isdir(source_dir):
+            segments_dir = os.path.join(RECORDINGS_FOLDER, label, "segments")
+            if not os.path.exists(segments_dir):
+                os.makedirs(segments_dir)
+            wav_files = [x for x in os.listdir(source_dir) if os.path.isfile(os.path.join(source_dir, x)) and x.endswith(".wav")]            
+            if len(wav_files) == 0:
+                continue
+            print( "Resegmenting " + label + "..." )
+            progress = 0
+            progress_chunk = 1 / len( wav_files )
+            skipped_amount = 0
+            for index, wav_file in enumerate(wav_files):
+                wav_file_location = os.path.join(source_dir, wav_file)
+                srt_file_location = os.path.join(segments_dir, wav_file.replace(".wav", ".v" + str(CURRENT_VERSION) + ".srt"))
+                output_file_location = os.path.join(segments_dir, wav_file.replace(".wav", "_detection.wav"))
+
+                # Only resegment if the new version does not exist already
+                if not os.path.exists(srt_file_location):
+                    process_wav_file(wav_file_location, srt_file_location, output_file_location, [label], \
+                        lambda internal_progress, state: print_migration_progress(progress + (internal_progress * progress_chunk), state) )
+                else:
+                    skipped_amount += 1
+                progress = index / len( wav_files ) + progress_chunk
+
+            if progress == 1 and skipped_amount < len(wav_files):
+                clear_previous_lines(1)
+
+            clear_previous_lines(1)
+            print( label + " resegmented!" if skipped_amount < len(wav_files) else label + " already properly segmented!" )
+
+    time.sleep(1)
+    print("Finished migrating data!")
+    print("----------------------------")
+
+def print_migration_progress(progress, state: DetectionState):
+    status_lines = get_current_status(state)
+    line_count = 1 + len(status_lines) if progress > 0 or state.state == "processing" else 0
+    reset_previous_lines(line_count) if progress < 1 else clear_previous_lines(line_count)
+    print( create_progress_bar(progress) )
+    if progress != 1:
+        for line in status_lines:
+            print( line )