Add PreprocessingPipeline #3438

chrishalcrow · 2024-09-25T10:13:57Z

A proposal to add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.

You can apply the class to a recording, or use the helper function create_preprocessed to make a preprocessed recording:

preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}

# apply using
from spikeinterface.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(preprocessor_dict)
preprocessed_recording = pipeline.apply_to(recording)

# or
from spikeinterface.preprocessing import create_preprocessed
preprocessed_recording = create_preprocessed(recording, preprocessor_dict)

Also adds a function which takes in a recording.json provenance file and make a preprocessor_dict:

from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_json('/path/to/recording.json')

This allow for some cool things:

Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter_jobs, and postprocessing in compute.
Users can easily visualise their preprocessing pipeline using the repr, including an HTML repr in Jupyter notebook (I made a hideous one, but we can aim for something like the sklearn pipeline repr see https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_pipeline_display.html)
Increases portability between labs, since you can reconstruct the preprocessing steps from the recording.json file without the original recording (and worrying about paths).

Note that 3. only works for preprocessing steps that are in some sense “global” i.e. can be applied to any recording. This doesn’t apply for all preprocessing steps e.g. interpolate_bad_channels needs the bad_unit_ids which are recording dependent. However, many of these functions can be modified to be applied more globally e.g. if bad_unit_ids is None, interpolate_bad_channels could detect bad channels, then interpolate these. This would be apply-able to any recording, so is “global”.

No rush on this and I’m not 100% set on it being implemented. Important to get the names right. I read this: https://melevir.medium.com/python-functions-naming-tips-376f12549f9. I think it’s important that create_preprocessed doesn’t sound in-place, after the number of problems with set_probe. Hence I’m against something like apply_preprocessing(recording), and would rather have make, create, construct, produce or something in the function name. I also like the idea (from the article) that you don’t need to include e.g. recording in the name if recording is a required argument. Hence I like something like my_pipeline.apply_to(recording) rather than something like my_pipeline.apply_pipeline_to_recording(recording).

To do:

Tests
Add "allowed preprocessing steps" for get_preprocessing_dict_from_json

chrishalcrow added 2 commits September 25, 2024 11:06

add PreprocessingPipeline

d7bb297

Merge branch 'main' into preprocessing-pipeline

d0e74f7

chrishalcrow added enhancement New feature or request preprocessing Related to preprocessing module labels Sep 25, 2024

alejoe91 modified the milestone: 0.101.2 Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PreprocessingPipeline #3438

Add PreprocessingPipeline #3438

chrishalcrow commented Sep 25, 2024

Add PreprocessingPipeline #3438

Are you sure you want to change the base?

Add PreprocessingPipeline #3438

Conversation

chrishalcrow commented Sep 25, 2024