-
Notifications
You must be signed in to change notification settings - Fork 9
Preprocessing
Our preprocessing pipeline is set up to be as general as possible and
allows for custom implementations, defined as subclass from the
Preprocessor
class and passed as a command-line argument. For our
tasks, we have defined a default preprocessing pipeline for both
classification and regression tasks. The snippet below shows the class structure of the
default classification preprocessor. In the private methods of this
class, is used to apply feature generation steps. The abstract
Preprocessor
has two functions that need to be implemented:
__init__()
(which initializes the preprocessor and configures the
settings) and apply(data)
(which returns the preprocessed data
dictionary of features and labels for each of the train, validate, and
test splits)
@gin.configurable("base_classification_preprocessor")
class DefaultClassificationPreprocessor(Preprocessor):
def __init__(self, generate_features: bool = True, scaling: bool = True, use_static_features: bool = True):
"""
Args:
generate_features: Generate features for dynamic data.
scaling: Scaling of dynamic and static data.
use_static_features: Use static features.
Returns:
Preprocessed data.
"""
def apply(self, data, vars):
"""
Args:
data: Train, validation and test data dictionary. Further divided in static, dynamic, and outcome.
vars: Variables for static, dynamic, outcome.
Returns:
Preprocessed data.
"""
...
return data
def _process_static(self, data, vars):
...
return data
def _process_dynamic(self, data, vars):
...
return data
def _dynamic_feature_generation(self, data, dynamic_vars):
...
return data