Scientific framework for representation in sequential data
Click to expand!
This package aims to simplify the workflow of evaluation of machine learning models. It is primarily focused on sequential data. It helps with:
- labeling data,
- splitting data,
- feature extraction,
- feature reduction (i.e. selection or transformation),
- running pipeline,
- evaluation of results.
It also allows you to visualize each step.
The framework is designed for easy customization and extension of its functionality.
python -m pip install git+https://github.com/MIR-MU/seqrep
See the README
in the seqrep folder.
It is simple to use this package. After the import, you need to do three steps:
- Create your pipeline (which you want to evaluate);
- Create PipelineEvaluator (according to how you want to evaluate);
- Run the evaluation.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
from seqrep.feature_engineering import PreviousValuesExtractor, TimeFeaturesExtractor
from seqrep.labeling import NextColorLabeler
from seqrep.splitting import TrainTestSplitter
from seqrep.scaling import UniversalScaler
from seqrep.evaluation import ClassificationEvaluator
from seqrep.pipeline_evaluation import PipelineEvaluator
# 1. step
pipe = Pipeline([('fext_prev', PreviousValuesExtractor()),
('fext_time', TimeFeaturesExtractor()),
('scale_u', UniversalScaler(scaler=MinMaxScaler())),
])
# 2. step
pipe_eval = PipelineEvaluator(labeler = NextColorLabeler(),
splitter = TrainTestSplitter(),
pipeline = pipe,
model = SVC(),
evaluator = ClassificationEvaluator(),
)
# 3. step
result = pipe_eval.run(data=data)
See the examples folder for more details.
This package is licensed under the MIT license, so it is open source. Feel free to use it!
Thanks for the huge support to my supervisor Michal Stefanik! Gratitude also belongs to all members of the MIR-MU group. Finally, thanks go to the Faculty of Informatics of Masaryk University for supporting this project as a dean's project.