Skip to content

2016.04.14 Scikit learn pipelines and analysis workflows

stevejbrown edited this page May 15, 2016 · 1 revision

Today Steve went over how you can use pipelines from the scikit-learn library to keep your machine learning code clean and organized.

We also talked about workflows in general. Angad suggested keeping all user defined functions in a separate "utils" file that is loaded into the main analysis Jupyter notebook (updating the notebook with reload() after making changes in the utils file).

Since the meeting Steve ran across the Cookiecutter Data Science project which has a lot of good ideas for how to arrange a data science project particularly with regard to directory structure.