Skip to content

Commit

Permalink
improve "setup" readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Nikronic committed Sep 11, 2023
1 parent 9640339 commit 3f47d31
Showing 1 changed file with 27 additions and 22 deletions.
49 changes: 27 additions & 22 deletions tutorials/setup.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,52 @@
## Setup
It is been explained why some of requirements are needed. You can install these manually if you want to be sure about the environment but it is way easier to just install using conda environment as a `env.yml` file containing all necessary dependencies is already provided.

### `mamba` and `conda-forge`
It is fast and for some reason, do way better job at resolving while
conda can't. `conda install -c conda-forge mamba`
It is been explained why some of requirements are needed. You can install these manually if you want to be sure about the environment.

### Install Core packages
Note that you can use `conda` instead of `mamba` in all following commands,
but why would you want to do it? `mamba` is better!

Note that you can use `conda` and `mamba` in all following commands, but we stick to the `pip` and `venv`.

#### Data
1. xmltodict: `mamba install -c conda-forge xmltodict`: Some helping with parsing data extracted

1. xmltodict: `pip install xmltodict`: Some helping with parsing data extracted
from pdf or other media to common data format.
2. pandas: `mamba install pandas`: main tool for exploratory data analysis
3. numpy: `mamba install -c conda-forge numpy` (it should be installed already): array manipulation
2. pandas: `pip install pandas`: main tool for exploratory data analysis
3. numpy: `pip install numpy` (it should be installed already): array manipulation
here and there
4. PyPDF2: `mamba install -c conda-forge pypdf2`: organizing pdf files particularly extracting data
4. PyPDF2: `pip install pypdf2`: organizing pdf files particularly extracting data
5. pikepdf: `pip install pikepdf`: for manipulation and repair of PDFs. I particularly use this for reading Adobe protected PDFs that forced you to use Adobe Reader (because of `XFA` forms - these damned propriety formats :\)
6. python-dateutil: `mamba install -c conda-forge python-dateutil`: Note that official doc only says
about the `pip` but still install the `conda-forge` one.
6. python-dateutil: `pip install python-dateutil`

#### Modeling
7. sklearn: `mamba install -c conda-forge scikit-learn`: For preprocessing (not EDA which is done by pandas) for modeling and also the modeling itself. (I highly appreciate tree based models)
8. flaml: `mamba install flaml -c conda-forge`: The main automl modeling. It installs cuda toolkit, xgboost and lightgbm but not other necessary libs if you want such as catboost, transformers, etc. As I am relying on tree based model as the baseline almost always, only tree based dependencies are installed.
9. catboost: `mamba install catboost -c conda-forge`: For `flaml` tree based dependency.

1. sklearn: `pip install scikit-learn`: For preprocessing (not EDA which is done by pandas) for modeling and also the modeling itself. (I highly appreciate tree based models)
2. flaml: `pip install flaml`: The main automl modeling. It installs cuda toolkit, xgboost and lightgbm but not other necessary libs if you want such as catboost, transformers, etc. As I am relying on tree based model as the baseline almost always, only tree based dependencies are installed.
3. catboost: `pip install catboost`: For `flaml` tree based dependency.
4. xgboost: `pip install xgboost`: For `flaml` tree based dependency.
5. lightgbm: `pip install lightgbm`: For `flaml` dependency.

#### Serving
10. fastapi: `pip install fastapi`: For creating APIs very fast.
11. gunicorn: `pip install gunicorn`: Debug level server
12. sqlalchemy: `pip install sqlalchemy`: For databases used via API
13. uvicorn (standard): `pip install uvicorn[standard]`: For serving in production. If you want to use `uvicorn` for debugging, it is better to use `pip install uvicorn` as this is a pure python and easier to read.

1. fastapi: `pip install fastapi`: For creating APIs very fast.
2. gunicorn: `pip install gunicorn`: Debug level server
3. sqlalchemy: `pip install sqlalchemy`: For databases used via API
4. uvicorn (standard): `pip install uvicorn[standard]`: For serving in production. If you want to use `uvicorn` for debugging, it is better to use `pip install uvicorn` as this is a pure python and easier to read.

### Install Helpers Packages

Currently, they all are needed for the code to work, but they will be made optional maybe in future.

1. mlflow: `pip install mlflow`: For tracking experiments, codes, models, etc.
2. enlighten: `mamba install -c conda-forge enlighten`: For having progress bar in log that can be redirected from std to another std. (here from console to file for mlflow artifact)
3. dvc: `mamba install -c conda-forge dvc==2.10.2`: Data version control which is a must have in ML. Note that even though one can use both CLI and python SKD and also can install `dvc` in system-wide, because I use it in integration with `mlflow`, I prefer to have `dvc` in this virtual environment rather than OS package level. Also, make sure to install `2.10.2` until [this bug](https://github.com/iterative/dvc/issues/7927) is fixed. You can of course install the latest version as this bug could be easily resolved by hardcoding.
4. matplotlib: `mamba install -c conda-forge matplotlib`: for vis.
2. dvc: `pip install dvc`: Data version control which is a must have in ML. Note that even though one can use both CLI and python SKD and also can install `dvc` in system-wide, because I use it in integration with `mlflow`, I prefer to have `dvc` in this virtual environment rather than OS package level.
3. enlighten: `pip install enlighten`: For having progress bar in log that can be redirected from std to another std. (here from console to file for mlflow artifact)
4. matplotlib: `pip install matplotlib`: for visualizations.

### Docs

The only mandatory one here is `sphinx`. All other ones can be ignored and in that case, you need to remove the corresponding line in `docs/src/conf.py`.

1. sphinx: `pip install sphinx`: Building the docs.
2. sphinx-rtd-theme: `pip install sphinx-rtd-theme`: Just a theme.
3. sphinx-autodoc-typehints: `pip install sphinx-autodoc-typehints`: For automatically documenting type hints
4. sphinx-copybutton: `pip install sphinx-copybutton`: Copy button for the source code.
5. furo: `pip install furo`: Sphinx Furo theme

0 comments on commit 3f47d31

Please sign in to comment.