This aim of this project is to train a model at scale using the Kaggle New York City Taxi Fare Dataset and predict the price of a new taxi journey. I will host an API using Google Cloud Run on a lightweight website using Streamlit and Heroku. The repo that stores the code for the front-end is here.
The website can be found here.
Tech stack
- Language - Python
- Tools - GCP, ML Flow, Streamlit, Heroku.
- Libraries - Pandas, NumPy, sklearn
To Add
The initial setup.
Create virtualenv and install the project:
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
Unittest test:
make clean install test
Check for TaxiFareModel in gitlab.com/{group}. If your project is not set please add it:
- Create a new project on
gitlab.com/{group}/TaxiFareModel
- Then populate it:
## e.g. if group is "{group}" and project_name is "TaxiFareModel"
git remote add origin [email protected]:{group}/TaxiFareModel.git
git push -u origin master
git push -u origin --tags
Functionnal test with a script:
cd
mkdir tmp
cd tmp
TaxiFareModel-run
Go to https://github.com/{group}/TaxiFareModel
to see the project, manage issues,
setup you ssh public key, ...
Create a python3 virtualenv and activate it:
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate
Clone the project and install it:
git clone [email protected]:{group}/TaxiFareModel.git
cd TaxiFareModel
pip install -r requirements.txt
make clean install test # install and test
Functionnal test with a script:
cd
mkdir tmp
cd tmp
TaxiFareModel-run
To Add