Skip to content

Commit

Permalink
improve readme
Browse files Browse the repository at this point in the history
  • Loading branch information
yuriihavrylko committed Feb 25, 2024
1 parent ca1a8bd commit f835421
Showing 1 changed file with 145 additions and 118 deletions.
263 changes: 145 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,189 +1,223 @@
## Projector course work
Skeleton for project on projector course

### Docker

Build
## Projector Course Work: Disinformation Detection Service
A project aimed at disinformation detection. This repository outlines the steps involved in deploying the service using various technologies, testing and benchmarking, as well as implementing various machine learning methodologies.

Done during Projector course [Machine Learning in Production](https://prjctr.com/course/machine-learning-in-production)

## Table of Contents
- [Projector Course Work: Disinformation Detection Service](#projector-course-work-disinformation-detection-service)
- [Table of Contents](#table-of-contents)
- [Prerequisites](#prerequisites)
- [Minio setup](#minio-setup)
- [Data](#data)
- [DVC](#dvc)
- [Label studio](#label-studio)
- [Model training](#model-training)
- [Model optimization](#model-optimization)
- [Streamlit](#streamlit)
- [Model serving](#model-serving)
- [Fast API](#fast-api)
- [Seldon](#seldon)
- [Kserve](#kserve)
- [Tests](#tests)
- [Benchmarks](#benchmarks)
- [File formats](#file-formats)
- [Load testing](#load-testing)
- [POD autoscaling](#pod-autoscaling)
- [Kafka](#kafka)
- [Data drift detection](#data-drift-detection)


## Prerequisites
This guide assumes that you have basic knowledge in the following technologies:
- Docker
- GitHub Actions
- Kubernetes

## Minio setup
Mac/Local
```
docker build --tag yuriihavrylko/prjctr:latest .
brew install minio/stable/minio
minio server --console-address :9001 ~/minio # path to persistent local storage + run on custom port
```

Push
Build
Docker

```
docker push yuriihavrylko/prjctr:latest
docker run \
-p 9002:9002 \
--name minio \
-v ~/minio:/data \
-e "MINIO_ROOT_USER=ROOTNAME" \
-e "MINIO_ROOT_PASSWORD=CHANGEME123" \
quay.io/minio/minio server /data --console-address ":9002"
```

DH Images:
![Alt text](assets/images.png)
Kubernetes

### GH Actions:
```
kubectl create -f deployment/minio.yml
```

Works on push to master/feature*
![Alt text](assets/actions.png)
## Data

### DVC

### Streamlit
Install DVC

Run:
```
streamlit run src/serving/streamlit.py
brew install dvc
```

![Alt text](assets/streamlit.png)
Init in repo

Deploy k8s:
```
kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
dvc init --subdir
git status
git commit -m "init DVC"
```

Deploy k8s:
Move file with data and add to DVC, commit DBV data config

```
kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
dvc add ./data/data.csv
git add data/.gitignore data/data.csv.dvc
git commit -m "create data"
```

Add remote data storage and push DVC remote config
(ensure that bucket already created)

### Fast API

Postman
```
dvc remote add -d minio s3://ml-data
dvc remote modify minio endpointurl [$AWS_ENDPOINT](http://10.0.0.6:9000)
![Alt text](assets/fastapi.png)
git add .dvc/config
git commit -m "configure remote"
git push
```

Upload data
```
export AWS_ACCESS_KEY_ID='...'
export AWS_SECRET_ACCESS_KEY='...'
dvc push
```

### Label studio

Deploy k8s:
```
kubectl create -f deployment/app-fasttext.yml
kubectl port-forward --address 0.0.0.0 svc/app-fasttext 8090:8090
docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest
```

### Seldon
![Alt text](assets/labeling.png)

Instalation

```
kubectl apply -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-crds.yaml
kubectl apply -n ambassador -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-kind.yaml
kubectl wait --timeout=180s -n ambassador --for=condition=deployed ambassadorinstallations/ambassador

kubectl create namespace seldon-system
## Model training

helm install seldon-core seldon-core-operator --version 1.15.1 --repo https://storage.googleapis.com/seldon-charts --set usageMetrics.enabled=true --set ambassador.enabled=true --namespace seldon-system
Build
```
docker build -t model-training . -f job/Dockerfile
```

Deploy k8s:
Run
```
kubectl create -f deployment/seldon-custom.yaml
docker run -it model-training
```

### Kserve
## Model optimization

Deploy k8s:
Run pruning:

```
kubectl create -f deployment/kserve.yaml
kubectl get inferenceservice custom-model
python -m src.model.pruning
```


### Load testing

![Alt text](assets/locust.png)
Run distillation:

```
locust -f benchmarks/load_test.py --host=http://localhost:9933 --users 50 --spawn-rate 10 --autostart --run-time 600s
python -m src.model.distilation
```

### DVC

Install DVC
## Streamlit

Run:
```
brew install dvc
streamlit run src/serving/streamlit.py
```

Init in repo
![Alt text](assets/streamlit.png)

Deploy k8s:
```
dvc init --subdir
git status
git commit -m "init DVC"
kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
```

Move file with data and add to DVC, commit DBV data config
Deploy k8s:
```
dvc add ./data/data.csv
git add data/.gitignore data/data.csv.dvc
git commit -m "create data"
kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
```

Add remote data storage and push DVC remote config
(ensure that bucket already created)
## Model serving

```
dvc remote add -d minio s3://ml-data
dvc remote modify minio endpointurl [$AWS_ENDPOINT](http://10.0.0.6:9000)
### Fast API

git add .dvc/config
git commit -m "configure remote"
git push
```
Postman

Upload data
```
export AWS_ACCESS_KEY_ID='...'
export AWS_SECRET_ACCESS_KEY='...'
dvc push
![Alt text](assets/fastapi.png)


### Label studio

Deploy k8s:
```
docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest
kubectl create -f deployment/app-fasttext.yml
kubectl port-forward --address 0.0.0.0 svc/app-fasttext 8090:8090
```

![Alt text](assets/labeling.png)
### Seldon

Installation

### Minio setup
Mac/Local
```
brew install minio/stable/minio
kubectl apply -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-crds.yaml
kubectl apply -n ambassador -f https://github.com/datawire/ambassador-operator/releases/latest/download/ambassador-operator-kind.yaml
kubectl wait --timeout=180s -n ambassador --for=condition=deployed ambassadorinstallations/ambassador
minio server --console-address :9001 ~/minio # path to persistent local storage + run on custom port
```
kubectl create namespace seldon-system
Docker
helm install seldon-core seldon-core-operator --version 1.15.1 --repo https://storage.googleapis.com/seldon-charts --set usageMetrics.enabled=true --set ambassador.enabled=true --namespace seldon-system
```

Deploy k8s:
```
docker run \
-p 9002:9002 \
--name minio \
-v ~/minio:/data \
-e "MINIO_ROOT_USER=ROOTNAME" \
-e "MINIO_ROOT_PASSWORD=CHANGEME123" \
quay.io/minio/minio server /data --console-address ":9002"
kubectl create -f deployment/seldon-custom.yaml
```

Kubernetes
### Kserve

Deploy k8s:

```
kubectl create -f deployment/minio.yml
kubectl create -f deployment/kserve.yaml
kubectl get inferenceservice custom-model
```

### Tests

## Tests

Run tests
```
pytest app/tests/
```

### Benchmarks
## Benchmarks

Fileformats
### File formats

![Alt text](assets/format_benchmark.png)

Expand All @@ -203,38 +237,33 @@ JSON format demonstrates faster write times but slower read times compared to ot
PARQUET format showcases the fastest write times and relatively fast read times, with a smaller file size after write compared to CSV and JSON.

ORC format exhibits moderate write times and the smallest file size after write among the tested formats, with efficient read times.
=======
### POD autoscaling

Install metric service

```
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch -n kube-system deployment metrics-server --type=json -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
```
### Load testing

Run from config
![Alt text](assets/locust.png)

```
kubectl create -f deployment/app-fastapi-scaling.yml
locust -f benchmarks/load_test.py --host=http://localhost:9933 --users 50 --spawn-rate 10 --autostart --run-time 600s
```

## POD autoscaling

### Model optimization

Run pruning:
Install metric service

```
python -m src.model.pruning
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch -n kube-system deployment metrics-server --type=json -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
```

Run distilation:
Run from config

```
python -m src.model.distilation
kubectl create -f deployment/app-fastapi-scaling.yml
```

### Kafka

## Kafka

Install kafka
```
Expand Down Expand Up @@ -268,11 +297,9 @@ mc admin service restart myminio
mc event add myminio/input arn:minio:sqs::1:kafka -p --event put --suffix .json
kubectl create -f deployment/kafka-infra.yml
```

### Data drift detetion
## Data drift detection

```
python -m src.monitoring.drift
Expand Down

0 comments on commit f835421

Please sign in to comment.