Skip to content

Commit

Permalink
Merge branch 'master' into feature/l11t1-load-test
Browse files Browse the repository at this point in the history
  • Loading branch information
yuriihavrylko authored Feb 14, 2024
2 parents 705ed3c + 44d05bf commit 64e6f3b
Show file tree
Hide file tree
Showing 12 changed files with 209 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
5 changes: 5 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[core]
remote = minio
['remote "minio"']
url = s3://ml-data
endpointurl = http://10.0.0.6:9000
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@ RUN chmod 777 /.config

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE


FROM builder AS app-kserve
ENTRYPOINT ["python", "app/src/serving/kserve.py"]
88 changes: 88 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ DH Images:
Works on push to master/feature*
![Alt text](assets/actions.png)


### Streamlit

Run:
Expand All @@ -37,13 +38,21 @@ kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
```

Deploy k8s:
```
kubectl create -f deployment/app-ui.yml
kubectl port-forward --address 0.0.0.0 svc/app-ui.yml 8080:8080
```


### Fast API

Postman

![Alt text](assets/fastapi.png)



Deploy k8s:
```
kubectl create -f deployment/app-fasttext.yml
Expand Down Expand Up @@ -78,10 +87,89 @@ kubectl create -f deployment/kserve.yaml
kubectl get inferenceservice custom-model
```


### Load testing

![Alt text](assets/locust.png)

```
locust -f benchmarks/load_test.py --host=http://localhost:9933 --users 50 --spawn-rate 10 --autostart --run-time 600s
### DVC
Install DVC
```
brew install dvc
```
Init in repo
```
dvc init --subdir
git status
git commit -m "init DVC"
```
Move file with data and add to DVC, commit DBV data config
```
dvc add ./data/data.csv
git add data/.gitignore data/data.csv.dvc
git commit -m "create data"
```
Add remote data storage and push DVC remote config
(ensure that bucket already created)
```
dvc remote add -d minio s3://ml-data
dvc remote modify minio endpointurl [$AWS_ENDPOINT](http://10.0.0.6:9000)

git add .dvc/config
git commit -m "configure remote"
git push
```
Upload data
```
export AWS_ACCESS_KEY_ID='...'
export AWS_SECRET_ACCESS_KEY='...'
dvc push


### Label studio

```
docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest
```

![Alt text](assets/labeling.png)


### Minio setup
Mac/Local
```
brew install minio/stable/minio
minio server --console-address :9001 ~/minio # path to persistent local storage + run on custom port
```

Docker

```
docker run \
-p 9002:9002 \
--name minio \
-v ~/minio:/data \
-e "MINIO_ROOT_USER=ROOTNAME" \
-e "MINIO_ROOT_PASSWORD=CHANGEME123" \
quay.io/minio/minio server /data --console-address ":9002"
```

Kubernetes

```
kubectl create -f deployment/minio.yml
```
1 change: 1 addition & 0 deletions app/requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ datasets==2.16.1
wandb==0.16.1
httpx==0.23.0
locust==2.20.1
ipykernel==6.28.0
Binary file added assets/labeling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/data.csv
5 changes: 5 additions & 0 deletions data/data.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: 7ec83b215d1790bedaf458a1690370e3
size: 25144581
hash: md5
path: data.csv
38 changes: 38 additions & 0 deletions deployment/minio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio-deployment
spec:
selector:
matchLabels:
app: minio
strategy:
type: Recreate
template:
metadata:
labels:
# Label is used as selector in the service.
app: minio
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: minio-pv-claim
containers:
- name: minio
image: quay.io/minio/minio:latest
args:
- server
- /storage
env:
# Minio access key and secret key
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9003
hostPort: 9003
volumeMounts:
- name: storage
mountPath: "/storage"
1 change: 1 addition & 0 deletions experiments/train.ipynb

Large diffs are not rendered by default.

63 changes: 63 additions & 0 deletions modelcard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
language: en
tags:
- bert
license: apache-2.0
datasets:
- GonzaloA/fake_news
---

# BERT fake news classifiction model

Pretrained model on English language based on uncased version of BERT finetuned for task of binary classification.


### How to use

You can use this model directly with a pipeline for masked language modeling:

```python

tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)
bert_model = BertForSequenceClassification.from_pretrained(PATH, local_files_only=True)

# run infernce

```
With transformers pipeline

```python

text_classification_pipeline = pipeline(
"text-classification",
model=PATH,
tokenizer=PATH,
return_all_scores=True
)
```


## Training data

The BERT model was pretrained on [bert-base-uncased](https://huggingface.co/bert-base-uncased), a dataset consisting of ~25,000 of news labeled as fake and real.
For training purpoose 10k of samples randomly selected and splitted in 80:20 ratio.

## Training procedure

### Preprocessing

The texts are tokenized using BERT tokenizer.

### Training

The model was trained on GPU T4 x 2.

## Evaluation results


| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1 | 0.074000 | 0.027787 | 0.986500 |
| 2 | 0.032600 | 0.010920 | 0.995000 |
| 3 | 0.010100 | 0.002739 | 0.999500 |

0 comments on commit 64e6f3b

Please sign in to comment.