-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
57dc6c7
commit df1a489
Showing
10 changed files
with
650 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,271 @@ | ||
# DLIO Unet3D Loading Tests | ||
|
||
## Prerequisites | ||
|
||
### Build DLIO docker container image | ||
|
||
```bash | ||
# Replace the docker registry. | ||
git clone https://github.com/argonne-lcf/dlio_benchmark.git | ||
cd dlio_benchmark/ | ||
docker build -t jiaxun/dlio:v1.0.0 . | ||
docker image push jiaxun/dlio:v1.0.0 | ||
``` | ||
|
||
### Create a new node pool | ||
|
||
For an existing GKE cluster, use the following command to create a new node pool. Make sure the cluster has the [Workload Identity feature enabled](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable). | ||
|
||
> In this early stage test, the managed GCS FUSE CSI driver feature is disabled, and the driver is manually installed. | ||
```bash | ||
# Replace the cluster name and zone. | ||
gcloud container node-pools create large-pool \ | ||
--cluster gcsfuse-csi-test-cluster \ | ||
--ephemeral-storage-local-ssd count=16 \ | ||
--machine-type n2-standard-96 \ | ||
--zone us-central1-a \ | ||
--num-nodes 3 | ||
``` | ||
|
||
### Set up GCS bucket | ||
|
||
Create a GCS bucket using `Location type`: `Region`, and select the same region where your cluster runs. Follow the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#authentication) to configure the access. This example uses the default Kubernetes service account in the default Kubernetes namespace. | ||
|
||
### Install Helm | ||
|
||
The example uses Helm charts to manage the applications. Follow the [Helm documentation](https://helm.sh/docs/intro/install/#from-script) to install Helm. | ||
|
||
## DLIO Unet3D Datasets Loading | ||
|
||
Run the following commands to generate Unet3D datasets using DLIO, and upload to the bucket. You may need to `--set image=<your-registry>/dlio:v1.0.0` and `--set bucketName=<your-bucket-name>` to set your registry and bucket name. | ||
|
||
```bash | ||
cd ./examples/dlio | ||
|
||
helm install dlio-unet3d-100kb-500k-data-loader data-loader \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 | ||
|
||
helm install dlio-unet3d-500kb-1m-data-loader data-loader \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 | ||
|
||
helm install dlio-unet3d-3mb-100k-data-loader data-loader \ | ||
--set bucketName=gke-dlio-unet3d-3mb-100k \ | ||
--set dlio.numFilesTrain=100000 \ | ||
--set dlio.recordLength=3145728 | ||
|
||
helm install dlio-unet3d-150mb-5k-data-loader data-loader \ | ||
--set bucketName=gke-dlio-unet3d-150mb-5k \ | ||
--set dlio.numFilesTrain=5000 \ | ||
--set dlio.recordLength=157286400 | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-100kb-500k-data-loader \ | ||
dlio-unet3d-500kb-1m-data-loader \ | ||
dlio-unet3d-3mb-100k-data-loader \ | ||
dlio-unet3d-150mb-5k-data-loader | ||
``` | ||
|
||
## DLIO Unet3D Loading Tests | ||
|
||
Change the directory to `./examples/dlio`. Run the following commands to run the loading tests. Each `helm install` command will deploy a Pod to run the test, and upload logs to the bucket. You may need to `--set image=<your-registry>/dlio:v1.0.0` and `--set bucketName=<your-bucket-name>` to set your registry and bucket name. | ||
|
||
### dlio-unet3d-100kb-500k dlio.batchSize=800 | ||
|
||
```bash | ||
helm install dlio-unet3d-100kb-500k-800-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-100kb-500k-800-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-100kb-500k-800-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-100kb-500k-800-local-ssd \ | ||
dlio-unet3d-100kb-500k-800-gcsfuse-file-cache \ | ||
dlio-unet3d-100kb-500k-800-gcsfuse-no-file-cache | ||
``` | ||
|
||
### dlio-unet3d-100kb-500k dlio.batchSize=128 | ||
|
||
```bash | ||
helm install dlio-unet3d-100kb-500k-128-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-100kb-500k-128-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-100kb-500k-128-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-100kb-500k \ | ||
--set dlio.numFilesTrain=500000 \ | ||
--set dlio.recordLength=102400 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-100kb-500k-128-local-ssd \ | ||
dlio-unet3d-100kb-500k-128-gcsfuse-file-cache \ | ||
dlio-unet3d-100kb-500k-128-gcsfuse-no-file-cache | ||
``` | ||
|
||
### dlio-unet3d-500kb-1m dlio.batchSize=800 | ||
|
||
```bash | ||
helm install dlio-unet3d-500kb-1m-800-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-500kb-1m-800-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-500kb-1m-800-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=800 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-500kb-1m-800-local-ssd \ | ||
dlio-unet3d-500kb-1m-800-gcsfuse-file-cache \ | ||
dlio-unet3d-500kb-1m-800-gcsfuse-no-file-cache | ||
``` | ||
|
||
### dlio-unet3d-500kb-1m dlio.batchSize=128 | ||
|
||
```bash | ||
helm install dlio-unet3d-500kb-1m-128-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-500kb-1m-128-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-500kb-1m-128-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-500kb-1m \ | ||
--set dlio.numFilesTrain=1000000 \ | ||
--set dlio.recordLength=512000 \ | ||
--set dlio.batchSize=128 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-500kb-1m-128-local-ssd \ | ||
dlio-unet3d-500kb-1m-128-gcsfuse-file-cache \ | ||
dlio-unet3d-500kb-1m-128-gcsfuse-no-file-cache | ||
``` | ||
|
||
### dlio-unet3d-3mb-100k dlio.batchSize=200 | ||
|
||
```bash | ||
helm install dlio-unet3d-3mb-100k-200-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-3mb-100k \ | ||
--set dlio.numFilesTrain=100000 \ | ||
--set dlio.recordLength=3145728 \ | ||
--set dlio.batchSize=200 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-3mb-100k-200-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-3mb-100k \ | ||
--set dlio.numFilesTrain=100000 \ | ||
--set dlio.recordLength=3145728 \ | ||
--set dlio.batchSize=200 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-3mb-100k-200-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-3mb-100k \ | ||
--set dlio.numFilesTrain=100000 \ | ||
--set dlio.recordLength=3145728 \ | ||
--set dlio.batchSize=200 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-3mb-100k-200-local-ssd \ | ||
dlio-unet3d-3mb-100k-200-gcsfuse-file-cache \ | ||
dlio-unet3d-3mb-100k-200-gcsfuse-no-file-cache | ||
``` | ||
|
||
### dlio-unet3d-150mb-5k dlio.batchSize=4 | ||
|
||
```bash | ||
helm install dlio-unet3d-150mb-5k-4-local-ssd unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-150mb-5k \ | ||
--set dlio.numFilesTrain=5000 \ | ||
--set dlio.recordLength=157286400 \ | ||
--set dlio.batchSize=4 \ | ||
--set scenario=local-ssd | ||
|
||
helm install dlio-unet3d-150mb-5k-4-gcsfuse-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-150mb-5k \ | ||
--set dlio.numFilesTrain=5000 \ | ||
--set dlio.recordLength=157286400 \ | ||
--set dlio.batchSize=4 \ | ||
--set scenario=gcsfuse-file-cache | ||
|
||
helm install dlio-unet3d-150mb-5k-4-gcsfuse-no-file-cache unet3d-loading-test \ | ||
--set bucketName=gke-dlio-unet3d-150mb-5k \ | ||
--set dlio.numFilesTrain=5000 \ | ||
--set dlio.recordLength=157286400 \ | ||
--set dlio.batchSize=4 \ | ||
--set scenario=gcsfuse-no-file-cache | ||
|
||
# Clean up | ||
helm uninstall \ | ||
dlio-unet3d-150mb-5k-4-local-ssd \ | ||
dlio-unet3d-150mb-5k-4-gcsfuse-file-cache \ | ||
dlio-unet3d-150mb-5k-4-gcsfuse-no-file-cache | ||
``` | ||
|
||
## Parsing the test results | ||
|
||
Run the following python script to parse the logs. The results will be saved in `./examples/dlio/output.csv`. | ||
|
||
```bash | ||
cd ./examples/dlio | ||
python ./parse_logs.py | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
apiVersion: v2 | ||
name: data-loader | ||
description: A Helm chart for DLIO data loading to GCS buckets | ||
type: application | ||
version: 0.1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: dlio-data-loader-{{ .Values.dlio.numFilesTrain }}-{{ .Values.dlio.recordLength }} | ||
annotations: | ||
gke-gcsfuse/volumes: "true" | ||
gke-gcsfuse/cpu-limit: "0" | ||
gke-gcsfuse/memory-limit: "0" | ||
gke-gcsfuse/ephemeral-storage-limit: "0" | ||
spec: | ||
restartPolicy: Never | ||
nodeSelector: | ||
cloud.google.com/gke-ephemeral-storage-local-ssd: "true" | ||
containers: | ||
- name: dlio-data-loader | ||
image: {{ .Values.image }} | ||
resources: | ||
limits: | ||
cpu: "100" | ||
memory: 400Gi | ||
requests: | ||
cpu: "30" | ||
memory: 300Gi | ||
command: | ||
- "/bin/sh" | ||
- "-c" | ||
- | | ||
echo "Installing gsutil..." | ||
apt-get install -y apt-transport-https ca-certificates gnupg curl | ||
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg | ||
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list | ||
apt-get update && apt-get install google-cloud-cli | ||
echo "Generating data for file number: {{ .Values.dlio.numFilesTrain }}, file size: {{ .Values.dlio.recordLength }}..." | ||
mpirun -np 20 dlio_benchmark workload=unet3d \ | ||
++workload.workflow.generate_data=True \ | ||
++workload.workflow.train=False \ | ||
++workload.dataset.data_folder=/data \ | ||
++workload.dataset.num_files_train={{ .Values.dlio.numFilesTrain }} \ | ||
++workload.dataset.record_length={{ .Values.dlio.recordLength }} \ | ||
++workload.dataset.record_length_stdev=0 \ | ||
++workload.dataset.record_length_resize=0 | ||
gsutil -m cp -R /data/train gs://{{ .Values.bucketName }} | ||
mkdir -p /bucket/valid | ||
volumeMounts: | ||
- name: local-dir | ||
mountPath: /data | ||
- name: gcs-fuse-csi-ephemeral | ||
mountPath: /bucket | ||
volumes: | ||
- name: local-dir | ||
emptyDir: {} | ||
- name: gcs-fuse-csi-ephemeral | ||
csi: | ||
driver: gcsfuse.csi.storage.gke.io | ||
volumeAttributes: | ||
bucketName: {{ .Values.bucketName }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Default values for data-loader. | ||
# This is a YAML-formatted file. | ||
# Declare variables to be passed into your templates. | ||
|
||
image: jiaxun/dlio:v1.0.0 | ||
bucketName: gke-dlio-unet3d-100kb-500k | ||
|
||
dlio: | ||
numFilesTrain: 500000 | ||
recordLength: 102400 |
Oops, something went wrong.