-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design a vineyard csi driver for workloads based on kubernetes volumes. #1533
Conversation
✅ Deploy Preview for v6d ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
ea420b5
to
6958569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any simple readme/docs/or even internal docs that can tell how to getting started with the csi-driver?
I would like to take a try.
k8s/Dockerfile
Outdated
# Use distroless as minimal base image to package the manager binary | ||
# Refer to https://github.com/GoogleContainerTools/distroless for more details | ||
FROM gcr.io/distroless/static:nonroot | ||
FROM ${BASE_IMAGE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need other base images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the image gcr.io/distroless/static:nonroot
doesn't have the mount
command and the csi driver image need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you should use gcr.io/distroless/base:debug
which has busybox installed and there would be a mount
command available.
NEVER use your build environments (e.g., golang image) as the base to release your artifacts.
from sklearn.metrics import mean_squared_log_error, mean_absolute_error | ||
|
||
def path_to_vineyard_name(path): | ||
return path.replace("/", "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for such hacks as name can have '/' since #1550.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great patch!
2e1c8bc
to
a85597a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first glance experiences:
- The deployment of vineyardd and csi-driver should be separated, i.e., we first have vineyardd, the we want a volume view for those objects, we ask the vineyard-operator to launch/deploy csi agents on nodes (specify the namespace and instance name of
Vineyardd
CRD that the csi driver will manage). - the owner-reference of csi driver deployment should be the corresponding vineyard cluster
- the owner-reference of those pv/pvc should be the csi-driver deployment
About the PUT/GET api, I suggest something like
- vineyard.read('/path...managed...by...csi driver')
- vineyard.write('/path..........', object)
where the corresponding vineyard IPC socket will be resolved from the abstract file inside the volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some minor comments.
@@ -0,0 +1,271 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the pkg/
directory to outside pkg/csi/
.
return s[0], s[1], nil | ||
} | ||
} | ||
return "", "", fmt.Errorf("Invalid endpoint: %v", ep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using fmt.Errorf
, using github.com/pkg/errors.Errorf
instead to track the traceback of errors for debugging in the future.
k8s/cmd/main.go
Outdated
@@ -27,6 +27,7 @@ import ( | |||
gosdklog "github.com/v6d-io/v6d/go/vineyard/pkg/common/log" | |||
"github.com/v6d-io/v6d/k8s/cmd/commands/client" | |||
"github.com/v6d-io/v6d/k8s/cmd/commands/create" | |||
"github.com/v6d-io/v6d/k8s/cmd/commands/csidriver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer csi
over csidriver
.
roleRef: | ||
kind: ClusterRole | ||
name: vineyard-csi-nodes | ||
apiGroup: rbac.authorization.k8s.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a global service account vineyard
(or maybe vineyard-operator
) to cover all permissions we require and use it anywhere when a service account is needed?
for simplifying the management.
Thanks for the advice! It really makes sense to me. |
769a4fc
to
2894f2c
Compare
|
||
### Write time | ||
|
||
| data_multiplier | without vineyard | with vineyard | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_multipiler
-> data scale
, and use data size (e.g., xxx Mi
) in this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import vineyard | ||
|
||
def test_model(): | ||
os.system('echo 3 > /proc/sys/vm/drop_caches') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want drop page cache, you first need a sync
command, then echo to ....
|
||
def test_model(): | ||
os.system('echo 3 > /proc/sys/vm/drop_caches') | ||
enable_vineyard = os.environ.get('ENABLE_VINEYARD', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_VINEYARD
-> WITH_VINEYARD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
k8s/Dockerfile
Outdated
# Use distroless as minimal base image to package the manager binary | ||
# Refer to https://github.com/GoogleContainerTools/distroless for more details | ||
FROM gcr.io/distroless/static:nonroot | ||
FROM ${BASE_IMAGE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you should use gcr.io/distroless/base:debug
which has busybox installed and there would be a mount
command available.
NEVER use your build environments (e.g., golang image) as the base to release your artifacts.
k8s/Makefile
Outdated
-t $(VINEYARD_CSI_IMAGE); \ | ||
else \ | ||
docker build -f k8s/Dockerfile . \ | ||
--build-arg BASE_IMAGE=golang:1.19-buster \ | ||
--build-arg BASE_IMAGE=gcr.io/distroless/base:debug \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can directly replace the FROM
statement in the dockerfile to avoid such a layer of complexity.
@@ -38,6 +38,9 @@ spec: | |||
type: string | |||
type: object | |||
type: array | |||
enableDebugLog: | |||
default: false | |||
type: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you let user set the log-level (as we previously discussed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it looks like users only can set verbose or non-verbose. Do you think EnableVerbose
is more meaningful?
k8s/cmd/commands/csi/csi.go
Outdated
|
||
var csiExample = util.Examples(` | ||
# start the csidriver with the specific endpoint and node id | ||
vineyardctl csidriver --endpoint=unix:///csi/csi.sock --nodeid=csinode1`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the nodeid
means?
k8s/csitest.Dockerfile
Outdated
|
||
USER root | ||
|
||
CMD ["go", "test", "-run", "./..."] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need a separate docker file for go test
?
|
||
Based on the above results, we can find that the read time of vineyard is nearly a constant, which is not affected by the data scale. The reason is that the data is stored in the shared memory of vineyard cluster, so it's actually a pointer copy operation. | ||
|
||
As a result, we can find that with vineyard, the argo workflow duration of the pipeline is reduced by 10%~20% and the actual execution time of the pipeline is reduced by about 30%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the readme to the tutorial directory , same as what you have done for another "e xample" under the k8s/
directory:
https://github.com/v6d-io/v6d/tree/main/docs/tutorials/kubernetes
examples. * Add a new CRD CSIDriver to manage the deployment and status of the vineyard csi driver. * Add the CSIDriver benchmark test and relevant doc. * Introduce the kubernetes csi test to enhancing the robustness of the vineyard CSI Driver. * Generate the doc for CSIDriver CRD. Signed-off-by: Ye Cao <[email protected]>
* Update the golang-1.19 to gcr.io/distroless/base:debug as csidriver's base image. * Improve the csidriver example readme. Signed-off-by: Ye Cao <[email protected]>
05991dd
to
61f124d
Compare
* Add the deploy/delete csidriver API in the vineyardctl. * Update the CRD API doc and vineyardctl doc. * Update the helm chart. * Update the base image from gcr.io/distroless/static:nonroot to gcr.io/distroless/base:debug. Signed-off-by: Ye Cao <[email protected]>
61f124d
to
e6fd19c
Compare
The new commit mainly contains 3 parts:
@sighingnow Could you please take another look? Thanks. |
docs/tutorials/kubernetes.rst
Outdated
@@ -9,6 +9,7 @@ Vineyard on Kubernetes | |||
./kubernetes/using-vineyard-operator.rst | |||
./kubernetes/ml-pipeline-mars-pytorch.rst | |||
./kubernetes/data-sharing-with-vineyard-on-kubernetes.rst | |||
./kubernetes/speed-up-pipelines-based-on-volumes-with-vineyard-csi-driver.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speed-up-pipelines-based-on-volumes-with-vineyard-csi-driver
Efficient data sharing in Kubeflow with Vineyard CSI driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
k8s/Makefile
Outdated
@@ -223,9 +224,11 @@ vendor: | |||
docker-build: | |||
cd .. && \ | |||
if docker build --help | grep -q load; then \ | |||
docker build --load -f k8s/Dockerfile . -t $(IMG); \ | |||
docker build --load -f k8s/Dockerfile . \ | |||
-t $(IMG); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this part untouched. No need to add a newline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Ye Cao <[email protected]>
904092a
to
044c84a
Compare
What do these changes do?
Related issue number
Fixes parts of #1528