-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to deployment guides #3994
Updates to deployment guides #3994
Conversation
Signed-off-by: davidmirror-ops <[email protected]>
This currently assumes that you have nginx ingress. We'll be updating these | ||
in the near future to use the ALB ingress controller instead. | ||
This section assumes that you're using the NGINX Ingress controller. Instructions and annotations for the ALB controller | ||
are covered in the `Flyte The Hard Way <https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/06-intro-to-ingress.md#setting-up-amazons-load-balancer-alb-ingress-controller>`__ tutorial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI these were the final annotations that made things work correctly for me
{
"alb.ingress.kubernetes.io/certificate-arn": helm.values.userSettings.certificateArn
"alb.ingress.kubernetes.io/group.name": "flyte"
"alb.ingress.kubernetes.io/listen-ports": '[{"HTTPS":443}]'
"alb.ingress.kubernetes.io/scheme": "internet-facing"
"alb.ingress.kubernetes.io/ssl-redirect": "443"
"alb.ingress.kubernetes.io/target-type": "ip"
"kubernetes.io/ingress.class": "alb"
"alb.ingress.kubernetes.io/inbound-cidrs": "xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx"
}```
may interact with other parts of your stack. | ||
|
||
In addition to searching and posting on the `Flyte Slack community <https://flyte-org.slack.com/archives/C01P3B761A6>`__, | ||
Considering the above, we recommend checking out the `"Flyte The Hard Way" <https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main#flyte-the-hard-way>`__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would avoid pointing to an external guide, give also that the guide you are referring to is mostly referring to the single binary deployment, which might be confusing. I think all documentation should be in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. A second iteration of updates to these guides should be pointed to extend the reach and make it more actionable. Also expanding the tutorial in the https://github.com/unionai-oss/deploy-flyte repo to cover flyte-core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we forked your guide, @davidmirror-ops , and moved it to flyteorg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eapolinario I was thinking of something similar. Moving forward, the idea is that TF is the preferred approach but a manual guide is always a good resource
To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to admin (for example, within Lyft, `Confidant <https://github.com/lyft/confidant>`__ is used). | ||
.. prompt:: bash $ | ||
|
||
kubectl create token <service-account-name> -n <namespace> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expand on what service-account-name should be here.
If I understand correctly this is a separate service account, created in the data plane, which will be assume by the flyteadmin service account in the control plane to generate all the required resources in the data plane, right?
|
||
kubectl create token <service-account-name> -n <namespace> | ||
|
||
To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to ``FlyteAmin``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use the word "upload" which might be misleading. I'd say that you need update your control plane configuration with the new secrets
cluster_2_cacert: {{ cluster 2 cacert here }} | ||
cluster_3_token: {{ cluster 3 token here }} | ||
cluster_3_cacert: {{ cluster 3 cacert here }} | ||
cluster_1_token: "cluster-1-token-here" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming that we have 3 dataplanes or 2? Since in the first case we only need 2 entries. I'd also specify how to change the names of the clusters
Signed-off-by: davidmirror-ops <[email protected]>
rsts/deployment/deployment/index.rst
Outdated
@@ -108,7 +85,7 @@ There are three different paths for deploying a Flyte cluster: | |||
This option is appropriate if all your compute can `fit on one EKS cluster <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html>`__ . | |||
As of this writing, a single Flyte cluster can handle more than 13,000 nodes. | |||
|
|||
Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required. | |||
Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -tha main data plane component- can be sharded as well, if scale demands require it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: tha -> the
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
:language: yaml | ||
:lines: 127-135 | ||
|
||
.. group-tab:: ``flyte-binary`` on EKS using ALB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this also the one used for flyte-core using ALB? I'd add that too
@@ -49,29 +49,6 @@ deployment comes with a containerized `Minio <https://min.io/>`__, which offers | |||
- **GCP**: `GCS <https://cloud.google.com/storage/>`__ | |||
- **Azure**: `Azure Blob Storage <https://azure.microsoft.com/en-us/products/storage/blobs>`__ | |||
|
|||
|
|||
Cluster Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just not sure how it fits here. Additionally is not clear what specific K8s resources it refers to (besides namespaces for projects, which is only an example). So this relationship between Flyte objects and K8s resources is worth documenting, but I'm not sure it fits in this sort of pre-requisites section.
rsts/deployment/deployment/index.rst
Outdated
@@ -108,7 +85,7 @@ There are three different paths for deploying a Flyte cluster: | |||
This option is appropriate if all your compute can `fit on one EKS cluster <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html>`__ . | |||
As of this writing, a single Flyte cluster can handle more than 13,000 nodes. | |||
|
|||
Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required. | |||
Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well, if scale demands require it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what the word "shared" means here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference to the sharding
mechanism
|
||
.. tabbed:: AWS | ||
|
||
1. An IAM Policy that defines the permissions needed for Flyte. A minimum set of permissions include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually need 2 sets of policies, one for the control plane and one for the data plane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you defining different permissions for control and data plane? The set of perms indicated here work for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally they should be different, but initially they may be the same
.. code-block:: json | ||
|
||
|
||
2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controlplane -> control plane
namespace: flyte | ||
type: Opaque | ||
stringData: | ||
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to put an actual secret here
cluster_3_cacert: {{ cluster 3 cacert here }} | ||
|
||
Create cluster credentials secret in the control plane cluster. | ||
stringData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use directly encoded data here since the new lines are error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
So you mean also modifying the previous command to avoid the -D
and also using a data
section, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
||
Create cluster credentials secret in the control plane cluster. | ||
stringData: | ||
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an actual token
stringData: | ||
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w | ||
dataplane_1_cacert: | | ||
-----BEGIN CERTIFICATE----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an actual cert
team1: | ||
- id: cluster_1 | ||
label1: | ||
- id: dataplane_1 | ||
weight: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to explain here what these weight values are
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
@gdabisias Thanks for your careful reviews! |
|
||
.. code-block:: yaml | ||
|
||
configmap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need the configuration for datacatalog; in my case I had to also add the endpoint for that.
Note that this should be moved to the default charts and not left to the user
|
||
"oidc.eks.<AWS-REGION-CODE>.amazonaws.com/id/<DATAPLANE1-OIDC-PROVIDER>:sub": [ | ||
"system:serviceaccount:flyte:flytepropeller", | ||
"system:serviceaccount:*:default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd explain here that ideally we would add only the namespaces needed, i.e. project-env, so that users can be more specific if they want
Signed-off-by: davidmirror-ops <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, left just a couple of comments.
@@ -132,7 +132,7 @@ ingress: | |||
nginx.ingress.kubernetes.io/app-root: /console | |||
grpcAnnotations: | |||
nginx.ingress.kubernetes.io/backend-protocol: GRPC | |||
host: development.uniondemo.run | |||
host: development.uniondemo.run # change for the URL you'll use to connect to Flyte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an inexistent url? Maybe rename this to '<your.flyte.url>' ?
@@ -41,7 +41,7 @@ Requirements | |||
- Install `docker <https://docs.docker.com/engine/install/>`__ or any other OCI-compatible tool, like Podman or LXD. | |||
- Install `flytectl <https://github.com/flyteorg/flytectl>`__, the official CLI for Flyte. | |||
|
|||
While Flyte can run any OCI-compatible task image, using the default Kubernetes container runtime (cri-o), the Flyte | |||
While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (cri-o), the Flyte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the default k8s container runtime containerd?
may interact with other parts of your stack. | ||
|
||
In addition to searching and posting on the `Flyte Slack community <https://flyte-org.slack.com/archives/C01P3B761A6>`__, | ||
Considering the above, we recommend checking out the `"Flyte The Hard Way" <https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main#flyte-the-hard-way>`__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we forked your guide, @davidmirror-ops , and moved it to flyteorg?
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Looks all good :) |
Tracking issue
Closes #3973
Describe your changes
Check all the applicable boxes
Screenshots
Note to reviewers