Skip to content

Commit

Permalink
Merge branch 'master' of github.com:flyteorg/flyte into fix-databrick…
Browse files Browse the repository at this point in the history
…s-plugin
  • Loading branch information
pingsutw committed Apr 12, 2024
2 parents 109e5d8 + c7d1463 commit cfc6b06
Show file tree
Hide file tree
Showing 58 changed files with 11,022 additions and 660 deletions.
2 changes: 1 addition & 1 deletion charts/flyte-core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ helm install gateway bitnami/contour -n flyte
| flytescheduler.tolerations | list | `[]` | tolerations for Flytescheduler deployment |
| secrets.adminOauthClientCredentials.clientId | string | `"flytepropeller"` | |
| secrets.adminOauthClientCredentials.clientSecret | string | `"foobar"` | |
| secrets.adminOauthClientCredentials.enabled | bool | `true` | If enabled is true, helm will create and manage `flyte-secret-auth` and populate it with `clientSecret`. If enabled is false, it's up to the user to create `flyte-secret-auth` as described in https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#oauth2-authorization-server |
| secrets.adminOauthClientCredentials.enabled | bool | `true` | |
| sparkoperator | object | `{"enabled":false,"plugin_config":{"plugins":{"spark":{"spark-config-default":[{"spark.hadoop.fs.s3a.aws.credentials.provider":"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"},{"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version":"2"},{"spark.kubernetes.allocation.batch.size":"50"},{"spark.hadoop.fs.s3a.acl.default":"BucketOwnerFullControl"},{"spark.hadoop.fs.s3n.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3n.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3a.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.multipart.threshold":"536870912"},{"spark.blacklist.enabled":"true"},{"spark.blacklist.timeout":"5m"},{"spark.task.maxfailures":"8"}]}}}}` | Optional: Spark Plugin using the Spark Operator |
| sparkoperator.enabled | bool | `false` | - enable or disable Sparkoperator deployment installation |
| sparkoperator.plugin_config | object | `{"plugins":{"spark":{"spark-config-default":[{"spark.hadoop.fs.s3a.aws.credentials.provider":"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"},{"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version":"2"},{"spark.kubernetes.allocation.batch.size":"50"},{"spark.hadoop.fs.s3a.acl.default":"BucketOwnerFullControl"},{"spark.hadoop.fs.s3n.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3n.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3a.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.multipart.threshold":"536870912"},{"spark.blacklist.enabled":"true"},{"spark.blacklist.timeout":"5m"},{"spark.task.maxfailures":"8"}]}}}` | Spark plugin configuration |
Expand Down
2 changes: 1 addition & 1 deletion charts/flyte-core/templates/common/secret-auth.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.secrets.adminOauthClientCredentials.enabled }}
{{- if and (.Values.secrets.adminOauthClientCredentials.enabled) (not (empty .Values.secrets.adminOauthClientCredentials.clientSecret)) }}
apiVersion: v1
kind: Secret
metadata:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -298,9 +298,12 @@ deployRedoc: false

secrets:
adminOauthClientCredentials:
# -- If enabled is true, helm will create and manage `flyte-secret-auth` and populate it with `clientSecret`.
# If enabled is false, it's up to the user to create `flyte-secret-auth` as described in
# If enabled is true, and `clientSecret` is specified, helm will create and mount `flyte-secret-auth`.
# If enabled is true, and `clientSecret` is null, it's up to the user to create `flyte-secret-auth` as described in
# https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#oauth2-authorization-server
# and helm will mount `flyte-secret-auth`.
# If enabled is false, auth is not turned on.
# Note: Unsupported combination: enabled.false and clientSecret.someValue
enabled: true
clientSecret: "<>" # put the secret for the confidential client flytepropeller defined in the IDP
clientId: "flytepropeller" #use this client id and secret in the flytectl config with ClientSecret option
Expand Down
7 changes: 5 additions & 2 deletions charts/flyte-core/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -430,9 +430,12 @@ deployRedoc: false

secrets:
adminOauthClientCredentials:
# -- If enabled is true, helm will create and manage `flyte-secret-auth` and populate it with `clientSecret`.
# If enabled is false, it's up to the user to create `flyte-secret-auth` as described in
# If enabled is true, and `clientSecret` is specified, helm will create and mount `flyte-secret-auth`.
# If enabled is true, and `clientSecret` is null, it's up to the user to create `flyte-secret-auth` as described in
# https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#oauth2-authorization-server
# and helm will mount `flyte-secret-auth`.
# If enabled is false, auth is not turned on.
# Note: Unsupported combination: enabled.false and clientSecret.someValue
enabled: true
clientSecret: foobar
clientId: flytepropeller
Expand Down
8 changes: 4 additions & 4 deletions charts/flyte/README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion charts/flyte/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ flyte:
container: container
sidecar: sidecar
container_array: k8s-array
bigquery_query_job_task: agent-service
sensor: agent-service


# -- Kubernetes specific Flyte configuration
Expand Down
6 changes: 3 additions & 3 deletions deployment/sandbox/flyte_helm_generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -634,9 +634,9 @@ data:
tasks:
task-plugins:
default-for-task-types:
bigquery_query_job_task: agent-service
container: container
container_array: k8s-array
sensor: agent-service
sidecar: sidecar
enabled-plugins:
- container
Expand Down Expand Up @@ -7173,7 +7173,7 @@ spec:
template:
metadata:
annotations:
configChecksum: "4fd54a75274d84bbb9a90cc421f7aece12c202911984a436a9ec5fe52e942eb"
configChecksum: "673119651fe870e114e1b95cfbc27a6e5c2418215569ab9d0b9451385c32a51"
labels:
app.kubernetes.io/name: flytepropeller
app.kubernetes.io/instance: flyte
Expand Down Expand Up @@ -7247,7 +7247,7 @@ spec:
app.kubernetes.io/name: flyte-pod-webhook
app.kubernetes.io/version: v1.11.1-b1
annotations:
configChecksum: "4fd54a75274d84bbb9a90cc421f7aece12c202911984a436a9ec5fe52e942eb"
configChecksum: "673119651fe870e114e1b95cfbc27a6e5c2418215569ab9d0b9451385c32a51"
spec:
securityContext:
fsGroup: 65534
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/complete-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: SDRTOVJwQzU0WURYTG1NbQ==
haSharedSecret: WlVScnNIb3I2RFM4UFhrcA==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1412,7 +1412,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: 1d977a1daf6338c6d55444d6c0565a40353efd71d0a8bef422cfc6387b20a39f
checksum/secret: a041f8b1e9c41f465e4f113957cc10f1b48b2e259a5d193657571ae597305e2c
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/complete.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -796,7 +796,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: MGs1QlJSY2VKM3I0cEQ2bw==
haSharedSecret: VU5MNDc1MDZUU05OWmZOYw==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1360,7 +1360,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: d2a40d222d6f4b81e6186400d7fc9818c90e07068ccc2569cfdb212ad7782e98
checksum/secret: 0c9fcdc5ba4f5091dbd31e0a907c4748391313df162b5e1d3ace3084b62cdd40
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ metadata:
---
apiVersion: v1
data:
haSharedSecret: SVFrS2JhOWVndXFEYlE3WA==
haSharedSecret: RXhwTzhZT25HZzJjdUllSQ==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -934,7 +934,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: b5ff29721af068e75a80eff30c7402def61a64a87c73e8e716d5d06cf05c4bd8
checksum/secret: 6f8a6d8c2b4e54840abf28822833192923adeb062f926c962e8e0785b96877d5
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
25 changes: 25 additions & 0 deletions docs/community/troubleshoot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,31 @@ Depending on the contents of the logs or the `Events`, you can try different thi
Debugging common execution errors
----------------------------------

``Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This error will show if you are not running Docker with the native Docker engine in a Linux machine. Most probably you are running Docker via Docker Desktop.

- If you are using Docker Desktop in MacOs, run:

.. prompt:: bash $

sudo ln -s ~/Library/Containers/com.docker.docker/Data/docker.raw.sock /var/run/docker.sock

- If you are using Docker Desktop in Linux, run:

.. prompt:: bash $

sudo ln -s ~$USER/.docker/desktop/docker.sock /var/run/docker.sock

- If you are using another tool to run Docker, you need to make sure that ``/var/run/docker.sock`` is linked to the correct socket file.

For example, if you are using Rancher Desktop on Linux, run:

.. prompt:: bash $

sudo ln -s ~$USER/.rd/docker.sock /var/run/docker.sock

``message: '0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.'``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
12 changes: 12 additions & 0 deletions docs/deployment/agents/chatgpt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ Specify agent configuration
agent-service:
supportedTaskTypes:
- chatgpt
# Configuring the timeout is optional.
# Tasks like using ChatGPT with a large model might require a longer time,
# so we have the option to adjust the timeout setting here.
defaultAgent:
timeouts:
ExecuteTaskSync: 10s
.. group-tab:: Flyte core

Expand Down Expand Up @@ -66,6 +72,12 @@ Specify agent configuration
agent-service:
supportedTaskTypes:
- chatgpt
# Configuring the timeout is optional.
# Tasks like using ChatGPT with a large model might require a longer time,
# so we have the option to adjust the timeout setting here.
defaultAgent:
timeouts:
ExecuteTaskSync: 10s
Add the OpenAI API token
-------------------------------
Expand Down
11 changes: 8 additions & 3 deletions docs/deployment/configuration/auth_setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -346,8 +346,12 @@ Apply OIDC Configuration
secrets:
adminOauthClientCredentials:
# -- If enabled is true, helm will create and manage `flyte-secret-auth` and populate it with `clientSecret`.
# If enabled is false, it's up to the user to create `flyte-secret-auth`
# If enabled is true, and `clientSecret` is specified, helm will create and mount `flyte-secret-auth`.
# If enabled is true, and `clientSecret` is null, it's up to the user to create `flyte-secret-auth` as described in
# https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#oauth2-authorization-server
# and helm will mount `flyte-secret-auth`.
# If enabled is false, auth is not turned on.
# Note: Unsupported combination: enabled.false and clientSecret.someValue
enabled: true
# Use the non-encoded version of the random password
clientSecret: "<your-random-password>"
Expand Down Expand Up @@ -677,7 +681,8 @@ Alternatively, you can instruct Helm not to create and manage the secret for ``f
secrets:
adminOauthClientCredentials:
enabled: false #set to false
enabled: true # enable mounting the flyte-secret-auth secret to the flytepropeller.
clientSecret: null # disable Helm from creating the flyte-secret-auth secret.
# Replace with the client_id provided by provided by your IdP for flytepropeller.
clientId: <client_id>
Expand Down
56 changes: 48 additions & 8 deletions docs/deployment/configuration/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,55 @@ Use Published Dashboards to Monitor Flyte Deployment

Flyte Backend is written in Golang and exposes stats using Prometheus. The stats are labeled with workflow, task, project & domain, wherever appropriate.

The dashboards are divided into two types:
Both ``flyteadmin`` and ``flytepropeller`` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus:

- **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks.
The user-facing dashboard is published under Grafana marketplace ID `13980 <https://grafana.com/grafana/dashboards/13980>`__.
The user-facing dashboard is published under ID `13980 <https://grafana.com/grafana/dashboards/13980>`__ in the Grafana marketplace.

- **System Dashboards**: Dashboards that are useful for the system maintainer to maintain their Flyte deployments. These are further divided into:
- DataPlane/FlytePropeller dashboards published @ `13979 <https://grafana.com/grafana/dashboards/13979>`__
- ControlPlane/Flyteadmin dashboards published @ `13981 <https://grafana.com/grafana/dashboards/13981>`__
- **System Dashboards**: Dashboards that are useful for the system maintainer to investigate the status and performance of their Flyte deployments. These are further divided into:
- `DataPlane/FlytePropeller <https://grafana.com/grafana/dashboards/13979>`__: execution engine status and performance.
- `ControlPlane/Flyteadmin <https://grafana.com/grafana/dashboards/13981>`__: API-level monitoring.

The corresponding JSON files for each dashboard are also located at ``deployment/stats/prometheus``.

.. note::

The dashboards are basic dashboards and do not include all the metrics exposed by Flyte.
Feel free to use the scripts provided `here <https://github.com/flyteorg/flyte/tree/master/stats>`__ to improve and -hopefully- contribute the improved dashboards.

How to use the dashboards
~~~~~~~~~~~~~~~~~~~~~~~~~

1. We recommend installing and configuring the Prometheus operator as described in `their docs <https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md>`__.
This is especially true if you plan to use the Service Monitors provided by the `flyte-core <https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/templates/propeller/service-monitor.yaml>`__ Helm chart.

2. Enable the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource:

.. code-block:: yaml
spec:
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
.. note::

The above example configuration lets Prometheus use any ``ServiceMonitor`` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed.

3. Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your ``values`` file:

.. code-block:: yaml
flyteadmin:
serviceMonitor:
enabled: true
flytepropeller:
serviceMonitor:
enabled: true
.. note::

By default, the ``ServiceMonitor`` is configured with a ``scrapeTimeout`` of 30s and and ``interval`` of 60s. You can customize these values if needed.

With the above configuration in place you should be able to import the dashboards in your Grafana instance.

The above mentioned are basic dashboards and do no include all the metrics exposed by Flyte.
Please help us improve the dashboards by contributing to them 🙏.
Refer to the build scripts `here <https://github.com/flyteorg/flyte/tree/master/stats>`__.
2 changes: 1 addition & 1 deletion docs/deployment/plugins/k8s/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Select the integration you need and follow the steps to install the correspondin

.. code-block:: bash
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm repo add spark-operator https://kubeflow.github.io/spark-operator
To install the Spark operator, run the following command:

Expand Down
10 changes: 6 additions & 4 deletions docs/flyte_agents/developing_agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,10 +184,7 @@ kubectl set image deployment/flyteagent flyteagent=ghcr.io/flyteorg/flyteagent:l
kubectl rollout restart deployment flytepropeller -n flyte
```

### 5.


### Canary deployment
### 5. Canary deployment

Agents can be deployed independently in separate environments. Decoupling agents from the
production environment ensures that if any specific agent encounters an error or issue, it will not impact the overall production system.
Expand All @@ -210,7 +207,12 @@ you can route particular task requests to designated agent services by adjusting
endpoint: "dns:///flyteagent.flyte.svc.cluster.local:8000"
insecure: true
timeouts:
# CreateTask, GetTask and DeleteTask are for async agents.
# ExecuteTaskSync is for sync agents.
CreateTask: 5s
GetTask: 5s
DeleteTask: 5s
ExecuteTaskSync: 10s
defaultTimeout: 10s
agents:
custom_agent:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ tasks:
- sidecar
- K8S-ARRAY
default-for-task-types:
- bigquery_query_job_task: agent-service
- sensor: agent-service
- container: container
- container_array: K8S-ARRAY
```
Expand All @@ -69,7 +69,12 @@ plugins:
endpoint: "localhost:8000" # your grpc agent server port
insecure: true
timeouts:
GetTask: 10s
# CreateTask, GetTask and DeleteTask are for async agents.
# ExecuteTaskSync is for sync agents.
CreateTask: 5s
GetTask: 5s
DeleteTask: 5s
ExecuteTaskSync: 10s
defaultTimeout: 10s
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,17 @@ conda activate flyte-example

Next, initialize your Flyte project. The [flytekit-python-template GitHub repository](https://github.com/flyteorg/flytekit-python-template) contains Flyte project templates with sample code that you can run as is or modify to suit your needs.

In this example, we will initialize the [basic-example-imagespec project template](https://github.com/flyteorg/flytekit-python-template/tree/main/basic-example-imagespec).
In this example, we will initialize the [basic-template-imagespec project template](https://github.com/flyteorg/flytekit-python-template/tree/main/basic-template-imagespec).

```{prompt} bash $
pyflyte init my_project
```

:::{note}

To initialize a Flyte project with a different template, use the `--template` parameter:
If you need to use a Dockerfile for your project, you can initialize the Dockerfile template:

`pyflyte init --template hello-world hello-world`
`pyflyte init --template basic-template-dockerfile my_project`
:::

### 3. Install additional requirements
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(flyte_project_components)=
# Flyte project components

A Flyte project is a directory containing task and workflow code, internal Python source code, configuration files, and other artifacts required to package up your code so that it can be run on a Flyte cluster.
Expand Down Expand Up @@ -26,13 +27,13 @@ You can specify pip-installable Python dependencies in your project by adding th
`requirements.txt` file.

```{note}
We recommend using [pip-compile](https://pip-tools.readthedocs.io/en/latest/) to
We recommend using [pip-compile](https://pip-tools.readthedocs.io/en/stable/) to
manage your project's Python requirements.
```

````{dropdown} See requirements.txt
```{rli} https://raw.githubusercontent.com/flyteorg/flytekit-python-template/main/simple-example/%7B%7Bcookiecutter.project_name%7D%7D/requirements.txt
```{rli} https://raw.githubusercontent.com/flyteorg/flytekit-python-template/main/basic-template-imagespec/%7B%7Bcookiecutter.project_name%7D%7D/requirements.txt
:caption: requirements.txt
```
Expand Down
Loading

0 comments on commit cfc6b06

Please sign in to comment.