Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Daria Bezkorovaina <[email protected]>
  • Loading branch information
Oleh Borysevych and dariabezkorovaina authored Mar 3, 2024
1 parent a851199 commit 7b04897
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 11 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Prompt:
How can I orchestrate Apache Beam pipelines with KubeFlow?
How can I orchestrate Apache Beam pipelines with Kubeflow?

Response:
You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
Expand All @@ -16,6 +16,7 @@ To run an Apache Beam pipeline as a part of a Kubeflow Pipeline (KFP), you need
3. Compile and run the KFP pipeline by submitting it to a KFP client endpoint.

Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing pipeline:

```
kfp
├── pipeline.py
Expand All @@ -30,9 +31,10 @@ Here is an example of a KFP pipeline orchestrating an Apache Beam preprocessing
```

Apache Beam pipelines can be used as KFP components consisting of a YAML specification `component.yaml` and a Python source file `preprocess.py`. The YAML file defines the input and output arguments of the component, while the Python file contains the Apache Beam pipeline code.
Since KFP provides input and output arguments as command-line arguments, an `ArgumentParser` is required. The component logic and requirements are containerized, and the container image is pushed to a container registry.

As a final step, the KFP pipeline is compiled into a JSON file and submitted to a KFP client endpoint using `pipeline.py`.
Since KFP provides the input and output arguments as command-line arguments, you need to use the `ArgumentParser` class. You containerize the logic and prerequisites of the pipeline components, and then push the resulting container image to a container registry.

In the final stage, the `compiler` method transforms the KFP pipeline into a JSON file, and the `pipeline.py` script submits it to a KFP client endpoint.

```python
Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
Expand Down
18 changes: 10 additions & 8 deletions learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
Prompt:
How can I orchestrate Apache Beam pipelines with TFX?


Response:
You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference.
However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).

TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself.
However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/).

TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Apache Airflow](https://airflow.apache.org/), and Apache Beam itself.

In order to run an Apache Beam pipeline as part of a TFX pipeline you need to:
To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps:
1. Create TFX components by specifying interfaces and containerizing the component implementation.
2. Create TFX pipeline by connecting components, and defining how inputs and outputs are passed between the components.
3. Compile and run TFX pipeline in the target environment.
2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them.
3. Compile and execute the TFX pipeline in the target environment.

TFX has a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for building ML pipelines. These components can be used as building blocks for creating custom components. TFX relies heavily on Apache Beam libraries to implement data-parallel pipelines in these components. You need to run components created with these libraries with one of the supported Apache Beam runners.
TFX provides a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners.

To launch the pipeline, two configurations must be provided - orchestrator for the TFX pipeline and the pipeline options to run Apache Beam pipelines.
To launch the pipeline, you must provide two configurations: the orchestrator for the TFX pipeline and the pipeline options necessary for running Apache Beam pipelines.

The full TFX example code can again be found [here](The full TFX example code can again be found in the GitHub repository.).
You can find the complete TFX example code in the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/tfx).

0 comments on commit 7b04897

Please sign in to comment.