-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Daria Bezkorovaina <[email protected]>
- Loading branch information
1 parent
a851199
commit 7b04897
Showing
2 changed files
with
15 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 10 additions & 8 deletions
18
learning/prompts/documentation-lookup/37_ai_tfx_orchestration.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,21 @@ | ||
Prompt: | ||
How can I orchestrate Apache Beam pipelines with TFX? | ||
|
||
|
||
Response: | ||
You can use Apache Beam for various tasks within your machine learning project, including data validation, preprocessing, model validation, and model deployment and inference. | ||
However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [KubeFlow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). | ||
|
||
TFX is an end-to-end platform for deploying production ML pipelines. TFX pipelines are a sequence of components that implement an ML pipeline. TFX pipeline DAG needs an orchestrator of it's own. Natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Airflow](https://airflow.apache.org/), and Apache Beam itself. | ||
However, a comprehensive ML workflow also involves additional steps like data exploration, feature engineering, and model training. Additionally, ML workflows require metadata and artifact tracking for reproducibility and auditability. Popular tools for building and managing ML workflows include [Kubeflow](https://www.kubeflow.org/), [TFX](https://www.tensorflow.org/tfx), and [Apache Airflow](https://airflow.apache.org/). | ||
|
||
TFX is an end-to-end platform designed for deploying production ML pipelines. TFX pipelines consist of a sequence of components that together form an ML pipeline. Each TFX pipeline DAG requires its own orchestrator. The natively supported orchestrators include [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/), [Apache Airflow](https://airflow.apache.org/), and Apache Beam itself. | ||
|
||
In order to run an Apache Beam pipeline as part of a TFX pipeline you need to: | ||
To integrate an Apache Beam pipeline into a TFX pipeline, you need to follow these steps: | ||
1. Create TFX components by specifying interfaces and containerizing the component implementation. | ||
2. Create TFX pipeline by connecting components, and defining how inputs and outputs are passed between the components. | ||
3. Compile and run TFX pipeline in the target environment. | ||
2. Create TFX pipeline by connecting these components and specifying how inputs and outputs are exchanged between them. | ||
3. Compile and execute the TFX pipeline in the target environment. | ||
|
||
TFX has a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for building ML pipelines. These components can be used as building blocks for creating custom components. TFX relies heavily on Apache Beam libraries to implement data-parallel pipelines in these components. You need to run components created with these libraries with one of the supported Apache Beam runners. | ||
TFX provides a rich set of [standard components](https://www.tensorflow.org/tfx/guide#tfx_standard_components) for constructing ML pipelines. These components can be used as building blocks for creating custom components. TFX heavily relies on Apache Beam libraries for implementing data-parallel pipelines within these components. You must execute components created using these libraries with one of the supported Apache Beam runners. | ||
|
||
To launch the pipeline, two configurations must be provided - orchestrator for the TFX pipeline and the pipeline options to run Apache Beam pipelines. | ||
To launch the pipeline, you must provide two configurations: the orchestrator for the TFX pipeline and the pipeline options necessary for running Apache Beam pipelines. | ||
|
||
The full TFX example code can again be found [here](The full TFX example code can again be found in the GitHub repository.). | ||
You can find the complete TFX example code in the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/ml-orchestration/tfx). |