Skip to content

Commit

Permalink
docs: change_for_airflow_provider_demo_of_integration_folder (#3467)
Browse files Browse the repository at this point in the history
  • Loading branch information
TanZiYen authored Oct 12, 2023
1 parent 72a49d8 commit 15a7d46
Show file tree
Hide file tree
Showing 12 changed files with 157 additions and 12 deletions.
145 changes: 145 additions & 0 deletions docs/en/integration/deploy_integration/airflow_provider_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Airflow
We provide [Airflow OpenMLDB Provider](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb), which facilitates the integration of OpenMLDB with Airflow DAG.

This specific case will undergo training and execution with Airflow's [TalkingData](https://chat.openai.com/talkingdata_demo).

## TalkingData DAG

To implement this workflow in Airflow, a DAG (Directed Acyclic Graph) file needs to be written. Here we use an example DAG file in [example_openmldb_complex.py](https://github.com/4paradigm/OpenMLDB/blob/main/extensions/airflow-provider-openmldb/openmldb_provider/example_dags/example_openmldb_complex.py).

![airflow dag](images/airflow_dag.png)

The diagram above illustrates the work process in DAG. It begins by creating a table, followed by offline data loading, feature extraction, and model training. If the model trained performs well (AUC >= 99.0), the workflow proceeds to execute deploy SQL and model serving online. Otherwise, a failure report is generated.

In the following demonstration, you can directly import this DAG and run in Airflow.


## Demonstration

We import the above DAG to perform feature computation and deployment for the TalkingData Demo, then perform real-time inference using the predict server of TalkingData Demo.

### Preparation

#### Download DAG

Along with the DAG files, training scripts are also required. For convenience, we provide the [code package](https://openmldb.ai/download/airflow_demo/airflow_demo_files.tar.gz) for direct download. If you prefer to use the latest version, you can obtain it from [github example_dags](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb/openmldb_provider/example_dags).

```
wget https://openmldb.ai/download/airflow_demo/airflow_demo_files.tar.gz
tar zxf airflow_demo_files.tar.gz
ls airflow_demo_files
```
#### Start Docker Image

For smooth function, we recommend starting OpenMLDB using the docker image and installing Airflow within the docker container.

Since Airflow Web requires an external port for login, the container's port must be exposed. Then map the downloaded file from the previous step to the `/work/airflow/dags` directory. This step is crucial for Airflow to load the DAGs from this folder correctly.

```
docker run -p 8080:8080 -v `pwd`/airflow_demo_files:/work/airflow_demo_files -it 4pdosc/openmldb:0.8.0 bash
```

#### Download and Install Airflow and Airflow OpenMLDB Provider
In the docker container, execute:
```
pip3 install airflow-provider-openmldb
```
Airflow will be downloaded as a dependency.

#### Source Data and DAG Preparation
Copy the sample data file, named `/tmp/train_sample.csv`, to the tmp directory. Airflow DAG files and training scripts used in the DAG must also be copied to the Airflow directory.

```
cp /work/airflow_demo_files/train_sample.csv /tmp/
mkdir -p /work/airflow/dags
cp /work/airflow_demo_files/example_openmldb_complex.py /work/airflow_demo_files/xgboost_train_sample.py /work/airflow/dags
```

### Step 1: Start OpenMLDB and Airflow
The command provided below will initiate the OpenMLDB cluster, enabling support for predict server and Airflow standalone.
```
/work/init.sh
python3 /work/airflow_demo_files/predict_server.py --no-init > predict.log 2>&1 &
export AIRFLOW_HOME=/work/airflow
cd $AIRFLOW_HOME
airflow standalone
```

Airflow standalone will show username and password as shown below.

![airflow login](images/airflow_login.png)

In Airflow Web interface at `http://localhost:8080`, enter username and password.

```{caution}
`airflow standalone` is a front-end program that exits with Airflow. You can exit Airflow after DAG completion to run [Step 3-Testing](#3-Testing), or place the Airflow process in the background.
```

### Step 2: Running DAG

To check the status of the DAG "example_openmldb_complex" in Airflow Web, click on the DAG and select the `Code` tab, as shown below.

![dag home](images/dag_home.png)

In this code, you will notice the usage of `openmldb_conn_id`, as depicted in the following figure. The DAG doesn't directly employ the address of OpenMLDB; instead, it uses a connection, so you need to create a new connection with the same name.

![dag code](images/dag_code.png)

#### Create Connection
Click on connections in the Admin tab.
![connection](images/connection.png)

Add the connection.
![add connection](images/add_connection.png)

The Airflow OpenMLDB Provider is linked to the OpenMLDB API Server. Therefore, you should provide the address of the OpenMLDB API Server in this configuration, rather than the Zookeeper address.

![connection settings](images/connection_settings.png)

The completed connection is shown in the figure below.
![display](images/connection_display.png)

#### Running DAG
Run the DAG to complete the training of the model, SQL deployment, and model deployment. A successful operation will yield results similar to the figure below.
![dag run](images/dag_run.png)

### Step 3: Test

If Airflow is currently running in the foreground within the container, you may exit the process now. The upcoming tests will not be dependent on Airflow.

#### Online Data Import
The SQL and model deployment have been successfully executed in the Airflow DAG. However, there is currently no data in the online storage, necessitating an online data import.

```
curl -X POST http://127.0.0.1:9080/dbs/example_db -d'{"mode":"online", "sql":"load data infile \"file:///tmp/train_sample.csv\" into table example_table options(mode=\"append\");"}'
```

This import process is asynchronous, but since the data volume is small, it will be completed quickly. You can monitor the status of the import operations by using the `SHOW JOBS` command.
```
curl -X POST http://127.0.0.1:9080/dbs/example_db -d'{"mode":"online", "sql":"show jobs"}'
```

#### Prediction
Execute the prediction script to make a prediction using the newly deployed SQL and model.
```
python3 /work/airflow_demo_files/predict.py
```
The result is as shown.
![result](images/airflow_test_result.png)


### Non-Interactive Testing

Check if DAG has been successfully loaded:
```
airflow dags list | grep openmldb
```
Add required connection:
```
airflow connections add openmldb_conn_id --conn-uri http://127.0.0.1:9080
airflow connections list --conn-id openmldb_conn_id
```
DAG test:
```
airflow dags test example_openmldb_complex 2022-08-25
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 12 additions & 12 deletions docs/zh/integration/deploy_integration/airflow_provider_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ DAG流程如上图所示,首先建表,然后进行离线数据导入与特

我们导入上述的DAG完成TalkingData Demo中的特征计算与上线,并使用TalkingData Demo的predict server来进行上线后的实时推理测试。

### 0 准备
### 准备工作

#### 0.1 下载DAG
#### 下载DAG

除了DAG文件,还需要训练的脚本,所以我们提供了[下载包](https://openmldb.ai/download/airflow_demo/airflow_demo_files.tar.gz),可以直接下载。如果想要使用最新版本,请在[github example_dags](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb/openmldb_provider/example_dags)中获取。

Expand All @@ -28,7 +28,7 @@ wget https://openmldb.ai/download/airflow_demo/airflow_demo_files.tar.gz
tar zxf airflow_demo_files.tar.gz
ls airflow_demo_files
```
#### 0.2 启动镜像
#### 启动镜像

我们推荐使用docker镜像直接启动OpenMLDB,并在docker内部安装启动Airflow。

Expand All @@ -38,22 +38,22 @@ ls airflow_demo_files
docker run -p 8080:8080 -v `pwd`/airflow_demo_files:/work/airflow_demo_files -it 4pdosc/openmldb:0.8.3 bash
```

#### 0.3 下载安装Airflow与Airflow OpenMLDB Provider
#### 下载安装Airflow与Airflow OpenMLDB Provider
在docker容器中,执行:
```
pip3 install airflow-provider-openmldb
```
由于airflow-provider-openmldb依赖airflow,所以会一起下载。

#### 0.4 源数据与DAG准备
#### 源数据与DAG准备
由于在DAG中导入数据用的文件为`/tmp/train_sample.csv`,所以我们需要将sample数据文件拷贝到tmp目录。Airflow 的DAG文件和DAG中使用的训练脚本也需要拷贝到airflow目录中。
```
cp /work/airflow_demo_files/train_sample.csv /tmp/
mkdir -p /work/airflow/dags
cp /work/airflow_demo_files/example_openmldb_complex.py /work/airflow_demo_files/xgboost_train_sample.py /work/airflow/dags
```

### 1 启动OpenMLDB与Airflow
### 步骤1:启动OpenMLDB与Airflow
以下命令将启动OpenMLDB cluster,支持上线并测试的predict server,与Airflow standalone。
```
/work/init.sh
Expand All @@ -73,7 +73,7 @@ Airflow standalone运行输出将提示登录用户名和密码,如下图所
`airflow standalone`为前台程序,退出即airflow退出。你可以在dag运行完成后再退出airflow进行[第三步————测试](#3-测试),或者将airflow进程放入后台。
```

### 2 运行DAG
### 步骤2:运行DAG
在Airflow Web中点击DAG example_openmldb_complex,可以点击`Code`查看DAG的详情,见下图。

![dag home](images/dag_home.png)
Expand All @@ -82,7 +82,7 @@ Airflow standalone运行输出将提示登录用户名和密码,如下图所

![dag code](images/dag_code.png)

#### 2.1 创建connection
#### 创建connection
在管理界面中点击connection。
![connection](images/connection.png)

Expand All @@ -96,15 +96,15 @@ Airflow OpenMLDB Provider是连接OpenMLDB Api Server的,所以此处配置中
创建完成后的connection如下图所示。
![display](images/connection_display.png)

#### 2.2 运行DAG
#### 运行DAG
运行dag,即完成一次训练模型、sql部署与模型部署。成功运行的结果,类似下图。
![dag run](images/dag_run.png)

### 3 测试
### 步骤3:测试

Airflow如果在容器中是前台运行的,现在可以退出,以下测试将不依赖airflow。

#### 3.1 在线导入
#### 在线导入
Airflow DAG中完成了SQL和模型的上线。但在线存储中还没有数据,所以我们需要做一次在线数据导入。
```
curl -X POST http://127.0.0.1:9080/dbs/example_db -d'{"mode":"online", "sql":"load data infile \"file:///tmp/train_sample.csv\" into table example_table options(mode=\"append\");"}'
Expand All @@ -115,7 +115,7 @@ curl -X POST http://127.0.0.1:9080/dbs/example_db -d'{"mode":"online", "sql":"lo
curl -X POST http://127.0.0.1:9080/dbs/example_db -d'{"mode":"online", "sql":"show jobs"}'
```

#### 3.2 预测
#### 预测
执行预测脚本,进行一次预测,预测将使用新部署好的sql与模型。
```
python3 /work/airflow_demo_files/predict.py
Expand Down

0 comments on commit 15a7d46

Please sign in to comment.