Paddle Serving Using Baidu Kunlun Chips

(English|简体中文)

Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips (such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve the deployment capability on various heterogeneous hardware servers in the future.

Install docker images

We recommend using the docker deployment service. In the xpu environment, you can refer to the Docker image document to install the xpu image, and further complete tasks such as construction, installation, and deployment.

Compilation and installation

Refer to compile document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.

Compilatiton

Compile the Serving Server

cd Serving
mkdir -p server-build-arm && cd server-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DSERVER=ON ..
make -j10

You can run make install to produce the target in ./output directory. Add -DCMAKE_INSTALL_PREFIX=./output to specify the output path to CMake command shown above. Please specify -DWITH_MKL=ON on Intel CPU platform with AVX2 support.

Compile the Serving Client

mkdir -p client-build-arm && cd client-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DCLIENT=ON ..

make -j10

Compile the App

cd Serving 
mkdir -p app-build-arm && cd app-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DAPP=ON ..

make -j10

Install the wheel package

After the compilations stages above, the whl package will be generated in python/dist/ under the specific temporary directories. For example, after the Server Compiation step，the whl package will be produced under the server-build-arm/python/dist directory, and you can run pip install -u python/dist/*.whl to install the package.

Request parameters description

In order to deploy serving service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite，please specify the following parameters during deployment.

param	param description	about
use_lite	using Paddle-Lite Engine	use the inference capability of Paddle-Lite
use_xpu	using Baidu Kunlun for inference	need to be used with the use_lite option
ir_optim	open the graph optimization	refer toPaddle-Lite

Deplyment examples

Download the model

wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz

Start RPC service

There are mainly three deployment methods：

deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu；
deploy on the cpu server standalone with Paddle-Lite；
deploy on the cpu server standalone without Paddle-Lite.

The first two deployment methods are recommended.

Start the rpc service, deploying on cpu server with Baidu Kunlun chips，and accelerate with Paddle-Lite and Baidu Kunlun xpu.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim

Start the rpc service, deploying on cpu server，and accelerate with Paddle-Lite.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim

Start the rpc service, deploying on cpu server.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292

from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)

Others

Model example and explanation

Some examples are provided below, and other models can be modifed with reference to these examples.

sample name	sample links
fit_a_line	fit_a_line_xpu
resnet	resnet_v2_50_xpu

Note：Supported model lists refer to doc. There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem，please submit Github issue, and we will follow up in real time.

Kunlun chip related reference materials

PaddlePaddle on Baidu Kunlun xpu chips
Deployment on Baidu Kunlun xpu chips using PaddleLite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run_On_XPU_EN.md

Run_On_XPU_EN.md

Paddle Serving Using Baidu Kunlun Chips

Install docker images

Compilation and installation

Compilatiton

Install the wheel package

Request parameters description

Deplyment examples

Download the model

Start RPC service

Others

Model example and explanation

Kunlun chip related reference materials

Files

Run_On_XPU_EN.md

Latest commit

History

Run_On_XPU_EN.md

File metadata and controls

Paddle Serving Using Baidu Kunlun Chips

Install docker images

Compilation and installation

Compilatiton

Install the wheel package

Request parameters description

Deplyment examples

Download the model

Start RPC service

Others

Model example and explanation

Kunlun chip related reference materials