Skip to content

Latest commit

 

History

History
122 lines (107 loc) · 5.81 KB

Run_On_XPU_EN.md

File metadata and controls

122 lines (107 loc) · 5.81 KB

Paddle Serving Using Baidu Kunlun Chips

(English|简体中文)

Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips (such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve the deployment capability on various heterogeneous hardware servers in the future.

Install docker images

We recommend using the docker deployment service. In the xpu environment, you can refer to the Docker image document to install the xpu image, and further complete tasks such as construction, installation, and deployment.

Compilation and installation

Refer to compile document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.

Compilatiton

  • Compile the Serving Server
cd Serving
mkdir -p server-build-arm && cd server-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DSERVER=ON ..
make -j10

You can run make install to produce the target in ./output directory. Add -DCMAKE_INSTALL_PREFIX=./output to specify the output path to CMake command shown above. Please specify -DWITH_MKL=ON on Intel CPU platform with AVX2 support.

  • Compile the Serving Client
mkdir -p client-build-arm && cd client-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DCLIENT=ON ..

make -j10
  • Compile the App
cd Serving 
mkdir -p app-build-arm && cd app-build-arm

cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
    -DPYTHON_EXECUTABLE=/usr/bin/python \
    -DWITH_PYTHON=ON \
    -DWITH_LITE=ON \
    -DWITH_XPU=ON \
    -DAPP=ON ..

make -j10

Install the wheel package

After the compilations stages above, the whl package will be generated in python/dist/ under the specific temporary directories. For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run pip install -u python/dist/*.whl to install the package.

Request parameters description

In order to deploy serving service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment.

param param description about
use_lite using Paddle-Lite Engine use the inference capability of Paddle-Lite
use_xpu using Baidu Kunlun for inference need to be used with the use_lite option
ir_optim open the graph optimization refer toPaddle-Lite

Deplyment examples

Download the model

wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz

Start RPC service

There are mainly three deployment methods:

  • deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
  • deploy on the cpu server standalone with Paddle-Lite;
  • deploy on the cpu server standalone without Paddle-Lite.

The first two deployment methods are recommended.

Start the rpc service, deploying on cpu server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim

Start the rpc service, deploying on cpu server,and accelerate with Paddle-Lite.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim

Start the rpc service, deploying on cpu server.

python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292

from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)

Others

Model example and explanation

Some examples are provided below, and other models can be modifed with reference to these examples.

sample name sample links
fit_a_line fit_a_line_xpu
resnet resnet_v2_50_xpu

Note:Supported model lists refer to doc. There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit Github issue, and we will follow up in real time.

Kunlun chip related reference materials