Paddle Multiple Language API/SDK #849

reyoung · 2016-12-13T05:40:04Z

Paddle is a standalone application now, which cannot customize training progress conveniently. The current API of Paddle only support the model inference.

We consider to rewrite current API now and make Paddle as a standard Python library and could easily port to another programming language.

There are several agreements and todos for this feature.

Using standard C99 API instead of SWIG

SWIG API is excellent for Python binding, but it seems not work smoothly for other languages, such as Julia, Go. Make Paddle integrated to other systems easily is an essential requirement for Paddle API.

Only expose GradientMachine.

The GradientMachine is an abstraction class of neural network, which can perform forward/backward on multiple local devices(CPU cores, GPU cards). In the cluster environment, we should provide the same abstraction with some additional configurations about node count, etc.

The GradientMachine will always act as a single thread program. We won't provide API about how to sending data from one GPU to another, how to use many CPUs, etc. We think that API is too low-level, and is not necessary to be exposed.

And there are few rules about GradientMachine-API:

Expose GradientMachine as details as possible.
The ParameterUpdater is exposed in C-API, but not for end-users.

Wrap C-API into a standard Python library.

Python is used widely in neural network domains. We will write a standard Python library as the first language binding.

However, Python library can be considered as a demo only; other language bindings are welcome to contribute.

Possible Python API Demos

Here is a possible Python usage in current design. It would be flux.

import paddle

@paddle.network(
    input_types = {
        'img': dense_vector(784),
        'label': integer_value(10)
    }
)
def mnist_network(img, label):
     hidden1 = fc_layer(input=img, size=200)
     hidden2 = fc_layer(input=hidden2, size=200)
     inference = fc_layer(input=hidden2, size=10, act=SoftmaxActivation())
     cost = classification_cost(input=inferrence, label=label)
     return cost


@mnist_network.train_data(files = ['dataset1.txt', 'dataset2.txt'])
@mnist_network.test_data(files=['dataset_test.txt'])
def provider(filename):
      with open(filename) as f:
          for each_sample in readFromFile(f):
               yield each_sample

if __name__ == '__main__':  #main function.
    network = mnist_network()
    #trainer = network.createClusterTrainer("node0, node1")
    trainer = network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

    for _ in xrange(100):
        trainer.trainOnePass()

Tasks

Step 1. Single Machine Development.

To implement this feature, several tasks should be done.

Step 2. Cluster development.

TBD

The text was updated successfully, but these errors were encountered:

jacquesqiao · 2016-12-19T06:47:29Z

关于API接口封装，有个很重要的问题，就是异常情况如何处理。
目前的是用LOG(FATAL)或者CHECK让程序挂掉。而作为lib，需要专门的错误提示方法，所有的对外提供的API，都需要“统一”的错误处理机制。

对于这个问题，比较主流的方法是返回一个错误状态(bool/int)，并且带上错误信息(string)。封装的语言需要主动对api调用结果进行检查并提取错误信息。

有两个问题需要确定：

1，是否使用exception：
一种方式是内部throw exception，在api调用处catch住，然后做后续处理。
另一种是层层传递执行状态。
2，需要一个status数据结构进行api调用结果的数据交换，以为涉及到资源的分配和释放，使用方法类似下面：

      status = session.NewStatus()
      try:
        session.api(a, b, status)
        if session.GetCode(status) != 0:
          raise RuntimeError(session.GetMessage(status)))
      finally:
        session.DeleteStatus(status)

reyoung · 2016-12-19T06:53:31Z

1、强烈不建议使用Exception，除非有特殊的理由。原因如下

很多第三方语言不支持exception的返回值，同时Paddle应该也只会expose C-API
Exception-Safe的C++ 代码基本上没有人能够写的明白。
- 如果使用裸指针等不符合RAII的C++代码，很容易内存泄露
- 构造函数里面throw exception会导致一个半构造的对象出现。
- etc

2、直接把error结构体作为返回值应该就可以了吧？

typedef struct tagError {
    const char* msg;
    int32_t code;
} Error;


Error api.forward(inArg1, inArg2, &outArg);

jacquesqiao · 2016-12-19T07:17:03Z

我也觉得用exception不太合适，所以这点可以确定了，后面的status差不多就是这个样子了

jacquesqiao · 2016-12-20T02:49:22Z

for the Possible Python API Demos above,

network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

is mainly realized in python with the wrapper of c_api, for example there should be some thing like

  def withSGDOptimizer(network, learning_rate):
        gradientMachine = GradientMachine(network)
        optimizer = SDGOptimizer(learning_rate)
        ...

what about parameter server, what is the best way to wrap ps

reyoung · 2016-12-21T04:32:26Z

Developing Roadmap #959

reyoung · 2017-07-20T12:46:19Z

Use PyBind11 for refactoring

reyoung added the design_doc label Dec 13, 2016

reyoung assigned reyoung, wangkuiyi, gangliao, emailweixu, jacquesqiao and hohdiy Dec 13, 2016

This was referenced Dec 13, 2016

Paddle多语言SDK #258

Closed

Questions regarding structure of Paddle #715

Closed

Remove all global variables in Paddle #852

Closed

jacquesqiao mentioned this issue Dec 13, 2016

Remove LOG(FATAL) and CHECK in Paddle #855

Closed

jacquesqiao mentioned this issue Dec 20, 2016

C-API design survey #955

Closed

reyoung mentioned this issue Dec 21, 2016

希望发布新版C/C++预测SDK #978

Closed

backyes mentioned this issue Dec 22, 2016

C++ 预测接口临时解决方案 #991

Closed

reyoung closed this as completed Jul 20, 2017

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019

update_faq (PaddlePaddle#849)

d167ed2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paddle Multiple Language API/SDK #849

Paddle Multiple Language API/SDK #849

reyoung commented Dec 13, 2016 •

edited by dayhaha

Loading

jacquesqiao commented Dec 19, 2016 •

edited

Loading

reyoung commented Dec 19, 2016

jacquesqiao commented Dec 19, 2016

jacquesqiao commented Dec 20, 2016 •

edited

Loading

reyoung commented Dec 21, 2016

reyoung commented Jul 20, 2017

Paddle Multiple Language API/SDK #849

Paddle Multiple Language API/SDK #849

Comments

reyoung commented Dec 13, 2016 • edited by dayhaha Loading

Using standard C99 API instead of SWIG

Only expose GradientMachine.

Wrap C-API into a standard Python library.

Possible Python API Demos

Tasks

Step 1. Single Machine Development.

Step 2. Cluster development.

jacquesqiao commented Dec 19, 2016 • edited Loading

reyoung commented Dec 19, 2016

jacquesqiao commented Dec 19, 2016

jacquesqiao commented Dec 20, 2016 • edited Loading

reyoung commented Dec 21, 2016

reyoung commented Jul 20, 2017

reyoung commented Dec 13, 2016 •

edited by dayhaha

Loading

jacquesqiao commented Dec 19, 2016 •

edited

Loading

jacquesqiao commented Dec 20, 2016 •

edited

Loading