Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paddle Multiple Language API/SDK #849

Closed
10 tasks
reyoung opened this issue Dec 13, 2016 · 6 comments
Closed
10 tasks

Paddle Multiple Language API/SDK #849

reyoung opened this issue Dec 13, 2016 · 6 comments

Comments

@reyoung
Copy link
Collaborator

reyoung commented Dec 13, 2016

Paddle is a standalone application now, which cannot customize training progress conveniently. The current API of Paddle only support the model inference.

We consider to rewrite current API now and make Paddle as a standard Python library and could easily port to another programming language.

There are several agreements and todos for this feature.

Using standard C99 API instead of SWIG

SWIG API is excellent for Python binding, but it seems not work smoothly for other languages, such as Julia, Go. Make Paddle integrated to other systems easily is an essential requirement for Paddle API.

Only expose GradientMachine.

The GradientMachine is an abstraction class of neural network, which can perform forward/backward on multiple local devices(CPU cores, GPU cards). In the cluster environment, we should provide the same abstraction with some additional configurations about node count, etc.

The GradientMachine will always act as a single thread program. We won't provide API about how to sending data from one GPU to another, how to use many CPUs, etc. We think that API is too low-level, and is not necessary to be exposed.

And there are few rules about GradientMachine-API:

  • Expose GradientMachine as details as possible.
  • The ParameterUpdater is exposed in C-API, but not for end-users.

Wrap C-API into a standard Python library.

Python is used widely in neural network domains. We will write a standard Python library as the first language binding.

However, Python library can be considered as a demo only; other language bindings are welcome to contribute.


Possible Python API Demos

Here is a possible Python usage in current design. It would be flux.

import paddle

@paddle.network(
    input_types = {
        'img': dense_vector(784),
        'label': integer_value(10)
    }
)
def mnist_network(img, label):
     hidden1 = fc_layer(input=img, size=200)
     hidden2 = fc_layer(input=hidden2, size=200)
     inference = fc_layer(input=hidden2, size=10, act=SoftmaxActivation())
     cost = classification_cost(input=inferrence, label=label)
     return cost


@mnist_network.train_data(files = ['dataset1.txt', 'dataset2.txt'])
@mnist_network.test_data(files=['dataset_test.txt'])
def provider(filename):
      with open(filename) as f:
          for each_sample in readFromFile(f):
               yield each_sample

if __name__ == '__main__':  #main function.
    network = mnist_network()
    #trainer = network.createClusterTrainer("node0, node1")
    trainer = network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

    for _ in xrange(100):
        trainer.trainOnePass()

Tasks

Step 1. Single Machine Development.

To implement this feature, several tasks should be done.

  • Remove all global variables in Paddle. Most of them are command line flags.

  • Find a way not to core-dump when log(FATAL) or CHECK error.

    • Only not to exit program is not enough, we should also recover the process.
    • @hohdiy @jacquesqiao
  • Expose C-API about

    • Paddle Matrix/SparseMatrix/Vector with unit tests.
      • It is used for feed data to GradientMachine. So only get/set method should be exposed. The calculation methods are not urgent now.
    • Paddle Parameter/Argument with unit tests.
      • It is used for feed data, get parameter, etc.
    • Optimizers, parameter updaters with unit tests.
      • optimizers such as adam, sgd.
      • parameter updaters should be exposed from C++, or reimplement them in other languages should be discussed.
    • Expose GradientMachines with unit tests.
  • Python Library [should be parallel with C-API exposion]

    • Python Matrix/SparseMatrix/Vector with unit tests.
      * exchange data with NumPy.
    • Parameter/Arguments Python API with unit tests.
    • Optimizers, Paramater Updaters in Python.
    • GradientMachines in Python.

Step 2. Cluster development.

TBD

@jacquesqiao
Copy link
Member

jacquesqiao commented Dec 19, 2016

关于API接口封装,有个很重要的问题,就是异常情况如何处理。
目前的是用LOG(FATAL)或者CHECK让程序挂掉。而作为lib,需要专门的错误提示方法,所有的对外提供的API,都需要“统一”的错误处理机制。

对于这个问题,比较主流的方法是返回一个错误状态(bool/int),并且带上错误信息(string)。封装的语言需要主动对api调用结果进行检查并提取错误信息。

有两个问题需要确定:

  • 1,是否使用exception:
    一种方式是内部throw exception,在api调用处catch住,然后做后续处理。
    另一种是层层传递执行状态。

  • 2,需要一个status数据结构进行api调用结果的数据交换,以为涉及到资源的分配和释放,使用方法类似下面:

      status = session.NewStatus()
      try:
        session.api(a, b, status)
        if session.GetCode(status) != 0:
          raise RuntimeError(session.GetMessage(status)))
      finally:
        session.DeleteStatus(status)

@reyoung
Copy link
Collaborator Author

reyoung commented Dec 19, 2016

1、强烈不建议使用Exception,除非有特殊的理由。原因如下

  • 很多第三方语言不支持exception的返回值,同时Paddle应该也只会expose C-API
  • Exception-Safe的C++ 代码基本上没有人能够写的明白。
    • 如果使用裸指针等不符合RAII的C++代码,很容易内存泄露
    • 构造函数里面throw exception会导致一个半构造的对象出现。
    • etc

2、直接把error结构体作为返回值应该就可以了吧?

typedef struct tagError {
    const char* msg;
    int32_t code;
} Error;


Error api.forward(inArg1, inArg2, &outArg);

@jacquesqiao
Copy link
Member

我也觉得用exception不太合适,所以这点可以确定了,后面的status差不多就是这个样子了

@jacquesqiao
Copy link
Member

jacquesqiao commented Dec 20, 2016

for the Possible Python API Demos above,

network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

is mainly realized in python with the wrapper of c_api, for example there should be some thing like

  def withSGDOptimizer(network, learning_rate):
        gradientMachine = GradientMachine(network)
        optimizer = SDGOptimizer(learning_rate)
        ...

what about parameter server, what is the best way to wrap ps

@reyoung
Copy link
Collaborator Author

reyoung commented Dec 21, 2016

Developing Roadmap #959

@reyoung
Copy link
Collaborator Author

reyoung commented Jul 20, 2017

Use PyBind11 for refactoring

@reyoung reyoung closed this as completed Jul 20, 2017
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants