WIP: Support basic data parallel #366

shendiaomo · 2020-10-20T11:52:10Z

No description provided.

codecov · 2020-10-26T16:35:35Z

Codecov Report

Merging #366 into develop will decrease coverage by 0.19%.
The diff coverage is 50.00%.

@@             Coverage Diff             @@
##           develop     #366      +/-   ##
===========================================
- Coverage    87.77%   87.57%   -0.20%     
===========================================
  Files           33       34       +1     
  Lines         1505     1513       +8     
===========================================
+ Hits          1321     1325       +4     
- Misses         121      125       +4     
  Partials        63       63

Impacted Files	Coverage Δ
nn/parallel/parallel.go	`50.00% <50.00%> (ø)`

QiJune · 2020-10-27T02:01:06Z

nn/parallel/parallel.go

+//    1. Scatter the input to the given devices,
+//    2. Replicate (deep clone) the model on each device,
+//    3. Evaluate each module with its input on its device,
+//    4. Gather the outputs of each replica into a single output tensor, located on the `outputDevice`.


There are two approaches for data parallelism in for multi-GPU training:

Single-Process Multi-GPU

Per Process Per GPU

PyTorch DistributedDataParallel has proved that Per Process Per GPU is more efficient.

Single-Process Multi-GPU is not the recommended mode for "
"DDP. In this mode, each DDP instance operates on multiple "
"devices and creates multiple module replicas within one "
"process. The overhead of scatter/gather and GIL contention "
"in every forward pass can slow down training. "
"Please consider using one DDP instance per device or per "
"module replica by explicitly setting device_ids or "
"CUDA_VISIBLE_DEVICES.

So, scatter--> parallel apply --> gather is not suggested. Instead, we launch a training process for each device. Each training process does dataloading/forward/backward/allreduce/update individually.

shendiaomo force-pushed the basic_data_parallel branch from cf33bbd to 2d4a9f4 Compare October 26, 2020 10:33

QiJune reviewed Oct 27, 2020

View reviewed changes

shendiaomo added 4 commits November 4, 2020 16:38

Support basic data parallel

38155f8

Call Go Module.Forward from C++

c907c41

Fix lint

789e461

Add test case

784a3b7

shendiaomo force-pushed the basic_data_parallel branch from cb02186 to 784a3b7 Compare November 4, 2020 08:40

shendiaomo added 3 commits November 4, 2020 17:50

Add nilness check and KeepAlive, derive from Module

91ad9da

Fix lint

2140b09

Fix syntax error and CI

3007a5a

shendiaomo force-pushed the basic_data_parallel branch from 1b2b2fa to 3007a5a Compare November 4, 2020 15:06

Fix pointer problems

beb37fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Support basic data parallel #366

WIP: Support basic data parallel #366

shendiaomo commented Oct 20, 2020

codecov bot commented Oct 26, 2020 •

edited

Loading

QiJune Oct 27, 2020

QiJune Oct 27, 2020

WIP: Support basic data parallel #366

Are you sure you want to change the base?

WIP: Support basic data parallel #366

Conversation

shendiaomo commented Oct 20, 2020

codecov bot commented Oct 26, 2020 • edited Loading

Codecov Report

QiJune Oct 27, 2020

Choose a reason for hiding this comment

QiJune Oct 27, 2020

Choose a reason for hiding this comment

codecov bot commented Oct 26, 2020 •

edited

Loading