-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 5th No.40】为 Paddle 新增 ASGD API 中文文档 #6412
【Hackathon 5th No.40】为 Paddle 新增 ASGD API 中文文档 #6412
Conversation
感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-6412.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
映射文档需要让用户能低成本转写过来,多出来的三个参数是不符合论文的吗
| params | parameters | 表示指定优化器需要优化的参数,仅参数名不一致 | | ||
| lr | learning_rate | 学习率,用于参数更新的计算。参数默认值不一致, Pytorch 默认为 `0.0001`, Paddle 默认为 `0.001`,Paddle 需保持与 Pytorch 一致 | | ||
| lambd | - | 衰变项,与 weight_decay 功能重叠,暂无转写方式 | | ||
| alpha | - | eta 更新的 power,暂无转写方式 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个参数不需要吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要
| lr | learning_rate | 学习率,用于参数更新的计算。参数默认值不一致, Pytorch 默认为 `0.0001`, Paddle 默认为 `0.001`,Paddle 需保持与 Pytorch 一致 | | ||
| lambd | - | 衰变项,与 weight_decay 功能重叠,暂无转写方式 | | ||
| alpha | - | eta 更新的 power,暂无转写方式 | | ||
| t0 | - | 开始求平均值的点,暂无转写方式 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个参数不需要吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
| ------------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------- | | ||
| params | parameters | 表示指定优化器需要优化的参数,仅参数名不一致 | | ||
| lr | learning_rate | 学习率,用于参数更新的计算。参数默认值不一致, Pytorch 默认为 `0.0001`, Paddle 默认为 `0.001`,Paddle 需保持与 Pytorch 一致 | | ||
| lambd | - | 衰变项,与 weight_decay 功能重叠,暂无转写方式 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个与weight_decay功能一致的话,有办法替代实现吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不需要转换。这个在 torch 的实现中会用到,在 paddle 的实现中没有这个参数的。
| foreach | - | 是否使用优化器的 foreach 实现。Paddle 无此参数,一般对网络训练结果影响不大,可直接删除 | | ||
| maximize | - | 根据目标最大化参数,而不是最小化。Paddle 无此参数,暂无转写方式 | | ||
| differentiable| - | 是否应通过训练中的优化器步骤进行自动微分。Paddle 无此参数,一般对网络训练结果影响不大,可直接删除 | | ||
| - | batch_num | 完成一个 epoch 所需迭代的次数。 PyTorch 无此参数,Paddle 需要根据样本数据设置 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个可以写下具体的设置方式吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
多出来的三个参数目前看来是冗余的。 |
@zhwesky2010 老师您看还有其它的问题 |
你认为torch的实现有问题,所以目前看来alpha、lambda、to、maximize这四个参数会出现无法转换的情况对吗,也就是如果用户用了这4个参数中任意一个,都无法用paddle实现同等的效果? |
是这样的,这个优化器的一个关键部分是历史梯度信息,它需要历史梯度信息参与到参数更新中去。在现版本的 torch 实现中,历史梯度信息被保存到了 ax 中,但是在参数更新的过程中, ax 并没有起作用。有人反应过这个问题(见 |
@zhwesky2010 不忙的时候再看一下哈⸂⸂⸜(രᴗര )⸝⸃⸃ |
@WintersMontagne10335 那在转写时,这几个参数建议是可以直接删掉吗,对最终的结果影响大吗,或者是不是把torch.optim.ASGD转写成paddle.optimizer.SGD,这里转写都是单纯从结果上来看 |
如果要转写成 ASGD ,建议直接删掉。 |
@zhwesky2010 再看一下哈 |
目前来看这个API应该还是无法转的,底层公式也不太一样,如果torch以后更新了上面这个bug的问题,咱们有可能转吗 |
我会跟进的~~有调整的话,我会做对接。 |
``` | ||
|
||
注:Pytorch 的 ASGD 是有问题的。 | ||
Pytorch 相比 Paddle 支持更多其他参数,具体如下: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那目前这两个API实现差异还是比较大的,对 torch.optim.ASGD
直接转成 paddle.optimizer.ASGD
结果肯定是对不上的,所以在映射文档里写明白原因吧:torch的问题、为何实现不一致、如果使用paddle的ASGD结果会对不上但不一定影响最终收敛,或者自行尝试其他优化器,让用户知道这里有坑不容易对齐。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
然后就按 功能缺失 来处理吧,后面torch如果更新了再调整
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改了一下,您看可以嘛
@zhwesky2010 您看还有需要补充订正的嘛 |
@@ -0,0 +1,83 @@ | |||
## [ 功能缺失 ]torch.optim.ASGD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果你写了参数映射的方式,那就还是按 torch参数更多 吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| t0 | - | 开始求平均值的点,可直接删除 | | ||
| weight_decay | weight_decay | 权重衰减。参数默认值不一致, Pytorch 默认为 `0`, Paddle 默认为 `None`,Paddle 需保持与 Pytorch 一致 | | ||
| foreach | - | 是否使用优化器的 foreach 实现。Paddle 无此参数,一般对网络训练结果影响不大,可直接删除 | | ||
| maximize | - | 根据目标最大化参数,而不是最小化。Paddle 无此参数,可直接删除 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for api docs
@zhwesky2010 改好啦,还有别的要订正嘛 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
为 Paddle 新增 ASGD API 中文文档