[AssertionError] 训练很久之后突然报错怎么排查? #40
Answered
by
Equim-chan
Pragmatism0220
asked this question in
Q&A
-
报错信息 File "/home/user/app/mortal-v3/mortal/train.py", line 413, in <module>
main()
File "/home/user/app/mortal-v3/mortal/train.py", line 391, in main
train()
File "/home/user/app/mortal-v3/mortal/train.py", line 372, in train
train_epoch()
File "/home/user/app/mortal-v3/mortal/train.py", line 208, in train_epoch
assert masks[range(batch_size), actions].all()
AssertionError
TRAIN: 33%|#########3 | 133/400 [01:02<02:05, 2.13batch/s] 大概是训练了一天多之后遇到这个问题,怀疑是某个脏数据引起的?但是看了下源代码,数据集的信息貌似已经特征化了,有没有什么办法定位到错误文件呢? try:
assert masks[range(batch_size), actions].all()
except AssertionError as e:
logging.error(str(e))
with open('error.log', 'a') as f_err:
f_err.write('error at %s.\n' % str(datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
else:
q_target_mc = gamma ** steps_to_done * kyoku_rewards
# ... 省略,直到下面
steps += 1
idx += 1 |
Beta Was this translation helpful? Give feedback.
Answered by
Equim-chan
Feb 26, 2023
Replies: 1 comment 1 reply
-
用 |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
Pragmatism0220
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
用
validate_logs
,它就是做这个的。