Figure out where the NaNs in autoencoder training are coming from: #4

robogast · 2021-11-02T11:33:45Z

Source:
2021-11-01/12-14-29/0/lightning_logs/version_8317429

The text was updated successfully, but these errors were encountered:

robogast · 2022-06-10T12:24:35Z

TL;DR: instability probably happens in the VQ-layer, but still unsure what exactly happens.
Increasing commitment loss, and making sure cdist compute_mode is non-mm seems to at least mitigate the issue.
Forcing 32-bit with torch.cuda.amp.autocast(enabled=False) doesn't solve the issue.

robogast changed the title ~~Figure out where the NaNs in training are coming from:~~ Figure out where the NaNs in autoencoder training are coming from: Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out where the NaNs in autoencoder training are coming from: #4

Figure out where the NaNs in autoencoder training are coming from: #4

robogast commented Nov 2, 2021 •

edited

Loading

robogast commented Jun 10, 2022

Figure out where the NaNs in autoencoder training are coming from: #4

Figure out where the NaNs in autoencoder training are coming from: #4

Comments

robogast commented Nov 2, 2021 • edited Loading

robogast commented Jun 10, 2022

robogast commented Nov 2, 2021 •

edited

Loading