why CrossEntropyLoss is zero,i #692

aizhweiwei · 2024-08-06T02:09:38Z

❓ The question

System/Peak GPU Memory (MB)=6,784

2024-08-06 09:59:26.181 intern-studio-160750:0 olmo.train:908 INFO [step=1/739328,epoch=0]
optim/total_grad_norm=231.7
train/CrossEntropyLoss=12.18
train/Perplexity=195,153
throughput/total_tokens=1,048,576
throughput/total_training_Gflops=5,103,640
throughput/total_training_log_Gflops=15.45
System/Peak GPU Memory (MB)=46,911
2024-08-06 10:00:05.520 intern-studio-160750:0 olmo.train:908 INFO [step=2/739328,epoch=0]
optim/total_grad_norm=0.0002
train/CrossEntropyLoss=1.7872662283480167e-06
train/Perplexity=1.000
throughput/total_tokens=2,097,152
throughput/total_training_Gflops=10,207,281
throughput/total_training_log_Gflops=16.14
throughput/device/tokens_per_second=26,668
throughput/device/batches_per_second=0.0254
System/Peak GPU Memory (MB)=53,695
2024-08-06 10:00:44.815 intern-studio-160750:0 olmo.train:908 INFO [step=3/739328,epoch=0]
optim/total_grad_norm=7.725906669975302e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=3,145,728
throughput/total_training_Gflops=15,310,922
throughput/total_training_log_Gflops=16.54
throughput/device/tokens_per_second=26,676
throughput/device/batches_per_second=0.0254
2024-08-06 10:01:24.324 intern-studio-160750:0 olmo.train:908 INFO [step=4/739328,epoch=0]
optim/total_grad_norm=2.965892065276421e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=4,194,304
throughput/total_training_Gflops=20,414,563
throughput/total_training_log_Gflops=16.83
throughput/device/tokens_per_second=26,630
throughput/device/batches_per_second=0.0254
2024-08-06 10:02:03.863 intern-studio-160750:0 olmo.train:908 INFO [step=5/739328,epoch=0]
optim/total_grad_norm=1.9301344522659747e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=5,242,880
throughput/total_training_Gflops=25,518,204
throughput/total_training_log_Gflops=17.05
throughput/device/tokens_per_second=26,603
throughput/device/batches_per_second=0.0254

The text was updated successfully, but these errors were encountered:

aizhweiwei · 2024-08-06T02:09:47Z

torchrun --nproc_per_node=1 scripts/train.py configs/official/OLMo-0.4B.yaml --save_overwrite

2015aroras · 2024-08-06T16:36:16Z

It's hard to say without seeing the config. My guess would be that you're training on a single batch/instance, which the model can learn almost immediately.

aizhweiwei added the type/question An issue that's a question label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why CrossEntropyLoss is zero,i #692

why CrossEntropyLoss is zero,i #692

aizhweiwei commented Aug 6, 2024

aizhweiwei commented Aug 6, 2024

2015aroras commented Aug 6, 2024

why CrossEntropyLoss is zero,i #692

why CrossEntropyLoss is zero,i #692

Comments

aizhweiwei commented Aug 6, 2024

❓ The question

aizhweiwei commented Aug 6, 2024

2015aroras commented Aug 6, 2024