You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's hard to say without seeing the config. My guess would be that you're training on a single batch/instance, which the model can learn almost immediately.
❓ The question
2024-08-06 09:59:26.181 intern-studio-160750:0 olmo.train:908 INFO [step=1/739328,epoch=0]
optim/total_grad_norm=231.7
train/CrossEntropyLoss=12.18
train/Perplexity=195,153
throughput/total_tokens=1,048,576
throughput/total_training_Gflops=5,103,640
throughput/total_training_log_Gflops=15.45
System/Peak GPU Memory (MB)=46,911
2024-08-06 10:00:05.520 intern-studio-160750:0 olmo.train:908 INFO [step=2/739328,epoch=0]
optim/total_grad_norm=0.0002
train/CrossEntropyLoss=1.7872662283480167e-06
train/Perplexity=1.000
throughput/total_tokens=2,097,152
throughput/total_training_Gflops=10,207,281
throughput/total_training_log_Gflops=16.14
throughput/device/tokens_per_second=26,668
throughput/device/batches_per_second=0.0254
System/Peak GPU Memory (MB)=53,695
2024-08-06 10:00:44.815 intern-studio-160750:0 olmo.train:908 INFO [step=3/739328,epoch=0]
optim/total_grad_norm=7.725906669975302e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=3,145,728
throughput/total_training_Gflops=15,310,922
throughput/total_training_log_Gflops=16.54
throughput/device/tokens_per_second=26,676
throughput/device/batches_per_second=0.0254
2024-08-06 10:01:24.324 intern-studio-160750:0 olmo.train:908 INFO [step=4/739328,epoch=0]
optim/total_grad_norm=2.965892065276421e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=4,194,304
throughput/total_training_Gflops=20,414,563
throughput/total_training_log_Gflops=16.83
throughput/device/tokens_per_second=26,630
throughput/device/batches_per_second=0.0254
2024-08-06 10:02:03.863 intern-studio-160750:0 olmo.train:908 INFO [step=5/739328,epoch=0]
optim/total_grad_norm=1.9301344522659747e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=5,242,880
throughput/total_training_Gflops=25,518,204
throughput/total_training_log_Gflops=17.05
throughput/device/tokens_per_second=26,603
throughput/device/batches_per_second=0.0254
The text was updated successfully, but these errors were encountered: