GPT-2 (124M) reproduction time discrepancy #75

PingleSlayer · 2024-08-16T15:01:40Z

PingleSlayer
Aug 16, 2024

Hi there,

I noticed a small discrepancy between the README files of two repositories by @karpathy, and I'm hoping to get some clarification.

In the README of the karpathy/build-nanogpt repository, it mentions:

"...While the GPT-2 (124M) model probably trained for quite some time back in the day (2019, ~5 years ago), today, reproducing it is a matter of ~1hr and ~$10...."

However, in the README of the karpathy/nanoGPT repository, it states:

"...Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training...."

These statements seem to be at odds with each other, particularly regarding the training time. The first suggests the model can be trained in about an hour, while the second indicates it takes approximately four days.

Could someone shed some light on the difference in these training times? Is it due to different datasets, model configurations, or perhaps something else?

Thanks in advance for any clarification!

victorbjorsvik · 2024-11-04T19:20:23Z

victorbjorsvik
Nov 4, 2024

My understanding is that the model used in the "build-nanogpt" repo is the one Andrej built during his GPT2 the movie on youtube. He did indeed train that for 1.7 hrs (could be reduced if you compile the model and ignore generation and HellaSwag evaluation during training) and this model "beat" the OpenAI GPT2 124M checkpoint for HellaSwag after training for said ~2 hrs.

The model in nanoGPT is a more refined version, and was used as the template for the llm.c implementation. This was trained on OpenWebText and appearently trained for way longer. My intution though, is that this model will perform significantly better - especially considering the model from the build-nanogpt repo was more ad-hoc for the youtube video

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-2 (124M) reproduction time discrepancy #75

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GPT-2 (124M) reproduction time discrepancy #75

PingleSlayer Aug 16, 2024

Replies: 1 comment

victorbjorsvik Nov 4, 2024

PingleSlayer
Aug 16, 2024

victorbjorsvik
Nov 4, 2024