Weight decay for token embeddings #68
Unanswered
MasterSkepticista
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Why did we not exclude
embedding
weights from weight decay?build-nanogpt/train_gpt2.py
Lines 185 to 186 in 6104ab1
Re: karpathy/minGPT#24 (comment)
This may also explain divergence in longer runs (?)
Beta Was this translation helpful? Give feedback.
All reactions