Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma: WTE scaling for Adapter and LoRA #1193

Merged

Conversation

Andrei-Aksionov
Copy link
Collaborator

Hi there 👋

Apparently I forgot one more thing: WTE scaling in Adapter and LoRA variants for Gemma model.
That explain why when I did fine-tuning with 2b model the loss started from ~12 (in contrast to ~2 after the fix) and why tests failed.

@carmocca carmocca merged commit c7ae866 into Lightning-AI:main Mar 26, 2024
8 checks passed
@Andrei-Aksionov Andrei-Aksionov deleted the wte_scaling_for_adapter_and_lora branch March 26, 2024 15:36
rasbt pushed a commit that referenced this pull request Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants