Gemma: WTE scaling for Adapter and LoRA #1193

Andrei-Aksionov · 2024-03-26T11:32:50Z

Hi there 👋

Apparently I forgot one more thing: WTE scaling in Adapter and LoRA variants for Gemma model.
That explain why when I did fine-tuning with 2b model the loss started from ~12 (in contrast to ~2 after the fix) and why tests failed.

Andrei-Aksionov added 2 commits March 26, 2024 14:14

Update tests for Gemma

29c99fa

Add missing wte scaling to Adapter and LoRA

4422163

Andrei-Aksionov requested review from awaelchli, carmocca and lantiga as code owners March 26, 2024 11:32

awaelchli approved these changes Mar 26, 2024

View reviewed changes

carmocca merged commit c7ae866 into Lightning-AI:main Mar 26, 2024
8 checks passed

Andrei-Aksionov deleted the wte_scaling_for_adapter_and_lora branch March 26, 2024 15:36

rasbt pushed a commit that referenced this pull request Apr 3, 2024

Gemma: WTE scaling for Adapter and LoRA (#1193)

e1a9c90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma: WTE scaling for Adapter and LoRA #1193

Gemma: WTE scaling for Adapter and LoRA #1193

Andrei-Aksionov commented Mar 26, 2024

Gemma: WTE scaling for Adapter and LoRA #1193

Gemma: WTE scaling for Adapter and LoRA #1193

Conversation

Andrei-Aksionov commented Mar 26, 2024