adapter-transformers v3.2.0
Based on transformers v4.26.1
New
New model integrations
- Add BEiT integration (@jannik-brinkmann via #428, #439)
- Add GPT-J integration (@ChiragBSavani via #426)
- Add CLIP integration (@calpt via #483)
- Add ALBERT integration (@lenglaender via #488)
- Add BertGeneration (@hSterz via #480)
Misc
- Add support for adapter configuration strings (@calpt via #465, #486)
This enables you to easily configure adapter configs. To create a Pfeiffer adapter with reduction factor 16 you can know usepfeiffer[reduction_factor=16]
. Especially for experiments using different hyperparameters or the example scripts, this can come in handy. Learn more - Add for
Stack
,Parallel
&BatchSplit
composition to prefix tuning (@calpt via #476)
In previousadapter-transformers
versions, you could combine multiple bottleneck adapters. You could use them in parallel or stack them. Now, this is also possible for prefix-tuning adapters. Add multiple prefixes to the same model to combine the functionality of multiple adapters (Stack) or perform several tasks simultaneously (Parallel, BatchSplit) Learn more - Enable parallel sequence generation with adapters (@calpt via #436)
Changed
- Removal of the
MultiLingAdapterArguments
class. Use theAdapterArguments
class andsetup_adapter_training
method instead. Learn more. - Upgrade of underlying transformers version to 4.26.1 (@calpt via #455, @hSterz via #503)
Fixed
- Fixes for GLUE & dependency parsing example script (@calpt via #430, #454)
- Fix access to shared parameters of compacter (e.g. during sequence generation) (@calpt via #440)
- Fix reference to adapter configs in
T5EncoderModel
(@calpt via #437) - Fix DeBERTa prefix tuning with enabled relative attention (@calpt via #451)
- Fix gating for prefix tuning layers (@calpt via #471)
- Fix input to T5 adapter layers (@calpt via #479)
- Fix AdapterTrainer hyperparameter tuning (@dtuit via #482)
- Move loading best adapter to AdapterTrainer class (@MaBeHen via #487)
- Make HuggingFace Hub Mixin work with newer utilities (@Helw150 via #473)
- Only compute fusion reg loss if fusion layer is trained (@calpt via #505)