Post-merge file & documentation fixes

adapter-hub · Mar 23, 2022 · d8bcf37 · d8bcf37
1 parent fb2beba
commit d8bcf37
Show file tree

Hide file tree

Showing 18 changed files with 80 additions and 158 deletions.
diff --git a/README.md b/README.md
@@ -60,7 +60,7 @@ To get started with adapters, refer to these locations:
 - **[Colab notebook tutorials](https://github.com/Adapter-Hub/adapter-transformers/tree/master/notebooks)**, a series notebooks providing an introduction to all the main concepts of (adapter-)transformers and AdapterHub
 - **https://docs.adapterhub.ml**, our documentation on training and using adapters with _adapter-transformers_
 - **https://adapterhub.ml** to explore available pre-trained adapter modules and share your own adapters
-- **[Examples folder](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples)** of this repository containing HuggingFace's example training scripts, many adapted for training adapters
+- **[Examples folder](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples/pytorch)** of this repository containing HuggingFace's example training scripts, many adapted for training adapters
 
 ## Implemented Methods
 

diff --git a/adapter_docs/adapter_composition.md b/adapter_docs/adapter_composition.md
@@ -14,7 +14,7 @@ model.active_adapters = "adapter_name"
 
 Note that we also could have used `model.set_active_adapters("adapter_name")` which does the same.
 
-```eval_rst
+```{eval-rst}
 .. important::
     ``active_adapters`` defines which of the available adapters are used in each forward and backward pass through the model. This means:
 
@@ -39,7 +39,7 @@ They are presented in more detail in the following.
 
 ## `Stack`
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/stacking_adapters.png
     :height: 300
     :align: center
@@ -71,7 +71,7 @@ For backwards compatibility, you can still do this, although it is recommended t
 
 ## `Fuse`
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/Fusion.png
     :height: 300
     :align: center
@@ -98,7 +98,7 @@ model.add_adapter_fusion(["d", "e", "f"])
 model.active_adapters = ac.Fuse("d", "e", "f")
 ```
 
-```eval_rst
+```{eval-rst}
 .. important::
     Fusing adapters with the ``Fuse`` block only works successfully if an adapter fusion layer combining all of the adapters listed in the ``Fuse`` has been added to the model.
     This can be done either using ``add_adapter_fusion()`` or ``load_adapter_fusion()``.
@@ -111,7 +111,7 @@ For backwards compatibility, you can still do this, although it is recommended t
 
 ## `Split`
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/splitting_adapters.png
     :height: 300
     :align: center
@@ -159,7 +159,7 @@ model.active_adapters = ac.BatchSplit("i", "k", "l", batch_sizes=[2, 1, 2])
 
 ## `Parallel`
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/parallel.png
     :height: 300
     :align: center
@@ -206,7 +206,7 @@ model.active_adapters = ac.Stack("a", ac.Split("b", "c", split_index=60))
 
 However, combinations of adapter composition blocks cannot be arbitrarily deep. All currently supported possibilities are visualized in the figure below. 
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/adapter_blocks_nesting.png
     :height: 300
     :align: center

diff --git a/adapter_docs/classes/adapter_config.rst b/adapter_docs/classes/adapter_config.rst
@@ -28,6 +28,12 @@ Single (bottleneck) adapters
 .. autoclass:: transformers.ParallelConfig
     :members:
 
+.. autoclass:: transformers.CompacterConfig
+    :members:
+
+.. autoclass:: transformers.CompacterPlusPlusConfig
+    :members:
+
 Prefix Tuning
 ~~~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/adapter_docs/conf.py b/adapter_docs/conf.py
@@ -6,8 +6,6 @@
 import os
 import sys
 
-from recommonmark.transform import AutoStructify
-
 
 # -- Path setup --------------------------------------------------------------
 
@@ -90,5 +88,4 @@
 
 def setup(app):
     app.add_config_value("recommonmark_config", {"enable_eval_rst": True}, True)
-    app.add_transform(AutoStructify)
     app.add_css_file("custom.css")
diff --git a/adapter_docs/contributing.md b/adapter_docs/contributing.md
@@ -1,6 +1,6 @@
 # Contributing to AdapterHub
 
-```eval_rst
+```{eval-rst}
 .. note::
     This document describes how to contribute adapters via the AdapterHub `Hub repository <https://github.com/adapter-hub/hub>`_. See `Integration with HuggingFace's Model Hub <huggingface_hub.html>`_ for uploading adapters via the HuggingFace Model Hub.
 ```
@@ -49,7 +49,7 @@ Let's go through the upload process step by step:
     ```
     `adapter-hub-cli` will search for available adapters in the path you specify and interactively lead you through the packing process.
 
-    ```eval_rst
+    ```{eval-rst}
     .. note::
         The configuration of the adapter is specified by an identifier string in the YAML file. This string should refer to an adapter architecture available in the Hub. If you use a new or custom architecture, make sure to also `add an entry for your architecture <#add-a-new-adapter-architecture>`_ to the repo. 
     ```

diff --git a/adapter_docs/huggingface_hub.md b/adapter_docs/huggingface_hub.md
@@ -1,6 +1,6 @@
 # Integration with HuggingFace's Model Hub
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/hfhub.svg
     :align: center
     :alt: HuggingFace Hub logo.
@@ -53,7 +53,7 @@ For more options and information, e.g. for managing models via the CLI and Git,
     This will create a repository `my-awesome-adapter` under your username, generate a default adapter card as `README.md` and upload the adapter named `awesome_adapter` together with the adapter card to the new repository.
     `adapterhub_tag` and `datasets_tag` provide additional information for categorization.
 
-    ```eval_rst
+    ```{eval-rst}
     .. important::
         All adapters uploaded to HuggingFace's Model Hub are automatically also listed on AdapterHub.ml. Thus, for better categorization, either ``adapterhub_tag`` or ``datasets_tag`` is required when uploading a new adapter to the Model Hub.
 

diff --git a/adapter_docs/installation.md b/adapter_docs/installation.md
@@ -3,7 +3,7 @@
 Our *adapter-transformers* package is a drop-in replacement for Huggingface's *transformers* library.
 It currently supports Python 3.6+ and PyTorch 1.3.1+. You will have to [install PyTorch](https://pytorch.org/get-started/locally/) first. 
 
-```eval_rst
+```{eval-rst}
 .. important::
     ``adapter-transformers`` is a direct fork of ``transformers``.
     This means our package includes all the awesome features of HuggingFace's original package plus the adapter implementation.

diff --git a/adapter_docs/loading.md b/adapter_docs/loading.md
@@ -117,7 +117,7 @@ The identifier string used to find a matching adapter follows a format consistin
 
 An example of a full identifier following this format might look like `qa/squad1.1@example-org`.
 
-```eval_rst
+```{eval-rst}
 .. important::
     In many cases, you don't have to give the full string identifier with all three components to successfully load an adapter from the Hub. You can drop the `<username>` you don't care about the uploader of the adapter.  Also, if the resulting identifier is still unique, you can drop the ``<task>`` or the ``<subtask>``. So, ``qa/squad1.1``, ``squad1.1`` or ``squad1.1@example-org`` all may be valid identifiers.
 ```

diff --git a/adapter_docs/model_overview.md b/adapter_docs/model_overview.md
@@ -3,7 +3,7 @@
 This page gives an overview of the Transformer models currently supported by `adapter-transformers`.
 The table below further shows which model architectures support which adaptation methods and which features of `adapter-transformers`.
 
-```eval_rst
+```{eval-rst}
 .. note::
     Each supported model architecture X typically provides a class ``XAdapterModel`` for usage with ``AutoAdapterModel``.
     Additionally, it is possible to use adapters with the model classes already shipped with HuggingFace Transformers.

diff --git a/adapter_docs/overview.md b/adapter_docs/overview.md
@@ -35,7 +35,7 @@ config = ... # config class deriving from AdapterConfigBase
 model.add_adapter("name", config=config)
 ```
 
-```eval_rst
+```{eval-rst}
 .. important::
     In literature, different terms are used to refer to efficient fine-tuning methods.
     The term "adapter" is usually only applied to bottleneck adapter modules.
@@ -67,7 +67,7 @@ $$
 A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`AdapterConfig`](transformers.AdapterConfig).
 
 
-```eval_rst
+```{eval-rst}
 .. figure:: img/architecture.png
     :width: 350
     :align: center
@@ -120,7 +120,7 @@ model.add_adapter("lang_adapter", config=config)
 _Papers:_
 - [MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer](https://arxiv.org/pdf/2005.00052.pdf) (Pfeiffer et al., 2020)
 
-```eval_rst
+```{eval-rst}
 .. note::
     V1.x of adapter-transformers made a distinction between task adapters (without invertible adapters) and language adapters (with invertible adapters) with the help of the ``AdapterType`` enumeration.
     This distinction was dropped with v2.x.
@@ -171,7 +171,7 @@ for a PHM layer by specifying `use_phm=True` in the config.
 The PHM layer has the following additional properties: `phm_dim`, `shared_phm_rule`, `factorized_phm_rule`, `learn_phm`, 
 `factorized_phm_W`, `shared_W_phm`, `phm_c_init`, `phm_init_range`, `hypercomplex_nonlinearity`
 
-For more information check out the [AdapterConfig](classes/adapter_config.html#transformers.AdapterConfig) class.
+For more information check out the [`AdapterConfig`](transformers.AdapterConfig) class.
 
 To add a Compacter to your model you can use the predefined configs:
 ```python

diff --git a/adapter_docs/prediction_heads.md b/adapter_docs/prediction_heads.md
@@ -3,7 +3,7 @@
 This section gives an overview how different prediction heads can be used together with adapter modules and how pre-trained adapters can be distributed side-by-side with matching prediction heads in AdapterHub.
 We will take a look at the `AdapterModel` classes (e.g. `BertAdapterModel`) introduced by adapter-transformers, which provide **flexible** support for prediction heads, as well as models with **static** heads provided out-of-the-box by HuggingFace Transformers (e.g. `BertForSequenceClassification`).
 
-```eval_rst
+```{eval-rst}
 .. tip::
     We recommend to use the `AdapterModel classes <#adaptermodel-classes>`_ whenever possible. 
     They have been created specifically for working with adapters and provide more flexibility.
@@ -37,7 +37,7 @@ Since we gave the task adapter the same name as our head, we can easily identify
 The call to `set_active_adapters()` in the second line tells our model to use the adapter - head configuration we specified by default in a forward pass.
 At this point, we can start to [train our setup](training.md).
 
-```eval_rst
+```{eval-rst}
 .. note::
     The ``set_active_adapters()`` will search for an adapter and a prediction head with the given name to be activated.
     Alternatively, prediction heads can also be activated explicitly (i.e. without adapter modules).
@@ -87,7 +87,7 @@ In case the classes match, our prediction head weights will be automatically loa
 
 ## Automatic conversion 
 
-```eval_rst
+```{eval-rst}
 .. important::
     Although the two prediction head implementations serve the same use case, their weights are *not* directly compatible, i.e. you cannot load a head created with ``AutoAdapterModel`` into a model of type ``AutoModelForSequenceClassification``.
     There is however an automatic conversion to model classes with flexible heads.

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
@@ -6,7 +6,7 @@ Currently, *adapter-transformers* adds adapter components to the PyTorch impleme
 For working with adapters, a couple of methods for creation (`add_adapter()`), loading (`load_adapter()`), 
 storing (`save_adapter()`) and deletion (`delete_adapter()`) are added to the model classes. In the following, we will briefly go through some examples.
 
-```eval_rst
+```{eval-rst}
 .. note::
     This document focuses on the adapter-related functionalities added by *adapter-transformers*.
     For a more general overview of the *transformers* library, visit

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
@@ -47,7 +47,7 @@ if task_name not in model.config.adapters:
 model.train_adapter(task_name)
 ```
 
-```eval_rst
+```{eval-rst}
 .. important::
     The most crucial step when training an adapter module is to freeze all weights in the model except for those of the
     adapter. In the previous snippet, this is achieved by calling the ``train_adapter()`` method which disables training
@@ -90,12 +90,12 @@ python run_glue.py \
 
 The important flag here is `--train_adapter` which switches from fine-tuning the full model to training an adapter module for the given GLUE task.
 
-```eval_rst
+```{eval-rst}
 .. tip::
     Adapter weights are usually initialized randomly. That is why we require a higher learning rate. We have found that a default adapter learning rate of ``1e-4`` works well for most settings.
 ```
 
-```eval_rst
+```{eval-rst}
 .. tip::
     Depending on your data set size you might also need to train longer than usual. To avoid overfitting you can evaluating the adapters after each epoch on the development set and only save the best model.
 ```
@@ -129,7 +129,7 @@ python run_mlm.py \
 We provide an example for training _AdapterFusion_ ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00247)) on the GLUE dataset: [run_fusion_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/adapterfusion/run_fusion_glue.py). 
 You can adapt this script to train AdapterFusion with different pre-trained adapters on your own dataset.
 
-```eval_rst
+```{eval-rst}
 .. important::
     AdapterFusion on a target task is trained in a second training stage, after independently training adapters on individual tasks.
     When setting up a fusion architecture on your model, make sure to load the pre-trained adapter modules to be fused using ``model.load_adapter()`` before adding a fusion layer.
@@ -180,7 +180,7 @@ trainer = AdapterTrainer(
         data_collator=data_collator,
     )
 ```
-```eval_rst
+```{eval-rst}
 .. tip::
     When you migrate from the previous versions, which use the Trainer class for adapter training and fully fine-tuning, note that the 
     specialized AdapterTrainer class does not have the parameters `do_save_full_model`, `do_save_adapters` and `do_save_adapter_fusion`.

diff --git a/adapter_docs/v2_transition.md b/adapter_docs/v2_transition.md
@@ -106,7 +106,7 @@ model.active_adapters = "awesome_adapter"
 model(**input_data)
 ```
 
-```eval_rst
+```{eval-rst}
 .. note::
     Version 2.0.0 temporarily removed the ``adapter_names`` parameter entirely.
     Due to user feedback regarding limitations of the ``active_adapters`` property in multi-threaded contexts,

diff --git a/examples/README.md b/examples/README.md