Skip to content

Commit

Permalink
Metrics implementation (#27)
Browse files Browse the repository at this point in the history
* Test for actions. (main unchanged)

* Code formatting.

* Jury metrics added.

* Test for actions. (main unchanged)

* Code formatting.

* Jury metrics added.

* JuryMetric class added registered.

* Metric class implemented, supported through Jury.
- DataAdapter reworked to require "split" parameter on call to ensure label_list contains validation samples (to be evaluated). The use for the user is seamless (nothing additional required by user), only changed on test cases.
- Test cases changed acordingly.

* QA Adapter, appending answers changed to list from str.

* jury added to requirements.txt (>=2.0.0).

* Update requirements.txt

* MetadataHandler introduced to save meta data from samples required for metric computation.

* (Planning Phase (PP: No test implemented yet.)) MetadataHandler component implemented.
- Test cases changed accordingly.
- Associated components changed accordingly.

* Small refactors for metadata handlers.

* Updates from main branch (conflicts resolved).
- Metadata handlers are introduced as handling metadata for metric evaluation (for tasks that require additional metadata for final prediction out.)
- Bug fix: `qa_id` for FilteredInstance is changed to type str (from int) on question_answering_processor.py.
- New requirement: jury>2.1.0.

* Code formatting.

* MetadataHandlerForPosTagging added for pos-tag example.
- Seqeval class is removed from metrics.py.
- Docstring added for MetadataHandler.

* Code formatting.

* README.md updated.
- Unused imports removed.

* Updates from main.
- Name changed as metric handler.

* Required changes for metric handler.

* Test for MetricHandler (default implementation) added.

* Metric handler test fixtures.

* trapper.__init__ updated.
- README.md updated.

* Question answering notebook updated.

* Unused class removed.

* Updates from the reviews.
- README.md updated.

* setup.py updated.

* setup.py corrected.

* Updates from review.

* README.md update.

* README.md update.

* Updates from review.

* Updates from review.

* Updates from reviews.
- Unused import removed in question_answering_adapter.py.
- Docstring added to question_answering_handler.py.
- Parts regarding metric handler updated in README.md.
- version updated to "0.0.5" from "0.0.4".

* Updated structure of metric handlers.
- MetricHandler is divided to MetricInputHandler and MetricOutputHandler to handling input for metrics and manipulating resulting output of metric computation respectively.
- README.md updates.
- Docstrings updated.
- Test cases updated according to the changes.

* version set back to 0.0.4 (from 0.0.5)

* Requested changes from review.

* Add `Why You Should Use Trapper` section to the README.md

* Update the `Why You Should Use Trapper` section

* Minor update

Co-authored-by: cemilcengiz <[email protected]>
  • Loading branch information
devrimcavusoglu and cemilcengiz authored Nov 9, 2021
1 parent a45ce70 commit 13a2829
Show file tree
Hide file tree
Showing 47 changed files with 744 additions and 244 deletions.
131 changes: 104 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
</p>

Trapper is an NLP library that aims to make it easier to train transformer based
models on downstream tasks. It wraps the HuggingFace's `transformers` library to
models on downstream tasks. It
wraps [huggingface/transformers](http://github.com/huggingface/transformers) to
provide the transformer model implementations and training mechanisms. It defines
abstractions with base classes for common tasks encountered while using transformer
models. Additionally, it provides a dependency-injection mechanism and allows
Expand All @@ -24,13 +25,34 @@ changing the existing code. These features foster code reuse, less boiler-plate
code, as well as repeatable and better documented training experiments which is
crucial in machine learning.

## Why You Should Use Trapper

- You have been a `transformers` user for quite some time now. However, you started
to feel that some computation steps could be standardized through new
abstractions. You wish to reuse the scripts you write for data processing,
post-processing etc with different models/tokenizers easily. You would like to
separate the code from the experiment details, mix and match components through
configuration files while keeping your codebase clean and free of duplication.


- You are an `AllenNLP` user who is really happy with the dependency-injection
system, well-defined abstractions and smooth workflow. However, you would like to
use the latest transformer models without having to wait for the core developers
to integrate them. Moreover, the `transformers` community is scaling up rapidly,
and you would like to join the party while still enjoying an `AllenNLP` touch.


- You are an NLP researcher / practitioner, and you would like to give a shot to a
library aiming to support state-of-the-art models along with datasets, metrics and
more in unified APIs.

## Key Features

### Compatibility with HuggingFace Transformers

**trapper extends transformers!**

We implement the trapper components by trying to use the available components of the
While implementing the components of trapper, we try to reuse the classes from the
transformers library as much as we can. For example, trapper uses the models, and
the trainer as they are in transformers. This makes it easy to use the models
trained with trapper on other projects or libraries that depend on transformers
Expand All @@ -42,46 +64,60 @@ pipeline (e.g. for training).

### Dependency Injection and Training Based on Configuration Files

We use `allennlp`'s registry mechanism to provide dependency injection and enable
reading the experiment details from training configuration files in `json`
We use the registry mechanism of [AllenNLP](http://github.com/allenai/allennlp) to
provide dependency injection and enable reading the experiment details from training
configuration files in `json`
or `jsonnet` format. You can look at the
[allennlp guide on dependency injection](https://guide.allennlp.org/using-config-files)
[AllenNLP guide on dependency injection](https://guide.allennlp.org/using-config-files)
to learn more about how the registry system and dependency injection works as well
as how to write configuration files. In addition, we strongly recommend reading the
remaining parts of the [allennlp guide](https://guide.allennlp.org/)
remaining parts of the [AllenNLP guide](https://guide.allennlp.org/)
to learn more about its design philosophy, the importance of abstractions etc.
(especially Part2: Abstraction, Design and Testing). As a warning, please note that
we do not use allennlp's abstractions and base classes in general, which means you
can not mix and match the trapper's and allennlp's components. Instead, we just use
we do not use AllenNLP's abstractions and base classes in general, which means you
can not mix and match the trapper's and AllenNLP's components. Instead, we just use
the class registry and dependency injection mechanisms and only adapt its very
limited set of components, first by wrapping and registering them as trapper
components. For example, we use the optimizers from allennlp since we can
components. For example, we use the optimizers from AllenNLP since we can
conveniently do so without hindering our full compatibility with transformers.

### Full Integration with HuggingFace datasets

In trapper, we officially use the format of the datasets from the HuggingFace's
`datasets` library and provide full integration with it. You can directly use all
datasets published in [datasets hub](https://huggingface.co/datasets) without doing
any extra work. You can write the dataset name and extra loading arguments (if there
are any) in your training config file, and trapper will automatically download the
dataset and pass it to the trainer. If you have a local or private dataset, you can
still use it after converting it to the HuggingFace `datasets` format by writing a
dataset loading script as explained
### Full Integration with HuggingFace Datasets

In trapper, we officially use the format of the datasets
from [datasets](http://github.com/huggingface/datasets) and provide full integration
with it. You can directly use all datasets published
in [datasets hub](https://huggingface.co/datasets) without doing any extra work. You
can write the dataset name and extra loading arguments (if there are any) in your
training config file, and trapper will automatically download the dataset and pass
it to the trainer. If you have a local or private dataset, you can still use it
after converting it to the HuggingFace `datasets` format by writing a dataset
loading script as explained
[here](https://huggingface.co/docs/datasets/dataset_script.html).

### Support for Metrics through Jury

Trapper supports the common NLP metrics through
[jury](https://github.com/obss/jury). Jury is an NLP library dedicated to provide
metric implementations by adopting and extending the datasets library. For metric
computation during training you can use jury style metric
instantiation/configuration to set up on your trapper configuration file to compute
metrics on the fly on eval dataset with a specified `eval_steps` value. If your
desired metric is not yet available on jury or datasets, you can still create your
own by extending `trapper.Metric` and utilizing either
`jury.Metric` or `datasets.Metric` for handling larger set of cases on predictions.

### Abstractions and Base Classes

Following allennlp, we implement our own registrable base classes to abstract away
Following AllenNLP, we implement our own registrable base classes to abstract away
the common operations for data processing and model training.

* Data reading and preprocessing base classes including

- The classes to be used directly: `DatasetReader`, `DatasetLoader`
and `DataCollator`.

- The classes that you may need to extend: `LabelMapper`,`DataProcessor`,
and `DataAdapter`.
- The classes that you may need to extend: `LabelMapper`,`DataProcessor`
, `DataAdapter`.

- `TokenizerWrapper` classes utilizing `AutoTokenizer` from transformers are
used as factories to instantiate wrapped tokenizers into which task-specific
Expand All @@ -92,8 +128,15 @@ the common operations for data processing and model training.
are used as factories to instantiate the actual task-specific models from the
configuration files dynamically.

* Optimizers from AllenNLP: Implemented as children of the base `Optimizer` class.

* Optimizers from allennlp: Implemented as children of the base `Optimizer` class.
* Metric computation is supported through `jury`. In order to make the metrics
flexible enough to work with the trainer in a common interface, we introduced
metric handlers. You may need to extend these classes accordingly
* For conversion of predictions and references to a suitable form for a
particular metric or metric set: `MetricInputHandler`.
* For manipulating resulting score object containing the metric
results: `MetricOutputHandler`.

## Usage

Expand Down Expand Up @@ -192,7 +235,40 @@ already implemented one that matches your need.
your TokenizerWrapper subclass. Otherwise, you can directly use TokenizerWrapper.


5) **transformers.Pipeline**:
5) **MetricInputHandler**:
This class is mainly responsible for preprocessing applied to predictions and
labels (references). This is performed for transforming the predictions and
labels to a suitable format to be fed in metrics for computation. For example,
while using BLEU in a language generation task, the predictions and labels need
to be converted to a string or list of strings. However, for extractive question
answering task in which the predictions are returned as start and end indices
pointing the answer within the context, additional information (e.g context in
such case) may be needed, so directly returning the start and end indices in this
case does not help, and additional operation is needed to be done by converting
predictions to actual answers extracted from the context. You are able to do this
kind of operations through `MetricInputHandler`, storing additional information,
converting predictions and labels to a suitable format, manipulating resulting
score. Furthermore, in child classes helper classes can also be implemented (e.g
`TokenizerWrapper`, `LabelMapper`) for required tasks. In this class, we provide
three main functionality:
* `_extract_metadata()`: This method allows user to extract metadata from
dataset instances to be later used for preprocessing predictions and labels
in `preprocess()` method.
* `__call__()`: This method allows converting predictions and labels into a
suitable form for metric computation. The default behavior is defined as
directly returning predictions and labels without manipulation, but only
applying `argmax()` to predictions to convert the model predictions to
predictions input for metrics.

7) **MetricOutputHandler**:
The intention of this class is to support for manipulating the score object
returned by the metric computation phase. Jury returns a well-constructed
dictionary output for all metrics; however, to shorten dictionary items,
manipulate the information within the output or to add additional information to
score dictionary, this class can be extended as desired.


7) **transformers.Pipeline**:
The pipeline mechanism from the transformers library have not been fully
integrated yet. For now, you should check the transformers to find a pipeline
that is suitable for your needs and does the same pre-processing. If you could
Expand Down Expand Up @@ -312,15 +388,16 @@ thanks to configuration file based experiments.
### Training a POS Tagging Model on CONLL2003

Since the transformers library lacks a direct support for POS tagging, we added an
[example project](./examples/pos_tagging) that trains a transformer model on `CONLL2003` POS tagging dataset
and perform inference using it. It is a
[example project](./examples/pos_tagging) that trains a transformer model
on `CONLL2003` POS tagging dataset and perform inference using it. It is a
self-contained project including its own requirements file, therefore you can copy
the folder into another directory to use as a template for your own project. Please
follow its `README.md` to get started.

### Training a Question Answering Model on SQuAD Dataset

You can use the notebook in the [Example QA Project](./examples/question_answering) `examples/question_answering/question_answering.ipynb`
You can use the notebook in
the [Example QA Project](./examples/question_answering) `examples/question_answering/question_answering.ipynb`
to follow the steps while training a transformer model on SQuAD v1.

## Installation
Expand Down
2 changes: 2 additions & 0 deletions examples/pos_tagging/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ the root of the example project (i.e. the `pos_tagging` directory), you can run

```console
cd examples/pos_tagging
export PYTHONPATH=$PYTHONPATH:$PWD
python -m scripts.run_tests
```

Expand All @@ -233,6 +234,7 @@ HuggingFace's datasets library using the following command.

```console
cd examples/pos_tagging
export PYTHONPATH=$PYTHONPATH:$PWD
python -m scripts.cache_hf_datasets_fixtures
```

Expand Down
4 changes: 2 additions & 2 deletions examples/pos_tagging/experiments/roberta/experiment.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ local save_steps = 292;
},
"data_collator": {"type": "default"},
"model_wrapper": {"type": "token_classification", "num_labels": 47},
"compute_metrics": {"type": "seqeval",
"return_entity_level_metrics": false},
"compute_metrics": {"metric_params": "seqeval"},
"metric_handler": {"type": "pos-tagging"},
"label_mapper": {"type": "conll2003_pos_tagging_example"},
"args": {
"type": "default",
Expand Down
3 changes: 2 additions & 1 deletion examples/pos_tagging/scripts/cache_hf_datasets_fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
Caches the tests dataset to HuggingFace's `datasets` library's cache so that the
interpreter can find it when we try to load it through the `datasets` library.
"""
from examples.pos_tagging.src import POS_TAGGING_FIXTURES_ROOT
from src import POS_TAGGING_FIXTURES_ROOT

from trapper.common.testing_utils.hf_datasets_caching import (
renew_hf_datasets_fixtures_cache,
)
Expand Down
2 changes: 1 addition & 1 deletion examples/pos_tagging/scripts/run_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

if __name__ == "__main__":
sts_tests = shell(
"pytest --cov trapper --cov-report term-missing --cov-report xml -vvv tests"
"pytest --cov src --cov-report term-missing --cov-report xml -vvv tests"
)
validate_and_exit(tests=sts_tests)
4 changes: 2 additions & 2 deletions examples/pos_tagging/src/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from pathlib import Path

from examples.pos_tagging.src import data
from examples.pos_tagging.src.pipeline import ExamplePosTaggingPipeline
from src import data
from src.pipeline import ExamplePosTaggingPipeline

POS_TAGGING_PROJECT_ROOT = Path(__file__).parent.parent.resolve()
POS_TAGGING_TESTS_ROOT = POS_TAGGING_PROJECT_ROOT / "tests"
Expand Down
16 changes: 4 additions & 12 deletions examples/pos_tagging/src/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,4 @@
from examples.pos_tagging.src.data.data_adapter import (
ExampleDataAdapterForPosTagging,
)
from examples.pos_tagging.src.data.data_processor import (
ExampleConll2003PosTaggingDataProcessor,
)
from examples.pos_tagging.src.data.label_mapper import (
ExampleLabelMapperForPosTagging,
)
from examples.pos_tagging.src.data.tokenizer_wrapper import (
ExamplePosTaggingTokenizerWrapper,
)
from src.data.data_adapter import ExampleDataAdapterForPosTagging
from src.data.data_processor import ExampleConll2003PosTaggingDataProcessor
from src.data.label_mapper import ExampleLabelMapperForPosTagging
from src.data.tokenizer_wrapper import ExamplePosTaggingTokenizerWrapper
11 changes: 5 additions & 6 deletions examples/pos_tagging/src/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@
from typing import List, Optional, Union

import numpy as np

# needed for registering the data-related classes
# noinspection PyUnresolvedReferences
# pylint: disable=unused-import
import src.data
import torch
from tokenizers.pre_tokenizers import Whitespace
from transformers import (
Expand All @@ -33,13 +38,7 @@
TokenClassificationArgumentHandler,
)

# needed for registering the data-related classes
# noinspection PyUnresolvedReferences
# pylint: disable=unused-import
import examples.pos_tagging.src.data
from trapper import PROJECT_ROOT
from trapper.data import LabelMapper
from trapper.pipelines.pipeline import create_pipeline_from_checkpoint


class ExamplePosTaggingPipeline(TokenClassificationPipeline):
Expand Down
3 changes: 1 addition & 2 deletions examples/pos_tagging/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import pytest

from examples.pos_tagging.src import POS_TAGGING_FIXTURES_ROOT
from src import POS_TAGGING_FIXTURES_ROOT

# noinspection PyUnresolvedReferences
# pylint: disable=unused-import
Expand Down
12 changes: 3 additions & 9 deletions examples/pos_tagging/tests/test_data_adapter.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
import pytest
from src.data.data_adapter import ExampleDataAdapterForPosTagging
from src.data.data_processor import ExampleConll2003PosTaggingDataProcessor
from src.data.tokenizer_wrapper import ExamplePosTaggingTokenizerWrapper

from examples.pos_tagging.src.data.data_adapter import (
ExampleDataAdapterForPosTagging,
)
from examples.pos_tagging.src.data.data_processor import (
ExampleConll2003PosTaggingDataProcessor,
)
from examples.pos_tagging.src.data.tokenizer_wrapper import (
ExamplePosTaggingTokenizerWrapper,
)
from trapper.common.constants import IGNORED_LABEL_ID
from trapper.data import InputBatch

Expand Down
9 changes: 2 additions & 7 deletions examples/pos_tagging/tests/test_data_processor.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
import pytest

from examples.pos_tagging.src.data.data_processor import (
ExampleConll2003PosTaggingDataProcessor,
)
from examples.pos_tagging.src.data.tokenizer_wrapper import (
ExamplePosTaggingTokenizerWrapper,
)
from src.data.data_processor import ExampleConll2003PosTaggingDataProcessor
from src.data.tokenizer_wrapper import ExamplePosTaggingTokenizerWrapper


@pytest.fixture(scope="module")
Expand Down
10 changes: 6 additions & 4 deletions examples/pos_tagging/tests/test_trainer.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import datasets
import pytest
from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

# needed for registering the data-related classes
# noinspection PyUnresolvedReferences
# pylint: disable=unused-import
import examples.pos_tagging.src
import src
from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

from trapper.common import Params
from trapper.data.data_collator import DataCollator
from trapper.training import TransformerTrainer, TransformerTrainingArguments
Expand Down Expand Up @@ -35,8 +36,8 @@ def trainer_params(temp_output_dir, temp_result_dir, get_hf_datasets_fixture_pat
},
"data_collator": {},
"model_wrapper": {"type": "token_classification", "num_labels": 47},
"compute_metrics": {"type": "seqeval",
"return_entity_level_metrics": False},
"compute_metrics": {"metric_params": "seqeval"},
"metric_input_handler": {"type": "token-classification"},
"label_mapper": {"type": "conll2003_pos_tagging_example"},
"args": {
"type": "default",
Expand All @@ -58,6 +59,7 @@ def trainer_params(temp_output_dir, temp_result_dir, get_hf_datasets_fixture_pat
"save_total_limit": 1,
"metric_for_best_model": "eval_loss",
"greater_is_better": False,
"seed": 100
},
"optimizer": {
"type": "huggingface_adamw",
Expand Down
8 changes: 8 additions & 0 deletions examples/question_answering/experiment.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ local result_dir = std.extVar("OUTPUT_PATH");
"model_wrapper": {
"type": "question_answering"
},
"metric_input_handler": {
"type": "question-answering"
},
"compute_metrics": {
"metric_params": [
"squad"
]
},
"args": {
"type": "default",
"output_dir": checkpoint_dir,
Expand Down
Loading

0 comments on commit 13a2829

Please sign in to comment.