Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations #276

Merged
merged 44 commits into from
Sep 8, 2023
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
6885074
Add `CodeLlamaTokenizer`
xenova Sep 2, 2023
0ea55e9
Add `codellama` for testing
xenova Sep 2, 2023
1365fe0
Update default quantization settings
xenova Sep 2, 2023
ee754fa
Refactor `PretrainedModel`
xenova Sep 3, 2023
87ffc38
Remove unnecessary error message
xenova Sep 3, 2023
ddc9306
Update llama-code-tokenizer test
xenova Sep 3, 2023
1f36ebc
Add support for `GPTNeoX` models
xenova Sep 5, 2023
7f993d4
Fix `GPTNeoXPreTrainedModel` config
xenova Sep 5, 2023
4b45d9a
Add support for `GPTJ` models
xenova Sep 5, 2023
f060ead
Add support for `WavLM` models
xenova Sep 5, 2023
9444127
Update list of supported models
xenova Sep 5, 2023
2cda7e4
Add support for XLM models
xenova Sep 5, 2023
aa1b309
Add support for `ResNet` models
xenova Sep 5, 2023
087d173
Add support for `BeiT` models
xenova Sep 5, 2023
70196a6
Fix casing of `BeitModel`
xenova Sep 5, 2023
92078ab
Merge branch 'main' into decoder-optimizations
xenova Sep 5, 2023
674b868
Remove duplicate code
xenova Sep 5, 2023
f371586
Update variable name
xenova Sep 5, 2023
291eda5
Remove `ts-ignore`
xenova Sep 5, 2023
864271c
Remove unnecessary duplication
xenova Sep 5, 2023
7b12892
Update demo model sizes
xenova Sep 5, 2023
184cf4a
[demo] Update default summarization parameters
xenova Sep 5, 2023
3ce20c2
Update default quantization parameters for new models
xenova Sep 5, 2023
7170160
Remove duplication in mapping
xenova Sep 6, 2023
cff84ee
Update list of supported marian models
xenova Sep 6, 2023
18df52f
Add support for `CamemBERT` models
xenova Sep 6, 2023
f38cf9e
Add support for `MBart` models
xenova Sep 6, 2023
fc1426f
Add support for `OPT` models
xenova Sep 6, 2023
0e1fb97
Add `MBartTokenizer` and `MBart50Tokenizer`
xenova Sep 6, 2023
3ce23de
Add example of multilingual translation with MBart models
xenova Sep 6, 2023
f2fce14
Add `CamembertTokenizer`
xenova Sep 6, 2023
baa5869
Add support for `HerBERT` models
xenova Sep 7, 2023
3612822
Add support for `XLMTokenizer`
xenova Sep 7, 2023
a351359
Fix `fuse_unk` config
xenova Sep 7, 2023
fc1c176
Do not remove duplicate keys for `Unigram` models
xenova Sep 7, 2023
4c56699
Update HerBERT supported model text
xenova Sep 7, 2023
5080f7a
Update generate_tests.py
xenova Sep 7, 2023
a140648
Update list of supported models
xenova Sep 8, 2023
cdb4814
Use enum object instead of classes for model types
xenova Sep 8, 2023
fd238ee
Add link to issue
xenova Sep 8, 2023
68544a3
Update dependencies for unit tests
xenova Sep 8, 2023
fbe52aa
Add `sentencepiece` as a testing requirement
xenova Sep 8, 2023
2a4f44d
Add `protobuf` to test dependency
xenova Sep 8, 2023
05ed5ab
Remove duplicated models to test
xenova Sep 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update generate_tests.py
xenova committed Sep 7, 2023
commit 5080f7a0e27d47f67d6e5c9e77c4505cf67307a2
28 changes: 25 additions & 3 deletions tests/generate_tests.py
Original file line number Diff line number Diff line change
@@ -22,11 +22,23 @@
],
}

MODELS_TO_IGNORE = [
# TODO: remove when https://github.com/huggingface/tokenizers/issues/251 is fixed
'xlm',

# TODO: remove when https://github.com/huggingface/transformers/issues/26018 is fixed
'marian',
]

TOKENIZERS_TO_IGNORE = [
# TODO: remove when https://github.com/huggingface/transformers/pull/25478 is merged
'facebook/m2m100_418M',
]

MAX_TESTS = {
'marian': 10,
}

TOKENIZER_TEST_DATA = {
"shared": [
"hello world",
@@ -96,6 +108,11 @@ def generate_tokenizer_tests():
list(ADDITIONAL_TOKENIZERS_TO_TEST.items())

for model_type, tokenizer_names in tokenizers_to_test:
if model_type in MODELS_TO_IGNORE:
continue
if model_type in MAX_TESTS:
tokenizer_names = tokenizer_names[:MAX_TESTS[model_type]]

print(f'Generating tests for {model_type}')
for tokenizer_name in tokenizer_names:
if tokenizer_name in TOKENIZERS_TO_IGNORE:
@@ -147,11 +164,16 @@ def generate_tokenizer_tests():
def generate_config_tests():
results = {}
for model_type, config_names in SUPPORTED_MODELS.items():
print(f'Generating tests for {model_type}')

for config_name in config_names:
# Load config
config = AutoConfig.from_pretrained(config_name)

print(' -', config_name)
try:
# Load config
config = AutoConfig.from_pretrained(config_name)
except Exception:
# Something went wrong, skip this config
continue
results[config_name] = config.to_dict()

# TODO: Remove after https://github.com/huggingface/transformers/issues/23876 fixed