-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to export Tokenizer
-s?
#2015
Comments
Hi @fdtomasi - You can not directly export GPT2CausalLMPreprocessor tokenizers.
Or create custom keras model for which take tokenizer inputs and then save like this:
Attached gist for your refrence here. |
Hi @mehtamansi29 thank you for looking into this. Yes, I am aware we can save the tokenizers using tokenizer = GPT2CausalLMPreprocessor.from_preset("gpt2_base_en")
tf.saved_model.save(tokenizer, "tokenizer_model") works for me. However even after wrapping the tokenizer using a class TokenizerModel(keras.Model):
def __init__(self, tokenizer):
super(TokenizerModel, self).__init__()
self.tokenizer = tokenizer
def call(self, inputs):
encoded = self.tokenizer(inputs)
return encoded[0]
tokenizer_model = TokenizerModel(tokenizer)
# Build the model
tokenizer_model("text")
tokenizer_model(ops.convert_to_tensor(["test","test1"]))
tokenizer_model.export("test/export") which returns the same error. Since I am trying to use the tokenizers as part of tf-serving function, I think I am in need of using the |
Hi @fdtomasi - For exporting text tokenizers to be served for tf-serving, you can use TextVectorization layer with desired parameters and build some vocabulary with dummy data and then you above TokenizerModel class and then build TokenizerModel before exporting it.
Attached gist for reference which shows export tokenizer and example for loading exported tokenizer. |
I do not think this suggestion work in general. When exporting a tokenizer we should ensure that the tokenizer is the same that was used during training, with the same behaviour. Even when using a The only issue with exporting the original |
I am encountering issues in exporting text tokenizers to be served for tf-serving as part of a tf.Graph.
To Reproduce
This should not return errors, but I get the following:
I am explicitly tracking the tokenizer as according to this https://keras.io/api/models/model_saving_apis/export/#track-method it seems to be required when using lookup tables, but it seems it is not enough.
I am using
keras_hub == 0.17.0
,keras == 3.7.0
,tensorflow == 2.18.0
.Thanks!
The text was updated successfully, but these errors were encountered: