v2.0.0-rc0
Please note that moving forward our releases and branches will match the major & minor versions of core TensorFlow. This should prevent future confusion. As such, this (previously 1.0) release is 2.0, and we will be skiping straight to 1.15 for the next 1.x release to support TF 1.15.
Major Updates:
- SentencepieceTokenizer has been added. Please see https://github.com/google/sentencepiece for more information on Sentencepiece.
- New ToDense Keras layer for RaggedTensor conversion
- Pipeline for generating a Wordpiece Vocabulary has been added to tools.
- New Rouge-L metric op for measuring text similarity. A new colab has been added to the examples directory which provides usage examples.
- New BertTokenizer which mimics the preprocessing performed in the original BERT model.
- New Detokenizer abstract class has been added to the TF.Text Tokenizer API.
- Many previously released ops have been added TensorFlow Serving model server. Please see https://github.com/tensorflow/serving for more information.
Minor Updates:
- API docs have received an update that should make finding relevant information easier.
- Wordpiece: Add support for splitting unknown characters
- Wordpiece: Add support for max characters per token
- Wordshape: Fix finding of currency symbols.
- Update Whitespace & UnicodeScript Tokenizers to accept scalar values.
- Build includes CC library targets. Useful for statically linking in TF.Text custom ops. Specifically useful for building into TF.Serving's model server.
- Build environment: Updated to match core TF's update.