Skip to content

Releases: tensorflow/text

v2.9.0

18 May 02:27
Compare
Choose a tag to compare

Release 2.9

Major Features and Improvements

  • New FastBertNormalizer that improves speed for BERT normalization and is convertible to TF Lite.
  • New FastBertTokenizer that combines FastBertNormalizer and FastWordpieceTokenizer.
  • New ngrams kernel for handling STRING_JOIN reductions.

Bug Fixes and Other Changes

  • NgramsStringJoin shape inference fixed to handle unranked tensors
  • Upgrade pybind11 and reenable tests that were broken.
  • Rename a couple files to match the naming of the other tflite kernels. Also adds some deps to tflite_ops that were missing and causing an error when testing :all.
  • Add to TF Lite documentation that ngrams is a convertible op.
  • Fix public access and missing ICU data to build_fast_bert_normalizer_model and enable the disabled tests.
  • Update the doc for FastWordpieceTokenizer.
  • Refine the doc for FastWordpieceTokenizer.
  • Bug fix: make BertTokenizer work for RaggedTensors with row_splits_dtype=int32
  • Fix typo error text.WordpieceTokenizer
  • Added comma at missing places in emoticons for normalizer
  • Refactor build and test scripts to use prepare_tf_dep.sh
  • Fixes prepare_tf_dep.sh for OSX.
  • Fixed bug in setup.py that was requiring the wrong version.
  • Updated package with the correct versions of Python we release on.
  • Update documentation on TF Lite convertible ops.
  • Transition to use TF's version of bazel.
  • Transition to use TF's bazel configuration.
  • Add missing symbols for tokenization layers
  • Fix typo in text_generation.ipynb
  • Fix grammar typo
  • Allow fast wordpiece tokenizer to take in external wordpiece model.
  • Internal change
  • Improvement to guide where mean call is redundant. See #810 for more info.
  • Update broken link and fix typo in BERT-SNGP demo notebook
  • Consolidate disparate test-related files into a single testing_infra folder.
  • Pin tf-text version to guides & tutorials.
  • Fix bug in constrained sequence op. Added a check on an edge case where num_steps = 0 should do nothing and prevent it from SIGSEV crashes.
  • Remove outdated Keras tests due to them no longer making the testing utilities available.
  • Update bert preprocessing by padding correct tensors
  • Update tensorflow-text notebooks from 2.7 to 2.8
  • Optimize FastWordPiece to only generate requested outputs.
  • Add a note about byte-indexing vs character indexing.
  • Add a MAX_TOKENS to the transformer tutorial.
  • Only export tensorflow symbols from shared libs.
  • (Generated change) Update tf.Text versions and/or docs.
  • Do not run the prepare_tf_dep script for Apple M1 macs.
  • Update text_classification_rnn.ipynb
  • Fix the exported symbols for the linker test. By adding it to the share objects instead of the c++ code, it allows for the code to be compiled together in one large shared lib.
  • Implement FastBertNormalizer based on codepoint-wise mappings.
  • Add pybind for fast_bert_normalizer_model_builder.
  • Remove unused comments related to Python 2 compatibility.
  • update transformer.ipynb
  • Update toolchain & temporarily disable tf lite tests.
  • Define manylinux2014 for the new toolchain target, and have presubmits use it.
  • Move tflite build deps to custom target.
  • Add FastBertTokenizer.
  • Update bazel version to 5.1.0
  • Update TF Text to use new Ngrams kernel.
  • Don't try to set dimension if shape is unknown for ngrams.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Aflah, Connor Brinton, devnev39, Janak Ramakrishnan, Martin, Nathan Luehr, Pierre Dulac, Rabin Adhikari, gadagashwini, mohantym, rtg0795

v2.10.0-b2

12 May 17:28
Compare
Choose a tag to compare
v2.10.0-b2 Pre-release
Pre-release

Release 2.10.0-b2

Major Features and Improvements

  • Added FastSentencepieceTokenizer which is convertible to TF Lite. Please note the op name in the graph will change, so any models trained with this version will need to be retrained when the release candidate for 2.10 is released.

Important Notes

  • This beta release is outside the normal release cycle and is meant to work with TF versions 2.8.x.
  • Again, the op name for FSP will change in future releases.

v2.8.2

21 Apr 02:58
Compare
Choose a tag to compare

Release 2.8.2

Major Features and Improvements

  • 📦️ Fix macOS packaging so it works with package managers like Poetry (#838)

Bug Fixes and Other Changes

  • Package metadata updated with the correct available python versions.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Connor Brinton

v2.9.0-rc1

15 Apr 02:06
Compare
Choose a tag to compare
v2.9.0-rc1 Pre-release
Pre-release

Release 2.9.0-rc1

Major Features and Improvements

  • New FastBertNormalizer that improves speed for BERT normalization and is convertible to TF Lite.
  • New FastBertTokenizer that combines FastBertNormalizer and FastWordpieceTokenizer.
  • New ngrams kernel for handling STRING_JOIN reductions.

Bug Fixes and Other Changes

  • Fixed bug in setup.py that was requiring the wrong version.
  • Updated package with the correct versions of Python we release on.
  • Update documentation on TF Lite convertible ops.
  • Transition to use TF's version of bazel.
  • Transition to use TF's bazel configuration.
  • Add missing symbols for tokenization layers
  • Fix typo in text_generation.ipynb
  • Fix grammar typo
  • Allow fast wordpiece tokenizer to take in external wordpiece model.
  • Internal change
  • Improvement to guide where mean call is redundant. See #810 for more info.
  • Update broken link and fix typo in BERT-SNGP demo notebook
  • Consolidate disparate test-related files into a single testing_infra folder.
  • Pin tf-text version to guides & tutorials.
  • Fix bug in constrained sequence op. Added a check on an edge case where num_steps = 0 should do nothing and prevent it from SIGSEV crashes.
  • Remove outdated Keras tests due to them no longer making the testing utilities available.
  • Update bert preprocessing by padding correct tensors
  • Update tensorflow-text notebooks from 2.7 to 2.8
  • Optimize FastWordPiece to only generate requested outputs.
  • Add a note about byte-indexing vs character indexing.
  • Add a MAX_TOKENS to the transformer tutorial.
  • Only export tensorflow symbols from shared libs.
  • (Generated change) Update tf.Text versions and/or docs.
  • Do not run the prepare_tf_dep script for Apple M1 macs.
  • Update text_classification_rnn.ipynb
  • Fix the exported symbols for the linker test. By adding it to the share objects instead of the c++ code, it allows for the code to be compiled together in one large shared lib.
  • Implement FastBertNormalizer based on codepoint-wise mappings.
  • Add pybind for fast_bert_normalizer_model_builder.
  • Remove unused comments related to Python 2 compatibility.
  • update transformer.ipynb
  • Update toolchain & temporarily disable tf lite tests.
  • Define manylinux2014 for the new toolchain target, and have presubmits use it.
  • Move tflite build deps to custom target.
  • Add FastBertTokenizer.
  • Update bazel version to 5.1.0
  • Update TF Text to use new Ngrams kernel.
  • Don't try to set dimension if shape is unknown for ngrams.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Aflah, Connor Brinton, devnev39, Janak Ramakrishnan, Martin, Nathan Luehr, Pierre Dulac, Rabin Adhikari

v2.9.0-rc0

14 Apr 16:59
Compare
Choose a tag to compare
v2.9.0-rc0 Pre-release
Pre-release

Release 2.9.0-rc0

Major Features and Improvements

  • New FastBertNormalizer that improves speed for BERT normalization and is convertible to TF Lite.
  • New FastBertTokenizer that combines FastBertNormalizer and FastWordpieceTokenizer.
  • New ngrams kernel for handling STRING_JOIN reductions.

Bug Fixes and Other Changes

  • Add missing symbols for tokenization layers
  • Fix typo in text_generation.ipynb
  • Fix grammar typo
  • Allow fast wordpiece tokenizer to take in external wordpiece model.
  • Internal change
  • Improvement to guide where mean call is redundant. See #810 for more info.
  • Update broken link and fix typo in BERT-SNGP demo notebook
  • Consolidate disparate test-related files into a single testing_infra folder.
  • Pin tf-text version to guides & tutorials.
  • Fix bug in constrained sequence op. Added a check on an edge case where num_steps = 0 should do nothing and prevent it from SIGSEV crashes.
  • Remove outdated Keras tests due to them no longer making the testing utilities available.
  • Update bert preprocessing by padding correct tensors
  • Update tensorflow-text notebooks from 2.7 to 2.8
  • Optimize FastWordPiece to only generate requested outputs.
  • Add a note about byte-indexing vs character indexing.
  • Add a MAX_TOKENS to the transformer tutorial.
  • Only export tensorflow symbols from shared libs.
  • (Generated change) Update tf.Text versions and/or docs.
  • Do not run the prepare_tf_dep script for Apple M1 macs.
  • Update text_classification_rnn.ipynb
  • Fix the exported symbols for the linker test. By adding it to the share objects instead of the c++ code, it allows for the code to be compiled together in one large shared lib.
  • Implement FastBertNormalizer based on codepoint-wise mappings.
  • Add pybind for fast_bert_normalizer_model_builder.
  • Remove unused comments related to Python 2 compatibility.
  • update transformer.ipynb
  • Update toolchain & temporarily disable tf lite tests.
  • Define manylinux2014 for the new toolchain target, and have presubmits use it.
  • Move tflite build deps to custom target.
  • Add FastBertTokenizer.
  • Update bazel version to 5.1.0
  • Update TF Text to use new Ngrams kernel.
  • Don't try to set dimension if shape is unknown for ngrams.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Aflah, Connor Brinton, devnev39, Janak Ramakrishnan, Martin, Nathan Luehr, Pierre Dulac, Rabin Adhikari

v2.8.1

04 Feb 11:02
Compare
Choose a tag to compare

Release 2.8.1

Major Features and Improvements

  • Upgrade Sentencepiece to v0.1.96
  • Adds new trimmer ShrinkLongestTrimmer

Bug Fixes and Other Changes

  • Upgrade bazel to 4.2.2
  • Create .bazelversion file to guarantee using correct version
  • Update tf.Text versions and docs.
  • Add Apple Silicon support for manual builds.
  • Update configure.sh
  • Only Apple Silicon will be installed with tensorflow-macos
  • Fix merge error & add SP patch for building on Windows
  • Fix inclusion of missing libraries for Mac & Windows
  • Update word_embeddings.ipynb
  • Update classify_text_with_bert.ipynb
  • Update tensorflow_text tutorials to new preprocessing layer symbol path
  • Fixes typo in guide
  • Update Apple Silicon's requires.
  • release script to use tf nighly
  • Fix typo in ragged tensor link.
  • Update requires for setup. It wasn't catching non-M1 Macs.
  • Add missing symbols for tokenization layers
  • Fix typo in text_generation.ipynb
  • Fix grammar typo
  • Allow fast word piece tokenizer to take in external word piece model.
  • Update guide with redundant mean call.
  • Update broken link and fix typo in BERT-SNGP demo notebook.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Abhijeet Manhas, chunduriv, Dean Wyatte, Feiteng, jaymessina3, Mao, Olivier Bacs, RenuPatelGoogle, Steve R. Sun, Stonepia, sun1638650145, Tharaka De Silva, thuang513, Xiaoquan Kong, devnev39, Janak Ramakrishnan, Pierre Dulac

v2.8.0-rc0

31 Jan 20:21
Compare
Choose a tag to compare
v2.8.0-rc0 Pre-release
Pre-release

Release 2.8.0-rc0

Major Features and Improvements

  • Upgrade Sentencepiece to v0.1.96
  • Adds new trimmer ShrinkLongestTrimmer

Bug Fixes and Other Changes

  • Upgrade bazel to 4.2.2
  • Create .bazelversion file to guarantee using correct version
  • (Generated change) Update tf.Text versions and/or docs.
  • Add Apple Silicon support for manual builds.
  • Update configure.sh
  • Only Apple Silicon will be installed with tensorflow-macos
  • Fix merge error & add SP patch for building on Windows
  • Fix inclusion of missing libraries for Mac & Windows
  • Update word_embeddings.ipynb
  • Update classify_text_with_bert.ipynb
  • Update tensorflow_text tutorials to new preprocessing layer symbol path
  • Fixes typo in guide
  • Update Apple Silicon's requires.
  • release script to use tf nighly
  • Fix typo in ragged tensor link.
  • Update requires for setup. It wasn't catching non-M1 Macs.

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Abhijeet Manhas, chunduriv, Dean Wyatte, Feiteng, jaymessina3, Mao, Olivier Bacs, RenuPatelGoogle, Steve R. Sun, Stonepia, sun1638650145, Tharaka De Silva, thuang513, Xiaoquan Kong

v2.7.3

19 Nov 12:59
Compare
Choose a tag to compare

Bug Fixes and Other Changes

  • Fixed broken packages for MacOS & Windows

v2.7.0

12 Nov 06:43
Compare
Choose a tag to compare

Release 2.7.0

Major Features and Improvements

  • Added new tokenizer: FastWordpieceTokenizer that is considerably faster than the original WordpieceTokenizer
  • WhitespaceTokenizer was rewritten to increase speed and smaller kernel size
  • Ability to convert WhitespaceTokenizer & FastWordpieceTokenizer to TF Lite
  • Added Keras layers for tokenizers: UnicodeScript, Whitespace, & Wordpiece

Bug Fixes and Other Changes

  • (Generated change) Update tf.Text versions and/or docs.
  • tiny change for variable name in transformer tutorial
  • Update nmt_with_attention.ipynb
  • Add vocab_size for wordpiece tokenizer to have consistency with sentence piece.
  • This is a general clean up to the build files. The previous tf_deps paradigm was confusing. By encapsulating everything into a single call lib, I'm hoping this makes it easier to understand and follow.
  • This adds the builder for the new WhitespaceTokenizer config cache. This is the first in a series of changes to update the WST for mobile.
  • C++ API for new WhitespaceTokenizer. The updated API is more useful (accepts strings instead of ints), faster, and smaller in size.
  • Adds pywrap for WhitespaceTokenizer config builder.
  • Simplify the configure.bzl. Since for each platform we build with C++14, let's just make it easier to default to it across the board. This should be easier to understand and maintain.
  • Remove most of the default oss deps for kernels as they are no longer required for building.
  • Updating this BERT tutorial to use model subclassing (easier for students to hack on it this way).
  • Adds kernels for TF & TFLite for the new WhitespaceTokenizer.
  • Fix a problem with the WST template that was causing members to be exported as undefined symbols. After this change they become a unique global symbol in the shared object file.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Convert the TFLite kernel for ngram with STRING_JOIN mode to use tfshim so the same code is now used for TF and TFLite kernels.
  • fix: masked_ids -> masked_lm_ids
  • Save the transformer.
  • Remove the sentencepiece patch in OSS
  • fix vocab_table arg is not used in bert_pretrain_preprocess()
  • Disable TSAN for one more tutorial test that may run for >900sec when TSAN is
  • Remove the sentencepiece patch in OSS
  • internal
  • (Generated change) Update tf.Text versions and/or docs.
  • Update deps to fix broken build.
  • Remove --gen_report flag.
  • Small typo fixed
  • Explain that all heads are handled with a single Dense layer
  • internal change, should be a noop in github.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Creates tf Lite registrar and adds TF Lite tests for mobile ops.
  • Fix nmt_with_attention start_index
  • Export LD_LIBRARY_PATH when configuring for build.
  • Update tf lite test to use the function rather than having to globally share the linked library symbols so the interpreter can find the name since this is only available on linux.
  • Temporarily switch to the definition of REGISTER_TF_OP_SHIM while it updates.
  • Update REGISTER_TF_OP_SHIM macro to remove unnecessary parameter.
  • Remove temporary code and set back to using the op shim macro.
  • Updated import statement
  • Internal change
  • pushed back forward compatibility date for tf_text.WhitespaceTokenizer.
  • Add .gitignore
  • The --keep_going flag will make bazel run all tests instead of stopping
  • Add missing blank line between test and doctest.
  • Adds a regression test for model server for the replaced WST op. This ensures that current models using the old kernel will continue to work.
  • Fix the build by adding a new dependency required by TF to kernel targets.
  • Add sentenepiece detokenize op to stateful allowlist.
  • Fix broken build. This occurred because of a change on TF that updated the compiler infra version (tensorflow/tensorflow@e0940f2).
  • Clean up code now that the build horizon has passed.
  • Add pywrap dependency for tflite ops.
  • Update TextVectorization layer
  • Allows overridden get_selectable to be used.
  • fix: masked_input_ids is not used in bert_pretrain_preprocess()
  • Update word_embeddings.ipynb
  • Fixed a value where the training accuracy was shown instead of the validation accuracy
  • Mark old SP targets
  • Create a single SELECT_TFTEXT_OPS for registering all of the TF Text ops with TF Lite interpreter. Also adds a single target for building to them.
  • Add TF Lite op for RaggedTensorToTensor.
  • Adds a new guide for using select TF Text ops in TF Lite models for mobile.
  • Switch FastWordpieceTokenizer to default to running pre-tokenization, and rename the end_to_end parameter to no_pretokenization. This should be a no-op. The flatbuffer is not changed so as to not affect any models already using FWP currently. Only the python API is updated.
  • Update version

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Aaron Siddhartha Mondal, Abhijeet Manhas, Dominik Schlösser, jaymessina3, Mao, Xiaoquan Kong, Yasir Modak, Olivier Bacs, Tharaka De Silva

v2.7.0-rc1

04 Nov 22:39
Compare
Choose a tag to compare
v2.7.0-rc1 Pre-release
Pre-release

Release 2.7.0-rc1

Major Features and Improvements

  • Added new tokenizer: FastWordpieceTokenizer that is considerably faster than the original WordpieceTokenizer
  • WhitespaceTokenizer was rewritten to increase speed and smaller kernel size
  • Ability to convert WhitespaceTokenizer & FastWordpieceTokenizer to TF Lite
  • Added Keras layers for tokenizers: UnicodeScript, Whitespace, & Wordpiece

Bug Fixes and Other Changes

  • (Generated change) Update tf.Text versions and/or docs.
  • tiny change for variable name in transformer tutorial
  • Update nmt_with_attention.ipynb
  • Add vocab_size for wordpiece tokenizer to have consistency with sentence piece.
  • This is a general clean up to the build files. The previous tf_deps paradigm was confusing. By encapsulating everything into a single call lib, I'm hoping this makes it easier to understand and follow.
  • This adds the builder for the new WhitespaceTokenizer config cache. This is the first in a series of changes to update the WST for mobile.
  • C++ API for new WhitespaceTokenizer. The updated API is more useful (accepts strings instead of ints), faster, and smaller in size.
  • Adds pywrap for WhitespaceTokenizer config builder.
  • Simplify the configure.bzl. Since for each platform we build with C++14, let's just make it easier to default to it across the board. This should be easier to understand and maintain.
  • Remove most of the default oss deps for kernels as they are no longer required for building.
  • Updating this BERT tutorial to use model subclassing (easier for students to hack on it this way).
  • Adds kernels for TF & TFLite for the new WhitespaceTokenizer.
  • Fix a problem with the WST template that was causing members to be exported as undefined symbols. After this change they become a unique global symbol in the shared object file.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Convert the TFLite kernel for ngram with STRING_JOIN mode to use tfshim so the same code is now used for TF and TFLite kernels.
  • fix: masked_ids -> masked_lm_ids
  • Save the transformer.
  • Remove the sentencepiece patch in OSS
  • fix vocab_table arg is not used in bert_pretrain_preprocess()
  • Disable TSAN for one more tutorial test that may run for >900sec when TSAN is
  • Remove the sentencepiece patch in OSS
  • internal
  • (Generated change) Update tf.Text versions and/or docs.
  • Update deps to fix broken build.
  • Remove --gen_report flag.
  • Small typo fixed
  • Explain that all heads are handled with a single Dense layer
  • internal change, should be a noop in github.
  • Update whitespace op to use new kernel. This change still allows for building the old kernel as well so current users can continue to use it, even though we cannot make new calls to it.
  • Creates tf Lite registrar and adds TF Lite tests for mobile ops.
  • Fix nmt_with_attention start_index
  • Export LD_LIBRARY_PATH when configuring for build.
  • Update tf lite test to use the function rather than having to globally share the linked library symbols so the interpreter can find the name since this is only available on linux.
  • Temporarily switch to the definition of REGISTER_TF_OP_SHIM while it updates.
  • Update REGISTER_TF_OP_SHIM macro to remove unnecessary parameter.
  • Remove temporary code and set back to using the op shim macro.
  • Updated import statement
  • Internal change
  • pushed back forward compatibility date for tf_text.WhitespaceTokenizer.
  • Add .gitignore
  • The --keep_going flag will make bazel run all tests instead of stopping
  • Add missing blank line between test and doctest.
  • Adds a regression test for model server for the replaced WST op. This ensures that current models using the old kernel will continue to work.
  • Fix the build by adding a new dependency required by TF to kernel targets.
  • Add sentenepiece detokenize op to stateful allowlist.
  • Fix broken build. This occurred because of a change on TF that updated the compiler infra version (tensorflow/tensorflow@e0940f2).
  • Clean up code now that the build horizon has passed.
  • Add pywrap dependency for tflite ops.
  • Update TextVectorization layer
  • Allows overridden get_selectable to be used.
  • Update version

Thanks to our Contributors

This release contains contributions from many people at Google, as well as:

Aaron Siddhartha Mondal, Abhijeet Manhas, Dominik Schlösser, jaymessina3, Mao, Xiaoquan Kong, Yasir Modak