Releases: segment-any-text/wtpsplit
Releases · segment-any-text/wtpsplit
Release 2.1.2
Release 2.1.1
- Change default behaviour for newlines in SaT.split.
- Now, while the model ignores them, they will used to split as simple post-processing.
- Small bugfixes for LoRA training
- Update Readme for advanced usage
Release 2.1.0
- Adds ONNX support for SaT models.
- Including export scripts and an updated README.
- This results in 50% improved inference time on GPU.
Release 2.0.8
- Fix splitting of short sequences into individual characters (#127)
Release 2.0.7
- Allow numpy>=2.0
- Fix adaptation code
- Add some comments
Release 2.0.5
- Fixes potential CUDA device error when the input has exactly 511 tokens (#121).
Release 2.0.4
- Fix a speed issue with SaT (#118). Now it is (as expected) ~6x faster than WtP.
Release 2.0.3
Implement SaT (https://arxiv.org/abs/2406.16678) and switch the default models to SaT🚀
The previous WtP models are still available but SaT is strictly better in accuracy and speed. See the updated README for details: https://github.com/segment-any-text/wtpsplit.
SaT was implemented and developed by @markus583 @igorsterner.
Release 1.3.0
- Fix a bug affecting some hash embeddings of the
canine-*
models which reduced accuracy (please upgrade to this version!). - Add a guide on adapting to your custom data: https://github.com/bminixhofer/wtpsplit#advanced-usage.
Release 1.2.3
- fix error with text where length is not a multiple of 4 and shorter than 512 characters in
canine-s-*
models (#98).