Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

segment-any-text / wtpsplit Public

Notifications You must be signed in to change notification settings
Fork 44
Star 793

Code
Issues 4
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: segment-any-text/wtpsplit

Releases Tags

Releases · segment-any-text/wtpsplit

Release 2.1.2

14 Dec 11:06

markus583

2.1.2

159a493

Compare

Choose a tag to compare

View all tags

Release 2.1.2 Latest

Latest

Fixes #142: AssertionError when string is only comprised of newlines, whitespace, or if its an empty strong.

Assets 2

All reactions

Release 2.1.1

27 Oct 14:19

markus583

2.1.1

16f1a2c

Compare

Choose a tag to compare

View all tags

Release 2.1.1

Change default behaviour for newlines in SaT.split.
- Now, while the model ignores them, they will used to split as simple post-processing.
Small bugfixes for LoRA training
Update Readme for advanced usage

Assets 2

All reactions

Release 2.1.0

24 Sep 21:37

markus583

2.1.0

00d2d6c

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Release 2.1.0

Adds ONNX support for SaT models.
- Including export scripts and an updated README.
- This results in 50% improved inference time on GPU.

Assets 2

JTRNS reacted with hooray emoji

Qubitium reacted with heart emoji

All reactions

🎉 1 reaction
❤️ 1 reaction

2 people reacted

Release 2.0.8

09 Sep 10:49

markus583

2.0.8

46f3d19

Compare

Choose a tag to compare

View all tags

Release 2.0.8

Fix splitting of short sequences into individual characters (#127)

Assets 2

All reactions

Release 2.0.7

02 Sep 13:26

markus583

2.0.7

0f675f7

Compare

Choose a tag to compare

View all tags

Release 2.0.7

Allow numpy>=2.0
Fix adaptation code
Add some comments

Assets 2

All reactions

Release 2.0.5

08 Jul 07:41

bminixhofer

2.0.5

cfd5e24

Compare

Choose a tag to compare

View all tags

Release 2.0.5

Fixes potential CUDA device error when the input has exactly 511 tokens (#121).

Assets 2

All reactions

Release 2.0.4

01 Jul 09:32

bminixhofer

2.0.4

8227d86

Compare

Choose a tag to compare

View all tags

Release 2.0.4

Fix a speed issue with SaT (#118). Now it is (as expected) ~6x faster than WtP.

Assets 2

All reactions

Release 2.0.3

26 Jun 08:05

bminixhofer

2.0.3

66eb194

Compare

Choose a tag to compare

View all tags

Release 2.0.3

Implement SaT (https://arxiv.org/abs/2406.16678) and switch the default models to SaT🚀

The previous WtP models are still available but SaT is strictly better in accuracy and speed. See the updated README for details: https://github.com/segment-any-text/wtpsplit.

SaT was implemented and developed by @markus583 @igorsterner.

Contributors

markus583 and igorsterner

Assets 2

651961, nmstoker, and mrmichaeladavis reacted with thumbs up emoji

stefan-it and 651961 reacted with hooray emoji

Qubitium reacted with rocket emoji

All reactions

👍 3 reactions
🎉 2 reactions
🚀 1 reaction

5 people reacted

Release 1.3.0

22 Jan 15:30

bminixhofer

1.3.0

b83e419

Compare

Choose a tag to compare

View all tags

Release 1.3.0

Fix a bug affecting some hash embeddings of the canine-* models which reduced accuracy (please upgrade to this version!).
Add a guide on adapting to your custom data: https://github.com/bminixhofer/wtpsplit#advanced-usage.

Assets 2

nezda and pavaris-pm reacted with hooray emoji

Qubitium and pavaris-pm reacted with rocket emoji

All reactions

🎉 2 reactions
🚀 2 reactions

3 people reacted

Release 1.2.3

18 Jul 13:47

bminixhofer

1.2.3

cf347c3

Compare

Choose a tag to compare

View all tags

Release 1.2.3

fix error with text where length is not a multiple of 4 and shorter than 512 characters in canine-s-* models (#98).

Assets 2

All reactions

Previous 1 2 3 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.