Releases: tsproisl/SoMaJo
Releases · tsproisl/SoMaJo
v2.4.3
v2.4.2
v2.4.1
v2.4.0
- New feature: SoMaJo can output character offsets for tokens, allowing for stand-off tokenization. Pass
character_offsets=True
to the constructor or use the option--character-offsets
on the command line to enable the feature. The character offsets are determined by aligning the tokenized output with the input, therefore activating the feature incurs a noticeable increase in processing time.
v2.3.1
v2.3.0
- Potentially breaking change: The somajo-tokenizer script is automatically created upon installation and bin/somajo-tokenizer is removed. For most users, this does not make a difference. If you used to run your own modified version of SoMaJo directly via bin/somajo-tokenizer, consider installing the project in editable mode (see Development section in README.md).
- Switch from setup.py to pyconfig.toml and restructure the project (source in src, tests in tests).
- When creating a Token object, only known token classes can be passed.
- Fix issue #25 (dates at the end of sentences)