Skip to content

Commit

Permalink
Merge pull request #281 from WorksApplications/pre/v0.6.9
Browse files Browse the repository at this point in the history
Pre v0.6.9
  • Loading branch information
mh-northlander authored Nov 20, 2024
2 parents ea794e3 + aa72b20 commit 00c545b
Show file tree
Hide file tree
Showing 16 changed files with 457 additions and 311 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-python-wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
strategy:
matrix:
os: [windows-latest, macOS-latest]
python-version: [ "3.8", "3.9", "3.10", "3.11", "3.12" ]
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13" ]

steps:
- uses: actions/checkout@v4
Expand Down
40 changes: 25 additions & 15 deletions .github/workflows/python-upload-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools setuptools-rust build
python -m pip install --upgrade setuptools setuptools-rust build packaging
- name: Make .devXX version
run: python ./python/latest_dev_version.py
- name: Modify version for TestPyPI upload
run: python ./python/modify_version_for_testpypi.py

- name: Build sdist
working-directory: ./python
Expand Down Expand Up @@ -52,8 +52,18 @@ jobs:
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Make .devXX version
run: python ./python/latest_dev_version.py
- name: Setup python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install dependencies
run: |
python -m pip install -U pip
python -m pip install -U packaging
- name: Modify version for TestPyPI upload
run: python ./python/modify_version_for_testpypi.py

- uses: eiennohito/gha-manylinux-build@master
with:
Expand All @@ -70,7 +80,7 @@ jobs:
strategy:
matrix:
os: [windows-latest, macOS-latest]
python-version: [ "3.8", "3.9", "3.10", "3.11", "3.12" ]
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13" ]

steps:
- uses: actions/checkout@v4
Expand All @@ -89,17 +99,17 @@ jobs:
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Make .devXX version
run: python ./python/latest_dev_version.py

- name: Add aarch64 target for Rust
run: rustup target add aarch64-apple-darwin
if: startsWith(matrix.os, 'macOS')

- name: Install dependencies
run: |
python -m pip install -U pip
python -m pip install -U setuptools setuptools_rust build
python -m pip install -U setuptools setuptools_rust build packaging
- name: Modify version for TestPyPI upload
run: python ./python/modify_version_for_testpypi.py

- name: Add aarch64/x86 target for Rust
run: rustup target add aarch64-apple-darwin x86_64-apple-darwin
if: startsWith(matrix.os, 'macOS')

- name: Build wheel
working-directory: ./python
Expand Down Expand Up @@ -139,7 +149,7 @@ jobs:
strategy:
matrix:
os: [ ubuntu-latest, windows-latest, macOS-latest ]
python-version: [ "3.8", "3.9", "3.10", "3.11" ]
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13" ]
fail-fast: false
runs-on: ${{ matrix.os }}

Expand Down
231 changes: 131 additions & 100 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,148 +1,179 @@
# [0.6.8](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.8) (2023-12-14)
# Changelog

## Highlights
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

* Produce builds for Python 3.12 (#236)
* Add a simple [configuration API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#config-config)
* Add surface projections (#230)
Also check [python changelog](python/CHANGELOG.md).

## Surface Projections
## [Unreleased]

* For chiTra compatibility SudachiPy can now directly produce different tokens in the surface field.
* Original surface is accessible via [`Morheme.raw_surface()`](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.Morpheme.raw_surface) method
* It is possible to customize projection dictionary-wise, via Config object, passing it on a dictionary creation, or for a single pre-tokenizer.
* [Config API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.config.Config.projection)
* [Pretokenizer API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.Dictionary.pre_tokenizer)
## [0.6.9](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.9) (2024-11-20)

# [0.6.7](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.7) (2023-02-16)
### Added

## Highlights
- freebsd support (#222 by @KonstantinDjairo, #251)
- Add rust minimum support version (#255)
- Add option for embedded config and fallback resources (#262 by @Kuuuube)

* Provide binary wheels for Python 3.11
* Add `Dictionary.lookup()` method which allows you to enumerate morphemes from the dictionary without performing analysis.
### Changed

# [0.6.6](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.6) (2022-07-25)
- `fetch_dictionary.sh` targets latest dictionary by default (#240)
- update dependencies (#241, #246)
- Migrate from structopt to clap (#248 by @tkhshtsh0917)

## Highlights
* Add [boundary matching mode](https://github.com/WorksApplications/Sudachi/blob/develop/docs/oov_handlers.md) to regex oov handler
* macOS binary builds are now unversal2 (arm+x64)
## [0.6.8](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.8) (2023-12-14)

## MacOS
* Binary builds are universal2
* Caveat: we don't run tests on arm because there are no public arm instances, so builds may be broken without any warning
### Highlights

# [0.6.5](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.5) (2022-06-21)
- Produce builds for Python 3.12 (#236)
- Add a simple [configuration API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#config-config)
- Add surface projections (#230)

## Highlights
### Surface Projections

* Fixed invalid POS tags which appeared when using user-defined POS tags both in user dictionaries and OOV handlers.
- For chiTra compatibility SudachiPy can now directly produce different tokens in the surface field.
- Original surface is accessible via [`Morheme.raw_surface()`](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.Morpheme.raw_surface) method
- It is possible to customize projection dictionary-wise, via Config object, passing it on a dictionary creation, or for a single pre-tokenizer.
- [Config API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.config.Config.projection)
- [Pretokenizer API](https://worksapplications.github.io/sudachi.rs/python/api/sudachipy.html#sudachipy.Dictionary.pre_tokenizer)

## [0.6.7](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.7) (2023-02-16)

### Highlights

- Provide binary wheels for Python 3.11
- Add `Dictionary.lookup()` method which allows you to enumerate morphemes from the dictionary without performing analysis.

## [0.6.6](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.6) (2022-07-25)

### Highlights

- Add [boundary matching mode](https://github.com/WorksApplications/Sudachi/blob/develop/docs/oov_handlers.md) to regex oov handler
- macOS binary builds are now unversal2 (arm+x64)

### MacOS

- Binary builds are universal2
- Caveat: we don't run tests on arm because there are no public arm instances, so builds may be broken without any warning

## [0.6.5](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.5) (2022-06-21)

### Highlights

- Fixed invalid POS tags which appeared when using user-defined POS tags both in user dictionaries and OOV handlers.
You are not affected by this bug if you did not use user-defined POS in OOV handlers.

# [0.6.4](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.3) (2022-06-16)
## [0.6.4](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.4) (2022-06-16)

## Highlights
### Highlights

* Remove Python 3.6 support which reached end-of-life status on [2021-12-23](https://endoflife.date/python)
* OOV handler plugins support user-defined POS, [similar to Java version](https://github.com/WorksApplications/Sudachi/releases/tag/v0.6.0)
* Added Regex OOV handler
- Remove Python 3.6 support which reached end-of-life status on [2021-12-23](https://endoflife.date/python)
- OOV handler plugins support user-defined POS, [similar to Java version](https://github.com/WorksApplications/Sudachi/releases/tag/v0.6.0)
- Added Regex OOV handler

## Regex OOV Handler
### Regex OOV Handler

* For details, see [Java version changelog](https://github.com/WorksApplications/Sudachi/releases/tag/v0.6.0)
* In Rust/Python Regexes do not support backtracking and backreferences
* `maxLength` setting defines maximum length in unicode codepoints, not in utf-8 bytes as in Java (will be changed to codepoints later)
- For details, see [Java version changelog](https://github.com/WorksApplications/Sudachi/releases/tag/v0.6.0)
- In Rust/Python Regexes do not support backtracking and backreferences
- `maxLength` setting defines maximum length in unicode codepoints, not in utf-8 bytes as in Java (will be changed to codepoints later)

# [0.6.3](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.3) (2022-02-10)
## [0.6.3](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.3) (2022-02-10)

## Highlights
### Highlights

* Fixed path resolution algorithm for resources. They are now resolved in the following order (first existing file wins):
- Fixed path resolution algorithm for resources. They are now resolved in the following order (first existing file wins):
1. Absolute paths stay as they are
2. Relative to "path" value of the config file
3. Relative to "resource_dir" parameter of the config object during creation
* For SudachiPy it is the parameter of `Dictionary` constructor
- For SudachiPy it is the parameter of `Dictionary` constructor
4. Relative to the location of the configuration file
5. Relative to the current directory

## Python
### Python

- `Dictionary` now has `__repr__()` function which displays absolute paths to dictionaries in use.
- `Dictionary` now has `pos_of()` function which returns a POS tuple for a given POS id.
- `PosMatcher` supports set operations
- union (`m1 | m2`)
- intersection (`m1 & m2`)
- difference (`m1 - m2`)
- negation (`~m1`)

## [0.6.2](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.2) (2021-12-09)

### Fixes

- Fix analysis differences with 0.5.4

## [0.6.1](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.1) (2021-12-08)

### Highlights

- Added Fuzzing (see `sudachi-fuzz` subdirectory), Sudachi.rs seems to be pretty robust towards arbitrary inputs (no crashes and panics)
- Issues like https://github.com/WorksApplications/sudachi.rs/issues/182 should never occur more
- ~5% analysis speed improvement over 0.6.0
- Added support for Unicode combining symbols, now Sudachi.rs/py should be much better with emoji (🎅🏾) and more complex Unicode (İstanbul)

* `Dictionary` now has `__repr__()` function which displays absolute paths to dictionaries in use.
* `Dictionary` now has `pos_of()` function which returns a POS tuple for a given POS id.
* `PosMatcher` supports set operations
* union (`m1 | m2`)
* intersection (`m1 & m2`)
* difference (`m1 - m2`)
* negation (`~m1`)
### Rust

# [0.6.2](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.2) (2021-12-09)
- Added partial dictionary read functionality, it is now possible to skip reading certain fields if they are not needed
- Improved startup times, especially for debug builds

## Fixes
### Python

* Fix analysis differences with 0.5.4
- See [Python changelog](./python/CHANGELOG.md)

# [0.6.1](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.1) (2021-12-08)
## [0.6.0](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.0) (2021-11-11)

## Highlights
* Added Fuzzing (see `sudachi-fuzz` subdirectory), Sudachi.rs seems to be pretty robust towards arbitrary inputs (no crashes and panics)
* Issues like https://github.com/WorksApplications/sudachi.rs/issues/182 should never occur more
* ~5% analysis speed improvement over 0.6.0
* Added support for Unicode combining symbols, now Sudachi.rs/py should be much better with emoji (🎅🏾) and more complex Unicode (İstanbul)
### Highlights

## Rust
* Added partial dictionary read functionality, it is now possible to skip reading certain fields if they are not needed
* Improved startup times, especially for debug builds
- Full feature parity with Java version
- ~15% analysis speed improvement over 0.6.0-rc1

## Python
* See [Python changelog](./python/CHANGELOG.md)
### Rust

# [0.6.0](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.0) (2021-11-11)
## Highlights
* Full feature parity with Java version
* ~15% analysis speed improvement over 0.6.0-rc1
- Added dictionary build functionality
- https://github.com/WorksApplications/sudachi.rs/pull/143
- Added an option to perform analysis without sentence splitting
- Use it with `--split-sentences=no`

## Rust
* Added dictionary build functionality
* https://github.com/WorksApplications/sudachi.rs/pull/143
* Added an option to perform analysis without sentence splitting
* Use it with `--split-sentences=no`
### Python

## Python
* Added bindings for dictionary build (undocumented and not supported as API).
* See https://github.com/WorksApplications/sudachi.rs/issues/157
* `sudachipy build` and `sudachipy ubuild` should work once more
* Report on build times and dictionary part sizes can differ from the original SudachiPy
- Added bindings for dictionary build (undocumented and not supported as API).
- See https://github.com/WorksApplications/sudachi.rs/issues/157
- `sudachipy build` and `sudachipy ubuild` should work once more
- Report on build times and dictionary part sizes can differ from the original SudachiPy

## [0.6.0-rc1](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.0-rc1) (2021-10-26)

# [0.6.0-rc1](https://github.com/WorksApplications/sudachi.rs/releases/tag/v0.6.0-rc1) (2021-10-26)
## Highlights
### Highlights

* First release of Sudachi.rs
* SudachiPy compatible Python bindings
* ~30x speed improvement over original SudachiPy
* Dictionary build mode will be done before 0.6.0 final (See #13)
- First release of Sudachi.rs
- SudachiPy compatible Python bindings
- ~30x speed improvement over original SudachiPy
- Dictionary build mode will be done before 0.6.0 final (See #13)

## Rust
### Rust

* Analysis: feature parity with Python and Java version
* Dictionary build is not supported in rc1
* ~2x faster than Java version (with sentence splitting)
* No public API at the moment (contact us if you want to use Rust version directly, internals will significantly change and names are not finalized)
- Analysis: feature parity with Python and Java version
- Dictionary build is not supported in rc1
- ~2x faster than Java version (with sentence splitting)
- No public API at the moment (contact us if you want to use Rust version directly, internals will significantly change and names are not finalized)

## Python
### Python

* Mostly compatible with SudachiPy 0.5.4
* We provide binary wheels for popular platforms
* ~30x faster than 0.5.4
* IgnoreYomigana input text plugin is now supported (and enabled by default)
* We provide [binary wheels for convenience (and additional speed on Linux)](https://worksapplications.github.io/sudachi.rs/python/wheels.html)
- Mostly compatible with SudachiPy 0.5.4
- We provide binary wheels for popular platforms
- ~30x faster than 0.5.4
- IgnoreYomigana input text plugin is now supported (and enabled by default)
- We provide [binary wheels for convenience (and additional speed on Linux)](https://worksapplications.github.io/sudachi.rs/python/wheels.html)

## Known Issues
### Known Issues

* List of deprecated SudachiPy API:
* `MorphemeList.empty(dict: Dictionary)`
* This also needs a dictionary as an argument.
* `Morpheme.split(mode: SplitMode)`
* `Morpheme.get_word_info()`
* Most of instance attributes are not exported: e.g. `Dictionary.grammar`, `Dictionary.lexicon`.
* See [API reference page](https://worksapplications.github.io/sudachi.rs/python/) for supported APIs.
* Dictionary Build is not supported: `sudachipy build` and `sudachipy ubuild` will not work, please use 0.5.3 in another virtual environment for the time being until the feature is implemented: #13
- List of deprecated SudachiPy API:
- `MorphemeList.empty(dict: Dictionary)`
- This also needs a dictionary as an argument.
- `Morpheme.split(mode: SplitMode)`
- `Morpheme.get_word_info()`
- Most of instance attributes are not exported: e.g. `Dictionary.grammar`, `Dictionary.lexicon`.
- See [API reference page](https://worksapplications.github.io/sudachi.rs/python/) for supported APIs.
- Dictionary Build is not supported: `sudachipy build` and `sudachipy ubuild` will not work, please use 0.5.3 in another virtual environment for the time being until the feature is implemented: #13
Loading

0 comments on commit 00c545b

Please sign in to comment.