Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ucd2haskell: Refactor #120

Merged

Conversation

wismill
Copy link
Collaborator

@wismill wismill commented Jun 7, 2024

This is a huge refactor of ucd2haskell, motivated by similar work in
ghc-internal. This will prevent this tool from further bit-rotting.

  • Remove dependency on streamly. This package is overkilled here and has
    an instable API. The version we use is not supported by recent GHCs
    and non-trivial migration seems to be required at each new version.
    Furthermore we currently process Strings, so there is no much benefit.
  • Mimic the Fold type from streamly for basic features. Although not
    mandatory, this avoid changing all the logic.
  • Use ByteString parsers from unicode-data-parser. These parsers
    are shared with the corresponding ucd2haskell tool in base (now
    ghc-internal). We now have a clear separation between parsers and
    generators. The Unicode files being very stable, this package should be
    very stable as well.
  • Move generators to independent modules. This speeds the compilation up
    and add more structure to the code base.
  • Remove many anti-patterns and share more code.

The files generated by this tool remain identical, although I left some
comments to further improve them.

This is a huge refactor of `ucd2haskell`, motivated by similar work in
`ghc-internal`. This will prevent this tool from further bit-rotting.

- Remove dependency on `streamly`. This package is overkilled here and
  has an instable API. The version we use is not supported by recent
  GHCs and non-trivial migration seems to be required at each new
  version. Furthermore we currently process `String`s, so there is no
  much benefit.
- Mimic the `Fold` type from `streamly` for basic features. Although not
  mandatory, this avoid changing all the logic.
- Use `ByteString` parsers from `unicode-data-parser` [1]. These parsers
  are shared with the corresponding `ucd2haskell` tool in `base` (now
  `ghc-internal`). We now have a clear separation between parsers and
  generators. The Unicode files being very stable, this package should be
  very stable as well.
- Move generators to independent modules. This speeds the compilation up
  and add more structure to the code base.
- Remove many anti-patterns and share more code.

The files *generated* by this tool remain identical, although I left some
comments to further improve them.

[1]: https://hackage.haskell.org/package/unicode-data-parser
@wismill wismill force-pushed the ucd2haskell/use-unicode-data-parser branch from a64cc10 to e476d06 Compare June 7, 2024 13:19
wismill added 2 commits June 7, 2024 15:27
Also change “exe” to “ucd2haskell” in labels, to make clear what
it really built.
@wismill wismill force-pushed the ucd2haskell/use-unicode-data-parser branch 3 times, most recently from b648ee7 to 1a2b846 Compare June 7, 2024 15:54
@wismill wismill force-pushed the ucd2haskell/use-unicode-data-parser branch from 1a2b846 to 4e5ba1d Compare June 7, 2024 16:21
@wismill wismill merged commit a2b2d34 into composewell:master Jun 7, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant