-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diacritics misplaced with the default renderer #267
Comments
With HarfBuzz it's correctly placed if no break occurs, but it still has the hyphenation point there and therefore allows linebreaking between the e and the diacritic which is pretty much guaranteed to be wrong. Therefore I at least additionally think that this is a bug in the hyphenation patterns. It might make a sense to do a pass post-hyphenation to validate that no automatically inserted hyphenation points fall in the middle of grapheme clusters to avoid such issues in general, something like https://gist.github.com/zauguin/e119669fa702b112c704a9337b30d446/revisions. Additionally it might make sense to do Unicode normalization before hyphenation in order to avoid pattern not working with non-normalized text. |
I think too that is a bug in the patterns. The topic came up a few years ago here https://tex.stackexchange.com/a/340164/2388, and recently on the luatex user list for greek. If luaotfload could make some pre/post processing in the right place that would imho quite good. |
Not sure if this should belong in luaotfload.I don't really mind if we add it there, but hyphenation is not really in scope and touching the hyphenate callback might also be problematic for non-LaTeX users of luaotfload. |
Then I’ll fix it (at least for the moment) on the |
After thinking a little bit about this, with some attempts to deal with the issue, I’m not sure this is a task for the hyphenation patterns, because it’s not language dependent — no combination of ‹letter› and ‹combining char› can be hyphenated regardless of the language, and that’s true also for non-LaTeX formats. Repeating the full list of combining chars (there are ~100 of them) in every set of patterns ‘just in case’ doesn’t make much sense to me. In my tests, there is a penalization of ~.2-.3 seconds per language in my system if I attempt to fix it in the So, I think again it should be fixed by
|
@jbezos Do you see any reason why this couldn't become part of a separate package which would then be loaded by |
@zauguin With \babelposthyphenation{english}{ |[{0300}-{036F}] }{ remove, {} } (Here |
This just came up again https://tex.stackexchange.com/q/709020/2388 |
This could also be considered a bug in the hyphenation patterns, but with Harfbuzz it works as expected. Here is a MWE:
The umlaut is shifted to the right, but with Harfbuzz it’s correctly placed. It also works if we prevent a hyphen just before the diacritic.
The text was updated successfully, but these errors were encountered: