Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diacritics misplaced with the default renderer #267

Open
jbezos opened this issue Oct 28, 2023 · 8 comments
Open

Diacritics misplaced with the default renderer #267

jbezos opened this issue Oct 28, 2023 · 8 comments

Comments

@jbezos
Copy link

jbezos commented Oct 28, 2023

This could also be considered a bug in the hyphenation patterns, but with Harfbuzz it works as expected. Here is a MWE:

\documentclass{article}

\patterns{ е1 }
% \patterns{ 2^^^^0308  }

\usepackage{fontspec}

\setmainfont{Noto Sans}[
  % Renderer=Harfbuzz,
  Script=Cyrillic, Language=Bulgarian]
  
\begin{document}

азе^^^^0308аз

\end{document}

The umlaut is shifted to the right, but with Harfbuzz it’s correctly placed. It also works if we prevent a hyphen just before the diacritic.

@zauguin
Copy link
Member

zauguin commented Oct 29, 2023

With HarfBuzz it's correctly placed if no break occurs, but it still has the hyphenation point there and therefore allows linebreaking between the e and the diacritic which is pretty much guaranteed to be wrong.

Therefore I at least additionally think that this is a bug in the hyphenation patterns. It might make a sense to do a pass post-hyphenation to validate that no automatically inserted hyphenation points fall in the middle of grapheme clusters to avoid such issues in general, something like https://gist.github.com/zauguin/e119669fa702b112c704a9337b30d446/revisions. Additionally it might make sense to do Unicode normalization before hyphenation in order to avoid pattern not working with non-normalized text.

@u-fischer
Copy link
Member

I think too that is a bug in the patterns. The topic came up a few years ago here https://tex.stackexchange.com/a/340164/2388, and recently on the luatex user list for greek.

If luaotfload could make some pre/post processing in the right place that would imho quite good.

@zauguin
Copy link
Member

zauguin commented Oct 29, 2023

Not sure if this should belong in luaotfload.I don't really mind if we add it there, but hyphenation is not really in scope and touching the hyphenate callback might also be problematic for non-LaTeX users of luaotfload.

@jbezos
Copy link
Author

jbezos commented Oct 29, 2023

Then I’ll fix it (at least for the moment) on the babel side, although it has to be fixed eventually in the patterns. I think adding patterns like 8^^^^0308 systematically for all languages will be safe (and let’s hope 9 is not used 🤞). @reutenauer

@jbezos
Copy link
Author

jbezos commented Nov 9, 2023

After thinking a little bit about this, with some attempts to deal with the issue, I’m not sure this is a task for the hyphenation patterns, because it’s not language dependent — no combination of ‹letter› and ‹combining char› can be hyphenated regardless of the language, and that’s true also for non-LaTeX formats. Repeating the full list of combining chars (there are ~100 of them) in every set of patterns ‘just in case’ doesn’t make much sense to me.

In my tests, there is a penalization of ~.2-.3 seconds per language in my system if I attempt to fix it in the babel side (patterns cannot be added directly to avoid duplicates, so we must check before there isn’t a similar one).

So, I think again it should be fixed by luaoftload and for any renderer. As Ulrike said:

If luaotfload could make some pre/post processing in the right place that would imho quite good.

@zauguin
Copy link
Member

zauguin commented Nov 11, 2023

@jbezos Do you see any reason why this couldn't become part of a separate package which would then be loaded by babel (and maybe polyglossia)?
Otherwise I think that would be my plan here: Create a package based on the gist earlier, then we are completely node independent and have it applied at a more appropriate time than if luaotfload tried to do this as part of shaping.
Potentially adding an optional normalization step, I'm guessing non-normalized text isn't exactly helpful for hyphenation either.

@jbezos
Copy link
Author

jbezos commented Nov 21, 2023

@zauguin With lualatex +babel there is no real need for a package, because a simple transform can do the trick:

\babelposthyphenation{english}{ |[{0300}-{036F}] }{ remove, {} }

(Here | is a discretionary.) But it’s another loop for what I think should be handled by the font renderer.

@u-fischer
Copy link
Member

This just came up again https://tex.stackexchange.com/q/709020/2388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants