Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encodings #2435

Merged
merged 16 commits into from
Nov 20, 2024
Merged

Encodings #2435

merged 16 commits into from
Nov 20, 2024

Conversation

brucemiller
Copy link
Owner

This PR deals with a variety of encoding, accent and font mapping issues. This brings the system closer to doing things the latex.ltx way.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

There are some interesting failing CI tests in older texlives, maybe the babel test cases also need an update checked in?

I will take a few minutes to play with the new \accent defintion, as it has piqued my curiosity...

@dginev
Copy link
Collaborator

dginev commented Nov 15, 2024

I found just a couple of inaccuracies with \accent, for the examples:

\accent'27{right side}

\accent'27%

where TeX would stop the accent read at {, while latexml will seemingly pick up the first token of the braced group (thus putting the accent on top of r).

Also, TeX's accent quietly stops at the end of a line, while latexml seemingly reverts the following \para and puts an accent on it (in the comment example).

Attaching the 7 variations I tried, 5 of which worked well.

\accent'27a

\accent'27 a

\accent'27%

\accent'27{right-side}

{\accent'27}right-side

\accent'27\bgroup b \egroup

\accent'27\expandafter b

\bye

Idea: Maybe the {} argument of \accent should instead be a carefully vetted ->readToken call in the subroutine body?

@brucemiller
Copy link
Owner Author

It's ironic that so much of this (& recent) PR's have consisted of chasing ~ and ^, and it still keeps popping up. It seems to start with the fact that the basic CM fonts treat these directly as accent glyphs which can be placed more-or-less directly over another w/o explicit repositioning. But then how do you get the "normal" ones (assuming that that concept is even defined)? There's a lot of confusion based on whatever font one sees (even outside TeX). So that \textasciitilde, for example, gives (sometimes?) the raised tilde! I'm not yet clear on how to pass all the TeXLives here.
And then there's the mysterious \accent which accepts a bunch of "assignments" and then reads a "character". I guess I've pushed as far as I can by ignoring that and assuming a regular TeX argument, so back to TTP, sigh.

@brucemiller
Copy link
Owner Author

Warning rebased on previous merged PR

@dginev
Copy link
Collaborator

dginev commented Nov 15, 2024

@brucemiller I am really not trying to be persnickety, but reading your comment sent my mind into a very interesting corner, leading to a new test:

\accent'27\char`^

\bye

which does something sensible with pdftex (ha! sensible enough), while latexml throws a curious error. Not entirely related to your PR, but since you're looking at \accent, maybe worth checking this example out as well.

@brucemiller
Copy link
Owner Author

Hopefully the last commit fixes the french issue, but I'm going to look more carefully at \accent.

@brucemiller
Copy link
Owner Author

OK, that should handle \accent properly. Ready for review.

Copy link
Collaborator

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive new test, all looks great.

@brucemiller brucemiller merged commit 9ec6a41 into master Nov 20, 2024
26 checks passed
@brucemiller brucemiller deleted the encodings branch November 20, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants