What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

leftaroundabout · 2017-06-02T10:01:36Z

The IsString instance is pretty robust as far as ASCII is concerned, escaping characters that would be interpreted as control characters by LaTeX and instead producing the code that's needed for making the rendered output look like the original strings.

However, when a string contains Unicode, things aren't so clear-cut. While XeLaTeX has, unlike pdfLaTeX, proper support for UTF-8, it is still in my experience not given that Unicode input will be rendered properly. To be sure, I used to always escape e.g. German umlauts to the safe LaTeX form manually (like it's na\"ive to dismiss Mot\"orhead) when I still wrote LaTeX manually.

Now, that doesn't work in HaTeX, and since I have a couple of times found myself surprised by missing or inadequately substituted unicode characters because.

That seems like a problem we shouldn't be having. It would easy enough to add suitable rules to protectChar, like

diff --git a/Text/LaTeX/Base/Syntax.hs b/Text/LaTeX/Base/Syntax.hs
index 7801593..61ef225 100644
--- a/Text/LaTeX/Base/Syntax.hs
+++ b/Text/LaTeX/Base/Syntax.hs
@@ -134,6 +134,17 @@ protectChar '}'  = "\\}"
 protectChar '~'  = "\\~{}"
 protectChar '\\' = "\\textbackslash{}"
 protectChar '_'  = "\\_{}"
+protectChar 'ӓ'  = "\"a"
+protectChar 'ë'  = "\"e"
+protectChar 'ï'  = "\"i"
+protectChar 'ö'  = "\"o"
+protectChar 'ü'  = "\"u"
+protectChar 'Ä'  = "\"A"
+protectChar 'Ë'  = "\"E"
+protectChar 'Ï'  = "\"I"
+protectChar 'Ö'  = "\"O"
+protectChar 'Ü'  = "\"U"
+protectChar 'ß'  = "{\\ss}"
 protectChar x = [x]
 
 -- Syntax analysis

One might also wish to add accents etc..

But that might well be opening a can of worms. The question would immediately be, where do we end? I personally would be inclined to also add substitutions like ℝ → $\mathbb{R}$ , but that's clearly rather unsafe.

Is there a sensible option to offer multiple different IsString instances, in different modules that can be imported depending on what you need in your document?

The text was updated successfully, but these errors were encountered:

Daniel-Diaz · 2017-06-27T10:28:49Z

Have you tried this?

http://hackage.haskell.org/package/HaTeX-3.17.2.0/docs/Text-LaTeX-Packages-Inputenc.html

It works with accents, umlauts and also with the scharfes S. In case this feature was not clear enough, I added an example to the Examples directory. And yes, this doesn't work for all unicode characters, but it does work for many of them.

Daniel-Diaz · 2018-01-26T09:05:59Z

Did my comment help at all?

To extend a little more:

I am not oppossed to add more cases to protectChar in order to cover more than ASCII. At least common cases like the ones you describe. Using the inputenc package, however, I haven't found this problem ever. But maybe it depends on the LaTeX compiler?

About what would be included or not... Accents and umlauts seem reasonable to me, but I would exclude mathematic characters like ℝ.

And for the final question... We could define a type CharEscape = CharEscape (Char -> String) together with some defaults. Then a couple functions escapeString :: CharEscape -> String -> String and escapeText :: CharEscape -> Text -> Text, and then re-define the current escaping methods in terms of these. What do you think?

leftaroundabout · 2018-01-26T11:36:58Z

Some years ago, I did have trouble with Umlauts etc. in some combinations of packages, differences between pdfLaTeX and XeLaTeX, etc., that couldn't be fixed with any input encoding, so I kind of just got used to alw\"ays esc\"aping \"Umlauts in manually-written LaTeX. Perhaps I'm just paranoid carrying this over to HaTeX; it's always possible to add a safeguard as a LaTeX -> LaTeX if such problems turn up in a given application, and keeping protectChar as simple as possible certainly has it's benefits.

Then again, always escaping common letters with diacritics has the advantage that the resulting LaTeX can just be pasted in any existing document, without having to worry about encoding packages.

I don't know what's best.

Daniel-Diaz added the enhancement label Jan 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

leftaroundabout commented Jun 2, 2017

Daniel-Diaz commented Jun 27, 2017

Daniel-Diaz commented Jan 26, 2018 •

edited

Loading

leftaroundabout commented Jan 26, 2018

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

Comments

leftaroundabout commented Jun 2, 2017

Daniel-Diaz commented Jun 27, 2017

Daniel-Diaz commented Jan 26, 2018 • edited Loading

leftaroundabout commented Jan 26, 2018

Daniel-Diaz commented Jan 26, 2018 •

edited

Loading