epub: fix fatal errors while parsing EPUB files #1854

milmazz · 2024-01-26T03:40:59Z

After generating the EPUB file for the Elixir docs with this version, and reviewing the result with epubcheck, I got the following summary:

$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json
Check finished with errors
Messages: 0 fatals / 140 errors / 0 warnings / 0 infos

If you compare the previous result with what we had on #1851

Messages: 9 fatals / 425 errors / 0 warnings / 0 infos

you can see that now we don't have messages with fatal severity and we have reduced considerably the number of errors =)

I manually checked the generated EPUB on Apple Books and the previous truncated sections are fixed, I don't see the banner Below is a rendering of the page up to the first error and also the links to different anchors seems to work.

Fixes: #1851

After generating the EPUB file for the Elixir docs with this version, and reviewing the result with `epubcheck`, I got the following summary: ```console $ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json (base) Check finished with errors Messages: 0 fatals / 141 errors / 0 warnings / 0 infos ``` If you compare the previous result with what we had on #1851 ``` Messages: 9 fatals / 425 errors / 0 warnings / 0 infos ``` you can see that now we don't have messages with `fatal` severity and we have reduced considerably the number of errors =) I manually checked the generated EPUB on Apple Books and the previous truncated sections are solved, I don't see the banner _Below is a rendering of the page up to the first error_ and also the links to anchor different anchor seems to work. Fixes: #1851

milmazz · 2024-01-26T03:43:56Z

lib/ex_doc/formatter/epub.ex

+    |> String.replace(~r{id="&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))
+    |> String.replace(~r{href="[^#"]*#&+/\d+[^"]*}, &String.replace(&1, "&", "&amp;"))


I frowned a little with these nested String.replace. So, please let me now if you have any advice on how to improve this function.

@wojtekmach I though we had already escaped those when generating the links. Maybe this is something (or an option) we can pass when autolinking? The id we can fix by escaping in the document itself.

@wojtekmach @josevalim I know it has been a while, but is there any action that's expected on my end? How do we move on/resume this discussion?

milmazz · 2024-01-26T03:44:51Z

test/fixtures/README.md

+
+The following text includes a reference to an anchor that causes problems in EPUB documents.
+
+To remove this anti-pattern, we can replace `&&/2`, `||/2`, and `!/1` by `and/2`, `or/2`, and `not/1` respectively.


Added this line to demonstrate that we're transforming the links to problematic anchors in EPUB files.

nix2intel · 2024-08-10T13:27:21Z

I don't know if this is related? But don't give up because this is an amazing tool! please let me know how I can help.

nix2intel · 2024-08-10T13:37:01Z

here is the output for the errors using epubcheck
"messages" : [ {
"ID" : "RSC-005",
"severity" : "ERROR",
"message" : "Error while parsing file: element "ol" not allowed yet; expected element "a" or "span"",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},
"path" : "OEBPS/nav.xhtml",
"line" : 21,
"column" : 15,
"context" : null
} ],
"suggestion" : null
}, {
"ID" : "RSC-012",
"severity" : "ERROR",
"message" : "Fragment identifier is not defined.",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},
"path" : "OEBPS/EntityFingerprint.Fingerprint.xhtml",
"line" : 25,
"column" : 26,
"context" : "https://36ccc1be-51fc-4d98-ac07-46b49c286564.epubcheck.w3c.org/OEBPS/EntityFingerprint.Fingerprint.xhtml#functions"
} ],
"suggestion" : null
}, {
"ID" : "RSC-016",
"severity" : "FATAL",
"message" : "Fatal Error while parsing file: Attribute name "data-no-tooltip" associated with an element type "a" must be followed by the ' = ' character.",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},

milmazz commented Jan 26, 2024

View reviewed changes

milmazz added 2 commits January 25, 2024 21:57

Merge branch 'main' into epub/fix-fatal-errors

014d258

fix nav layout

146388d

josevalim force-pushed the main branch from 8f80e45 to af1089f Compare May 30, 2024 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epub: fix fatal errors while parsing EPUB files #1854

epub: fix fatal errors while parsing EPUB files #1854

milmazz commented Jan 26, 2024 •

edited

Loading

milmazz Jan 26, 2024

josevalim Jan 26, 2024

milmazz Sep 20, 2024

milmazz Jan 26, 2024

nix2intel commented Aug 10, 2024

nix2intel commented Aug 10, 2024

		\|> String.replace(~r{id="&+/\d+[^"]*}, &String.replace(&1, "&", "&"))
		\|> String.replace(~r{href="[^#"]#&+/\d+[^"]}, &String.replace(&1, "&", "&"))


		The following text includes a reference to an anchor that causes problems in EPUB documents.

		To remove this anti-pattern, we can replace `&&/2`, `\|\|/2`, and `!/1` by `and/2`, `or/2`, and `not/1` respectively.

epub: fix fatal errors while parsing EPUB files #1854

Are you sure you want to change the base?

epub: fix fatal errors while parsing EPUB files #1854

Conversation

milmazz commented Jan 26, 2024 • edited Loading

milmazz Jan 26, 2024

Choose a reason for hiding this comment

josevalim Jan 26, 2024

Choose a reason for hiding this comment

milmazz Sep 20, 2024

Choose a reason for hiding this comment

milmazz Jan 26, 2024

Choose a reason for hiding this comment

nix2intel commented Aug 10, 2024

nix2intel commented Aug 10, 2024

milmazz commented Jan 26, 2024 •

edited

Loading