Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epub: fix fatal errors while parsing EPUB files #1854

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

milmazz
Copy link
Member

@milmazz milmazz commented Jan 26, 2024

After generating the EPUB file for the Elixir docs with this version, and reviewing the result with epubcheck, I got the following summary:

$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json
Check finished with errors
Messages: 0 fatals / 140 errors / 0 warnings / 0 infos

If you compare the previous result with what we had on #1851

Messages: 9 fatals / 425 errors / 0 warnings / 0 infos

you can see that now we don't have messages with fatal severity and we have reduced considerably the number of errors =)

I manually checked the generated EPUB on Apple Books and the previous truncated sections are fixed, I don't see the banner Below is a rendering of the page up to the first error and also the links to different anchors seems to work.

Fixes: #1851

After generating the EPUB file for the Elixir docs with this version,
and reviewing the result with `epubcheck`, I got the following summary:

```console
$ epubcheck doc/elixir/Elixir.epub --json elixir_docs.json                                                                                                    (base)

Check finished with errors
Messages: 0 fatals / 141 errors / 0 warnings / 0 infos
```

If you compare the previous result with what we had on #1851

```
Messages: 9 fatals / 425 errors / 0 warnings / 0 infos
```

you can see that now we don't have messages with `fatal` severity and we
have reduced considerably the number of errors =)

I manually checked the generated EPUB on Apple Books and the previous
truncated sections are solved, I don't see the banner _Below is a
rendering of the page up to the first error_ and also the links to
anchor different anchor seems to work.

Fixes: #1851
Comment on lines +66 to +67
|> String.replace(~r{id="&+/\d+[^"]*}, &String.replace(&1, "&", "&"))
|> String.replace(~r{href="[^#"]*#&+/\d+[^"]*}, &String.replace(&1, "&", "&"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frowned a little with these nested String.replace. So, please let me now if you have any advice on how to improve this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach I though we had already escaped those when generating the links. Maybe this is something (or an option) we can pass when autolinking? The id we can fix by escaping in the document itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach @josevalim I know it has been a while, but is there any action that's expected on my end? How do we move on/resume this discussion?


The following text includes a reference to an anchor that causes problems in EPUB documents.

To remove this anti-pattern, we can replace `&&/2`, `||/2`, and `!/1` by `and/2`, `or/2`, and `not/1` respectively.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this line to demonstrate that we're transforming the links to problematic anchors in EPUB files.

@nix2intel
Copy link

Screenshot 2024-08-10 at 8 26 48 AM

I don't know if this is related? But don't give up because this is an amazing tool! please let me know how I can help.

@nix2intel
Copy link

here is the output for the errors using epubcheck
"messages" : [ {
"ID" : "RSC-005",
"severity" : "ERROR",
"message" : "Error while parsing file: element "ol" not allowed yet; expected element "a" or "span"",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},
"path" : "OEBPS/nav.xhtml",
"line" : 21,
"column" : 15,
"context" : null
} ],
"suggestion" : null
}, {
"ID" : "RSC-012",
"severity" : "ERROR",
"message" : "Fragment identifier is not defined.",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},
"path" : "OEBPS/EntityFingerprint.Fingerprint.xhtml",
"line" : 25,
"column" : 26,
"context" : "https://36ccc1be-51fc-4d98-ac07-46b49c286564.epubcheck.w3c.org/OEBPS/EntityFingerprint.Fingerprint.xhtml#functions"
} ],
"suggestion" : null
}, {
"ID" : "RSC-016",
"severity" : "FATAL",
"message" : "Fatal Error while parsing file: Attribute name "data-no-tooltip" associated with an element type "a" must be followed by the ' = ' character.",
"additionalLocations" : 0,
"locations" : [ {
"url" : {
"opaque" : false,
"hierarchical" : true
},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

EPUB: Links to anchors like #&&/2 cause a fatal error while parsing the file
3 participants