Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF/A: Ensure PDF files generated are Well-Tagged PDFs according to PDF/A-3 #264

Open
ronaldtse opened this issue Sep 1, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

ronaldtse commented Sep 1, 2024

From @petervwyatt in

FOP 2.9 supports PDF/A-3, and we should generate files according to it.

Because MN generates tagged PDF, you will want to use PDF/A-3a (or fallback to PDF/A-3u, if there are major PDF/A-3a issues).

I am not sure what internal checks FOP performs or how it changes the output (beyond adding metadata) but free tools like veraPDF can be used to verify conformance with one or more ISO subset standards.

MN already generates Tagged PDF (StructTreeRoot) which is the critical foundation of PDF/UA or, more likely, Well-Tagged PDF. I just tested our MN ISO 19005-4 spec through the free PAC PDF Accessibility Checker and it highlighted some issues (I'd class as mostly in the detail) so MN is definitely on the correct path. There are other free checkers too, but PAC gives information in a hopefully understandable way and shows you visually on the PDF where the issue lies. I do notice that some tables are "layout tables" rather than real tables which is probably the biggest no-no since PDF (unlike HTML) clearly separates presentation from semantics. Since I think MN goes via FOP many of these issues are possibly issues with FOP because of the nuances of getting something "well tagged" (vs "just tagged").

Once you get Well-Tagged then you're also be 98% compliant with PDF/A and possibly also PDF/X (noting that a single PDF can conform to multiple standards).

PDF/A and PDF/X are fundamentally about static page visual presentation so they prohibit implementation dependent features. For example all color must be defined as device-independent color and all resources (fonts, ICC, images, etc.) must be included in the PDF.

I ran ISO 19005-4 through the veraPDF Docker container and it is very very close! I obviously ignored errors about missing metadata. Since MN generates tagging you should select "PDF/A-3u" or, best, "PDF/A-3a" since both of these are better than "PDF/A-3b" (B= basic). Looks to me like a few simple tweaks and you'd be good...

We should use veraPDF's Docker container to run these checks in GitHub Actions:

@Intelligent2013
Copy link
Contributor

For further analyze, the checking for the PDF in the repository mn-native-pdf in https://github.com/metanorma/mn-native-pdf/actions/runs/10689964432/job/29633282268?pr=743.

Intelligent2013 added a commit that referenced this issue Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🌋 Urgent
Development

No branches or pull requests

2 participants