Use lazy loading for object-streams and their objects #85

packdat · 2024-02-01T21:57:55Z

This PR attempts to resolve the issues described in #73 and #46 in a more generic way.
It also supersedes #53 by removing the need to handle objects stored in object-streams in a special way.

The "lazy loading" aspect is handled by the new class PdfReferenceToCompressedObject, which is a sub-class of PdfReference.
While processing the document's xref-streams, references to objects stored in object-streams are collected in the form of the mentioned PdfReferenceToCompressedObject.
When accessing the Value of such a reference (which may occur while parsing another object which contains a reference to the compressed object), the object-stream is loaded and decrypted (if not already done) and the actual object is read from the object-stream.

Have not found any issue so far running automated tests with these changes against ~1000 PDF-files (testing page-import).

Note:
The PR also includes some minor tweaks not directly related to object-loading, which i think are helpful.
(like reporting the position within a document where an unexpected token was encountered during parsing)

Use lazy loading for object-streams and their objects

693b93a

packdat mentioned this pull request Mar 2, 2024

System.AggregateException: 'Invalid object ID.' when trying to open some pdfs #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use lazy loading for object-streams and their objects #85

Use lazy loading for object-streams and their objects #85

packdat commented Feb 1, 2024 •

edited

Loading

Use lazy loading for object-streams and their objects #85

Are you sure you want to change the base?

Use lazy loading for object-streams and their objects #85

Conversation

packdat commented Feb 1, 2024 • edited Loading

packdat commented Feb 1, 2024 •

edited

Loading