Use lazy loading for object-streams and their objects #85
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR attempts to resolve the issues described in #73 and #46 in a more generic way.
It also supersedes #53 by removing the need to handle objects stored in object-streams in a special way.
The "lazy loading" aspect is handled by the new class
PdfReferenceToCompressedObject
, which is a sub-class ofPdfReference
.While processing the document's xref-streams, references to objects stored in object-streams are collected in the form of the mentioned
PdfReferenceToCompressedObject
.When accessing the
Value
of such a reference (which may occur while parsing another object which contains a reference to the compressed object), the object-stream is loaded and decrypted (if not already done) and the actual object is read from the object-stream.Have not found any issue so far running automated tests with these changes against ~1000 PDF-files (testing page-import).
Note:
The PR also includes some minor tweaks not directly related to object-loading, which i think are helpful.
(like reporting the position within a document where an unexpected token was encountered during parsing)