-
I have created empty zip file. I think it is valid (created by python script), but siegfried identifies it as unknown format. File content: Siegfried output:
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 4 replies
-
Hi @JanVomlel Siegfried's ability to identify this file is driven by the PRONOM database and for zip files it expects a slightly different pattern: https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263
So it wouldn't match the default signature. Perhaps it is something worth bringing up with the PRONOM team? https://github.com/digital-preservation/PRONOM_Research Your sequence matches that in the small repo: https://github.com/mathiasbynens/small which is a good sign about it being syntactically correct. Incidentally I ran DROID against this set a while back, it didn't match a number of similar smallest possible files and I wondered if it might improve PRONOM identification in future to look at these: https://groups.google.com/g/droid-list/c/nwAtmup35Yk/m/aWn4myTFBAAJ |
Beta Was this translation helpful? Give feedback.
-
Just to note re the above, that's the EOF sequence. The BOF sequence is seeking |
Beta Was this translation helpful? Give feedback.
-
Hi all, I think you are right, the file is valid but the PRONOM signature doesn't cover this edge case. To be precise, this is your file:
Since the ZIP file is empty it consists of only an "end of central directory record". And for the same reason most of the values in the end of central directory record are set to 0. This looks all fine to me. (See this ZIP file example if you want to know more details.) Now if you look at the current PRONOM signature for the ZIP file format it essentially expects the following byte sequences in a ZIP file (in this order):
The first two sequences appear once for each file contained in a ZIP archive because they both introduce data structures that describe these files, while the third appears in every ZIP archive (empty or not) because there must always be an end of central directory record in a ZIP archive, even if the ZIP archive contains no files and thus the central directory contains no entries (see section 4.3.1 of the format specification). You already know where that leads, right? Your ZIP contains no files, so the first two sequences don't occur in it (the third does occur, but all three are required), so the PRONOM signature doesn't match. What can we do about that? The PRONOM signature could be restricted to require only the third sequence (the end of central directory signature) because that's the only sequence that will reliably appear in every ZIP file, even in an empty ZIP. But on the other hand, that would make the PRONOM signature less specific, possibly leading to more false positives. As always, it's a trade-off ... ;-) (@Dclipsham Or does "variant signature" imply that we could have both signatures?) Cheers, |
Beta Was this translation helpful? Give feedback.
-
This isn't really a DROID bug, more of an enhancement to PRONOM. I've raised an issue over there for this: digital-preservation/PRONOM_Research#42. PRONOM tends to get new official releases every few months so it won't happen overnight, but hopefully you'll see this in a PRONOM release in the near future. PRONOM updates are officially announced on their Google Group, which you can subscribe to if you'd like to receive a notification when this happens: https://groups.google.com/g/pronom Siegfried's companion project, Roy - https://github.com/richardlehane/siegfried/wiki/Building-a-signature-file-with-ROY - can be used to create custom signatures to enhance Siegfried While I'm not familiar with the capabilties of the Python script you've used to create the empty zip, if you can create a zip with a single item in it then this should naturally end up conforming to the existing ZIP identification signature without further adjustments required. I hope this is useful, I'm happy to answer any additional questions where I can! David |
Beta Was this translation helpful? Give feedback.
-
@JanVomlel if you'd like to be credited in the PRONOM release notes, please pop your name/or affliliation onto the PRONOM issue - digital-preservation/PRONOM_Research#42 (comment) - its just a way of acknowledging contributions by the PRONOM community and appears here: https://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml - no obligation. |
Beta Was this translation helpful? Give feedback.
This isn't really a DROID bug, more of an enhancement to PRONOM. I've raised an issue over there for this: digital-preservation/PRONOM_Research#42. PRONOM tends to get new official releases every few months so it won't happen overnight, but hopefully you'll see this in a PRONOM release in the near future. PRONOM updates are officially announced on their Google Group, which you can subscribe to if you'd like to receive a notification when this happens: https://groups.google.com/g/pronom
Siegfried's companion project, Roy - https://github.com/richardlehane/siegfried/wiki/Building-a-signature-file-with-ROY - can be used to create custom signatures to enhance Siegfried
While I'm not familiar …