Skip to content

v0.8.37

Compare
Choose a tag to compare
@CorentinB CorentinB released this 05 Apr 13:20
· 44 commits to master since this release
d986b14
Add command line utilities (#35)

* Add: warc extract

* Add: results report printing

* Oops, forgot to push utils.go

* Add .gitignore for output folder

* fix: improve support for spaces in the msgtype.

* fix: add warc executable and warc files to ignore.

* Add: warc verify

* Update cmd/verify.go

* small cosmetic fix

* fix: we currently cannot process revisit records.

this is currently outside of the scope of this tool, but could be added in the future.

* feat: add gzip content decoding

* fix: revisit records in verify

* small cosmetic fix

* fix: revisit if statement

* feat: add folder structure to extract output.

* fix: add support for SHA-256 Base16 verify support

Base16 appears to be the most common SHA-256 encoding. As such, we will check based on that.
https://github.com/iipc/warc-specifications/issues/80#issuecomment-1161479051

* Add: --host-sort

* Truncate filenames too long

* Cmd/extract: use filename from Content-Disposition only when it's not empty

* cmd/extract: replace / in filenames

* cmd/extract: handle mime parsing failure

* feat: add (default) support to suffix duplicate file names with a SHA1 hash if they are different.

* fix: resolve EOF read error

---------

Co-authored-by: Jake L <[email protected]>