Add command line utilities (#35)
* Add: warc extract
* Add: results report printing
* Oops, forgot to push utils.go
* Add .gitignore for output folder
* fix: improve support for spaces in the msgtype.
* fix: add warc executable and warc files to ignore.
* Add: warc verify
* Update cmd/verify.go
* small cosmetic fix
* fix: we currently cannot process revisit records.
this is currently outside of the scope of this tool, but could be added in the future.
* feat: add gzip content decoding
* fix: revisit records in verify
* small cosmetic fix
* fix: revisit if statement
* feat: add folder structure to extract output.
* fix: add support for SHA-256 Base16 verify support
Base16 appears to be the most common SHA-256 encoding. As such, we will check based on that.
https://github.com/iipc/warc-specifications/issues/80#issuecomment-1161479051
* Add: --host-sort
* Truncate filenames too long
* Cmd/extract: use filename from Content-Disposition only when it's not empty
* cmd/extract: replace / in filenames
* cmd/extract: handle mime parsing failure
* feat: add (default) support to suffix duplicate file names with a SHA1 hash if they are different.
* fix: resolve EOF read error
---------
Co-authored-by: Jake L <[email protected]>