-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add example dictionary generated from test WARCs with `zstd --train` * Update .gitignore to include zstd warcs * feat: update packages and run tidy. * fix: use 'proper' dictionary that decodes properly. * feat: initial commit of zstd dictionary support. This commit allows an external ZSTD generated dictionary to be used in the compression process. This implementation will be spec complaint against the IIPC spec and currently works with all known ZSTD WARC tools. It is currently a WIP and needs additional testing and validation to ensure everything is working correctly. * feat: ensure TLS handshake time is being respected * fix: run fieldalignment warc/client.go:15:25: struct of size 176 could be 144 warc/client.go:31:23: struct of size 232 could be 216 warc/dedupe.go:23:20: struct with 32 pointer bytes could be 24 warc/dedupe.go:31:20: struct with 48 pointer bytes could be 40 warc/dialer.go:24:19: struct with 168 pointer bytes could be 160 warc/random_local_ip.go:16:19: struct with 24 pointer bytes could be 8 warc/spooled.go:40:22: struct of size 80 could be 72 warc.go:15:22: struct with 72 pointer bytes could be 64 write.go:19:13: struct with 64 pointer bytes could be 56 warc/write.go:32:18: struct with 40 pointer bytes could be 32 * fix: add comments back * fix: run fieldalignment (again?) warc/client.go:15:25: struct with 96 pointer bytes could be 88 warc/client.go:31:23: struct with 176 pointer bytes could be 168
- Loading branch information
Showing
12 changed files
with
250 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,5 @@ warcs/** | |
temp/** | ||
output/** | ||
warc | ||
*.warc.gz | ||
*.warc.gz | ||
*.warc.zst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.