Implement Content-Code for binary data. #89

titusz · 2020-07-10T16:57:16Z

We could extract printable strings (with different encodings) from all kinds of binary data like executables or custom binary formats with https://github.com/getreu/stringsext ... and create a text similarity signature.

The question is if we still call this Content-ID-Text of if we create a custom Content-ID-Binary that signals that text was extracted from a binary format without any format-specific structured parsing.

lrosenthol · 2020-08-30T16:36:10Z

What about defining the algorithm that would be used instead of a specific implementation? It would help to look at various binary formats and see what is the most important aspects to apply to the ID.

titusz added Priority: Medium Scope: Medium Affects: Spec Type: Discussion labels Jul 10, 2020

titusz self-assigned this Jul 10, 2020

titusz changed the title ~~Implement Content-ID for binary data.~~ Implement Content-Code for binary data. Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Content-Code for binary data. #89

Implement Content-Code for binary data. #89

titusz commented Jul 10, 2020

lrosenthol commented Aug 30, 2020

Implement Content-Code for binary data. #89

Implement Content-Code for binary data. #89

Comments

titusz commented Jul 10, 2020

lrosenthol commented Aug 30, 2020