You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We could extract printable strings (with different encodings) from all kinds of binary data like executables or custom binary formats with https://github.com/getreu/stringsext ... and create a text similarity signature.
The question is if we still call this Content-ID-Text of if we create a custom Content-ID-Binary that signals that text was extracted from a binary format without any format-specific structured parsing.
The text was updated successfully, but these errors were encountered:
What about defining the algorithm that would be used instead of a specific implementation? It would help to look at various binary formats and see what is the most important aspects to apply to the ID.
titusz
changed the title
Implement Content-ID for binary data.
Implement Content-Code for binary data.
Feb 18, 2022
We could extract printable strings (with different encodings) from all kinds of binary data like executables or custom binary formats with https://github.com/getreu/stringsext ... and create a text similarity signature.
The question is if we still call this Content-ID-Text of if we create a custom Content-ID-Binary that signals that text was extracted from a binary format without any format-specific structured parsing.
The text was updated successfully, but these errors were encountered: