Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx][mlh] Add rotation reference to brute force lookup #1665

Open
Dcallies opened this issue Oct 22, 2024 · 2 comments
Open

[py-tx][mlh] Add rotation reference to brute force lookup #1665

Dcallies opened this issue Oct 22, 2024 · 2 comments
Labels
mlh Related to Major League Hacking Fellowship python-threatexchange Items related to the threatexchange python tool / library

Comments

@Dcallies
Copy link
Contributor

Dcallies commented Oct 22, 2024

This is a small-to-medium project.

You will learn about:

  1. Rotating images
  2. the PDQ algorithm
  3. The brute force & distance interfaces in SignalType

Images have multiple "primitive" rotations:

  1. Rotate 90: counterclockwise 90 degrees
  2. Rotate 180: 180 degrees
  3. Rotate 270: counterclockwise 270 degrees (i.e. clockwise 90 degrees)
  4. FlipX: Left is left and right is right but top and bottom change places
  5. FlipY: Top is top and bottom is bottom but left and right change places (mirror image)
  6. FlipPlus1: Upper left and lower right stay put; lower left and upper right exchange places
  7. FlipMinus: Upper right and lower left stay put; upper left and lower right exchange places

Some algorithms like pdq can generate multiple rotations at once as they hash. You can see the rotations implemented here: https://github.com/facebook/ThreatExchange/blob/main/pdq/cpp/hashing/pdqhashing.cpp#L440-L456

Goal

The reference brute force approach in tx match should also try rotations, and pick the minimum distance. Which rotation it used should be returned in the distance string

Possible Solutions

Add a new mixin for hashing rotations

This is similar to what we did for many exceptional cases for SignalType, we can add a new class called SignalTypeWithRotations which has a `.hash_rotations(file_ptr)

h1 = pdq.hash(image)
queries = PDQSignal.hash_all_rotations(image.flipx())
best = queries[0], PDQSignal.compare(queries[0], h1), Rotation.ORIGINAL
foreach queries as q, rotation:
  d = PDQSignal.compare(q, h1)
  if d < best[0]:
    best = q, d, r 

print(best)
<is match is true, distance is 0[flip_x]>

Note that this doesn't (yet) affect the index matching, which we will save for later.

Code

Tests

  • Add unittests that test that all rotations are sane, and consider exposing accessors to them in PDQSignalType, which refer to the underlying pdqhash library
  • Include outputs from tx match or the HMA upload debugger to demonstrate the interface is being used correctly
@Dcallies Dcallies converted this from a draft issue Oct 22, 2024
@Dcallies Dcallies added the python-threatexchange Items related to the threatexchange python tool / library label Oct 22, 2024
@Dcallies Dcallies changed the title [py-tx] Add rotation reference to brute force lookup [py-tx][mlh] Add rotation reference to brute force lookup Oct 22, 2024
@Dcallies Dcallies added the mlh Related to Major League Hacking Fellowship label Oct 22, 2024
@16BitNarwhal
Copy link
Contributor

There's a python binding to pdq for computing all rotation hashes of an image compute_dihedral bindings code

Would be nice to know how different it is (or if there's a difference at all) between these hashes (computed faster as rotated dihedral) versus directly rotating and image and computing hashes (which is slower)

@Dcallies
Copy link
Contributor Author

@16BitNarwhal - Thanks for catching me offline to flag the dihedral code. I think you are right that the solution I originally proposed won't work. Instead, I think we should instead focus on the interface for generating rotations, assuming the underlying algorithm may not be able to, and go from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlh Related to Major League Hacking Fellowship python-threatexchange Items related to the threatexchange python tool / library
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants