Add rudimentary ICC color profile support to colorhash #110

david-russo · 2020-06-09T15:22:51Z

Color profiles can significantly alter the appearance of an image's underlying color values on modern display equipment. This change modifies the colorhash algorithm to take that into account by normalizing (and converting) any embedded color profiles into the standard sRGB color space prior to hashing. This allows colorhash to detect color differences between images which may only differ by the inclusion, absence, or change of a color profile.

It also provides an option to ignore embedded color profiles and instead only analyze an image's raw color data (i.e. the default behaviour before this change), allowing images to be compared with or without their profiles.

Color profiles can significantly alter the appearance of an image's color. This change modifies the colorhash algorithm to take that into account by normalizing any embedded color profiles into the standard sRGB color space before analysis. It also provides an option to ignore embedded color profiles and instead analyze the image's raw color data (i.e. the default behaviour before this change), allowing images to be compared without their profiles.

JohannesBuchner · 2020-06-09T15:46:22Z

This seems like a good idea, but it is currently failing in the CI (see the logs there).

Do these color profiles also alter the luminescence? If yes, they could be relevant for the other hashes as well.

JohannesBuchner · 2020-06-09T15:50:36Z

Actually, if I read correctly, these ICC profiles are meant to adjust to the particular viewing device. Wouldn't it be more objective to ignore them? Probably I am misunderstanding this.

david-russo · 2020-06-09T16:58:40Z

They can have an effect on luminosity, yes, since they can essentially map any internal colour value to any other colour. Which means you could use a color profile to completely invert an image's colors, if you so desired.

There are a few kinds of profile, not just display profiles. Display profiles do describe what colors a display device is capable of displaying, but usually aren't embedded in files because they're specific to a given piece of hardware.

There are also input profiles, however, which are usually the ones you'd find embedded (such as by a scanner), and which describe precisely how the color values stored in an image file map to a standard device-independent color space, such as L*a*b*. This is the profile we care about, because it tells us how to correctly interpret the stored values.

When coupled with a display profile, such as the one that comes with your OS or monitor, both profiles are used to accurately map colour from the input profile to the display profile using a device-independent color space as a connection space: Input --> L*a*b* --> Display

It's even more complicated than all that gibberish above, but all we need to care about is that it works, and it's built-in to most image rendering software, so if we compare two images that are identical except one has an accurate input profile embedded, and the other doesn't, they're going to look different perceptually though they share the same underlying RGB values.

I hope that made some sense.

david-russo · 2020-06-09T17:51:21Z

The CI error appears to be due to a dependency missing from Pillow's conda package. Pillow needs to be built with LittleCMS2 to support its ImageCms module. I developed using pip, and PyPI's Pillow package seems to come with everything included. Comparing the two packages (conda, pip), I can see it missing from conda.

I'm not familiar with conda, but one of the possible solutions seems somewhat involved: https://stackoverflow.com/questions/52741538/anaconda3-python-3-6-pillow-cannot-import-imagecms

david-russo · 2020-06-11T15:15:12Z

The relevant conda package issue is here.

JohannesBuchner · 2020-06-11T18:02:03Z

Two suggestions:

change the code so that when ImageCms is not loadable, it does not use this functionality, either with hasattr or a try-except(ImportError) construct.
try to modify the travis install to install pillow with imagecms, perhaps with pip instead of conda. (adding conda uninstall pillow and pip install pillow).

coveralls · 2020-06-12T11:21:50Z

Coverage decreased (-7.7%) to 86.047% when pulling 036d9fc on bl-dpt:colorhash-icc-support into 691f210 on JohannesBuchner:master.

david-russo · 2020-06-12T11:26:09Z

I went with not allowing ImageCms to be optional due to the impact color profiles can have on an image's appearance, and I believe you were right earlier in thinking that they could be relevant to the other hash functions, so the code should probably be refactored to share this feature.

JohannesBuchner

Good, I am impressed you got travis working with a single commit. It usually takes me a few.

Zooming out a bit, if we want this to be a feature for all hash functions, perhaps it should be applied outside this functions. It appears the correction can be applied such that the image object is modified.

Then perhaps we should instead of including this in imagehash, document for users the optimal way to pass an PIL image to imagehash. This should include how to correct for the icc profile, but also applying:

from PIL import ImageOps
image = ImageOps.exif_transpose(image)

Do you think this would be the right approach?

JohannesBuchner · 2020-06-12T11:28:41Z

.travis.yml

@@ -38,12 +38,17 @@ install:
  - conda install --quiet --file conda-requirements.txt
  - pip install coveralls

+  # Work around conda's Pillow package lacking ImageCms


in line 9 add:

env: - REINSTALL_PILLOW=no - REINSTALL_PILLOW=yes

JohannesBuchner · 2020-06-12T11:29:01Z

.travis.yml

@@ -38,12 +38,17 @@ install:
  - conda install --quiet --file conda-requirements.txt
  - pip install coveralls

+  # Work around conda's Pillow package lacking ImageCms
+  - conda uninstall pillow


and then here prefix the two commands here with [[ "$REINSTALL_PILLOW" == "yes" ]] &&

JohannesBuchner · 2020-06-12T11:30:05Z

imagehash.py

@@ -320,7 +321,7 @@ def whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_ha



-def colorhash(image, binbits=3):
+def colorhash(image, binbits=3, ignore_icc=False):


use a positive variable name here to avoid the double negative. For example: apply_icc_profile=True

david-russo · 2020-06-12T14:01:09Z

Yeah, I agree. Any necessary profile conversion should happen on the Image object before it being passed to any of the hash functions. As for where that's done:

ICC profiles aren't particularly simple to handle, and there are a number of ways to screw up the conversion between profiles if one is not familiar with the concepts or specifics, such as the pros and cons between different rendering intents and color spaces.

For instance, I chose to normalize on the sRGB color space here because it's the only one supported by PIL well enough to allow conversion into HSV, as the current colorhash algorithm requires. It's also usually the default color space in which software renders images without profiles, so it should be generally analogous to what most users would see on their screens... but ideally both images with and without color profiles would first be converted to one of the larger, human vision-based color spaces, such as LAB, before hashing, to better account for colors that lie outside the sRGB space.

Since such decisions require a fair bit of knowledge, and are, currently at least, fairly tightly coupled to the hashing implementations, I think we should keep profile conversion internal to imagehash.

More generally, I think we should abstract away the need for users to import PIL at all, allowing them to pass in file-type objects or file paths instead. The use of PIL internally seems more like an implementation detail to me, which we needn't bother the user with.

david-russo · 2020-06-12T14:06:21Z

For this PR, do I also need to update data_files in setup.py with the new test images?

JohannesBuchner · 2020-06-12T14:09:08Z

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand. If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

JohannesBuchner · 2020-06-12T14:10:09Z

For this PR, do I also need to update data_files in setup.py with the new test images?

If you use them only for testing, no, if you want them to be installed in the site-package folder, yes. I think no.

david-russo · 2020-06-12T14:32:30Z

I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand.

There's little harm in supporting all three, I suppose :)

If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations.

Since the current colorhash algorithm converts images into both "L" and "HSV", Pillow seems to support conversion from RGB to either of them just fine, so we should be able to use a single function for the current implementation. It would only get more complicated if we wanted to go from LAB to certain other modes.

JohannesBuchner · 2022-09-02T14:42:46Z

I would be happy to include this as a single preprocessing helper function. A section in the README should explain why and how to use it. This will make it easy for users to plug it in.

Then we can also go with dependencies being optional, as it is not a core functionality.

Work around conda's Pillow package lacking ImageCms

036d9fc

JohannesBuchner reviewed Jun 12, 2020

View reviewed changes

JohannesBuchner mentioned this pull request Feb 1, 2022

Image interpolation method for downscaling (nearest neighbour, antialias) #128

Open

JohannesBuchner mentioned this pull request Aug 26, 2024

Add support for ISCC content hash for images (ie. phash variation) #212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rudimentary ICC color profile support to colorhash #110

Add rudimentary ICC color profile support to colorhash #110

david-russo commented Jun 9, 2020 •

edited

Loading

JohannesBuchner commented Jun 9, 2020

JohannesBuchner commented Jun 9, 2020

david-russo commented Jun 9, 2020 •

edited

Loading

david-russo commented Jun 9, 2020

david-russo commented Jun 11, 2020 •

edited

Loading

JohannesBuchner commented Jun 11, 2020

coveralls commented Jun 12, 2020

david-russo commented Jun 12, 2020

JohannesBuchner left a comment

JohannesBuchner Jun 12, 2020

JohannesBuchner Jun 12, 2020

JohannesBuchner Jun 12, 2020

david-russo commented Jun 12, 2020 •

edited

Loading

david-russo commented Jun 12, 2020

JohannesBuchner commented Jun 12, 2020

JohannesBuchner commented Jun 12, 2020

david-russo commented Jun 12, 2020 •

edited

Loading

JohannesBuchner commented Sep 2, 2022

		@@ -320,7 +321,7 @@ def whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_ha



		def colorhash(image, binbits=3):
		def colorhash(image, binbits=3, ignore_icc=False):

Add rudimentary ICC color profile support to colorhash #110

Are you sure you want to change the base?

Add rudimentary ICC color profile support to colorhash #110

Conversation

david-russo commented Jun 9, 2020 • edited Loading

JohannesBuchner commented Jun 9, 2020

JohannesBuchner commented Jun 9, 2020

david-russo commented Jun 9, 2020 • edited Loading

david-russo commented Jun 9, 2020

david-russo commented Jun 11, 2020 • edited Loading

JohannesBuchner commented Jun 11, 2020

coveralls commented Jun 12, 2020

david-russo commented Jun 12, 2020

JohannesBuchner left a comment

Choose a reason for hiding this comment

JohannesBuchner Jun 12, 2020

Choose a reason for hiding this comment

JohannesBuchner Jun 12, 2020

Choose a reason for hiding this comment

JohannesBuchner Jun 12, 2020

Choose a reason for hiding this comment

david-russo commented Jun 12, 2020 • edited Loading

david-russo commented Jun 12, 2020

JohannesBuchner commented Jun 12, 2020

JohannesBuchner commented Jun 12, 2020

david-russo commented Jun 12, 2020 • edited Loading

JohannesBuchner commented Sep 2, 2022

david-russo commented Jun 9, 2020 •

edited

Loading

david-russo commented Jun 9, 2020 •

edited

Loading

david-russo commented Jun 11, 2020 •

edited

Loading

david-russo commented Jun 12, 2020 •

edited

Loading

david-russo commented Jun 12, 2020 •

edited

Loading