Failed to pass the post-build test which related to webm images #1171

8ar10der opened this issue Nov 12, 2024 · 1 comment
Failed to pass the post-build test which related to webm images #1171

8ar10der opened this issue Nov 12, 2024 · 1 comment
8ar10der commented Nov 12, 2024

  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

Archlinux 6.6.60-1-lts
python 3.12.7

Installed pip packages

python 3.12.7

Testing Log

=================================================================================== FAILURES ===================================================================================
__________________________________________________________________ test_image_from_url_with_unusual_extension __________________________________________________________________

    def test_image_from_url_with_unusual_extension():
        url = ""
>       image = Image.from_url(url)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'instructor.multimodal.Image'>, url = ''

    def from_url(cls, url: str) -> Image:
        if cls.is_base64(url):
            return cls.from_base64(url)
        parsed_url = urlparse(url)
        media_type, _ = mimetypes.guess_type(parsed_url.path)
        if not media_type:
                response = requests.head(url, allow_redirects=True)
                media_type = response.headers.get("Content-Type")
            except requests.RequestException as e:
                raise ValueError(f"Failed to fetch image from URL") from e
        if media_type not in VALID_MIME_TYPES:
>           raise ValueError(f"Unsupported image format: {media_type}")
E           ValueError: Unsupported image format: text/html

instructor/ ValueError
_________________________________________________________ test_image_from_various_urls[] _________________________________________________________

url = '', request = <FixtureRequest for <Function test_image_from_various_urls[]>>

    def test_image_from_various_urls(url, request):
        if url.startswith("base64"):
            url = request.getfixturevalue(url)
>       image = Image.from_url(url)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'instructor.multimodal.Image'>, url = ''

    def from_url(cls, url: str) -> Image:
        if cls.is_base64(url):
            return cls.from_base64(url)
        parsed_url = urlparse(url)
        media_type, _ = mimetypes.guess_type(parsed_url.path)
        if not media_type:
                response = requests.head(url, allow_redirects=True)
                media_type = response.headers.get("Content-Type")
            except requests.RequestException as e:
                raise ValueError(f"Failed to fetch image from URL") from e
        if media_type not in VALID_MIME_TYPES:
>           raise ValueError(f"Unsupported image format: {media_type}")
E           ValueError: Unsupported image format: text/html

instructor/ ValueError
__________________________________________________________ test_image_autodetect[/path/to/image.webp-file-image/webp] __________________________________________________________

input_data = '/path/to/image.webp', expected_type = 'file', expected_media_type = 'image/webp'
request = <FixtureRequest for <Function test_image_autodetect[/path/to/image.webp-file-image/webp]>>

        "input_data, expected_type, expected_media_type",
            # URL tests
            ("", "url", "image/jpeg"),
            ("", "url", "image/png"),
            ("", "url", "image/webp"),
            ("", "url", "image/jpeg"),
            ),  # Default to JPEG if no extension
            # Base64 data URI tests
            # File path tests (mocked)
            ("/path/to/image.jpg", "file", "image/jpeg"),
            ("/path/to/image.png", "file", "image/png"),
            ("/path/to/image.webp", "file", "image/webp"),
    def test_image_autodetect(input_data, expected_type, expected_media_type, request):
        with (
            patch("pathlib.Path.is_file", return_value=True),
            patch("pathlib.Path.stat", return_value=MagicMock(st_size=1000)),
            patch("pathlib.Path.read_bytes", return_value=b"fake image data"),
            patch("requests.head") as mock_head,
            mock_head.return_value = MagicMock(
                headers={"Content-Type": expected_media_type}
            if input_data.startswith("base64"):
                input_data = request.getfixturevalue(input_data)
>           image = Image.autodetect(input_data)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
instructor/ in autodetect
    return cls.from_path(source)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'instructor.multimodal.Image'>, path = PosixPath('/path/to/image.webp')

    def from_path(cls, path: Union[str, Path]) -> Image:  # noqa: UP007
        path = Path(path)
        if not path.is_file():
            raise FileNotFoundError(f"Image file not found: {path}")
        if path.stat().st_size == 0:
            raise ValueError("Image file is empty")
        media_type, _ = mimetypes.guess_type(str(path))
        if media_type not in VALID_MIME_TYPES:
>           raise ValueError(f"Unsupported image format: {media_type}")
E           ValueError: Unsupported image format: None

instructor/ ValueError
=============================================================================== warnings summary ===============================================================================
../../../../../../../../usr/lib/python3.12/site-packages/pydantic/_internal/ 11 warnings
  /usr/lib/python3.12/site-packages/pydantic/_internal/ PydanticDeprecatedSince20: `json_encoders` is deprecated. See for alternatives. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at

  /usr/lib/python3.12/site-packages/pydantic/_internal/ PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)

  /usr/lib/python3.12/site-packages/litellm/ DeprecationWarning: open_text is deprecated. Use files() instead. Refer to for migration advice.
    with resources.open_text("litellm.llms.tokenizers", "anthropic_tokenizer.json") as f:

  /home/amao/.cache/paru/clone/python-instructor/src/instructor-1.6.3/instructor/ DeprecationWarning: The FUNCTIONS mode is deprecated and will be removed in future versions

  tests/ PytestWarning: The test <Function test_incomplete_output_exception_raise[mock_completion0]> is marked with '@pytest.mark.asyncio' but it is not an async function. Please remove the asyncio mark. If the test is not marked explicitly, check for global marks applied via 'pytestmark'.
    @pytest.mark.asyncio  # type: ignore[misc]

  /home/amao/.cache/paru/clone/python-instructor/src/instructor-1.6.3/instructor/ DeprecationWarning: 'imghdr' is deprecated and slated for removal in Python 3.13
    import imghdr

  /home/amao/.cache/paru/clone/python-instructor/src/instructor-1.6.3/tests/ DeprecationWarning: apatch is deprecated, use patch instead

-- Docs:
=========================================================================== short test summary info ============================================================================
FAILED tests/ - ValueError: Unsupported image format: text/html
FAILED tests/[] - ValueError: Unsupported image format: text/html
FAILED tests/[/path/to/image.webp-file-image/webp] - ValueError: Unsupported image format: None
===================================================== 3 failed, 119 passed, 2 skipped, 69 deselected, 17 warnings in 3.90s =====================================================
8ar10der commented Nov 12, 2024 has not included any webp content, it should be replaced by another link.

For example, we can use the Google's webp sample link:

I can try to fix it and make a PR if I find a time. If any developer could do a quick fix please just do it and let me know :)

