Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance multimodal support for images and audio in instructor #1212

Merged
merged 2 commits into from
Nov 23, 2024

Conversation

jxnl
Copy link
Collaborator

@jxnl jxnl commented Nov 23, 2024

Important

Enhance multimodal support in instructor by improving image and audio handling, and updating content conversion functions for better compatibility with AI models.

  • Multimodal Enhancements:
    • Update Image class to support autodetection of image sources from URLs, file paths, and base64 data.
    • Add autodetect_safely() method to Image class for safe image detection.
    • Improve Audio class to handle audio from URLs and file paths, ensuring WAV format.
  • Content Conversion:
    • Enhance convert_contents() function to handle Image and Audio objects based on mode.
    • Update convert_messages() to support autodetection of images and conversion of content for different modes.
  • Miscellaneous:
    • Add caching to from_url() and from_path() methods in Image class.
    • Refactor url_to_base64() to cache base64 encoding of image URLs.

This description was created by Ellipsis for f9d95c8. It will automatically update as commits are pushed.

@ellipsis-dev ellipsis-dev bot changed the title ... feat: enhance multimodal support for images and audio in instructor Nov 23, 2024
Copy link

cloudflare-workers-and-pages bot commented Nov 23, 2024

Deploying instructor-py with  Cloudflare Pages  Cloudflare Pages

Latest commit: f9d95c8
Status: ✅  Deploy successful!
Preview URL: https://606e7ac4.instructor-py.pages.dev
Branch Preview URL: https://doc-lint-2.instructor-py.pages.dev

View logs

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 23, 2024
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to f9d95c8 in 1 minute and 18 seconds

More details
  • Looked at 2772 lines of code in 65 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. instructor/multimodal.py:343
  • Draft comment:
    Consider using the | operator for type hinting instead of Union for consistency and readability.
    contents: str | dict[str, Any] | Image | Audio | list[str | dict[str, Any] | Image | Audio],
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The PR includes multiple instances where the Union type hint can be simplified using the | operator, which is more concise and modern. This change is consistent with the rest of the codebase and improves readability.
2. instructor/multimodal.py:213
  • Draft comment:
    Use consistent type hinting for the source attribute. Consider using Union[str, Path] or str | Path consistently throughout the code.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The code uses inconsistent type hinting for the source attribute in the Audio class. It uses str | Path in one place and Union[str, Path] in another. This inconsistency should be addressed for clarity and consistency.

Workflow ID: wflow_8Lx93v1vCi9sDFxO


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@jxnl jxnl merged commit 068d183 into main Nov 23, 2024
7 of 15 checks passed
@jxnl jxnl deleted the doc-lint-2 branch November 23, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant