Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 1.91 KB

README.md

File metadata and controls

55 lines (43 loc) · 1.91 KB

Langchain Vision Tools

The Vision Tools library provides a set of tools for image analysis and recognition, leveraging various deep learning models. It includes functionalities for deep image tagging using the DeepDanbooru model, image analysis using the CLIP model, and vision-based predictions using the GPT-4 Vision Preview model. Installation

Before using the following library please ensure you have: - A Supported CUDA version installed (For GPU Support) - Visual Studio - An OpenAI API Key

To use this library, you need to install the required dependencies. You can install them using the following:

pip install openai langchain deepdanbooru pillow tensorflow huggingface_hub clip_interrogator python-dotenv

Be sure to set OPENAI_API_KEY in your environment

export OPENAI_API_KEY='api_key'

Example Usage

import vision_tools as vt
# Example usage of the VisionTools library
# Create an instance of VisionTools with DeepDanbooru support
vision_tools = vt.VisionTools(is_deepdan=True)

# Image path for analysis
image_path = "path/to/your/image.jpg"

# GPT-4 Vision Preview prediction
prompt = "Describe the image"
gpt_result = vision_tools.gpt_v_predict(prompt, image_path)
print(f"GPT Result: {gpt_result}")

# CLIP Interrogator prediction
clip_result = vision_tools.clip_interrogator_predict(image_path)
print(f"CLIP Result: {clip_result}")

# DeepDanbooru prediction with a score threshold
score_threshold = 0.5
deepdan_result = vision_tools.deepdan_predict(image_path, score_threshold)
print(f"DeepDanbooru Result (Text): {deepdan_result}")

Acknowledgements

OpenAI for the GPT-4 Vision Preview model
LangChain for the LangChain tools
DeepDanbooru for the pre-trained model and labels
TensorFlow for the deep learning framework
Hugging Face for the CLIP model
Python Imaging Library (PIL) for image processing
Python dotenv for environment variable loading