- [Model Release] Jan 2023, released implementation of BLIP-2
Paper, Project Page,
A generic and efficient pre-training strategy that easily harvests development of pretrained vision models and large language models (LLMs) for vision-language pretraining. BLIP-2 beats Flamingo on zero-shot VQAv2 (65.0 vs 56.3), establishing new state-of-the-art on zero-shot captioning (on NoCaps 121.6 CIDEr score vs previous best 113.2). In addition, equipped with powerful LLMs (e.g. OPT, FlanT5), BLIP-2 also unlocks the new zero-shot instructed vision-to-language generation capabilities for various interesting applications!
Image-to-prompt is a Python deep learning model for generating prompt from image for text-to-image tasks.
- (Optional) Creating conda environment
conda create -n lavis python=3.8
conda activate lavis
- install from PyPI
pip install salesforce-lavis
- Or, for development, you may build from source
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
Model are in google drive, to view:
https://drive.google.com/file/d/1IGxTKDwQX5o3-C6ttZNJwt5j2Sbvjtk1/view?usp=share_link
In this example, we use the BLIP model to generate a prompt for the image. To make inference even easier, we also associate each
pre-trained model with its preprocessors (transforms), accessed via load_model_and_preprocess()
.
model_path are modified in
lavis/configs/models/blip2/blip2_caption_opt2.7b.yaml
finetuned: local_path
import torch
from lavis.models import load_model_and_preprocess
from PIL import Image
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, vis_processors, _ = load_model_and_preprocess(name="blip2_opt", model_type="caption_coco_opt2.7b", is_eval=True, device=device)
raw_image = Image.open("docs/_static/rooster.jpg").convert("RGB")
# preprocess the image
# vis_processors stores image transforms for "train" and "eval" (validation / testing / inference)
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
# generate caption
res = model.generate({"image": image})
print("res: {}".format(res))
#['rooster in oriental armor pattern, kung fu style, intricate, high resolution, art style, kirby, kirby art,']
If you have any questions, comments or suggestions, please do not hesitate to contact us at [email protected].