Skip to content

Commit

Permalink
Add mobileclip notebook (#1804)
Browse files Browse the repository at this point in the history
  • Loading branch information
eaidova authored Mar 12, 2024
1 parent 2de66b5 commit 3b0b55a
Show file tree
Hide file tree
Showing 5 changed files with 977 additions and 1 deletion.
3 changes: 2 additions & 1 deletion .ci/ignore_pip_conflicts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ notebooks/272-paint-by-example/272-paint-by-example.ipynb # gradio==3.44.1
notebooks/273-stable-zephyr-3b-chatbot/273-stable-zephyr-3b-chatbot.ipynb # install requirements.txt after clone repo
notebooks/279-mobilevlm-language-assistant/279-mobilevlm-language-assistant.ipynb # transformers<4.35
notebooks/280-depth-anything/280-depth-anything.ipynb # install requirements.txt after clone repo
notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb # requires python >=3.9
notebooks/285-surya-line-level-text-detection/285-surya-line-level-text-detection.ipynb # requires python >=3.9
notebooks/289-mobileclip-video-search/289-mobileclip-video-search.ipynb # install requirements.txt inside
1 change: 1 addition & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,7 @@ MLLM
MLLMs
MMVLM
MLP
MobileCLIP
MobileLLaMA
mobilenet
MobileNet
Expand Down

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions notebooks/289-mobileclip-video-search/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Visual Content Search using MobileCLIP and OpenVINO™
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/289-mobileclip-video-search/289-mobileclip-video-search.ipynb)

![example.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/4e241f82-548e-41c2-b1f4-45b319d3e519)

Semantic visual content search is a machine learning task that uses either a text query or an input image to search a database of images (photo gallery, video) to find images that are semantically similar to the search query.
Historically, building a robust search engine for images was difficult. One could search by features such as file name and image metadata, and use any context around an image (i.e. alt text or surrounding text if an image appears in a passage of text) to provide the richer searching feature. This was before the advent of neural networks that can identify semantically related images to a given user query.

[Contrastive Language-Image Pre-Training (CLIP)](https://arxiv.org/abs/2103.00020) models provide the means through which you can implement a semantic search engine with a few dozen lines of code. The CLIP model has been trained on millions of pairs of text and images, encoding semantics from images and text combined. Using CLIP, you can provide a text query and CLIP will return the images most related to the query.

In this tutorial, we consider how to use [MobileCLIP](https://arxiv.org/pdf/2311.17049.pdf) for implementing a visual content search engine for finding relevant frames in video

## Notebook Contents

This tutorial demonstrates step-by-step instructions on how to run PyTorch MobileCLIP with OpenVINO. It also provides an interactive user interface for search frames in video that are the most relevant to text or image requests.
The tutorial consists of the following steps:


- Select model
- Prepare PyTorch model
- Run PyTorch model inference
- Convert PyTorch model to OpenVINO IR
- Run model inference with OpenVINO
- Launch interactive demo for


## Installation Instructions

This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).
3 changes: 3 additions & 0 deletions selector/src/shared/notebook-tags.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@ export const TASKS = /** @type {const} */ ({
TEXT_TO_AUDIO: 'Text-to-Audio',
AUDIO_TO_TEXT: 'Audio-to-Text',
VISUAL_QUESTION_ANSWERING: 'Visual Question Answering',
IMAGE_CAPTIONING: "Image Captioning",
FEATURE_EXTRACTION: 'Feature Extraction',
TEXT_TO_IMAGE_RETRIEVAL: "Text-to-Image Retrieval",
IMAGE_TO_TEXT_RETRIEVAL: "Image-to-Text Retrieval"
},
CV: {
IMAGE_CLASSIFICATION: 'Image Classification',
Expand Down

0 comments on commit 3b0b55a

Please sign in to comment.