Multimodal retrieval with Amazon reviews dataset and LLVM reranking #1477

stefanwebb · 2024-12-16T01:21:41Z

This is a notebook I used for an upcoming AWS Open Source Developers YouTube video. It modifies an existing one to only use open-source models

review-notebook-app · 2024-12-16T01:21:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sre-ci-robot · 2024-12-16T01:21:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stefanwebb

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

review-notebook-app · 2024-12-16T13:29:12Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:11Z
----------------------------------------------------------------

Need a better title. e.g "Multimodal retrieval with Amazon reviews dataset and LLVM reranking" sounds better than this.

stefanwebb commented on 2024-12-20T21:06:29Z
----------------------------------------------------------------

Done!

review-notebook-app · 2024-12-16T13:29:13Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:12Z
----------------------------------------------------------------

Please add more elaboration on what this notebook is trying to do, to attract the reader and layout background. A good opening is very important to retain reader's attention.

Good examples are:

https://milvus.io/docs/graph_rag_with_milvus.md

https://milvus.io/docs/contextual_retrieval_with_milvus.md

stefanwebb commented on 2024-12-20T21:06:42Z
----------------------------------------------------------------

Done!

review-notebook-app · 2024-12-16T13:29:14Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:13Z
----------------------------------------------------------------

Does this notebook has a pairing blog? Why is here only clipped ones but no overview diagram?

stefanwebb commented on 2024-12-20T21:07:38Z
----------------------------------------------------------------

This is from a slide on a presentation I gave. I'll. include the entire figure with the irrelevant bits greyed out each time

stefanwebb commented on 2024-12-20T21:26:13Z
----------------------------------------------------------------

Done!

review-notebook-app · 2024-12-16T13:29:14Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:14Z
----------------------------------------------------------------

the product review text?

stefanwebb commented on 2024-12-20T21:08:05Z
----------------------------------------------------------------

I.e. the text from the customer reviews

stefanwebb commented on 2024-12-20T21:08:20Z
----------------------------------------------------------------

I'll make it clearer by rewording

stefanwebb commented on 2024-12-20T21:27:32Z
----------------------------------------------------------------

Done!

review-notebook-app · 2024-12-16T13:29:15Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:15Z
----------------------------------------------------------------

what about adding a little description like "It can embed text and image information into the same latent space thus enable multimodal search."

review-notebook-app · 2024-12-16T13:29:16Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:15Z
----------------------------------------------------------------

nit, what about

easy methods -> convenient util functions

imgs -> img

stefanwebb commented on 2024-12-20T21:39:56Z
----------------------------------------------------------------

Done! I think img's seems more natural than img. (You can use 's to separate a word and an -s for plural)

review-notebook-app · 2024-12-16T13:29:17Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:16Z
----------------------------------------------------------------

For each image in the downloaded dataset, we pass it through the embedding model to obtain the output vector. Embedding may take some time. For example, a MacBook Pro M3 embeds around nine images per second. The throughput is likely to be much higher if running on a more powerful hardware such as Nvidia GPU.

review-notebook-app · 2024-12-16T13:29:18Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:17Z
----------------------------------------------------------------

In order to perform efficient similarity search over embeddings, we need to store them in a vector database. In this demo, we use Milvus Lite, a lightweight version of a popular open-source vector database Milvus.

By specifying uri to a file path, it persists all data to the local file.

review-notebook-app · 2024-12-16T13:29:19Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:18Z
----------------------------------------------------------------

nit, I feel this and the following 4 code blocks can be merged. the logic is straightforward.

review-notebook-app · 2024-12-16T13:29:19Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:19Z
----------------------------------------------------------------

nit, delete before creating? (in case people repeated run the notebook and insert duplicate data)

https://zilliverse.feishu.cn/wiki/RrxgwEVooidRpEkH3pqcJMpTnA6#Gs2udxUMboOc0sxeGEJcHnF1nlc

review-notebook-app · 2024-12-16T13:29:20Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:20Z
----------------------------------------------------------------

since we use auto id we can omit the id field

review-notebook-app · 2024-12-16T13:29:21Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:21Z
----------------------------------------------------------------

can we hide the lengthy output (heard there is a trick to do that, or you can clear the output)

review-notebook-app · 2024-12-16T13:29:22Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:21Z
----------------------------------------------------------------

Now, all data is ingested into Milvus vector database. We are ready for multi-modal search!

review-notebook-app · 2024-12-16T13:29:23Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:22Z
----------------------------------------------------------------

nit, why is the leapard squeezed horizontally? maybe due to "display(combined_image.resize((512, 512)))"?

stefanwebb commented on 2024-12-20T21:50:51Z
----------------------------------------------------------------

I think because we have 3 rows but 4 columns... I'm going to keep it unchanged since changing it would involve rerunning everything

review-notebook-app · 2024-12-16T13:29:24Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:23Z
----------------------------------------------------------------

Nit, is it LLVM or VLM (vision language model)?

And seems the following code isn't using "HuggingFace's Transformers library."

stefanwebb commented on 2024-12-20T21:52:59Z
----------------------------------------------------------------

Reworded to make it clearer that I'm using phi_3_vision_mlx library

stefanwebb commented on 2024-12-20T21:53:26Z
----------------------------------------------------------------

I think LLVM is more more common

review-notebook-app · 2024-12-16T13:29:25Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:24Z
----------------------------------------------------------------

This part gets me a little confused. Previously it was using Milvus Lite, why here it becomes a Zilliz Cloud commercial?

I feel it's fine to focus on Milvus and list it as a blog post on milvus.io.

Need to anchor on either one and polish the marketing language here.

stefanwebb commented on 2024-12-20T21:54:18Z
----------------------------------------------------------------

That's a typo... Fixed now

review-notebook-app · 2024-12-16T13:29:26Z

View / edit / reply to this conversation on ReviewNB

codingjaguar commented on 2024-12-16T13:29:25Z
----------------------------------------------------------------

Ditto here, if targeting milvus.io linking to a milvus content aggregation page on zilliz.com is a little weird. (Linking to zilliz signup is fine and that's what we want for driving traffic there)

stefanwebb commented on 2024-12-20T21:57:16Z
----------------------------------------------------------------

Fixed

stefanwebb · 2024-12-20T21:06:30Z

Done!

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:06:44Z

Done!

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:07:39Z

This is from a slide on a presentation I gave. I'll. include the entire figure with the irrelevant bits greyed out each time

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:08:06Z

I.e. the text from the customer reviews

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:08:21Z

I'll make it clearer by rewording

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:26:14Z

Done!

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:27:33Z

Done!

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:39:57Z

Done! I think img's seems more natural than img. (You can use 's to separate a word and an -s for plural)

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:50:52Z

I think because we have 3 rows but 4 columns... I'm going to keep it unchanged since changing it would involve rerunning everything

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:53:00Z

Reworded to make it clearer that I'm using phi_3_vision_mlx library

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:53:28Z

I think LLVM is more more common

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:54:19Z

That's a typo... Fixed now

View entire conversation on ReviewNB

stefanwebb · 2024-12-20T21:57:17Z

Fixed

View entire conversation on ReviewNB

multimodal retrieval with Amazon reviews dataset

8e42db1

sre-ci-robot added the size/XL label Dec 16, 2024

renamed and formatted notebook

2f553b1

Jiang's suggested edits

5109009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal retrieval with Amazon reviews dataset and LLVM reranking #1477

Multimodal retrieval with Amazon reviews dataset and LLVM reranking #1477

stefanwebb commented Dec 16, 2024

review-notebook-app bot commented Dec 16, 2024

sre-ci-robot commented Dec 16, 2024

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

Multimodal retrieval with Amazon reviews dataset and LLVM reranking #1477

Are you sure you want to change the base?

Multimodal retrieval with Amazon reviews dataset and LLVM reranking #1477

Conversation

stefanwebb commented Dec 16, 2024

review-notebook-app bot commented Dec 16, 2024

sre-ci-robot commented Dec 16, 2024

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

review-notebook-app bot commented Dec 16, 2024 • edited Loading

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

stefanwebb commented Dec 20, 2024

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading

review-notebook-app bot commented Dec 16, 2024 •

edited

Loading