Unable to query `ai-embed-qa-4`; uninformative error #30

aishwaryap · 2024-04-23T21:30:41Z

I am trying to experiment with different embedding models in a RAG application building off of the example here. It works fine when I create an NVIDIAEmbeddings object with model="nvolveqa_40k" but with model="ai-embed-qa-4" it fails at the vectorestore creation step ie

vectorstore = FAISS.from_documents(documents, document_embedder)

with the following uninformative error:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[6], line 7
      5 # document_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")
      6 document_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
----> 7 vectorstore = FAISS.from_documents(documents, document_embedder)
      8 retriever = vectorstore.as_retriever()
     10 user_input = "How do I query NVIDIA models in LangChain?"

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py:550](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py#line=549), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    548 texts = [d.page_content for d in documents]
    549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:930](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py#line=929), in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
    903 @classmethod
    904 def from_texts(
    905     cls,
   (...)
    910     **kwargs: Any,
    911 ) -> FAISS:
    912     """Construct FAISS wrapper from raw documents.
    913 
    914     This is a user friendly interface that:
   (...)
    928             faiss = FAISS.from_texts(texts, embeddings)
    929     """
--> 930     embeddings = embedding.embed_documents(texts)
    931     return cls.__from(
    932         texts,
    933         embeddings,
   (...)
    937         **kwargs,
    938     )

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:142](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=141), in NVIDIAEmbeddings.embed_documents(self, texts)
    136     batch = texts[i : i + self.max_batch_size]
    137     truncated = [
    138         text[: self.max_length] if len(text) > self.max_length else text
    139         for text in batch
    140     ]
    141     all_embeddings.extend(
--> 142         self._embed(truncated, model_type=self.model_type or "passage")
    143     )
    144 return all_embeddings

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:105](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=104), in NVIDIAEmbeddings._embed(self, texts, model_type)
    102     if self.truncate:
    103         payload["truncate"] = self.truncate
--> 105 response = self.client.get_req(
    106     model_name=self.model,
    107     payload=payload,
    108     endpoint="infer",
    109 )
    110 response.raise_for_status()
    111 result = response.json()

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:392](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=391), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
    390 if payload.get("stream", False) is True:
    391     payload = {**payload, "stream": False}
--> 392 response, session = self._post(invoke_url, payload)
    393 return self._wait(response, session)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:220](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=219), in NVEModel._post(self, invoke_url, payload)
    218 session = self.get_session_fn()
    219 self.last_response = response = session.post(**self.last_inputs)
--> 220 self._try_raise(response)
    221 return response, session

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:303](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=302), in NVEModel._try_raise(self, response)
    301     body += "\nPlease check or regenerate your API key."
    302 # todo: raise as an HTTPError
--> 303 raise Exception(f"{header}\n{body}") from None

Exception: [400] Bad Request
Inference error
RequestID: e7d48fb8-0a64-49d3-8cf3-a0c9ddbddbb4

I had noticed that for generation models this error sometimes simply means that a newer package version is required and I have filed an issue requesting for more informative errors in that case but with this model, I get this error even with the latest version (0.0.9) and a newly generated API key.

If the model is not yet supported, can it be hidden from the output of available_models?

The text was updated successfully, but these errors were encountered:

mattf · 2024-04-24T10:02:50Z

If the model is not yet supported, can it be hidden from the output of available_models?

this is a good suggestion, also mentioned in #26 for chat completion models

mattf · 2024-04-24T10:03:47Z

@aishwaryap is it possible one of the documents you are sending to FAISS is empty? the service rejects empty content.

aishwaryap · 2024-04-24T20:16:41Z

@mattf I'm reasonably sure they are not. I took a working example with the nvolveqa_40k model and replaced just the embedding model with ai-embed-qa-4 to get this issue. I assume empty documents would error out with any model.

That said I can create a self complete example and add it for testing.

aishwaryap · 2024-04-24T20:39:08Z

Sample self contained script:

import os
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

os.environ["NVIDIA_API_KEY"] = "nvapi-<redacted>"

urls = ["https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/"]
loader = AsyncHtmlLoader(urls)
docs = loader.load()
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs_transformed)
print("Number of chunks: ", len(chunks))
print("----------------------")
print("Chunk text at index 2: \n")
print(chunks[1].page_content)
print("----------------------")
nvolveqa_embedder = NVIDIAEmbeddings(model="nvolveqa_40k")
nvolveqa_vectorstore = FAISS.from_documents(chunks, nvolveqa_embedder)
nvolveqa_retriever = nvolveqa_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
nvolveqa_retrieved = nvolveqa_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from nvolveqa-40k: \n")
print(nvolveqa_retrieved[0].page_content)
print("----------------------")
embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
embedqa4_retriever = embedqa4_vectorstore.as_retriever()
user_input = "How do I query NVIDIA models in LangChain?"
embedqa4_retrieved = embedqa4_retriever.get_relevant_documents(user_input)
print("----------------------")
print("Top retrieved chunk text from ai-embed-qa-4: \n")
print(embedqa4_retrieved[0].page_content)
print("----------------------")

My output (stderr + stdout):

Fetching pages: 100%|###########################################################################################| 1[/1](http://localhost:8888/1) [00:00<00:00, 27.39it[/s](http://localhost:8888/s)]
Created a chunk of size 2313, which is longer than the specified 2000
Number of chunks:  22
----------------------
Chunk text at index 2: 

* Graphs

    * Callbacks

    * Chat loaders

    * Adapters

    * Stores

  *   * Components
  * Chat models
  * NVIDIA AI Foundation Endpoints

On this page

# NVIDIA AI Foundation Endpoints

The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.

> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.

This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.

## Installation

    
    
    %pip install --upgrade --quiet langchain-nvidia-ai-endpoints  
    
    
    
    Note: you may need to restart the kernel to use updated packages.  
    

## Setup

**To get started:**

  1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.

  2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

  3. Select the `API` option and click `Generate Key`.

  4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
----------------------
Top retrieved chunk text from nvolveqa-40k: 

* Graphs

    * Callbacks

    * Chat loaders

    * Adapters

    * Stores

  *   * Components
  * Chat models
  * NVIDIA AI Foundation Endpoints

On this page

# NVIDIA AI Foundation Endpoints

The `ChatNVIDIA` class is a LangChain chat model that connects to NVIDIA AI
Foundation Endpoints.

> NVIDIA AI Foundation Endpoints give users easy access to NVIDIA hosted API
> endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable
> Diffusion, etc. These models, hosted on the NVIDIA NGC catalog, are
> optimized, tested, and hosted on the NVIDIA AI platform, making them fast
> and easy to evaluate, further customize, and seamlessly run at peak
> performance on any accelerated stack.
>
> With NVIDIA AI Foundation Endpoints, you can get quick results from a fully
> accelerated stack running on NVIDIA DGX Cloud. Once customized, these models
> can be deployed anywhere with enterprise-grade security, stability, and
> support using NVIDIA AI Enterprise.
>
> These models can be easily accessed via the `langchain-nvidia-ai-endpoints`
> package, as shown below.

This example goes over how to use LangChain to interact with and develop LLM-
powered systems using the publicly-accessible AI Foundation endpoints.

## Installation

    
    
    %pip install --upgrade --quiet langchain-nvidia-ai-endpoints  
    
    
    
    Note: you may need to restart the kernel to use updated packages.  
    

## Setup

**To get started:**

  1. Create a free account with the NVIDIA NGC service, which hosts AI solution catalogs, containers, models, etc.

  2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

  3. Select the `API` option and click `Generate Key`.

  4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
----------------------
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[1], line 32
     30 print("----------------------")
     31 embedqa4_embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
---> 32 embedqa4_vectorstore = FAISS.from_documents(chunks, embedqa4_embedder)
     33 embedqa4_retriever = embedqa4_vectorstore.as_retriever()
     34 user_input = "How do I query NVIDIA models in LangChain?"

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py:550](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_core/vectorstores.py#line=549), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    548 texts = [d.page_content for d in documents]
    549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:930](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py#line=929), in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
    903 @classmethod
    904 def from_texts(
    905     cls,
   (...)
    910     **kwargs: Any,
    911 ) -> FAISS:
    912     """Construct FAISS wrapper from raw documents.
    913 
    914     This is a user friendly interface that:
   (...)
    928             faiss = FAISS.from_texts(texts, embeddings)
    929     """
--> 930     embeddings = embedding.embed_documents(texts)
    931     return cls.__from(
    932         texts,
    933         embeddings,
   (...)
    937         **kwargs,
    938     )

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:142](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=141), in NVIDIAEmbeddings.embed_documents(self, texts)
    136     batch = texts[i : i + self.max_batch_size]
    137     truncated = [
    138         text[: self.max_length] if len(text) > self.max_length else text
    139         for text in batch
    140     ]
    141     all_embeddings.extend(
--> 142         self._embed(truncated, model_type=self.model_type or "passage")
    143     )
    144 return all_embeddings

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py:105](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/embeddings.py#line=104), in NVIDIAEmbeddings._embed(self, texts, model_type)
    102     if self.truncate:
    103         payload["truncate"] = self.truncate
--> 105 response = self.client.get_req(
    106     model_name=self.model,
    107     payload=payload,
    108     endpoint="infer",
    109 )
    110 response.raise_for_status()
    111 result = response.json()

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:392](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=391), in NVEModel.get_req(self, model_name, payload, invoke_url, stop, endpoint)
    390 if payload.get("stream", False) is True:
    391     payload = {**payload, "stream": False}
--> 392 response, session = self._post(invoke_url, payload)
    393 return self._wait(response, session)

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:220](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=219), in NVEModel._post(self, invoke_url, payload)
    218 session = self.get_session_fn()
    219 self.last_response = response = session.post(**self.last_inputs)
--> 220 self._try_raise(response)
    221 return response, session

File [~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:303](http://localhost:8888/lab/tree/~/Documents/venvs/nvaif_env/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py#line=302), in NVEModel._try_raise(self, response)
    301     body += "\nPlease check or regenerate your API key."
    302 # todo: raise as an HTTPError
--> 303 raise Exception(f"{header}\n{body}") from None

Exception: [400] Bad Request
Inference error
RequestID: 11a69c13-e43c-4f0e-960d-98f4a3f4f706

Also verified using pip show that I am on version 0.0.9 (langchain-nvidia-ai-endpoints does not have a __version__ attribute to verify in code (issue))

(nvaif_env) ➜  ~ pip show langchain-nvidia-ai-endpoints
Name: langchain-nvidia-ai-endpoints
Version: 0.0.9
Summary: An integration package connecting NVIDIA AI Endpoints and LangChain
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Location: <redacted>
Requires: aiohttp, langchain-core, pillow
Required-by:

mattf · 2024-05-08T15:55:36Z

@aishwaryap thank you for the reproducer, it helped me narrow this down.

i believe the issue is some of the inputs are longer than the embedding model allows. in this case you can pass truncate="END", e.g. NVIDIAEmbeddings(model="ai-embed-qa-4", truncate="END")

this is not an issue w/ the nvolveqa_40k model because it would silently truncate your input, while the new models reject the input by default.

does that resolve your issue?

apolo74 · 2024-06-16T22:02:37Z

Hi @mattf, I just found this thread and I just wanted to say that your suggestion worked... at least for the issue at
vectorstore = FAISS.from_documents(documents, document_embedder)
so thanks for that information. However, I wanted to ask a follow-up question since a new Error arises when working with a chain and the vectorstore retriever. I'm trying to ingest scientific articles in PDF format and after passing the following chain:

retriever = vectorstore.as_retriever()
prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Answer solely based on the following context:\n<Documents>\n{context}\n</Documents>",
            ),
            ("user", "{question}"),
        ]
    )

chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
response = chain.invoke( "A question for the PDF")

I get the following error:
.venv\Lib\site-packages\langchain_nvidia_ai_endpoints_common.py", line 311, in _try_raise raise Exception(f"{header}\n{body}") from None Exception: [500] Internal Server Error Input value error: prompt is [[5201]] long while only 2048 is supported
Any ideas how to solve this part? Thanks in advance for any help you may provide!

mattf · 2024-06-17T13:58:47Z

@apolo74 please open this as a new issue, it appears unrelated to embedding and has an informative error

apolo74 · 2024-06-17T18:04:43Z

Hi again @mattf, a couple of minutes ago solved this... I was using a small model (microsoft/phi-3-mini-4k-instruct). There were no more errors the moment I switch to larger models. So the error was related to the size of the LLM.
Thanks for your help!

mattf · 2024-06-25T12:20:34Z

@aishwaryap recent changes server-side should have fully resolved this. please reopen this if you still have an issue.

mattf closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to query `ai-embed-qa-4`; uninformative error #30

Unable to query `ai-embed-qa-4`; uninformative error #30

aishwaryap commented Apr 23, 2024

mattf commented Apr 24, 2024

mattf commented Apr 24, 2024

aishwaryap commented Apr 24, 2024

aishwaryap commented Apr 24, 2024 •

edited

Loading

mattf commented May 8, 2024

apolo74 commented Jun 16, 2024

mattf commented Jun 17, 2024

apolo74 commented Jun 17, 2024

mattf commented Jun 25, 2024

Unable to query ai-embed-qa-4; uninformative error #30

Unable to query ai-embed-qa-4; uninformative error #30

Comments

aishwaryap commented Apr 23, 2024

mattf commented Apr 24, 2024

mattf commented Apr 24, 2024

aishwaryap commented Apr 24, 2024

aishwaryap commented Apr 24, 2024 • edited Loading

mattf commented May 8, 2024

apolo74 commented Jun 16, 2024

mattf commented Jun 17, 2024

apolo74 commented Jun 17, 2024

mattf commented Jun 25, 2024

Unable to query `ai-embed-qa-4`; uninformative error #30

Unable to query `ai-embed-qa-4`; uninformative error #30

aishwaryap commented Apr 24, 2024 •

edited

Loading